Building Next-Generation Technologies for Low- Cost Gene Synthesis and High-Accuracy Genome Engineering Citation Eroshenko, Nikolai A. 2015. Building Next-Generation Technologies for Low-Cost Gene Synthesis and High-Accuracy Genome Engineering. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences. Permanent link http://nrs.harvard.edu/urn-3:HUL.InstRepos:14226086 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA Share Your Story The Harvard community has made this article openly available. Please share how this access benefits you. Submit a story . Accessibility Building next-generation technologies for low-cost gene synthesis and high-accuracy genome engineering A dissertation presented by Nikolai Eroshenko to School of Engineering and Applied Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Engineering Sciences Harvard University Cambridge, Massachusetts September 2014 ! © 2014 Nikolai Eroshenko All rights reserved ! Dissertation Advisor: Author: Professor George Church Nikolai Eroshenko Building next-generation technologies for low-cost gene synthesis and high-accuracy genome engineering Abstract The technologies that enable writing and editing of DNA form the foundation of modern molecular biology and biotechnology. However, a number of methodological barriers have limited the widespread adoption of both high-throughput de novo gene synthesis and large-scale genome alteration. Increasingly, work in the fields of synthetic biology, protein design, and gene therapy has been hindered by shortcomings in current DNA writing and editing technologies. The goal of this dissertation has been to improve both the throughput of chemical gene synthesis and the accuracy of genome editing tools. The scale of gene synthesis is most acutely limited by the high cost of using column-synthesized oligonucleotides as the base material. It has been clear for some time that using the much cheaper microarray-synthesized DNA instead would significantly decrease costs of making long, double-stranded DNA. Unfortunately, microarrays’ high chemical complexity, high error rates, and low synthesis yield has prevented their adoption into gene synthesis workflows. We have used selective nucleotide pool amplification and enzymatic error removal to develop a synthesis pipeline that uses Agilent’s Oligonucleotide Library Synthesis microarrays to build 500-850 base pair-long double-stranded constructs. At the time of initial publication the size of the total pool (13,000 oligonucleotides encoded ~2.5 megabases of DNA) was at least one order of magnitude larger than previously reported attempts. ! iii! Following our initial progress on increasing the throughput of gene synthesis, it became apparent that using applying synthetic DNA in vivo is bottlenecked by the low efficiencies and unpredictable accuracies of existing genome engineering techniques. In an effort to build tools that can be used in a large variety of organisms we focused our efforts on engineering site-specific recombinases, many of which can function without using host-encoded proteins. Unfortunately, many groups have reported that site-specific recombinases can cause toxicity possibly due to off-target binding and recombination activities. To address this problem, we proposed that the accuracy of DNA-binding proteins can be altered through mutations in the of protein-protein interaction domains. To test this idea we obtained single-residue mutants of Cre recombinases that exhibited improved site discrimination in in vivo and in vitro recombination experiments. We have also been interested in developing rapid and reproducible assay of protein binding assays. Towards this goal, we conducted proof-of-concepts experiments that demonstrated that gel shift assays could be used to generate binding curves in a multiplexed fashion. We propose that the slope of the information content as a function of binding affinity can be used to compare binding accuracy of dissimilar proteins. ! iv! Table of contents Title page .................................................................................................... i Abstract ....................................................................................................... iii Table of contents ......................................................................................... v Citations to previously published work ...................................................... vii Acknowledgements ..................................................................................... viii Chapter 1: Introduction ....................................................................................... 1 1.1 Summary and structure of thesis ........................................................... 1 1.2 Motivation ............................................................................................. 1 1.3 Chemical DNA synthesis ...................................................................... 3 1.4 Genome editing ..................................................................................... 4 1.4 References ............................................................................................. 6 Chapter 2: Scalable gene synthesis from high-fidelity microchips .................. 9 2.1 Abstract ................................................................................................. 9 2.2 Introduction ........................................................................................... 9 2.3 Results ................................................................................................... 13 2.3.1 Description of approach ......................................................... 13 2.3.2 First set of gene assemblies .................................................... 17 2.3.3 Second set of gene assemblies and error depletion ................ 25 2.4 Conclusion ............................................................................................ 32 2.5 Methods ................................................................................................. 33 2.5.1 Reanalysis of OLS pool error rates ........................................ 33 2.5.2 Design and synthesis of OLS pools ....................................... 38 2.5.3 Amplification and processing of OLS subpools .................... 39 2.5.4 Assembly of fluorescent proteins ........................................... 40 2.5.5 ErrASE ................................................................................... 41 2.5.6 Flow cytometry ...................................................................... 41 2.5.7 Synthesis of antibodies .......................................................... 42 2.6 Bibliography ......................................................................................... 42 Chapter 3: Mutants of Cre recombinase with improved accuracy .................. 45 3.1 Abstract ................................................................................................. 45 3.2 Introduction ........................................................................................... 45 3.3 Results ................................................................................................... 47 3.3.1 A theoretical model of DNA-binding accuracy ..................... 47 3.3.2 Identifying candidate mutations using bacterial selection ..... 50 3.3.3 Mutants better discriminate a model off-target site ............... 53 3.3.4 Mutants are functional in vitro and in human cells ................ 55 3.5.5 Mutants have less off-target activity genome wide ............... 59 3.4 Discussion ............................................................................................. 62 3.5 Methods ................................................................................................. 65 3.5.1 Plasmid construction .............................................................. 65 3.5.2 Identifying and characterizing functional loxP variants ........ 67 3.5.3 Negative and positive selections ............................................ 67 3.5.4 Bacterial recombination assay ............................................... 69 3.5.5 In vitro recombination assay .................................................. 69 ! v! 3.5.6 Human cell culture recombination assay ............................... 70 3.5.7 Genome-wide off-target integration assay ............................. 71 3.6 Bibliography ......................................................................................... 72 Chapter 4: Systematic evaluation of protein-DNA binding accuracies ........... 76 4.1 Introduction ........................................................................................... 76 4.2 Results ................................................................................................... 77 4.2.1 Constraining library size using SELEX ................................. 77 4.2.2 Multiplexed affinity measurements ....................................... 79 4.3 Discussion ............................................................................................. 81 4.4 Methods ................................................................................................. 82 4.4.1 Protein purification ................................................................ 82 4.4.2 EMSA .................................................................................... 83 4.4.3 SELEX ................................................................................... 84 4.4.4 Multiplexed binding assay ..................................................... 86 4.4.5 Data analysis .......................................................................... 87 4.5 Bibliography ......................................................................................... 90 Chapter 5: Discussion ........................................................................................... 92 Appendix: Protocols for gene assembly from microarray oligonucleotides .... 94 A.1 Abstract ................................................................................................ 94 A.2 Introduction .......................................................................................... 94 A.3 Protocols ............................................................................................... 97 A.3.1 Oligonucleotide design and synthesis ................................... 97 A.3.2 PCR amplification of oligonucleotide subpools ................... 104 A.3.3 Enzymatic removal of priming sites ..................................... 108 A.3.4 Assembly of processed subpools into dsDNA constructs ..... 110 A.3.5 Gel-stab PCR ......................................................................... 114 A.4 Commentary ......................................................................................... 116 A.4.1 Background information ....................................................... 116 A.4.2 Critical parameters ................................................................ 119 A.4.3 Troubleshooting .................................................................... 121 A.4.4 Anticipated results ................................................................. 123 A.4.5 Time considerations .............................................................. 124 A.5 Bibliography ......................................................................................... 124 ! vi! Citations to previously published work Most of the results described in this dissertation are reproduced from previously published materials. Any changes or additions reflect solely the opinion of the current author. All * denote authors with equal contributions. Chapter 2 was published as “Scalable gene synthesis by selective amplification of DNA pools from high- fidelity microchips”, S. Kosuri*, N. Eroshenko*, E.M. LeProust, M. Super, J. Way, J.B. Li & G.M. Church, Nature Biotechnology 28: 1295 (2010). Chapter 3 was published as “Mutants of Cre recombinase with improved accuracy”, N. Eroshenko & G.M. Church, Nature Communications 4: 2509 (2013). The Appendix was published as “Gene assembly from chip-synthesized oligonucleotides”, N. Eroshenko*, S. Kosuri*, A.H. Marblestone, N. Conway & G.M. Church, Current Protocols in Chemical Biology 4: 1 (2012). ! vii! Acknowledgements I would like to thank George Church for mentoring my research for the past five years. George is not only an endless source of far-reaching ideas, but also has a way of inspiring everyone who works with him to be more inventive, more daring, and to work towards making their vision of the future a reality. I am particularly indebted to George for providing me with the creative freedom and encouragement I needed to grow as a scientist. I am privileged to have spent five years in the creative and intellectually rigorous atmosphere of the Church lab. When I first joined the lab I was immensely lucky to work with Sri Kosuri, who was the main co-conspirator on the chip synthesis project. Through our conversations and collaborations I learned much about how to go about identifying interesting scientific question, as well as how to refine those questions into tractable projects. I have also been influenced, on both a personal and scientific level, by Tara Gianoulis, whose boundless enthusiasm, curiosity, and warmth are missed by all who knew her. From those early days of the Wyss I would also like to thank Jay Lee, Feng Zhang, and Mike Sismour. It is impossible to fully acknowledge everyone who has helped and influenced me during my time in the Church lab. In NRB 238 I have enjoyed all of the discussion (scientific and otherwise) that I have had with: Mike Mee, Uri Laserson, Xavier Rios, Marc Lajoie, Josh Mosberg, Adrian Briggs, Joyce Yang, Sara Vassallo, James DiCarlo, Raj Chari, Michael Napolitano, Dan Mandel, and George Chao. I am also grateful for ideas and career advice provided by Vatsan Raman, Andy Tolonen, Prashant Mali, Alex Chavez, Ben Stranges, and Henry Lee. ! viii! I would like to thank my parents and my sister Liza Eroshenko for all of their love and encouragement. I am also grateful for the support of the rest of my family, including Yuri Makarov, who has guided by scientific career from the beginning; and my grandmothers Vera Makarova and Maria Eroshenko. And most importantly, I would like to thank Alice McElhinney, whose love, support and inspiration has made both graduate school and the rest of my life immeasurably more meaningful and fun. ! ix!
Description: