Insights Into Evolution and Adaptation Using Computational Methods and Next Generation Sequencing BYAlexanderG.Shanku Adissertationsubmittedtothe GraduateSchool—NewBrunswick Rutgers,TheStateUniversityofNewJersey inpartialfulfillmentoftherequirements forthedegreeof DoctorofPhilosophy GraduatePrograminComputationalBiologyandMolecularBiophysics Writtenunderthedirectionof AndrewD.Kern andapprovedby NewBrunswick,NewJersey May,2016 (cid:13)c 2016 Alexander G. Shanku ALL RIGHTS RESERVED ABSTRACT OF THE DISSERTATION Insights Into Evolution and Adaptation Using Computational Methods and Next Generation Sequencing by Alexander G. Shanku Dissertation Director: Andrew D. Kern Historically, much of the research in evolutionary biology and population ge- neticshasinvolvedanalysisatthelevelofeitherasinglelocusorafewnumber thereof. However, “Next Generation” sequencing technology has opened the floodgates with respect to both the sheer volume and quality of sequence data that researchers have long needed to address and answer long-standing ques- tions in their fields. Scientists are now, by and large, no longer hampered in their efforts by technological hurdles to obtain data, but are in fact facing the problem of how best to use the vast amount of data that are accumulating at anever-increasingrate. Thisisagoodproblemtohave. The following research described in this dissertation is an attempt to de- riveanswerstoquestionsinthefieldsofpopulationgeneticsandevolutionary biology that, until recently, have been either intractable or, at best, extremely ii difficult to address. In the first chapter I provide an introduction and a brief historicallookattheresearcheffortsthathaveproceededmyown. In the second chapter I describe how modern sequencing methods and computationalanalysiscanbeusedtostudy,analyze,andanswerevolutionary questionsaboutthenon-modelorganism,Enallagmahageni,inorderto1)deter- mine this organism’s phylogenetic position within Arthropoda, 2) provide an- swers and insight into the evolutionary history of the protein-encoding genes in the Enallagma transcriptome, and 3) give functional annotation to these ex- pressedproteins. InthethirdchapterIexaminehownaturalselectionactsonthegenomeand derive a method that can accurately determine the evolutionary cause of nu- cleotide fixations, having occurred either through positive selection or neutral processes. I then apply the methodology to North American populations of Drosophila melanogaster, providing further evidence as to how adaptive evolu- tionproceedsinanewlyestablishedpopulation. Thisisanimportantquestion, for though there have been multiple approaches devised to determine the tar- gets and modes of evolution in the genome, to date there has not emerged a definitive method which can determine both the location and type of a selec- tive process, and as a result, the picture of how and where adaptive evolution proceedsinthegenomehasremainedopaque. In the forth chapter I examine how levels of natural selection within the genome have the potential to inhibit the ability to accurately learn popula- tion demographic history. Using a number of modern algorithms and exten- sivesimulations,Ifirstexaminewhetherornotdemographichistoriesthatare learned under simple biological assumptions will yield accurate results when the actual data itself does not adhere to these assumptions. Further, I go on iii to examine more complicated models of demographic history, looking specif- ically at how positive selection biases inference, which directions these biases occur,andatwhatlevelsofselectiondoinferencemethodsfailtoberobust. Fi- nally, I describe potential evolutionary scenarios where these inference meth- ods may be more prone to fail, as well as methods which might mitigate posi- tiveselection’seffects,thusallowingformoreaccuratehistoriestobeinferred. Theworkcontainedinthisdissertation,atthebroadestscale,isaneffortto marry state-of-the-art techniques in statistics, computer science, and machine learning algorithms to the technological advances of next generation sequenc- ing; the potent combination of these technologies has provided a means with which to derive answers to multiple, long-standing questions in population geneticsandevolutionarybiology. iv Preface The time will come when diligent research over long periods will bring to light things which now lie hidden. A single lifetime, even though entirely devoted to the sky, would not be enough for the investigation of so vast a subject... And so this knowledge will be unfoldedonlythroughlongsuccessiveages. Therewillcomeatime whenourdescendantswillbeamazedthatwedidnotknowthings that are so plain to them... Many discoveries are reserved for ages stilltocome,whenmemoryofuswillhavebeeneffaced. —Seneca,NaturalQuestions Portionsofthisdissertationarebasedonworkpreviouslypublishedorsubmittedfor publicationbytheauthor[1]. v Acknowledgements This dissertation is based upon the research undertaken as a Ph.D. student whileattendingDartmouthCollegeandRutgersUniversityfrom2010through 2016. These five and a half years have been divided between my research, the classroom, and ultimately, the creation of this document. However, none of these things would have been possible without the help, guidance, and assis- tance of the many other people that I have had the good fortune of meeting alongmyacademicpath. Accordingly, I must first acknowledge and express my gratitude to my ad- visor and mentor, Andrew D. Kern. Our first meeting was over a pint while we talked science, and I’m fortunate to say that five years later our meetings oftentakethesameform. ThescopeofwhatI’velearnedunderyourdirection isimmense,andforthataloneIamgrateful. Yourloveforscience,andinquiry ingeneral,permeatedthelabandwasmotivatingandinspiring. Yourpatience andunderstandingneverwentunnoticed. Iamalsohappytosaythatinaddi- tiontotheacademicrelationshipwehavecultivated,Ialsoamabletocallyou afriend. Andy,thankyousomuchforeverything. IwouldliketothankmyDissertationCommittee: JinchuanXing,IsaacEd- ery, and Kevin Chen - your guidance, suggestions, and the wealth of your combined experiences is something that I will benefit from my entire career. vi Having the chance to interact with you, be it in journal club, the classroom, or as part of a collaboration has been an extremely valuable experience and I thankeachoneofyouforit. Ihavebeenfortunatetoworkalongsidearemarkablecolleagueandfriend, DanielSchrider. Dan,Icertainlyoweyouquiteabit. You’vetaughtmeagreat deal and shared so much of what you know, and for that I must thank you. Your energy and passion for science are truly motivating. Along with A.D.K, we’ve made a point to keep mind and body strong - shouting “ALL MEAT MAN!!” willnowonlybringstares. Iwouldcertainlynothavemadeitthisfar,letalonesubmitthisdissertation, if I weren’t so fortunate to have known the Associate Director of my graduate program - a confidant, motivator, and friend Gail Ferstandig Arnold. Thank yousomuch,Gail. Iamsoappreciativeofeverythingyou’vedoneforme. Icouldn’tleavethe4th floorwithoutthankingmyfriendKathleenMcDon- ald and recalling the many hours we’ve spent talking in your office. Being able to vent about the “RUt” to each other and about the daily happenings in Nelson was often needed and very cathartic. You certainly helped to keep my spirits up. So, Kathleen, I say thank you for all of your help, your friendship, andgreatconversations. This dissertation starts in Tennessee and finishes in New Jersey, with stops in New Hampshire and Brooklyn along the way. During those transitions, serendipity led to a close relationship with the Cline and Cutraro families. Clayton, Allison, Andy, and Erin, your kindness and support mean the world to us. Watching our families grow together during these past five years has beenveryspecial. TothehardestDevilDogsaround-NathanielE.Allen,PatrickJ.Belmonte, vii and Craig H. Strohl - Semper Fidelis, and REPORT!! I’m especially fortunate for the time I’ve spent with you, Nat, over the past twenty years. You’ve kept me engaged with the world, and I treasure every one of our late night conver- sations,debates,andarguments. Not everyone is a fortunate as I am to have such wonderful in-laws. Your support, care, and love will always be remembered during this time. Kathy andDavidCrosslin,andHalandKimberlyDanson-thankyousoverymuch. In the same light, I am grateful for my whole family’s support and here mustspecificallyacknowledgemysisterandbrother-in-law,JennieandPatrick Osman. Your love and support have been crucial, your encouragement has always kept my spirits up, and knowing that you have always been in my corner means a great deal. Thank you both so much. I am excited for the chancetoliveclosetoyoubothagain,orforthefirsttime! My mother, Carol Shanku - your faith in me never faltered, even during thosetimeswhenmyowndid. Youhavegivenselflesslytomeforsolongand have supported me in all of my endeavors. I am forever grateful. I could not askforabettermother,mother-in-lawtomywife,andgrandmothertomyson. Iloveyouverymuch. Benjamin, my son, maybe one day you will read this and discover what I wasworkingonthefirst5yearsofyourlife. Maybenotthough,it’salongread. I am lucky that whenever I will think back about my Ph.D., I’ll be thinking about you, too. I love you, I am very proud of you, and I am excited for our nextadventure. Most importantly, I have to acknowledge my wife, Bethany. This has cer- tainlybeensomeadventure. Beth,we’vebeensidebysidethisentiretime,and viii throughalltheupsanddowns,yourlove,kindness,andpatience-immeasur- ableasitis-haveneverwavered. Yourendlesssupporthasmeantsomuchto me. I could never have made it without you. I love you very much and am so luckytosharemylifewithyou. ix
Description: