Marquette University e-Publications@Marquette Master's Theses (2009 -) Dissertations, Theses, and Professional Projects Bistro-Primer - Tool to design and validate specific PCR primer pairs for phylogenetic analysis Praful Aggarwal Marquette University Recommended Citation Aggarwal, Praful, "Bistro-Primer - Tool to design and validate specific PCR primer pairs for phylogenetic analysis" (2011).Master's Theses (2009 -).Paper 90. http://epublications.marquette.edu/theses_open/90 BISTRO-PRIMER-TOOLTODESIGNANDVALIDATE SPECIFICPCRPRIMERPAIRS FORPHYLOGENETICANALYSIS by PRAFUL AGGARWAL AThesissubmittedtotheFacultyoftheGraduateSchool, MarquetteUniversity, inPartialFulfillmentoftheRequirementsfor theDegreeofMasterofScience Milwaukee,Wisconsin May2011 ABSTRACT BISTRO-PRIMER-TOOLTODESIGNANDVALIDATE SPECIFICPCRPRIMERPAIRS FORPHYLOGENETICANALYSIS PrafulAggarwal MarquetteUniversity,2011 PolymeraseChainReactionisawidelyusedbiologicaltechniquewhichhelpsin amplifyingsmallquantitiesofDNA.TheseamplifiedDNAcopiesarethenusedinseveralother experimentslikeDNAsequencing,phylogeneticanalysis,etc. PCRprimersareshort subsequencesofnucleotides(basicunitofDNA)thathelpidentifylargerregionsoftheDNA sequence. TheyhelpinsuccessfullyamplifyingthetargetDNAsequencebyidentifying complementaryregionsontheDNAtemplate. Therefore,tosuccessfullyperformPCRitis imperativetodesigngoodqualityprimers. PCRcanbeusedforidentifyingthephylogeneticclassificationofanorganism. For example,inananaerobicdigester,thereisadiversemicrobialcommunityinvolvedthatworksto digestthewastematerialintocarbondioxideandmethane. Themethaneproducedcanbeusedin thefutureasarenewablefuel. Toidentifythemicrobesinvolved,researchersusePCRthatuses primerpair(s)targetingsomespecificgroupofmicrobes,onthe16SrRNAregionoftheir sequences. ThiswaytheyaremorelikelytoamplifyDNAfromspecificmicrobesonlywhichare presentinthetargetgroup. Thiscouldhelpinthephylogeneticclassificationofunknownmicrobes. InthisthesisworkaPCRprimerpairdesignandvalidationsoftwaretoolhasbeen developed. Thistoolhelpsindesigningprimerpairsthatamplifyatargetregioninaspecific taxonomicrank(e.g. Genus). Itusesanovelscoringfunctiontodifferentiatebetweenthespecific andthenot-sospecificprimerpairs. 16SrRNAsequencesforfourdifferentgenera (Syntrophobacter,Syntrophomonas,MethanosarcinaandStreptococcus)wereusedtodevelopand testthetool. Tothebestofmyknowledge,primerpair(s)specificforamplifyingSyntrophobacter orSyntrophomonashavenotyetbeenpublishedandtheresultsfromBistro-Primerafterfurther validationwouldbethefirstspecificprimerpairsfortargetamplificationofthesegenera. i ACKNOWLEDGEMENTS PrafulAggarwal Iamthankfultomycommitteemembersforguidingmethroughoutmythesis. Ialsowant tothankthemembersofDr. Maki’slabandDr. Zitomer’slabforhelpingmeinunderstanding someoftheaspectsusedinthiswork. IwouldalsoliketothankPrincePeterMathaiforcomingup withtherequirementofthissoftware. FinallyIwanttothankmyfamilyforstandingbymeand beingaconstantsourceofmotivation. ii TABLEOFCONTENTS ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i LISTOFTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v LISTOFFIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 StatementofProblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 SummaryofResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 StructureoftheThesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 PCRamplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 PCRrequirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 PCRamplificationprocedure . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 PrimerDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 WhatisaPrimer? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Primerdesignproblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 AlgorithmUsed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Primer3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Primer-BLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.3 PRIMROSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 iii 3 APPROACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 DesignModule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 ValidationModule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4 EVALUATIONANDRESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1 OverviewofEvaluationTechniquesandDatasets . . . . . . . . . . . . . . . . . . 20 4.2 DesignResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 ValidationoftheDesignResults . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5 CONCLUSIONSANDFUTUREWORK . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.3 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 APPENDIX A HOWTOUSEBISTRO-PRIMER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 B BISTRO-PRIMERDATASETSANDRESULTS . . . . . . . . . . . . . . . . . . . . . 37 B.1 TARGETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 B.1.1 Syntrophobacter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 B.1.2 Syntrophomonas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 B.1.3 Methanosarcina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 B.1.4 Streptococcus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 B.2 NON-TARGETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 B.2.1 Syntrophobacter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 B.2.2 Syntrophomonas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 B.2.3 Methanosarcina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 iv B.2.4 Streptococcus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 B.3 BISTRO-PRIMERRESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 C BISTRO-PRIMERDESIGNMODULE . . . . . . . . . . . . . . . . . . . . . . . . . . 58 D BISTRO-PRIMERVALIDATIONMODULE . . . . . . . . . . . . . . . . . . . . . . . 64 v LISTOFTABLES 4.1 Streptococcusand16SrRNAspecificprimerpairsusingBistro-Primer. Inthis tableasubsetoftheprimerpairsdesignedhavebeenreportedwiththe correspondingtargetandnon-targethitsandthescorevalue. . . . . . . . . . . . . 22 4.2 PrimerpairevaluationresultsusingemphinsilicoPCRamlificationfor Streptococcus. Thistableshowstheinsilicoamplificationresultsonthetargets andnon-targetsalongwiththecorrespondingBistro-Primerscore. . . . . . . . . . 23 4.3 Syntrophobacterand16SrRNAspecificprimerpairsusingBistro-Primer. This tableconsistsasubsetofthefinaloutputgeneratedforSyntrophobacterspecific PCRprimerpairs. Itdepictshighscoring,mediumscoringandthelowscoring primerpairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.4 Syntrophomonasand16SrRNAspecificprimerpairsusingBistro-Primer. Inthis tableasubsetofthedesignedprimerpairshasbeenshown. Thesedepictthe differentcasesthatcanbeobservedi.e. high,intermediateandlowscoringprimer pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.5 Methanosarcinaand16SrRNAspecificprimerpairsusingBistro-Primer. Inthis tableasubsetoftheprimerpairsdesignedhavebeenreportedwiththe correspondingtargetandnon-targethitsandthescorevalue. Thenon-targets consistofarchaeandbacterialsequences. . . . . . . . . . . . . . . . . . . . . . . 27 B.1 Bistro-PrimerresultsforthegenusMethanosarcinawith16SrRNAtargetregion. Theforwardandreverseprimersarein5’-3’direction. Themaximumnumberof targetsis324andthemaximumnumberofnon-targetsis1167. . . . . . . . . . . 47 B.2 Bistro-PrimerresultsforSyntrophobacterfor16SrRNAtargetregion. The forwardandreverseprimersarein5’-3’direction. Themaximumnumberof targetsis87andthemaximumnumberofnon-targetsis198. . . . . . . . . . . . . 51 B.3 Bistro-Primerresultsfor16SrRNAtargetinthegenusSyntrophomonas. The forwardandreverseprimersarein5’-3’direction. Themaximumnumberof targetsis93andthemaximumnumberofnon-targetsis198. . . . . . . . . . . . . 54 B.4 Bistro-PrimerresultsforthegenusStreptococcuswith16SrRNAtargetregion. Theforwardandreverseprimersarein5’-3’direction. Themaximumnumberof targetsis400andthemaximumnumberofnon-targetsis602. . . . . . . . . . . . 57 vi LISTOFFIGURES 2.1 Anillustrationofthepolymerasechainreactionprocess. Itdepictshowasingle copyofthetargetregioninaDNAmoleculeisamplifiedintomillionsofcopiesby usingthePCRtechnique. ObtainedfromNationalInstitutesofHealth. National HumanGenomeResearchInstitute.”PoylmeraseChainreaction-PCR.”Retrieved April12,2011fromhttp://www.genome.gov/Glossary/resources/polymerasechain reaction.pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Thegenericprimerdesignproblem. Theuserprovidesandinputtemplatefileand theexpectedoutputisasingleorasetofprimerpairsthatsatisfythecertain parametricconditions.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Exampleofapotentialprimer. Theredcoloredsequencedoesnotsatisfycertain parametersforaprimerandthereforeitisignored. Thegreencoloredsequence satisfiestheinitialconditionsandwillbecarriedonforfurthercheckingforbeing usedasaprimer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Primer3workflow. ThisfigureshowsageneralworkflowforPrimer3. Theuser inputsatemplatesequenceandtheotherparametricvalues. Primer3thenchecks theoligosforalltheparameters. Theoligosthatsatisfyalltheseconditionsare listedaspotentialprimers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1 Bistro-Primerworkflow. TheMSAfilerepresentsthetargetmultiplesequence alignmentfilethattheuserprovidestothedesignmodule. Theuserfilescontain thetargetandthenon-targetsequencefilesthatwillbeusedforthevalidationof theprimerpairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1 Bistro-PrimerscorevsinsilicoPCRtargetsforgenus:Streptococcus. Ingeneral,it wasobservedthatthehigherthescoreisthehigherthenumberofamplifiedtargets. 24 4.2 Bistro-PrimerscorevsinsilicoPCRnon-targetsforgenus:Streptococcus. In general,itwasobservedthatthehigherthescore,thelowerthenumberof amplifiednon-targets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 A.1 SnapshotoftheBistro-PrimerDesignModule . . . . . . . . . . . . . . . . . . . . 35 A.2 SnapshotoftheBistro-PrimerValidationModule . . . . . . . . . . . . . . . . . . 36 1 CHAPTER1 INTRODUCTION Thepolymerasechainreaction(PCR)isawidelyusedbiologicaltechniquetoamplify quantitiesofspecificgenesfromDNAsamples. Theseamplifiedgenescanbethenusedina varietyofanalysessuchasidentifyingnewDNAsequencesandplacingthemintoanalready existingclassificationsystem. PCRprimersareshortsequencesofnucleotides(basicunitofDNA) thatbindtolargerregionsoftheDNAsequenceandaidintheamplificationofagenefragment. ThemaingoalofthisthesisworkistodevelopasoftwaretoolthathelpsresearchersdesignPCR primerpair(s)forclassifyingthetargetorganismsintophylogenetictaxonomies. 1.1 Motivation TheWaterQualityCenteratMarquetteUniversityisacollaborationbetweengroupsfrom CivilandEnvironmentalEngineering,BiologicalSciencesandBioinformatics[1]. Oneofthe majorfieldsofworkdonebythisteaminvolvesthestudyoftheanaerobicdigestionprocessasa sourceofrenewableenergy. Inordertounderstandthisprocessitisimportanttounderstandthe microbialcommunitiesinvolved[1,2]. Incertaincases,severalunknownmicrobesareobserved alongwiththeknownmicrobes. Hence,tobeabletoidentifythesemicrobesinsideadigester,PCR hasbeenemployedtoamplifyDNAobtainedfromthesesludgesamples. Incertaincasestostudytheseveralpathwaysinvolvedinanaerobicdigestion,theresearch teamcomesacrosscertainunknownmicrobialDNAsample. Therefore,forfurtherstudy,itis importanttoidentifytheseunknowns. PCRcanplayamajorroleinaccomplishingthistask. Therefore,torunasuccessfulPCRonunknownDNAsamples,itisnecessarytodesigncertain specificPCRprimerpair(s)thatcouldhelpidentifytheseunknowns. Designingtheseprimer pair(s)inthelabisnotthemostefficientway. Thus,tohavesoftwarethatcouldaccomplishthe taskofdesigningsuchspecificprimerpair(s)canbeveryusefultothebiologicalcommunity. OverthepasttwodecadesagoodnumberofPCRprimerdesignsoftwarehavebeen developed[35]. However,asperknowledgenoneofthesedoesagoodjobindesigninga taxonomicgroupspecificprimerpairs. Someofthesoftwaretrytoaccomplishthistask,butthey
Description: