Table Of ContentMarquette University
e-Publications@Marquette
Master's Theses (2009 -) Dissertations, Theses, and Professional Projects
Bistro-Primer - Tool to design and validate specific
PCR primer pairs for phylogenetic analysis
Praful Aggarwal
Marquette University
Recommended Citation
Aggarwal, Praful, "Bistro-Primer - Tool to design and validate specific PCR primer pairs for phylogenetic analysis" (2011).Master's
Theses (2009 -).Paper 90.
http://epublications.marquette.edu/theses_open/90
BISTRO-PRIMER-TOOLTODESIGNANDVALIDATE
SPECIFICPCRPRIMERPAIRS
FORPHYLOGENETICANALYSIS
by
PRAFUL AGGARWAL
AThesissubmittedtotheFacultyoftheGraduateSchool,
MarquetteUniversity,
inPartialFulfillmentoftheRequirementsfor
theDegreeofMasterofScience
Milwaukee,Wisconsin
May2011
ABSTRACT
BISTRO-PRIMER-TOOLTODESIGNANDVALIDATE
SPECIFICPCRPRIMERPAIRS
FORPHYLOGENETICANALYSIS
PrafulAggarwal
MarquetteUniversity,2011
PolymeraseChainReactionisawidelyusedbiologicaltechniquewhichhelpsin
amplifyingsmallquantitiesofDNA.TheseamplifiedDNAcopiesarethenusedinseveralother
experimentslikeDNAsequencing,phylogeneticanalysis,etc. PCRprimersareshort
subsequencesofnucleotides(basicunitofDNA)thathelpidentifylargerregionsoftheDNA
sequence. TheyhelpinsuccessfullyamplifyingthetargetDNAsequencebyidentifying
complementaryregionsontheDNAtemplate. Therefore,tosuccessfullyperformPCRitis
imperativetodesigngoodqualityprimers.
PCRcanbeusedforidentifyingthephylogeneticclassificationofanorganism. For
example,inananaerobicdigester,thereisadiversemicrobialcommunityinvolvedthatworksto
digestthewastematerialintocarbondioxideandmethane. Themethaneproducedcanbeusedin
thefutureasarenewablefuel. Toidentifythemicrobesinvolved,researchersusePCRthatuses
primerpair(s)targetingsomespecificgroupofmicrobes,onthe16SrRNAregionoftheir
sequences. ThiswaytheyaremorelikelytoamplifyDNAfromspecificmicrobesonlywhichare
presentinthetargetgroup. Thiscouldhelpinthephylogeneticclassificationofunknownmicrobes.
InthisthesisworkaPCRprimerpairdesignandvalidationsoftwaretoolhasbeen
developed. Thistoolhelpsindesigningprimerpairsthatamplifyatargetregioninaspecific
taxonomicrank(e.g. Genus). Itusesanovelscoringfunctiontodifferentiatebetweenthespecific
andthenot-sospecificprimerpairs. 16SrRNAsequencesforfourdifferentgenera
(Syntrophobacter,Syntrophomonas,MethanosarcinaandStreptococcus)wereusedtodevelopand
testthetool. Tothebestofmyknowledge,primerpair(s)specificforamplifyingSyntrophobacter
orSyntrophomonashavenotyetbeenpublishedandtheresultsfromBistro-Primerafterfurther
validationwouldbethefirstspecificprimerpairsfortargetamplificationofthesegenera.
i
ACKNOWLEDGEMENTS
PrafulAggarwal
Iamthankfultomycommitteemembersforguidingmethroughoutmythesis. Ialsowant
tothankthemembersofDr. Maki’slabandDr. Zitomer’slabforhelpingmeinunderstanding
someoftheaspectsusedinthiswork. IwouldalsoliketothankPrincePeterMathaiforcomingup
withtherequirementofthissoftware. FinallyIwanttothankmyfamilyforstandingbymeand
beingaconstantsourceofmotivation.
ii
TABLEOFCONTENTS
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
LISTOFTABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LISTOFFIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
CHAPTER
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 StatementofProblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 SummaryofResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 StructureoftheThesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 PCRamplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 PCRrequirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 PCRamplificationprocedure . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 PrimerDesign . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 WhatisaPrimer? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Primerdesignproblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 AlgorithmUsed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Primer3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 Primer-BLAST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 PRIMROSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
iii
3 APPROACH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 DesignModule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 ValidationModule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 EVALUATIONANDRESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 OverviewofEvaluationTechniquesandDatasets . . . . . . . . . . . . . . . . . . 20
4.2 DesignResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 ValidationoftheDesignResults . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5 CONCLUSIONSANDFUTUREWORK . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
APPENDIX
A HOWTOUSEBISTRO-PRIMER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
B BISTRO-PRIMERDATASETSANDRESULTS . . . . . . . . . . . . . . . . . . . . . 37
B.1 TARGETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
B.1.1 Syntrophobacter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
B.1.2 Syntrophomonas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
B.1.3 Methanosarcina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
B.1.4 Streptococcus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
B.2 NON-TARGETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
B.2.1 Syntrophobacter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
B.2.2 Syntrophomonas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
B.2.3 Methanosarcina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
iv
B.2.4 Streptococcus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
B.3 BISTRO-PRIMERRESULTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
C BISTRO-PRIMERDESIGNMODULE . . . . . . . . . . . . . . . . . . . . . . . . . . 58
D BISTRO-PRIMERVALIDATIONMODULE . . . . . . . . . . . . . . . . . . . . . . . 64
v
LISTOFTABLES
4.1 Streptococcusand16SrRNAspecificprimerpairsusingBistro-Primer. Inthis
tableasubsetoftheprimerpairsdesignedhavebeenreportedwiththe
correspondingtargetandnon-targethitsandthescorevalue. . . . . . . . . . . . . 22
4.2 PrimerpairevaluationresultsusingemphinsilicoPCRamlificationfor
Streptococcus. Thistableshowstheinsilicoamplificationresultsonthetargets
andnon-targetsalongwiththecorrespondingBistro-Primerscore. . . . . . . . . . 23
4.3 Syntrophobacterand16SrRNAspecificprimerpairsusingBistro-Primer. This
tableconsistsasubsetofthefinaloutputgeneratedforSyntrophobacterspecific
PCRprimerpairs. Itdepictshighscoring,mediumscoringandthelowscoring
primerpairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.4 Syntrophomonasand16SrRNAspecificprimerpairsusingBistro-Primer. Inthis
tableasubsetofthedesignedprimerpairshasbeenshown. Thesedepictthe
differentcasesthatcanbeobservedi.e. high,intermediateandlowscoringprimer
pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.5 Methanosarcinaand16SrRNAspecificprimerpairsusingBistro-Primer. Inthis
tableasubsetoftheprimerpairsdesignedhavebeenreportedwiththe
correspondingtargetandnon-targethitsandthescorevalue. Thenon-targets
consistofarchaeandbacterialsequences. . . . . . . . . . . . . . . . . . . . . . . 27
B.1 Bistro-PrimerresultsforthegenusMethanosarcinawith16SrRNAtargetregion.
Theforwardandreverseprimersarein5’-3’direction. Themaximumnumberof
targetsis324andthemaximumnumberofnon-targetsis1167. . . . . . . . . . . 47
B.2 Bistro-PrimerresultsforSyntrophobacterfor16SrRNAtargetregion. The
forwardandreverseprimersarein5’-3’direction. Themaximumnumberof
targetsis87andthemaximumnumberofnon-targetsis198. . . . . . . . . . . . . 51
B.3 Bistro-Primerresultsfor16SrRNAtargetinthegenusSyntrophomonas. The
forwardandreverseprimersarein5’-3’direction. Themaximumnumberof
targetsis93andthemaximumnumberofnon-targetsis198. . . . . . . . . . . . . 54
B.4 Bistro-PrimerresultsforthegenusStreptococcuswith16SrRNAtargetregion.
Theforwardandreverseprimersarein5’-3’direction. Themaximumnumberof
targetsis400andthemaximumnumberofnon-targetsis602. . . . . . . . . . . . 57
vi
LISTOFFIGURES
2.1 Anillustrationofthepolymerasechainreactionprocess. Itdepictshowasingle
copyofthetargetregioninaDNAmoleculeisamplifiedintomillionsofcopiesby
usingthePCRtechnique. ObtainedfromNationalInstitutesofHealth. National
HumanGenomeResearchInstitute.”PoylmeraseChainreaction-PCR.”Retrieved
April12,2011fromhttp://www.genome.gov/Glossary/resources/polymerasechain
reaction.pdf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Thegenericprimerdesignproblem. Theuserprovidesandinputtemplatefileand
theexpectedoutputisasingleorasetofprimerpairsthatsatisfythecertain
parametricconditions.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Exampleofapotentialprimer. Theredcoloredsequencedoesnotsatisfycertain
parametersforaprimerandthereforeitisignored. Thegreencoloredsequence
satisfiestheinitialconditionsandwillbecarriedonforfurthercheckingforbeing
usedasaprimer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Primer3workflow. ThisfigureshowsageneralworkflowforPrimer3. Theuser
inputsatemplatesequenceandtheotherparametricvalues. Primer3thenchecks
theoligosforalltheparameters. Theoligosthatsatisfyalltheseconditionsare
listedaspotentialprimers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Bistro-Primerworkflow. TheMSAfilerepresentsthetargetmultiplesequence
alignmentfilethattheuserprovidestothedesignmodule. Theuserfilescontain
thetargetandthenon-targetsequencefilesthatwillbeusedforthevalidationof
theprimerpairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1 Bistro-PrimerscorevsinsilicoPCRtargetsforgenus:Streptococcus. Ingeneral,it
wasobservedthatthehigherthescoreisthehigherthenumberofamplifiedtargets. 24
4.2 Bistro-PrimerscorevsinsilicoPCRnon-targetsforgenus:Streptococcus. In
general,itwasobservedthatthehigherthescore,thelowerthenumberof
amplifiednon-targets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
A.1 SnapshotoftheBistro-PrimerDesignModule . . . . . . . . . . . . . . . . . . . . 35
A.2 SnapshotoftheBistro-PrimerValidationModule . . . . . . . . . . . . . . . . . . 36
1
CHAPTER1
INTRODUCTION
Thepolymerasechainreaction(PCR)isawidelyusedbiologicaltechniquetoamplify
quantitiesofspecificgenesfromDNAsamples. Theseamplifiedgenescanbethenusedina
varietyofanalysessuchasidentifyingnewDNAsequencesandplacingthemintoanalready
existingclassificationsystem. PCRprimersareshortsequencesofnucleotides(basicunitofDNA)
thatbindtolargerregionsoftheDNAsequenceandaidintheamplificationofagenefragment.
ThemaingoalofthisthesisworkistodevelopasoftwaretoolthathelpsresearchersdesignPCR
primerpair(s)forclassifyingthetargetorganismsintophylogenetictaxonomies.
1.1 Motivation
TheWaterQualityCenteratMarquetteUniversityisacollaborationbetweengroupsfrom
CivilandEnvironmentalEngineering,BiologicalSciencesandBioinformatics[1]. Oneofthe
majorfieldsofworkdonebythisteaminvolvesthestudyoftheanaerobicdigestionprocessasa
sourceofrenewableenergy. Inordertounderstandthisprocessitisimportanttounderstandthe
microbialcommunitiesinvolved[1,2]. Incertaincases,severalunknownmicrobesareobserved
alongwiththeknownmicrobes. Hence,tobeabletoidentifythesemicrobesinsideadigester,PCR
hasbeenemployedtoamplifyDNAobtainedfromthesesludgesamples.
Incertaincasestostudytheseveralpathwaysinvolvedinanaerobicdigestion,theresearch
teamcomesacrosscertainunknownmicrobialDNAsample. Therefore,forfurtherstudy,itis
importanttoidentifytheseunknowns. PCRcanplayamajorroleinaccomplishingthistask.
Therefore,torunasuccessfulPCRonunknownDNAsamples,itisnecessarytodesigncertain
specificPCRprimerpair(s)thatcouldhelpidentifytheseunknowns. Designingtheseprimer
pair(s)inthelabisnotthemostefficientway. Thus,tohavesoftwarethatcouldaccomplishthe
taskofdesigningsuchspecificprimerpair(s)canbeveryusefultothebiologicalcommunity.
OverthepasttwodecadesagoodnumberofPCRprimerdesignsoftwarehavebeen
developed[35]. However,asperknowledgenoneofthesedoesagoodjobindesigninga
taxonomicgroupspecificprimerpairs. Someofthesoftwaretrytoaccomplishthistask,butthey
Description:Aggarwal, Praful, "Bistro-Primer - Tool to design and validate specific PCR primer pairs for phylogenetic imperative to design good quality primers.