RESEARCHARTICLE Hybrid Origins of Citrus Varieties Inferred from DNA Marker Analysis of Nuclear and Organelle Genomes TokurouShimizu1*,AkiraKitajima2,KeisukeNonaka1,TerutakaYoshioka1, SatoshiOhta1,ShingoGoto1,AtsushiToyoda3,AsaoFujiyama3,TakakoMochizuki4, HidekiNagasaki4¤,EliKaminuma4,YasukazuNakamura4 1 DivisionofCitrusResearch,InstituteofFruitTreeandTeaScience,NARO,Shimizu,Shizuoka,Japan, 2 ExperimentalFarm,GraduateSchoolofAgriculture,KyotoUniversity,Kizugawa,Kyoto,Japan,3 National InstituteofGenetics,ComparativeGenomicslaboratory,NationalInstituteofGenetics,Mishima,Shizuoka, a11111 Japan,4 NationalInstituteofGenetics,CenterforInformationBiology,NationalInstituteofGenetics, Mishima,Shizuoka,Japan ¤ Currentaddress:DepartmentofFrontierResearch,KazusaDNAResearchInstitute,Kisarazu,Chiba, Japan *[email protected] Abstract OPENACCESS Citation:ShimizuT,KitajimaA,NonakaK, Mostindigenouscitrusvarietiesareassumedtobenaturalhybrids,buttheirparentagehas YoshiokaT,OhtaS,GotoS,etal.(2016)Hybrid OriginsofCitrusVarietiesInferredfromDNA sofarbeendeterminedinonlyafewcasesbecauseoftheirwidegeneticdiversityandthe MarkerAnalysisofNuclearandOrganelle lowtransferabilityofDNAmarkers.Hereweinfertheparentageofindigenouscitrusvarie- Genomes.PLoSONE11(11):e0166969. tiesusingsimplesequencerepeatandindelmarkersdevelopedfromvariouscitrusgenome doi:10.1371/journal.pone.0166969 sequenceresources.Parentagetestswith122knownhybridsusingtheselectedDNAmark- Editor:DavidDFang,USDA-ARSSouthern erscertifytheirtransferabilityamongthosehybrids.Identitytestsconfirmthatmostvariant RegionalResearchCenter,UNITEDSTATES strainsareselectedmutants,butwefindfourtypesofkunenbo(Citrusnobilis)andthree Received:August13,2016 typesoftachibana(Citrustachibana)forwhichwesuggestdifferentorigins.Structureanaly- Accepted:November7,2016 siswithDNAmarkersthatareinHardy–Weinbergequilibriumdeducethreebasictaxacoin- Published:November30,2016 cidingwiththecurrentunderstandingofcitrusancestors.Genotypinganalysisof101 indigenouscitrusvarietieswith123selectedDNAmarkersinferstheparentagesof22indig- Copyright:©2016Shimizuetal.Thisisanopen accessarticledistributedunderthetermsofthe enouscitrusvarietiesincludingSatsuma,Temple,andiyo,andsingleparentsof45indige- CreativeCommonsAttributionLicense,which nouscitrusvarieties,includingkunenbo,C.ichangensis,andIchanglemonbyallele-sharing permitsunrestricteduse,distribution,and andparentagetests.Genotypinganalysisofchloroplastandmitochondrialgenomesusing reproductioninanymedium,providedtheoriginal 11DNAmarkersclassifiestheircytoplasmicgenotypesinto18categoriesanddeducesthe authorandsourcearecredited. combinationofseedandpollenparents.Likelihoodratioanalysisverifiestheinferredparent- DataAvailabilityStatement:Allrelevantdataare ageswithsignificantscores.Thereconstructedgenealogyidentifies12typesofvarieties withinthepaperanditsSupportingInformation files.Allnucleotidesequencedatathatwere consistingofKishu,kunenbo,yuzu,koji,sourorange,dancy,kobenimikan,sweetorange, obtainedbytheauthorshadbeendepositedtothe tachibana,Cleopatra,willowleafmandarin,andpummelo,whichhaveplayedpivotalrolesin publicdatabasewithaccessionnumbersgivenin theoccurrenceoftheseindigenousvarieties.Theinferredparentageoftheindigenousvari- thispaper.Accessionnumberscanbefoundinthe etiesconfirmstheirhybridorigins,asfoundbyrecentstudies. filetitledS1Table. Funding:Thisworkwassupportedbyagrantfor ResearchProject"GenomicsforAgricultural InnovationNGB1006"fromJapaneseMinistryof Agriculture,ForestryandFisheriesforTS.Apartof PLOSONE|DOI:10.1371/journal.pone.0166969 November30,2016 1/58 HybridOriginsofCitrusVarieties thisworkwasalsosupportedbyKAKEN(No. Introduction 24405025)forAK. ThegenusCitrusL.(FamilyRutaceae,subfamilyAurantiodeae)coversawiderangeofedible CompetingInterests:Theauthorshavedeclared andcommercialvarieties,includingsweetorange,lemon,lime,grapefruit,andmandarins thatnocompetinginterestsexist. suchasClementine,Satsuma,King,andponkan[1–4].Theproductionofmajorcitrusvarie- tiesintropicaltosub-tropicalandtemperatezonesexceeds90milliontons,andthecitrus industryoccupiesasignificantpositionnotonlyinthefruitindustrybutalsoinglobalagricul- ture[5,6].Inadditiontotheworldwideproductionofthesemajorcitrusvarieties,numerous indigenouscitrusvarietieshavealsobeenproducedinspecificregions,andconsumedlocally [2,7].WidegeneticdiversityobservedinCitrus,however,hasmadeitdifficultfortaxonomists todrawaclearpictureoftheirclassification.Furthermore,mutantshaveoccasionallybeen selectedfromlimbsportsornucellarseedlings,andtheseconstitutelargevariantstrains[2,8– 10].Understandinghowthesemoderncitrusvarietiesarosefromtheancestralbasicspecies wouldbringusimportantinsightsforfuturecitrusbreeding. Manybotanistsandtaxonomistshaveproposedvariousapproachesfortheclassificationof awiderangeofcitrusvarieties.Amongthem,twosystemsproposedbySwingle[11]and Tanaka[7,12]havebeenusedinmanystudies.Thesetwosystemspresumethatmostindige- nousandcommercialvarietiesarosefromhybridizationofancestralones,butdifferintheway theytreatindigenousvarietiesandcultivatedvarieties.Swingleprimarilyclassifiedindigenous varietiesratherthanthecultivatedvarieties,placingtwosubgeneraPapedaandCitrusinthe genusCitrus[11].ThesubgenusPapedaconsistsofsectionPapedawithfourspecies,andsec- tionPapedocitruswithtwospecies.HeclassifiedtenspeciesinthesubgenusCitrus,and regardedmostcultivatedvarietiesasnaturalhybridsoftheseindigenousspecies.Heassigned mostmandarinvarietiestothescientificnameCitrusreticulata,classifiedtachibanaseparately asC.tachibana,andalsoclassifiedgrapefruit,whicharosefromachanceseedling[2,9],sepa- ratelyasC.paradisi.Incontrast,Tanakastressedtheimportanceofbothindigenousvarieties andcultivatedvarieties,andclassifiedthemequallyasaspecies.Heprimarilyplacedtwosub- genera(ArchicitrusandMetacitrus)ingenusCitrus.ThesubgenusArchicitrusconsistsoffive sections(Papeda,Limonellus,Citrophorum,CephalocitrusandAurantium)with111species, includinggrapefruitasC.paradisi.ThesubgenusMetacitrusconsistsofthreesections(Osmo- citrus,AcrumenandPseudofortunella)with48species[12].AccordingtoTanaka’ssystem, individualmandarinvarietiesandtachibanawereclassifiedasaspecieswithindividualscien- tificnames,andC.reticulatawasassignedtotheponkanmandarin.Tanakaclassified145cit- russpeciesin22differentcategories[12].Sincethen,hehasaddedseveralindigenousvarieties tohisclassificationsystem,andhereleasedtheultimatelistconsistingof159speciesin1969 [13].SwingleconsideredC.ichangensisasaspeciesofsubgenusPapeda,anddidnotassigna scientificnametoyuzubecauseheregardeditasanaturalhybridofC.ichangensis.Incontrast, TanakaclassifiedC.ichangensisinsubgenusMetacitrussectionOsmocitrus,andclassified yuzutosubgenusMetacitrussectionEuosmocitrusasC.junos[12]. Bythe1970s,variousstudieshadbeenlaunchedtoclassifycitrusvarietiesusingbiochemi- calmarkers.In1975,Scorapublishedanovelpaperbasedonhisownchemotaxonomical studyofcitrustogetherwithasurveyofpastliterature[14].Hepostulatedthreehypothetical taxa,mandarin(C.reticulata),citron(C.medica)andpummelo(C.maxima,formerlyC. grandis),astheancestors,andproposedthatmoderncitrusvarietiesarosefromrepeated hybridizationoftheseancestors.In1976,BarrettandRhodesexaminedcorrelationsamong22 indigenousvarietiesbasedonsimilaritiesfor146traits,thenestimatedtheiraffinitiesaccord- ingtotheirdeduceddistance[15].Similarchemotaxonomicalstudiesgraduallyrevealedthe phylogeniesofcitrusvarieties[16–21].WhenDNAmarkertechnologybecameavailable,taxo- nomicalstudiesattemptedclassificationofcitrususingvariousDNAmarkerssuchasRAPD PLOSONE|DOI:10.1371/journal.pone.0166969 November30,2016 2/58 HybridOriginsofCitrusVarieties [22–26],RFLP[27],AFLP[28,29],ISSR[29–31]andSRAP[8,32].Nicolosiandcolleagues deducedacitrusphylogenyaccordingtothegenotypesofnuclearandchloroplastmarkers, anddemonstratedthattheoriginsofcitrusvarietiesproposedbyScora[14]andBarrettand Rhodes[15]wereacceptable[33,34].Sincethen,theoriginsofsomecitrusvarietieshavegrad- uallybeenrevealed,andnewclassificationshavebeenproposed[35,36].Nowadays,codomi- nantprecisionsimplesequencerepeat(SSR)orsinglenucleotidepolymorphism(SNP) markershavebeendevelopedandusedinmoststudies(seethereviews[34,37–40]).Inaddi- tion,thechloroplastgenomesequenceofsweetorangehasbeenreleased[41],andgenome sequencesofmajorcitrusvarietiesarenowpublic[42,43].Thesegenomesequenceresources enablethedesignofprecisionDNAmarkers,andhaverevealedtheparentageofClementine, grapefruit,sweetorange,andlimesandlemons[43–48].However,theparentageofmost indigenousvarietieshasnotyetbeendetermined. Identifyingthecombinationofseedparentandpollenparentisanotherimportantissueto besolvedinparentageanalysis.Manystudieshaverevealedthephylogenyofcitrusvarietiesby evaluatingpolymorphismsinthechloroplastormitochondrialgenome,orboth[33,47,49–57]. However,someofthesestudieshaveonlyevaluatedlocalcitrusvarieties[51,52],orlimited numbersofvarietiesinthegenusCitrus[50,57,58].Nextgenerationsequencing(NGS)tech- nologyhasbecomecommonplace,andithasbeenappliedtothegenotypingofcitruschloro- plastgenomes[56],butitisstillacostlyandtime-consumingapproach.Simplebut reproducibleandlow-costtechnologiesthatrevealsufficientpolymorphismsareneededfor theparentageanalysisofawiderangeofcitrusvarieties. DNAmarkeranalysishasbeenusedinforensicgeneticsforinferringparentageorpater- nity,andidentifyingmissingpersonsfromtheirremains[59,60].Thesetechniqueshavealso beenusedtoinfersibshipsofwildpopulations[61–64],andareanticipatedtobeabletoreveal unknowngenealogyamongindigenouscitrusvarieties.Twobasicapproacheshavebeen adoptedforparentageestimationwithDNAmarkeranalysis[64].Thefirstusesallele-sharing teststhatestimatethenumberofallelessharedbetweentwoindividualsatcodominantDNA markersaccordingtotheMendelianrulesofinheritance.Thesetestsestimatetheprobability ofparentagefromtheproportionofDNAmarkerswithsharedalleles,andcanalsoeliminate unrelatedindividuals.Thediscriminatorypowerofthetestisproportionaltothenumberof locievaluatedandthepolymorphismofeachDNAmarker.However,thesetestsaresuscepti- bletogenotypingerrors,andmaygivefalsepositiveornegativeresults[64].Anotherapproach isalikelihoodratioanalysis,whichcomparestheprobabilitiesofalternatehypothesesforthe parentageoftwoindividuals(e.g.,whethertheyareparentandoffspringorunrelated)then estimatesanoddsscorebetweenthesetwohypotheses[62–64].Thisisawidelyusedtechnique forexaminingproposedpaternityorparentageandalsotoidentifyindividuals[59,60,65].The likelihoodratioanalysisestimatestheprobabilityoftheproposedparentageaccordingtothe likelihoodofallegedparentsandchild,thencomparesitwithanullrelationbetweenthem deducedfromtheallelefrequencywithinthepopulation.Thelogarithmoflikelihoodratio odds(LODscore)isoftenusedtoindicatetheestimatedscore,butthenumberofDNAmark- ersusedfortheevaluationandtheirallelefrequencyinthepopulationinfluencethescore[64]. Genotypingerrorscanalsoinfluencethescore,anditisthusdifficulttodemonstrateaclear thresholdfordiscrimination[63].Thesetwomethodseachhaveprosandcons;therefore,an approachthatfirstexcludesunrelatedindividualsusinganallele-sharingtest,thenexamines theprobabilityoftheproposedparentageusinglikelihoodratioanalysis,willbeasimplebut effectivewaytoinferparentageinagivenpopulation. Becausegenotypingerrorseverelyaffectsthereliabilityofbothmethods,detectingsuch errorandevaluatingparentagewitherror-freeDNAmarkersisaprerequisiteforreliability.In thegenotypinganalysisofcitrusvarieties,however,widegeneticdiversityamongnatural PLOSONE|DOI:10.1371/journal.pone.0166969 November30,2016 3/58 HybridOriginsofCitrusVarieties varietiesreducesthetransferabilityofDNAmarkers,resultinginfalsegenotypes[44,46,64,66]. Selectedsomaticmutantscouldalsobeadrawbackbecausesomeofthem,butnotall,have mutationsintheirgenotypethatmakeitdifficulttoestimatetheiridentity. Theobjectiveofthepresentstudyistoinferparentageamongvariouscitrusvarietiesusing DNAmarkeranalysis,andverifytheinferredparentagestatistically.Wehaveattempted1)to developsufficientDNAmarkersforparentageanalysisandeliminateerroneousDNAmarkers byexaminingthemwithalargeenoughsetofknownhybridvarieties,2)toestimategenetic structuresofindigenousvarietiesusingthesecertifiedDNAmarkers,3)todeterminethecyto- solicgenotypesofindividualvarietiesbyevaluatingchloroplastandmitochondrialgenomes withDNAmarkeranalysis,4)toinferparentageamongindigenouscitrusvarietiesandverify itusingalikelihoodratioapproach. MaterialsandMethods Plantmaterials Weselected371citrusaccessionsconsistingof208indigenousvarieties,78hybridvarieties, and85selectedstrains(Table1andS1Table).Theindigenousvarietiesarefromthecollec- tionsoftheInstituteofFruitTreeandTeaScience,NARO(NIFTS)thathavebeenmaintained attheOkitsuCitrusResearchDivisioninShizuokaprefecture,Japan.Thesevarietieswere selectedfrommajormandarins(C.reticulata,C.tangerina,C.unshiu,C.clementina,C. Table1. Summaryofcitrussamplesusedinthisstudy. Category Scientificnames Samples Genotypedsamples Representativesamples Swingle’ssystem Tanaka’ssystem Indigenousvarieties 208 269 101 Clementine C.reticulataBlanco C.clementinahort.exTanaka 4 4 1 Dancy C.reticulataBlanco C.tangerinahort.exTanaka 2 2 1 Grapefruit C.paradisiMacf. C.paradisiMacf. 3 3 1 Hyuganatsu C.sinensis(L.)Osbeck C.tamuranahort.exTanaka 5 6 1 Iyo C.sinensis(L.)Osbeck C.iyohort.exTanaka 2 2 1 Kishu C.reticulataBlanco C.kinokunihort.exTanaka 16 21 1 Kunenbo1) C.reticulataBlanco C.nobilisLour.var.kunepTanaka 10 13 4 Natsudaidai C.paradisiMacf. C.natsudaidaiHayata 4 5 1 Ponkan C.reticulataBlanco C.reticulataBlanco 4 5 1 Pummelo C.grandisOsbeck2) C.grandisOsbeck2) 12 14 12 Satsuma C.reticulataBlanco C.unshiuMarcov. 21 33 1 Shiikuwasha C.indica C.depressaHayata 2 3 2 Sourorange C.aurantiumL. C.aurantiumL. 5 6 1 Sweetorange C.sinensis(L.)Osbeck C.sinensis(L.)Osbeck 20 22 1 Tachibana C.tachibanaMakino C.tachibana(Makino)Tanaka 12 13 3 Tankan C.sinensis(L.)Osbeck C.tankanHayata 4 4 1 Willowleafmandarin C.reticulataBlanco C.deliciosaTen. 2 2 2 Others 80 111 66 Hybridvarieties C.spp C.spp 78 83 75 Selectedstrains C.spp C.spp 85 90 85 Total 371 442 261 1)Kunenbo(C.nobilisLour.var.kunepTanaka)includesKingmandarin(C.nobilisLour.) 2)NowclassifiedasC.maximaMerr. doi:10.1371/journal.pone.0166969.t001 PLOSONE|DOI:10.1371/journal.pone.0166969 November30,2016 4/58 HybridOriginsofCitrusVarieties kinokuni,C.tachibana,C.nobilis),pummelos(C.maximaanditshybrids),lemon(C.limon), sweetorange(C.sinensis),yuzu(C.junos),ichanchii(C.ichangensis)andtheirassumednatural hybrids.Sixteenvarietiesincludedvariantselectionstoevaluatetheirgeneticidentity:four Clementines,twovarietiesclassifiedtoC.tangerinehort.exTanaka(DancyandObeni mikan),threegrapefruits,fivehyuganatsu,twoiyos,16Kishus,10kunenbos,fourponkans,12 pummelos,21Satsumas,twoshiikuwashas,fivesouroranges,20sweetoranges,12tachibanas, fourtankans,andtwowillowleafmandarins,respectively.Amongthem,kunenboincluded bothC.nobilisLour.(King)andC.nobilisLour.var.kunepTanaka.Hybridvarietiesusedin thisstudyarefromthecollectionsofNIFTS.Forty-fiveofthemweredevelopedbyNIFTS,11 byUCRiverside,10bytheUSDA,andtheother12varietiesweredevelopedbysevenother institutesorbyfarmers.Wealsoused85strainsthatwereselectionsfromvariouscrossesin NIFTS. DNAextraction FullymaturedleaveswerecollectedfromeachsampleinthefieldatOkitsu,Shizuoka,then providedforDNAextractionusingamodifiedprotocolwithaNucleonPhytopurekit(GE HealthcareLifeScience,NJ,USA)[67].Forcertainvarieties,severalsampleswerecollected fromdifferenttrees.Thesewereusedasbiologicalreplicatestoconfirmthereproducibilityof genotyping(RAinS1Table).DNAconcentrationofthepreparedDNAsampleswasdeter- minedusingaQubitAssaykit(ThermoFisherScientific,Tokyo,Japan).UVabsorbanceanaly- siswasusedtoconfirmsamplequality(A /A >1.8,andA /A >2.0),andgel 260 280 260 230 electrophoresisanalysistoverifythesizeandintegrityoftheextractedDNAsamples. CitrussequenceresourcesforDNAmarkerdesign NucleotidesequencesofexpressedgenesofcitruswereobtainedfrompubliccDNAsequence databasesdbEST(http://www.ncbi.nlm.nih.gov/dbEST/),RefSeq(http://www.ncbi.nlm.nih. gov/refseq/)andHarvEST(http://harvest.ucr.edu/)[68].Citrusgenomesequenceresourcesin publicdatabases,includingBACendsequencesofClementine[69]andSatsuma[70,71],and wholegenomeshotgunsequencesofsweetorange‘RidgePineapple’inthetracefilerepository ofSangerreads(ftp://ftp.ncbi.nlm.nih.gov/pub/TraceDB/citrus_sinensis/),werealsousedfor DNAmarkerdesign.Preliminaryevaluationofthequalityandlengthofeachofthesedatasets wascarriedoutusingpregap4[72],thenaconsensussequencesetwasobtainedforeachset withMiraassembler[73]toreduceredundancy. NGSanalysisofcitrusvarieties NGSanalysisofcitrusvarietiesforminingSSRandindelregionswasperformedwithaHiSeq 2000sequencingsystem(Illumina,CA,USA)inpaired-endmode[67].Quality-checkedNGS readsweremappedtothehaploidClementinereferencesequencev.0.9orv.1.0[43]using BWA[74].CandidateSSRorindelregionsinthere-sequenceddatawerescoredandidentified usingSAMtoolsandBCFtools[75],orusingmreps[76]. DNAmarkerdesignforgenotypingnucleargenomes SSRregionsofeachsequencewereminedusingmreps[76],thencandidateregionswithmotif lengthbetweentwoandsixnucleotideswereselected.Theidentifiedcandidateregionsfound inexpressedgenesorgenomicsequenceswereusedforoligonucleotideprimerdesignwith PerlPrimer[77]orPrimer3[78].PreviouslyreportedSSRmarkersdesignedfromBACend sequences[46],orfromESTsequences[79,80]werealsousedinthisstudy. PLOSONE|DOI:10.1371/journal.pone.0166969 November30,2016 5/58 HybridOriginsofCitrusVarieties DNAmarkerdesignforgenotypingorganellegenomes SSRmarkersfordetectingpolymorphismsinthechloroplastgenomeweredesignedfromthe chloroplastgenomesequenceofsweetorange‘RidgePineapple’(accessionNo.DQ864733) [41]bysearchingcandidateSSRregionsusingmreps[76]asdescribedintheprevioussection. Oligonucleotideprimersetsforcitrusmitochondrialgenomes[53],anduniversalprimersets forthechloroplastgenomesofdicotyledonousangiosperms[81]werealsousedforgenotyping organellegenomes. Genotypinganalysis Allgenotypinganalysisofnuclearororganellegenomesfollowedthemultiplexedandmulti- coloredpost-labelingmethodinsingletubewithBStagreportedbyShimizuandYano[82]. Post-labelingofthePCRproductwithBStagisasimplebutinexpensivemethodthatdoesnot requirelargealterationofthePCRprogram,anditreducesthetotalcostofanalysissignifi- cantly.OneofthesixstandardBStagsequencesoranadditionalBStagsequence(F9TCC:5’- CTAGTATCAGGACTCC-3’)wasaddedatthe5’endofthedesignedforwardprimer.Ashort ‘pigtail’sequencewasaddedatthe5’endofthereverseprimerinordertosuppressstuttering ofthedetectedpeak[83].Foreachgenotypinganalysis,fouroligonucleotideprimersetsthat wereindividuallyattachedtodifferentBStagsequencesweremixedwiththecorresponding fluorescentlylabeledBStagprimers.AtypicalPCRprogramfortheamplificationandpost- labelingofthetargetregionofthenucleargenomewas:initialdenaturationat94˚Cfor3min; 32cyclesoftargetamplification(20sat94˚Cfollowedby35sat52–65˚C);thenthreepost- labelingcycles(20sat94˚Cfollowedby10sat49˚Cand5sat72˚C);andfinalextensionat 72˚Cfor10minthenterminatedat4˚C.EachDNAmarkerwaslabeledseparatelywithoneof fourdifferentfluorescentdyesinasingletubeatthelabelingstep.Thereactionmixturewas dilutedtwofoldwithwaterafterthePCR.Then,a0.4-μLaliquotofthedilutedmixturewas mixedwith0.1μLGeneScan600LIZ1dyeSizeStandard(ThermoFisherScientific,Tokyo, Japan)andadjustedtobe10μLwithdeionizedformamide,andthenheatdenaturedat95˚C for4min.ElectrophoresisofthelabeledproductwascarriedoutonanABI3130xlDNA sequencer(ThermoFisherScientific,Tokyo,Japan)with36cmlengthcapillaryusingthestan- dardprogram.GenotypesofeachDNAmarker/samplewerecalledusingGeneMapper4.0 software(ThermoFisherScientific,Tokyo,Japan). Parentagetestandidentitytest Parentagewasconfirmedforassumedparent–offspringtriadsbyconsideringtheinheritance ofeachallelefromparentstooffspringaccordingtotheMendelianrule.AnyDNAmarkers showingdiscrepanciesinknownhybridswereexcludedfromtheanalysis.Theevaluationwas carriedoutusingafunctionofGUGS(GeneralUtilitiesforGenotypingStudy)software(Shi- mizu,T.inpreparation).Theidentitytestisasimpleexactmatchtestofeachgenotypetooth- ersforallcombinations.Ifapairofsamplescoincidedwitheachotherforthegenotypesofall oftheDNAmarkers,theyweretreatedasidentical.Inthisstudy,wecountedthenumberof DNAmarkersthatdidnotagreebetweenanygivenpairofsamples. Statisticalevaluationofthegenotypedata Observedheterozygosity(H ),expectedheterozygosity(H ,equivalenttotheunbiasedestima- o e torofgenediversitygivenbyequation8.4ofNei[84]),numberofuniquealleles,andpolymor- phicinformationcontent(PIC,representingtheprobabilityofdistinguishingamarkerallele derivedfromeitheroneoftheparents[85])werecalculatedusingthefrequencyanalysis PLOSONE|DOI:10.1371/journal.pone.0166969 November30,2016 6/58 HybridOriginsofCitrusVarieties functionofCervus[62]andconfirmedwithGUGS.Theprobabilityofmatch(PM),represent- ingtheprobabilitythatanunrelatedindividualhappenstohavethesamegenotypetoothers [60]isgivenby: P PM ¼ m p2: ð1Þ k¼1 k Here,p istheobservedfrequencyofeachuniquegenotypekinthepopulation,andmisthe k numberofuniquegenotypesatagivennuclearlocus.Thegenediversity(GD)ofasingleallelic organellegenotypeatagivenlocuswasevaluatedby P GD¼1(cid:0) m x2 ð2Þ i¼1 i (equation8.1ofNei[84]).Here,x istheobservedfrequencyoftheithsinglealleleinthepopu- i lation,andmisthenumberofallelesatanorganellelocus.Thisparameter(Nei’sGD)isan equivalentoftheexpectedheterozygosityfordiploidorganisms.Thevaluesoftheunique genotypes,PMandGD,wereobtainedusingafunctionofGUGS.Wright’sfixationindex(F ) w wasobtainedbytheequationF =(H −H )/H (equation12.9ofNeiandKumar[86]). w e o e Allstatisticalevaluationsofthenormaldistribution(Shapiro–Wilktest)andone-way ANOVA(Kruskal–Wallistest)wereconductedwiththestatspackageofR(version3.1.3, https://www.r-project.org/)intheRstudioenvironment(version0.99.893,https://www. rstudio.com/).Testsforequalvarianceandstochasticequalityoftwosampleswereconducted accordingtoBrown–ForsythetestandBrunner–Munzeltestusingfunctionslevene.testand brunner.munzel.testinthelawstatpackage[87].Thep-valueadjustmentformultiplesamples wascarriedoutbyBenjamini–Hochberg(BH)correctionwiththep.adjustfunctionofR.F- statisticsforpopulationanalysis(F ,F )[86,88,89]wereestimatedforeachsamplecategory IT IS orindividualDNAmarkerusingRpackageshierfstat[90]andpegas[91]incombinationwith adegenet[92].Additionally,Hedrick’sG'' [88],whichisanequivalentofF extendedto ST ST multiallelicDNAmarkers,wasestimatedgloballyorpairwiseusingthemmodpackageofR [93]incombinationwithadegenet[92]. EvaluationofHardy–Weinbergequilibrium AnexacttestofHardy–Weinbergproportionsformultiallelicgenotypedatawasestimated withaMarkovChainMonteCarlo(MCMC)simulationmethoddevelopedbyGuoand Thompson[94],thatwasimplementedasafunctionofArlequin(version3.5.2.2)[95].The genotypedatafileusedasinputforArlequinwasformattedwithCONVERTsoftware[96] withnopriorinferredpopulationstructure.WecontinuedtheMCMCsimulationruns10 timeseachfor1,000,000iterationsinbothinitialburn-inandde-memorizationsteps,and thentheaverageoftheestimatedp-valueswasprovidedforevaluation. Factorialanalysisandphylogeneticevaluation Principalcoordinateanalysis(PCoA)andphylogeneticanalysisoftheobtainedgenotypedata werecarriedoutwithDARWin(version6.0.13)[97,98].Adissimilaritymatrixwasobtained fromthegenotypesofeachsamplepairusingasimplematchingmethod(nucleargenotypes) orfrommodalitiesbyRogersandTanimoto’scoefficient(organellegenotypes).ThePCoA analysisassumedtwotosixaxes(typicallyfive),anddataforthefirsttwoaxeswereusedto drawascatterplot.Aconsensusphylogenetictreewasinferredfromthebootstrappeddissimi- laritymatricesobtainedfrom30,000iterationsforthenucleargenotypedataor5,000iterations fortheorganellegenotypedatausingtheweightedneighbor-joiningmethod[99],then obtainedconsensustrees. PLOSONE|DOI:10.1371/journal.pone.0166969 November30,2016 7/58 HybridOriginsofCitrusVarieties Structureanalysis Structureanalysisfortheinferenceofthebasictaxaandtheirproportionswascarriedout usingSTRUCTURE[100].Thegenotypedataforthe101representativeindigenousvarieties obtainedwiththe123selectedDNAmarkerswereformattedusingCONVERTsoftware[96] withnopriorinferredpopulationstructure.Missingdataweretreatedaslost(assigned‘-9’for thegenotypedata).Theanalysisassumedtheadmixturemodelforancestryandthatallelefre- quencieswerecorrelated.Intheestimationofthenumberofbasictaxa(K),wevariedKstep- wisefromtwototen,thenevaluatedtheprobabilitytentimesforeachKwith100,000 iterationsoftheinitialburn-inand1,000,000MCMCruns.TheinferredproportionsoftheK populations,andtheestimatedlnPr(X|K),meanlnP(K)anditsvariancewereusedtoobtain stdevLnP(K),L'(K)and|L''(K)|,thenΔKwasestimatedasthemeanof(|L''(K)|/stdevLnP (K)),followingEvannoetal[101].WeusedtheStructureHarvesterwebservice[102]athttp:// taylor0.biology.ucla.edu/structureHarvester/forthispurpose.Theinferredproportionsofthe KbasictaxawerededucedindividuallyfromtheoutputofStructureHarvesterusingthe GreedyalgorithmofCLUMPP[103].Wecomparedthefullsearchandrandominputorder runningmodesofCLUMPP,andalsochangedtherunningperiodforthepermutationanaly- sisfrom1,000to1,000,000,butallresultswereidentical.Wethereforeusedthesimulation resultsfromCLUMPPruninGreedymodewith100,000permutationruns.Thebarplotof inferredproportionswasdrawnwithMSExcel. Allele-sharingtestandstochasticverificationofinferredparentage Possibleparent-to-offspringrelationshipsbetweenvarietieswereexaminedusinganallele- sharingtest.ThetestevaluatestheratioofthenumberofDNAmarkersthatshareatleastone allelebetweentwovarietiestothetotalnumberofDNAmarkers.Anypairofvarietiesin whichnearlyallDNAmarkerssharedanallelebetweenthetwovarietieswasselectedasa candidateparent–offspringpair.Whentwovarietieswereassumedtobetheparentsofapar- ticularoffspringvariety,theparentageoftheassumedtriadwasexaminedusingtheparentage test. Theprobabilityoftheinferreddyadortriadbeingtruesingleparent-to-offspringor parents-to-offspringcombinationswasexaminedbylikelihoodratioanalysisaccordingto MarshaletalandJonesandArdren[62,63].Inthisanalysis,theprobabilitiesoftwohypotheses (H andH )arecompared.AssumeP(G|H )istheprobabilityofobservingaparticularpairof 1 2 1 genotypesGunderthehypothesisH ,andP(G|H )istheprobabilityofGunderthehypothesis 1 2 H .TheevaluatedP(G|H )relativetotheevaluatedP(G|H )willgivealikelihoodratioL(H , 2 1 2 1 H |G)thattheGwillbeobservedunderthetwohypothesesH andH : 2 1 2 PðGjH Þ LðH ;H jGÞ¼ 1 : ð3Þ 1 2 PðGjH Þ 2 Intheparentagetest,H presumesthataparticularvarietyisanoffspringoftheallegedpar- 1 entorparents,andH presumesthatitisnotanoffspringoftheallegedparentsbutachance 2 seedlingthathasarisenfromagivenpopulation.ThelikelihoodratioLrepresentstheproba- bilitythattheoffspringwasobtainedfromtheallegedparent(s)ratherthanbeingachance seedling. Forthestochasticevaluationoftheparentagetest,letg ,g andg representthegenotypes S P O oftheallegedseedparent,allegedpollenparentandoffspring,respectively,ataDNAmarker. Thelikelihoodratiothattheallegedparentsarethetrueparentsofthegivenoffspringvariety PLOSONE|DOI:10.1371/journal.pone.0166969 November30,2016 8/58 HybridOriginsofCitrusVarieties wasestimatedaccordingtoEq(3)fromJonesandArdren[63]: Tðg jg ;g Þ LðH ;H jg ;g ;g Þ¼ O S P : ð4Þ 1 2 S P O Pðg Þ B Here,thenumeratorT(g |g ,g )isthetransitionprobabilityofg giveng andg .Thisproba- O S P O S P bilitywasestimatedfromtheallelefrequenciesandagenotypecombinationaccordingto Table1ofMarshalletal[62].ThedenominatorP(g )isthefrequencyoftheoffspring’sgeno- B typeinaparticularpopulationobtainedaccordingtoTable2ofMarshalletal[62].Thevalue Listhelikelihoodratiothattheparentageofthistriadiscorrectcomparedtotheoffspring obtaineditsgenotypefromanunknownhybridcombination. Inasimilarmanner,anotherlikelihoodratiofortheallegedsingleparenttoanoffspring wasestimatedaccordingtoEq(2)ofJonesandArdren[63],orEq(5)ofMarshalletal[62]: Tðg jg Þ LðH ;H jg ;g Þ¼ O S : ð5Þ 1 2 S O Pðg Þ B Here,thenumeratorT(g |g )isthetransitionprobabilityofg giveng ,estimatedfromtheir O S O S allelefrequenciesandgenotypecombinationaccordingtoBrenner[104]orTable2ofMar- shalletal[62].Inmostparentageanalysesofwildplantpopulations,itisunknownwhichvari- etyistheseedparentorthepollenparent.Thus,aparticularallegedparentsamplewithout anypriorsupportinginformationwasassignedtoeitherg org arbitrarily.Theprobabilityof S P obtainingaparticulargenotypeinapopulationwasestimatedfromtheallelefrequenciesata givenDNAmarker,asx2forhomozygousgenotype,or2xyforaheterozygousgenotype, wherexandyaretheallelefrequenciesinapopulation.TheobtainedvalueListheratioofthe likelihoodthatthisisaparent–offspringdyadtothelikelihoodthattheoffspringisfromsome unknownhybridcombination.AllDNAmarkersusedintheparentagetestwerepresumedto beatHardy–Weinbergequilibrium(HWE)inthegivenpopulation.TheLODscore(thenatu- rallogarithmofthelikelihoodratio,LR)forthesetofgenotypesatmultipleDNAmarkersis givenbytheproductofLR: Q LODscore¼logð k LR Þ; ð6Þ m¼1 m whereLR isalikelihoodratioforatriadordyadatthemthDNAmarker.AnyDNAmarkers m thatshoweddiscrepanciesintheparentagetestorallele-sharingtestwereexcludedfromLOD Table2. SummaryofDNAmarkersusedinthisstudy. Type/source Evaluated Selected (%) Certified (%) Reference GenomicSSR/INDEL 154 104 67.5% 58 37.7% Thisstudy EST/cDNASSR 201 110 54.7% 87 43.3% Thisstudy Ollitrault,Fetal.2010 79 6 7.6% 6 7.6% 1) Chen,C.etal.2008 106 19 17.9% 12 11.3% 2) Chen,C.etal.2006 56 7 12.5% 6 10.7% 3) Total 596 246 41.3% 169 28.4% 1)Ollitrault,Fetal.(2010)Am.J.Bot.e124-e129. 2)Chen,Cetal.(2008)TreeGenet.Genom.4:1–10. 3)Chen,Cetal.(2006)TheorApplGenet.112:1248–1257. doi:10.1371/journal.pone.0166969.t002 PLOSONE|DOI:10.1371/journal.pone.0166969 November30,2016 9/58 HybridOriginsofCitrusVarieties scoreestimation.Therequiredcrosstrialindex(RCI)wasobtainedby: (cid:18) (cid:19) 1 RCI ¼log Q : ð7Þ N m f k¼1 k Here,Nisthenumberofindividualswithuniquegenotypeintheproposedpopulation,f is k theexpectedfrequencyofaparticulargenotypeatthekthDNAmarkerestimatedfromthe allelefrequenciesofthetwoallelesinthepopulation(equation7.4inNei[84]),andmisthe totalnumberofDNAmarkersusedfortheevaluation.Singleparent–offspringprobability (SPP)isnotalikelihoodratiovaluebutacumulativeprobabilitybetweentwoparticularindi- vidualsassumingthatoneistheallegedparentofaparticularoffspringvarietywithoutprior informationontheotherparent.TheSPPvaluefortheparticularoffspring(g )andthealleged O parent(g )isobtainedfromthetransitionprobabilityT(g |g )ofg giveng inasimilarman- P O P O P nertothatdescribedaboveby: P SPP¼ m T ðg jg Þ; ð8Þ k¼1 k O P wheremisthetotalnumberofDNAmarkersusedfortheevaluation.Thesetests,frequency analysesandprobabilityestimationswerecarriedoutusingfunctionsofGUGSsoftware.The inferredgenealogywasdrawnasafamilytreemanually,orusingHelium[105]. Results DevelopmentandevaluationofDNAmarkersfornucleargenotypingof citrus DNAsequencesofcitrusexpressedgenesfromclonedcDNA,EST,andRefSeqinpublic sequencedatabaserepositoriesortheharvESTcitrusdatabasewereusedforDNAmarker design.PreliminaryclusteringanalysisofESTsequenceswithasequenceassemblerreduced duplicationinthesedatasets,andyielded98,869consensussequencesfrom582,270EST sequences.Anotherclusteringanalysisofwholegenomeshotgunsequencesofsweetorange ‘RidgePineapple’yielded381,909consensussequencesfrom866,700reads,but46,341Clemen- tineBACendsequenceswerenotusedforassemblybecauseoftheirlowredundancy.SSRmin- ingofthesedatasetswithmreps[76]identified143,825candidateregionsfromtheconsensus ESTsequences,314,967fromtheconsensussweetorangewholegenomeshotgunsequences, and16,159fromtheClementineBACendsequences.SSRminingoftheClementinehaploid genomesequence[43](https://www.citrusgenomedb.org/)alsoidentified310,413candidate SSRregionsforbothv0.9(release165)andv1.0(release182)genomes.Thesecandidateregions wereverifiedwithresequencingdataobtainedfromNGSanalysisof15citrusvarieties(ban- peiyuA004,ClementineA009,dancyA016,hyuganatsuA036andA038,KingA054,Kishu A066,ponkanA108,SatsumaA113andA122,sweetorangeA162,willowleaf(Mediterranean) mandarinA200,‘Encore’B014,‘Harehime’B017,and‘Kiyomi’tangorB031).CandidateSSR regionsthatweresupportedwithmorethan40×Illuminareadcoveragewereselectedfor primerdesignbyreferringtheirmotifsize,repeatlength,genomeposition,geneannotation, specificityandversatilityamongcitrusvarieties.Wealsoidentifiedindelregionsbyreferringto resequencingdata,andthesewerealsousedforprimerdesign.Consequently,wedesignedSSR andindelmarkers(S2TablelistsDNAmarkersbytypeandgivestheirsources). VerifyinggenotypingerrorstoselectcertifiedDNAmarkers ThegenotypesoftheDNAmarkerswerepreliminarilyevaluatedforpeakheightandpeak heightratio,productsize,andnumberofallelesinasmallsamplesetconsistingofSatsuma, PLOSONE|DOI:10.1371/journal.pone.0166969 November30,2016 10/58
Description: