ebook img

Multiple paternity: determining the minimum number of sires of a large brood PDF

0.18 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multiple paternity: determining the minimum number of sires of a large brood

Multiple paternity: determining the minimum number of sires of a large brood A.Erikssona,b,B.Mehliga,M.Panovac,C.Andre´c,andK.Johannessonc aDepartmentofPhysics,Go¨teborgUniversity,Go¨teborg,Sweden bDepartmentofMarineEcology,UniversityofGothenburg,SE-43005Go¨teborg,Sweden cDepartmentofMarineEcology-Tja¨rno¨,UniversityofGothenburg,SE-45296Stro¨mstad,Sweden. Wedescribe anefficient algorithm for determining exactly the minimum numberofsires consistent with the multi-locusgenotypesofamotherandherprogeny.Weconsidercaseswhereasimpleexhaustivesearchthrough allpossiblesetsofsiresisimpossibleinpractice(becauseitwouldtaketoolongtocomplete). Ouralgorithm forsolvingthiscombinatorialoptimisationproblemavoidsvisitinglargepartsofsearchspacewhichwouldnot 9 improvethesolutionfoundsofar(i.e., resultinasolution withfewernumberofsires). Thisisofparticular 0 importancewhenthenumberofallelictypesintheprogenyarrayislargeandwhentheminimumnumberofsires 0 isexpectedtobelarge.Preciselyinsuchcasesitisimportanttoknowtheminimumnumberofsires:thisnumber 2 givesanexactboundonthemostlikelynumberofsiresestimatedbyarandomsearchalgorithminaparameter regionwhereitmaybedifficulttodeterminewhetherithasconverged. Weapplyouralgorithmtodatafromthe n marinesnail,Littorinasaxatilis. a J Keywords:Multiplepaternity,Combinatorialoptimisation,MicrosatelliteDNA 1 2 I. INTRODUCTION ] E A number of species from different taxa are known to mate numerous times during a mating season. Females of such P species are likely to give birth to offspring fathered by more than one sire, and in some cases, the offspring may be fa- . o thered by a large number of sires. For example, queens of the honeybee are known to leave the nest followed by hundreds i b ofmales(Wattanachaiyingcharoenetal.,2003). Femalesofthesaltwaterflymatemanytimesadayduringthematingseason - (BlythandGilbourn,2006)andinspeciesofperiwinkles(marinegastropods)femalesmaterepeatedlyduringthematingseason q whichisseveralmonthslong(Reid,1996;Saur,1990). [ Thedegreeofmultiplepaternitycanbe inferredfromempiricaldata, obtainedforexamplebygenotypingfemalesandoff- 1 springusinghigh-resolutiongeneticmarkers,suchasmicrosatellitesorsinglenucleotidepolymorphisms.Toestimatethenum- v berofsirescorrespondingtoagivenmulti-locusdatasetisusuallystraightforwardwhenthenumberofoffspring,thenumber 3 oftheirallelictypes,andthenumberofsirestobedeterminedaresmall. 3 Intheexamplesmentionedabove,however,thelevelsofmultiplepaternityareoftensohighthatthemathematicalanalysis 2 oftheempiricaldatabecomesverycomplexandtime-consuming.Inthispaperwedescribeanewefficientalgorithmtoanalyse 3 . multiplepaternityinempiricaldatasetswhenthepaternalgenotypesareunknown.Weapplythealgorithmtodatasetsfromthe 1 periwinklespeciesLittorinasaxatilisexhibitinghighlevelsofmultiplepaternity[ithasbeenshownpreviouslythatatleast7-10 0 malesarefatherstotheoffspringofonebrood(Ma¨kinenetal.,2007)]. 9 Consider an example. Tab. I shows the multi-locusgenotypesof 42 progenyfrom a brood of a female periwinkle. How 0 : doesonedeterminethenumberofsiresfromthedatashowninTab.I? Moreprecisely,thisquestionmaybeposedinatleast v threedifferentways: whatistheactualnumberofsires,asopposedtothemostlikelynumber,ortheminimumnumberofsires i X consistentwiththemotherandherprogenyarray? Unlessoneisabletodirectlyobservethematingsitisingeneralnotpossibletodeterminetheactualnumberofsiresforan r a arraysuchastheoneshowninTab.I. Analternativeistoinsteaddeterminethemostlikelynumberofsiresconsistentwiththe multi-locusgenotypesofmotherandprogeny(Wang,2004). Inthisapproachitiscommonlyassumedthatthepopulationisin Hardy-Weinbergequilibrium,thatalllociaresubjecttoneutralevolution,andthatalllociareinpairwiselinkageequilibrium. Themostlikelysetofsiresisdeterminedbyarandomsearchalgorithm(Pressetal.,1986),locallymaximisingthelikelihood constrainedbyrequiringconsistencywithmotherandprogeny.Arandomsearchhoweverdoesnotguaranteeconvergence.The onlywaytomakesurethatthealgorithmhasconvergedistocomparewithanexhaustivesearch. Exhaustivesearch algorithmshave beenpublishedin the literature. An exampleavailable for downloadis GERUD (Jones, 2001,2005). Theexhaustivesearchiscommonlyconductedasfollows: alistofpaternalallelesatalllociisdeterminedfrom theallelesintheprogenyarray,subtractingthoseofthemother. Ifforagivenchildatagivenlocusthepaternalallelescannot be uniquelyextracted, both are kept in the list. From this list, a set of potentialsires is obtained by constructingall possible multi-locuspaternalhaplotypes.Thislistisprunedbyremovingindividualswhicharenotconsistentwithanyoftheprogenyin thedata.Theminimumnumberofsiresisdeterminedbyexhaustivelysearchingthisprunedset:firstthealgorithmtestswhether asinglefatherfromthissetisconsistentwiththeprogenyarray. Ifthisisnotthecase,allpairsofsiresaretested,ifnecessary alltripletsofsires, andsoforth. Thisalgorithmensuresthattheminimumnumberofsiresconsistentwiththedataarefound. Thisalgorithmhasbeensuccessfullyusedinmanycircumstances.Itworkswellwhenthelistofpaternalallelesisnottoolong, andwhentheminimumnumberoffatherstobedeterminedisnottoolarge. 2 TheminimumnumberofsiresforthedatasetsshowninTab.Iisfoundtobetwelve(usingthealgorithmdescribedbelow). With the search algorithm summarised above, one would have to go througha prohibitivelylarge number of sets of possible sires. Sincethenumberofsetstosearchtypicallyincreasescombinatoriallywiththenumberofsires,theexhaustivesearchinthe algorithmproposedby(Jones,2005)islimitedtomaximallysixsiresforagivensample.Inpractice,whenthenumberofsiresis fiveorsix,thealgorithmmaytakeverylongtoconverge(Sefcetal.,2008). Inmanypracticalapplications(Amavetetal.,2008; Portnoyetal.,2007;RispoliandWilson,2008;Simmonsetal.,2008;Songetal.,2007;Takagietal.,2008;VanDorniketal., 2008)thenumberofsiresdeterminedwithGERUDdoesnotexceedfour. (ItshouldbenotedthatGERUD2.0allowstheuser to truncate the pruned set of sires in an ad-hoc fashion in order to check for up to eight sires. But due to the truncation, the minimumnumberofsiresispotentiallyoverestimated.Byhowmuchisunclear.) AmaximumlikelihoodapproachindicatesthatthenumberofsiresinsamplesofL.saxatilis(Ma¨kinenetal.,2007)istypically largerthansix. We havethereforedevelopedanewalgorithmfordeterminingtheminimumnumberofsiresconsistentwitha givenprogenyarray,suchasthatshowninTab.I. ThealgorithmisdescribedinSec. II. Itmakesitpossibletoexactlydetermine the minimumnumberof sires for the empiricaldata sets listed in Tab. II, in seven of the nine data sets the minimumnumber of sires is found to be larger than six. Further, the new algorithm enables us to estimate the number of sires for a data set whenconvergenceofmaximum-likelihoodalgorithmsisdifficulttoascertain. Fig.3forexampleshowsarunofthemaximum- likelihood algorithm COLONY (Wang, 2004) (which estimates the most likely number of fathers given the populationallele frequencies)forthedatashowninTab.I,comparedtotheminimumnumberofsires(whichistwelveforthisdataset). Fig.2 raisesthequestionhowlikelyitisthattheminimumnumberofsiresisequaltotheactualnumberofsiresforagivensample. Usingouralgorithmwehaveinvestigatedthisquestionemployingdatasetsgeneratedbyacoalescentalgorithmwithinastep- wisemutationmodel: theanswerdependsuponthepropertiesofthepopulationinquestion. Incaseswheretheprobabilitythat thetwonumbersarethesameishigh,wecaninferthattheactualnmberofsiresinthissituationcanbereliablyestimatedfrom empiricalsamples(itshouldbeemphasizedthattheresultofouralgorithmalwaysisanexactlowerbound). Lastbutnotleast, weemployouralgorithmtoanswerthefollowingquestion:givenanempiricaldataset,howcouldonemostefficientlyincrease theaccuracyoftheestimateofthenumberofsires: bygenotypingmorelociforagivensetofprogeny,orbygenotypingmore progenyforagivennumberofloci? Again,theanswerdependsuponthepropertiesofthepopulationinquestion. The remainder of this article is organised as follows. In Sec. II we describe the new algorithm, called ‘MinFathers’. We alsobrieflydescribehowweproducedartificialsamplesusingthecoalescentinordertotestthenewalgorithm. Ourresultsare summarisedinSec. III. ConclusionsaredrawninSec.IV. II. METHODS A. Anefficientsearchalgorithm Inthissectionwedescribeournewalgorithmwhichdeterminestheminimumnumberofsiresconsistentwithagivenprogeny array,suchasthatshowninTab.I. Tofindtheminimalsetofsiresisequivalenttofindingapartitionoftheprogeny,suchthatallprogenyinagivenmemberof thepartitioncanbeinheritedfromasinglefather. Afatherisrepresentedbyalistofthetwoallelesateachlocus.Eachpaternal allelemayeitherhaveadefinitevalue,ornovaluemayyethavebeenassignedtothisallele. Eachprogenytooisrepresented byalistofalleles,oneforeachlocuswhenthealleleofthemotherhasbeensubtracted. Usually,theallelictypesareuniquely determined. Thereare,however,twoexceptions: whenagenotypeerrorhasoccurred,andwhenthemotherandtheoffspring areidenticalandheterozygous. Inthiscase,atthelocusinquestion,therearetwopossibleallelesfortheprogeny. We ignore thesecomplicationsforthemoment,andreturntothemlater,afterhavingdescribedthealgorithminitssimplestform. In ouralgorithm,the mostgeneralcommonfatherfor a set of progenyis foundthrougha sequenceof mergingoperations. Thisoperationmapsthetwomostgeneralfathersoftwosetsofprogenytothemostgeneralfatherofthecombinedset. Since wearealwayssearchingforminimumnumberoffathers,thefathersarealwaystakentobeheterozygousateachlocus(orsome lociremainundetermined). Themergingoftwofathersf andf′proceedsindependentlyateachlocus(sincefreerecombinationisassumed).Assumethat acommonfatherf forasetofj progenyhasbeenfound. Nowaddanotherprogenytothisset. Assumethatthenewprogeny hasallelic typea atagivenlocus. Itsmostgeneralfatherhasthegeneticconfiguration{a,0}atthislocus. Themostgeneral commonfatherf′ofthejointsetofj+1progenyisobtainedbymergingf and{a,0}asdescribedinTab.III. Atagivenlocus, thefatherf′mayhaveseveralpossibleconfigurations,dependingontheconfigurationsoff andthefatherofp. Inthetable,the asterisk denotesthatthe correspondingallelic typehasnotyetbeendetermined,or is unknown. The mostgeneralfather ofa singleprogenyis{a,∗}. Now consider a partition F of j progeny. We introduce the following terminology. A partition is called valid if for each elementof the partitionthere is a commonfather for all progenyin the element. In other words, a valid partition of progeny correspondsto a set of fathers for the progenyin question. Our algorithm can now be formulated as follows: for each valid partitionF ofprogeny1,...,j,generateallvalidpartitionsF′ofprogeny1,...,j+1byaddinganewprogenyj+1toeach 3 elementofF providedthenewfathermergeswiththecommonfatheroftheelementofF intoavalidcommonfather. Starting fromanemptysetoffathers,F =∅,andasetofprogenyS,wecanfindallvalidsetsoffathersofS bythisalgorithm. It is possible to find the minimum number of fathers from this method by taking the minimum modulus of all partitions found.Thisisusuallymuchbetterthanfirstgeneratingthefullsetofpartitionsoftheprogeny,andsubsequentlycheckingwhich partitionsare valid; how muchmore efficient this is dependson the data. Our algorithm for findingthe minimumnumberof fathersissummarisedinFig.1. ItrecursivelybuildsallvalidpartitionsofasetofprogenyS,exceptsomepartitionsthatcanbe showntonotcorrespondtotheminimumnumberoffathers. InFig.2weshowasearchtreefora simplesetoffourprogeny. Eachprogenyhastwoloci,andateachlocustheallelecorrespondingtothemotherhasbeensubtracted,sothateachprogenyis describedbyalistoftwoalleles(oneforeachlocus): p = (a,b),p = (c,d),p = (c,e),andp = (a,d). Itisassumedthat 1 2 3 3 a,b,c,anddarefourdifferentallelictypes. We conclude this section by emphasizing four important points. First, note that if we have a given number of fathers for progeny1,...,j,thenumberoffathersforprogeny1,...,j+1mustbeatleastashigh. Hence,wheneverthesetoffathersis atleastaslargeastheminimumnumberfoundsofar,wecanstopsearchingforpartitionsofS basedonthecurrentpartition. When we have a complete partition of S which is smaller than the minimum found so far, we can update the minimum. As a startingpoint, we can use any validupperbound; in the presentimplementationwe use the trivialboundthatthe minimum numberoffathersisn∗ ≤⌈|S|/2⌉.Usingouralgorithm,wehave n∗ =MinFathers(∅,S,⌈|S|/2⌉). (1) Second, the algorithm as described above is valid only if the genetic material inherited from the father at each locus is uniquelydetermined. In general, this is notthe case, as pointed out above. Instead, there may be one or more loci with two possiblechoices.Inthiscase,whengeneratingthevalidpartitionscontainingagivenprogenyweloopoverallpossiblevariants ofallelesintheseloci. Inpracticaldata,itisraretohavemorethanonesuchlocus,butifmanylociareconsideredthiscould causeproblems: thenumberofvariantsoftheallelethatmayneedtobeconsideredis2m,wheremisthenumberoflociwith multiplechoicesintheprogeny. Fordatasetswherethisisaproblem,itispossibletoextendthealgorithmtoamorecomplex mergingoperation,whereforeachlocusofafather,wekeeptrackofallpossiblepairsofallelesthatcansimultaneouslymatch allprogenyderivingfromthefather. Third,ifwefindthatsomeoftheprogenynotyetincludedinthecurrentsetoffatherscanbedirectlyinheritedfromanyof thefathers(i.e. thefathercontainsthenecessarygeneticmaterialatallloci),wecansafelyremovethemfromfromthesetof progeny. In other words, if a new progenyp can be directly inherited froma father f, a merge f′ between f and p will lead tof′ = f, i.e. nochangeinf. Hence,itisclearthatgiventhepresentsetoffathers,itisnotpossibleto findanotherwayof mergingthese progenywhichwilllead tofewerfathersforthe wholeset. In ouralgorithm,wheneverweadda new fatherto the partition, we removethe progenythat can be directly inherited from the new father, beforerecursively searchingfor new partitions. Fourth, in general the key to an efficient solution of a combinatorial problem such as the present one lies in cutting away as large parts of the search space as early as possible. In the present context, this means that if we can consider the most constrainingprogenyfirst, we can discard partitions that will not be valid for the whole set of progeny,or that will be larger than the minimum, at an early stage. We sort the progenyfirst with respectto the numberof undeterminedsymbols(if any), andwherethennumberofno-caresymbolsisequal,withrespecttothenumberofmultiplechoicelociintheprogeny,sothat individualswithmultiple-choiceorundeterminedlociwillbeconsideredfirst. Beforethesorting,wecheckthemultiplechoice loci. Consider a multiple choice locus in a given progeny. If the two alleles only occur togetherat that locus in all progeny, orifonlyoneofthelocioccuraloneortogetherwithsomethirdallele,wecansafelyreplacethemultiplechoicebythemost frequentoftheallelictypes. Pickingtheleastfrequentallelictypecanonlyexcludesomepossiblemergesthatwouldhavebeen possiblewiththemorecommonallele.However,ifbothallelesoccuralsoaloneorwithsomethirdallelictype,wecannotsafely concludewhichchoicewillleadtotheminimumnumberoffathers. Inthiscaseitisnecessarytokeepbothoptions. B. Generatingsamplesusingthecoalescent Inthissectionwedescribehowwehaveusedthecoalescenttogenerateartificialsamplesinordertotestthesearchalgorithm describeintheprevioussection. Inordertounderstandunderwhichcircumstancestheminimumnumberoffathersisequaltothetruenumberoffathers,we generaten progenywithaknownnumberoffathers,n ,asfollows. First,thegenegenealogiesoftheLlociinamotherand p f n fathersare generatedaccordingto the standardneutralcoalescenttheory for an unstructuredpopulationwith constantsize f N. Because weassume thatthe lociareunlinked,thegenegenealogiesofdifferentlociare statistically independent. Ineach branchof the genealogies, mutationsoccur with probabilityµ per generation, so thatthe numberof mutationsin a branchof T generations is Poisson distributed with mean µT. In the coalescent, time is measured in units of 2N generations and the mutationrateisgivenbythescaledparameterθ =4Nµ. 4 Wemodelmicrosatellitedatausingthestepwisemutationmodel,whereeachmutationleadstoeitherthegainorthelossof a single repeat unit(KimuraandOhta, 1975, 1978). Thus, given the genealogy, for each locuswe start from the most recent commonancestorofthewholesample,andassignitallele0(inthestepwisemutationnumber,onlydifferencesinthenumber ofrepeatunitsarerelevant). Wethenrecursivelygeneratetheallelesofeachnodeinthegenealogybygeneratingthestepwise mutationsalongeachbranchasdescribedabove,untilwehaveassignedtheallelesforallindividualsinthesample. Giventheallelictypesofthemotherandthefathers,weproducetheoffspringasfollows:foreachprogenywepickarandomly chosenfather,suchthateachfatherisequallylikelytobepicked.Ifintheendnotallfathershavebeenpickedforatleastsome offspring,werepeatthewholeprocessuntilthisisthecase. Thisguaranteesthatthetruenumberoffathersisexactlyn . For f each locus in the progeny we then form the progeny according to Mendelian inheritance from the mother and the father, by picking one allele from the mother and one from the father. This procedure guarantees that the marginal distribution of the numberofoffspringperfatherisapproximatelybinomial,whichisconsistentwiththeneutraltheoryandwithempiricaldatafor thesnails(Ma¨kinenetal.,2007). Theperformanceof‘MinFathers’dependsontheamountofgeneticvariation(determinedbythemutationrateθ)andonthe number of progeny per father, n = n /n . When θ is small, the minimum number of fathers is small and the algorithm p/f p f terminatesquickly.Alsowhenθislargethealgorithmisefficientbecauseitcanusuallyeliminateimpossiblemergesatanearly stage. When θ is intermediate and n is large, however, the algorithm may have to investigate a significant fraction of the p/f possiblecombinations,andinthesecasesitmaynotbepracticaltousethealgorithm.Despitethiscaveat,Tab.IIandtheresults presentinthenextsectionshowthatthealgorithmcanbeusedonempiricaldatawithalargenumberofprogenyandacrossa largerangeofparametersforθandn . p/f III. RESULTS InthissectionwedescribetheresultsobtainedwiththenewalgorithmproposedinsectionII. A. Applicationtoempiricaldata We have applied our algorithm to the L. saxatilis data by (Ma¨kinenetal., 2007), and to a new L. saxatilis data set (Bostro¨metal. , 2009, in preparation),given in Tab. I. We begin by describingour results for the new data set (Tab. I). The original data contains more progeny than listed in Tab. I. Using our algorithm we find that the minimum number of sires is twelve,asgiveninthefirstrowofTab. II. ThealgorithmGERUD2.0couldnotberunbecausethisalgorithmdeterminesthe exactsolutiononlyforupto sixsires. We havealso runCOLONY (Wang,2004) (whichestimatesthe mostlikelynumberof sires), andthe correspondingresultsareshownin Fig. 3. Itisnotentirelyclearwhetherthealgorithmhasconverged;the log likelihood may have reached a plateau but it is not clear whether further explorationof the state-space may yield still higher likelihoodvalues.Itisthereforevaluabletohavetheexactlowerbound(alsoshowninFig.3)fromournewalgorithm. Usingthenewalgorithm,MinFathers,wehavere-analysedthedataof(Ma¨kinenetal.,2007);thecorrespondingresultsare alsogiveninTab. II. ThelasteightdatasetsinTab.IIwereanalysedusingGERUDby(Ma¨kinenetal.,2007)andarebroadly consistentwiththecorrespondingresultsofCOLONY.Itmustbenotedhoweverthatin(Ma¨kinenetal.,2007),thesearchwas performedonthreelocionly,andusinganadhoctruncationofthepossiblesetoffathers. Itisthereforeofinteresttodetermine whattheminimumnumberofsiresactuallyis. ThecorrespondingresultsareshowninTab.II,andprovideanexactlowerbound forthemostlikelynumberoffathersdeterminedby(Ma¨kinenetal.,2007). Exceptintwocases,GERUD2.0couldnotberun becausethetrueminimumnumberoffathersexceededthemaximumvalueofsix. TheresultssummarisedinTab. IIraisethequestionofhowmuchlargerthantheexactminimumoneexpectsthemostlikely numberof fathersto be. The answer dependsuponthe populationmodel, and upon the parametersdescribing it, such as the mutation rate θ, the number L of loci, and the numbern of progenyin the data. This question is addressed in the following section. B. Thedifferencebetweentheminimumandthemostlikelynumberofsires InthissectionweconsiderapopulationinHardy-Weinbergequilibrium,weassumethatalllociaresubjecttoneutralevolu- tion,andthatalllociareinpairwiselinkageequilibrium. Weposethequestion: howmuchlargerthantheminimumnumberis themostlikelynumberofsiresinabroodofagivenmother?Tothisendwegeneratedsamplesusingthecoalescentasdescribed inSec. II.B. Figs. 4and5summariseourresults. First, Fig.4showstheprobabilitythatthenumberoffathersis equalto theminimum numberoffathers,p ,asafunctionofthenumbern ofprogenyperfather,forthreevaluesofthemutationrateθ;θ = 1, same p/f θ = 10,andθ = 100. Whenθ islarge,weobserveasharptransitionwherep increasesfromalmostzerotoalmostunity. same Whenθissmall,however,samplingmanyoffspringdoesnotresultinasignficiantincreaseofp becausethefathersaretoo same 5 geneticallysimilar. Forintermediatevaluesofθ, p doesincreasewiththenumberofprogenyperfather,butneverreaches same unitybecausethefathersarestillsignificantlygeneticallycorrelated. Second,Fig.5showshowp changesasafunctionofθforfivedifferentvaluesofthenumberoflociL. WhenL=1,the same minimumnumberoffathersisalwayssmallerthanthetruenumberoffathers(unlessthereisonlyonefather),thereforep same equalszero. ForL≥2wefindthatp increasesasafunctionofbothθandL. Theextenttowhichtheminimumnumberof same fathersagreeswiththetruenumberoffathersdependsontheprobabilitythatthefathersareallgeneticallydistinct. WhenLis increasing,thishappensforsmallervaluesofθ,asindicatedbytheincreasinglysharptransitionsfromp =0top =1in same same thefigure. Thus,whenthemutationrate,numberofloci,andnumberofoffspringperfatheraresufficientlyhighsothatp ≈ 1,the same minimumnumberoffathersalmostalwaysequalsthetruenumberoffathers. Asaconsequence,theresultingnumberofsires contributingtoagivenfamilydoesnotdependonwhetherthepopulationisstructuredorpanmitic. Inotherwords,theresultis insensitivetoassumptionsabouttheunderlyingpopulationstructure. IV. DISCUSSION Weconcludewithadiscussionof,first,howtheminimumnumberoffathersisinfluencedbygenotypingerrors. Second,we addressthefollowingquestion. Iftheaimistoincreasetheprobabilityofdeducingthetruenumberoffathersfromempirical datasuchasTab.I,isitbetteroftosamplemoreoffspringormoreloci? Microsatellites can be prone to genotyping errors (see reviews by DeWoodyetal., 2006; HoffmanandAmos, 2005; SelkoeandToonen, 2006). Sourcesof error when genotypingmicrosatellites include stuttering (appearanceof PCR products oneormorerepeatsshorterthananactualallele),alleledropout(non-amplificationofoneofthetwoallelesinaheterozygote, usuallyalongerone),non-specificPCR-productsduetoannealingofprimerstomultiplesites, nullalleles(alleleswithamu- tationintheprimerregion,whichpreventtheiramplificationinPCR)and,finally,mistypingandothermistakesduringmanual scoringoftheresults. Changestothenumberofrepeats(PCRstutteringevents)maycausetheminimumnumberoffatherstoappearlargerthanit actuallyis. PCRstutteringeventsinamicrosatellitelocusareusuallyincremental(singleadditionsordeletionsofarepeatunit). Hence,whenthesamplesizeislarge,itislikelythattheresultingallelesarepresentinotherfathers(theeffectofPCRstuttering issimilartomutationsoccurringduringmeiosis). Therefore,whenthefrequencyofstutteringeventsissmall,theeffectonthe minimumnumberoffathersisexpectedtobesmall. Whenonlyasinglealleleisamplified(e.g. becauseofalleledropoutornullalleles),theminimumnumberoffathersofthe samplemayin-ordecrease. Ifthefrequencyofsucherrorsissmall,however,theeffectisexpectedtobesmall: assumingthat theincidenceoftheseerrorsareindependentacrossdifferentloci,thelikelihoodthattheerrorschangetheminimumnumberof fathersdecreasesrapidlywithincreasingnumberofloci. Wenowturntothequestionofhowtobestincreasetheaccuracyofestimatingthetruenumberoffathers. Whenthenumber ofoffspringislarge,asinthemarinesnails,therearetwopossibilitiesforincreasingtheprobabilitythatwecandeducethetrue numberof fathers. One mayeither samplethe same lociin moreoffspring,orone maysample morelociin theoffspringwe alreadyhave. Whichisthebetteroption? Ourresultsshowthatincreasingthenumberoflocigenerallyprovidesthequickest wayofincreasingtheprobabilityoffindingthetruenumberoffathersfromtheminimumnumberoffathers,butwhetherthisis feasibleornotdependsontheavailabilityandcostofadditionalhigh-qualitymarkers(seeFigs.4and5). Itmaybelesscostly to sample moreindividualswith fewerloci, if possible. In this case, however,the accuracymay be limited by the numberof offspringavailablebutalsothegeneticvariationintheloci.Iffewlociaresampled,andthemutationrateislow,ourresultsshow thatitmaynotbepossibletoincreasetheaccuracybysamplingmoreindividualsbeyondacertainlimit,whichisdeterminedby theprobabilitythatthefatherssharethesamealleles. InordertorelatethetheoreticaldiscussionofSec.3.2totheempiricaldata(Tabs.IandII),wehaveestimatedtheparameterθ fromthedatainMa¨kinenetal.(2007)usingtwostandardestimators,θˆ =h(x −y )2i(Wehrhahn,1975)andθˆ =(F−2−1)/2 v i j F (OhtaandKimura, 1973). Here x and y are alleles of progeny i and j from mothers x and y, respectively, and F is the i j homozygosity(theprobabilitythatx =y ).Itisknownthatθˆ isunbiased[buthasalargevariance(ZhivotovskyandFeldman, i j v 1995)],whereasθˆ isbiasedforlargevaluesofθ, buthasasmallervariance. ForthedatainTab.IIweobtainθˆ = 124and F v θˆ = 49whenaveragedoverallfiveloci. While theseestimatesareuncertainbecauseofthesmallsamplesize, wesee from F Figs.4and5thatthenumberofprogenysampledin(Ma¨kinenetal.,2007)(n = 21)isprobablytoolowtoreliablyestimate p the true numberof fathers (the estimated n is the range 2...4 for these data). Increasingthe number of loci is not likely p/f tohelpverymuch. Theprogenysampledwerechosenfromlargefamilies(of70to100progeny). Ourresultsindicatethatin thiscase,thebeststrategyforincreasingtheaccuracyofthenumberoffathersistosamplestillmoreprogeny.Indeed,analysis ofthefullsetofprogenythisdatasetindicatesthatthetruenumberoffathersinthesefamiliesissignificantlyhigherthanthe minimumnumberoffathersreportedinTab.II(Bostro¨metal. ,2009,inpreparation). Acknowledgments. We thank J. Bostro¨m, T. Hofving, T. Areskoug and T. Ma¨kinen for providing data, and the Swedish ResearchCouncil,theLinneusinitiativeAdaptationtoChangingMarineEnvironments(ACME)andtheCentreforTheoretical 6 BiologyatGothenburgUniversityforsupport. References AmavetP.,RossoE.,MarkarianiR.,etal.,MicrosatelliteDNAMarkersAppliedtoDetectionofMultiplePaternityinCaimanlatirostrisin SantaFe,Argentina.J.Exp.Zool.309A:637-642,2008. Bostro¨mJ.,HofvingT.,Andre´C.,AreskougT.,ErikssonA.,JohannessonK.,MehligB.,Ma¨kinenT.,andPanovaM.,unpublished,2009. BlythJ.E.andGilburnA.S.,Extremepromiscuityinamatingsystemdominatedbysexualconflict.J.Ins.Behav.,19:447-455,2006. DeWoodyJ.,NasonJ.D.andHipkins.V.D.,Mitigatingscoringerrorsinmicrosatellitedatafromwildpopulations.MolecularEcologyNotes, 6(4):951–957,2006. HoffmanJ.I.andAmosW.,Microsatellitegenotypingerrors:detectionapproaches,commonsourcesandconsequencesforpaternalexclusion. MolecularEcology,14(2):599–612,2005. JonesA.G.,GERUD1.0: acomputerprogramforthereconstructionofparentalgenotypesfromprogenyarraysusingmultilocusDNAdata. MolecularEcologyNotes,1:215-218,2001. Jones A. G., GERUD 2.0: a computer program for the reconstruction of parental genotypes from half-sib progeny arrays with known or unknownparentage.MolecularEcologyNotes,5:207-711,2005. KimuraM.andOhtaT.,Distributionofallelicfrequenciesinafinitepopulationunderstepwiseproductionofneutralalleles.PNAS,72:2761– 2764,1975. KimuraM.andOhtaT.,Stepwisemutationmodelanddistributionofallelicfrequenciesinafinitepopulation.PNAS,75:2868–2872,1978. Ma¨kinenT.,PanovaM.,Andre´C.,HighLevelsofMultiplePaternityinLittorinasaxatilis:HedgingtheBets?J.Hered.,98:705-711,2007. OhtaT.andKimuraM.,Amodelofmutationappropriatetoestimatethenumberofelectrophoreticallydetectableallelesinafinitepopulation. GenetRes,22(2):201–204,1973. PressW.H.,TeukolskyS.A.,VetterlingW.T.,andFlanneryB.P.,NumericalRecipes,CambridgeUniversityPress1986. PortnoyD.S.,PiercyA.N.,MusickJ.A.,etal.,Geneticpolyandryandsexualconflictinthesandbarshark,Carcharhinusplumbeus,inthe westernNorthAtlanticandGulfofMexico.Mol.Evol.,16:187-197,2007. ReidD.G.,SystematicsandevolutionofLittorina.RaySociety,London,1996. RispoliV.F.andWilsonA.B.,Sexualsizedimorphismpredictsthefrequencyofmultiplematinginthesex-rolereversedpipefishSyngnathus typhle.J.Evol.Biol.,21:30-38,2008. SaurM.,MatediscriminationinLittorinalittorea(L.)andLittorinasaxatilis(Olivi)(Mollusca:Prosobranchia).Hydrobiologia193:261-270, 1990. SefcK.M.,MattirsdorferK.,SturmbauerCh.,andKoblmu¨llerS.,Highfrequencyofmultiplepaternityinbroodsofasocialymonogamous cichlidfishwithbiparentalnestdefence.Mol.Ecol.,17:2531,2008. SelkoeK.A.andToonenR.J.,Microsatellitesforecologists:apracticalguidetousingandevaluatingmicrosatellitemarkers.EcologyLetters, 9(5):615–629,2006. SimmonsL.W.,BeveridgeM.,andEvansJ.P.,MolecularEvidenceforMultiplePaternityinaFeralPopulationofGreenSwordtails.J.Hered., 99:610-615,2008. SongS.D.,DrewR.A.I.,andHughesJ.M.,Multiplepaternityinanaturalpopulationofawildtobaccofly,Bactroceracacuminata(Diptera :Tephritidae),assessedbymicrosatelliteDNA.Mol.Ecol.,16:2353-2361,2007. TakagiM.,SakaiK.,andTaniguchiN.,DirectevidenceofmultiplepaternitiesinnaturalpopulationofviviparousJapanesesurfperchbyallelic markersofmicrosatelliteDNAloci.Fish.Science,74:976-982,2008. VanDoornikDM,ParkerSJ,MillardSR,etal.,MultiplepaternityisprevalentinPacificoceanperch(Sebastesalutus)offtheOregoncoast, andiscorrelatedwithfemalesizeandage.Env.Biol.Fish.,83:269-275,2008. WangJ.,Sibshipreconstructionfromgeneticdatawithtypingerrors.Genetics,166:1963-1979,2004. Wattanachaiyingcharoen W.,OldroydB.P.,WongsiriS.,PalmerK.,PaarJ.,AscientificnoteonthematingfrequencyofApisdorsata.Api- dologie,34:85-86,2003. WehrhahnC.F.,Theevolutionofselectivelysimilarelectrophoreticallydetectableallelesinfinitenaturalpopulations.Genetics,80(2):375– 394,1975. ZhivotovskyL.A.andFeldmanM.W.,Microsatellitevariabilityandgeneticdistances.PNAS,92(25):11549–11552,1995. 7 Figures 8 functionMinFathers(F,S,nˆ)returnintegeris begin ifS isemptyand|F|<nˆthen nˆ :=|F| PrintfathersinF elseif|F|<nˆ then pickaprogenyp∈S foreachvariantqofploop foreachvalidfatherf′ ∈{Merge(f,q):f ∈F}loop S′ :={s∈S\{p}: scannotbeinheritedfromf} nˆ :=min(nˆ,MinFathers(S′,(F\{f})∪{f′},nˆ)) endloop S′ :={s∈S\{p}: scannotbeinheritedfromq} nˆ :=min(nˆ,MinFathers(S′,(F ∪{q},nˆ)) endloop endif returnnˆ endMinFathers FIG.1 Algorithmfor finding the minimum number of fathers. S istheset of progeny, F isthe set of fathers (F = ∅initially), nˆ isthe minimumnumber offathersforthewholesetofprogeny foundsofar(⌈|S|/2⌉initially). Seethetextforanexplanationofthealgorithm. Underwhichcircumstancesaloopovervariantsqofpisnecessaryisexplainedinthetext. 9 F={} S={1,2,3,4} F={{1}} S={2,3,4} F={{1,2}} F={{1},{2}} S={3,4} S={3,4} F={{1,2,3}} F={{1,2},{3} F={{1,3},{2}} F={{1},{2,3}} F={{1},{2},{3}} S={4} S={4} S={4} S={4} S={4} F={{1,2,3,4}} F={{1,2,4},{3}} F={{1,3,4},{2}} F={{1,4},{2,3}} F={{1,4},{2},{3}} S={} S={} S={} S={} S={} F={{1,2,3},{4}} F={{1,2},{3,4}} F={{1,3},{2,4}} F={{1},{2,3,4}} F={{1},{2,4},{3}} S={} S={} S={} S={} S={} F={{1,2},{3},{4}} F={{1,3},{2},{4}} F={{1},{2,3},{4}} F={{1},{2},{3,4}} S={} S={} S={} S={} F={{1},{2},{3},{4}} S={} FIG.2 ShowsasearchtreeforfourprogenyconstructedbythealgorithmdescribedinSec. II. Eachprogenyhastwoloci,andateachlocus the allelecorresponding to themother has been subtracted, sothat each progeny isdescribed by a listof twoalleles(one for each locus): p1 = (a,b),p2 = (c,d),p3 = (c,e),andp3 = (a,d). Itisassumedthata,b,c,anddarefourdifferentallelictypes. Thefigureillustrates howlargepartsofthesearchtreecanbecutaway(yellow)becausetheyneednotbevisited. Thismayconsiderablyspeedupthealgorithm. Inthefigure,eachboxrepresentsthestateofthealgorithmateachiteration;F isthesetoffathers(eachfatherisrepresentedbythelistof offspringassignedtoit),andSisthesetofoffspringthealgorithmhasyettobeassignedtoafather.Thestatesarevisitedfromtoptobottom, and fromleft toright, followingthe linesthat emanate from thebottom of each node, except theterminal nodes (shown assquares). The algorithmstopswhenthesetSisempty.Asetofindividualsthatcannotinheritfromasinglefatherisshowninbolditalicfont(i.e.thefather isinvalid). Thestateswhicharecolouredyellowarenevervisited,eitherbecausetheydescendfromastatewithaninvalidfather,orbecause itcanbeseenthatthestatedoesnotleadtoasolutionwithfewerfathersthanthebeststatefoundsofarinthesearch. 10 −500 d o o −505 h keli −510 li g −515 o L −520 2 4 6 8 10 No. iterations[×105] 20 18 s r e h at16 f o. N 14 12 0 2 4 6 8 10 No. iterations[×105] FIG.3 ShowsarunofCOLONY(Wang,2004)forthedatashowninTab.I.Top:Thetimeevolutionoftheloglikelihoodofthedata.Bottom: Themostlikelynumberofsiresasafunctionofthenumberofiterations(dots). Alsoshownistheexactlowerboundprovidedbythenew algorithm,MinFathers(dashedline).

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.