G3: Genes|Genomes|Genetics Early Online, published on August 1, 2018 as doi:10.1534/g3.118.200340 Recursive algorithms for modeling genomic ancestral origins in a fixed pedigree ChaozhiZheng∗,1,MartinP.Boer∗andFredA.vanEeuwijk∗ ∗Biometris,WageningenUniversityandResearch,Wageningen,TheNetherlands ABSTRACT The study of gene flow in pedigrees is of strong interest for the development of quantitative KEYWORDS trait loci (QTL) mapping methods in multiparental populations. We developed a Markovian framework for Junctiontheory modelingancestraloriginsalongtwohomologouschromosomeswithinindividualsinfixedpedigrees. Ahighly Identical by beneficial property of our method is that the size of state space depends linearly or quadratically on the descent numberofpedigreefounders,whereasthisincreasesexponentiallywithpedigreesizeinalternativemethods. Ancestral infer- TocalculatetheparametervaluesoftheMarkovprocess,wedescribetwonovelrecursivealgorithmsthat ence differwithrespecttothepedigreefoundersbeingassumedtobeexchangeableornot. Ouralgorithmsapply QTL mapping equally to autosomes and sex chromosomes, another desirable feature of our approach. We tested the resolution accuracyofthealgorithmsbyamillionsimulationsonapedigree. Wedemonstratedtwoapplicationsofthe Collaborative recursivealgorithmsinmultiparentalpopulations: designabreedingschemeformaximizingtheoveralldensity Cross(CC) ofrecombinationbreakpointsandthustheQTLmappingresolution,andincorporatepedigreeinformation Recombinant intohiddenMarkovmodelsinancestralinferencefromgenotypicdata;theconditionalprobabilitiesandthe InbredLine(RIL) recombinationbreakpointdataresultingfromancestralinferencecanfacilitatefollow-upQTLmapping. The Multiparent results show that the generality of the recursive algorithms can greatly increase the application range of AdvancedGener- geneticanalysissuchasancestralinferenceinmultiparentalpopulations. ation InterCross (MAGIC) INTRODUCTION ManycomplicatedexperimentalcrosseshaverecentlybeenproducedformappingQTL,particularlyinplants (e.g.Koveretal.2009;Sannemannetal.2015). Incontrasttotraditionalbiparentalpopulations,multipleparents have been used to increase genetic diversity and thus QTL segregation probability, and many generations of intercrossmatinghavebeenusedtoincreaseaccumulatedrecombinationbreakpointsinsampledoffspringin thefinalgenerationandthusQTLmappingresolution. Theexpecteddensityofrecombinationpointsisoneof keyquantitiesforoptimizingexperimentaldesignspriortocollectinggenotypicandphenotypicdata(Rockman andKruglyak2008). Furthermore,theidentificationofrecombinationbreakpointsfromgenotypicdataprovides usefulinformationforincreasingdetectionpowerandmappingresolution(Xu2013;Lietal.2015). Theprimary Copyright ©2018ChaozhiZheng etal. Manuscriptcompiled:Wednesday1stAugust,2018% 1Biometris,WageningenUniversityandResearch,POBox16,6700AAWageningen,TheNetherlands.Email:[email protected] VolumeX | August2018 | 1 © The Author(s) 2013. Published by the Genetics Society of America. aimsofthispaperaretodevelopthetheoryofgeneflowinafixedpedigreewitharbitrarystructuretocalculate thepriordistributionofrecombinationpoints,andtoapplythetheoryforancestralinferenceanddetectionof recombinationpointsfromgenotypicdatainmultiparentalpopulations. Thetheoryofone-locusgeneflowinapedigreehasbeenwelldeveloped. Thehaploidgenomesofpedigree foundersaredesignatedbydistinctlabels,calledfoundergenomelabels(FGLs). Asetofgenesareidenticalby descent (IBD) if they carry the same FGLs. The identity state for two genes is either IBD or non-IBD. The IBD probabilityfortwodistinctgeneswithinanindividualistheinbreedingcoefficient,anditisthekinshipcoefficient forthetwogenesrandomlysampledfromtwodistinctindividuals. Rostron(1978)developedarecursivealgorithm forcalculatingthetwo-genecoefficientsofinbreedingandkinship. Karigl(1981)developedarecursivealgorithm forcalculatingthecoefficientsofthefifteengeneidentitystatesforfourgenesbetweentwoindividuals. Thompson (1983)showedthatasimilaralgorithmappliestotheprobabilitiesofjointdescentofmultiplegenesfromaspecific FGL. Bauman et al. (2008) and Zhou et al. (2012) developed a recursive algorithm for calculating the ancestral coefficientssuchastheprobabilityofonegenedescendingfromagivenFGLandtheprobabilityoftwogenes descendingfromtwogivenFGLs(distinctornot). The theory of gene flow at two linked loci in a pedigree has also been developed. Weir and Cockerham (1969)andCockerhamandWeir(1973)developedarecursivealgorithmforcalculatingthetwo-locusinbreeding coefficientofanindividualthatisdefinedasalinearfunctionoftheprobabilitiesofthefifteenidentitystatesfor fourgenes,twoateachoftwolinkedloci. Thompson(1988)developedarecursivealgorithmforcalculatingthe two-locuskinshipbetweentwoindividualsdefinedastheprobabilityofIBDatbothloci,seeHillandWeir(2007) forcalculatingmulti-locusIBDprobabilitiesinrandommatingpopulations. Incontrasttotheidentitycoefficients, theancestralcoefficientattwolociisdefinedasthetwo-locusdiplotypeprobabilities,andthereare4L possible diplotypesforfourgenesinanindividualwithrespectto LdistinctFGLs. Thus,thecalculationofthetwo-locus ancestral inferences has been developed only for simple breeding systems including selfing, brother-sister (or sibling)mating,andparent-offspringmating(HaldaneandWaddington1931;JohannesandColome-Tatche2011; Broman2012). Therehasbeenmuchinterestindevelopingthetheoryofgeneflowinapedigreewithchromosomesassumed tobeagenomiccontinuum. Donnelly(1983)developedaformalmathematicalframework,wherethecrossover processesinachromosomepedigreefollowjointlyacontinuoustimeMarkovrandomwalkontheverticesofa hypercube,thetimeparameterbeingpositionalonghomologouschromosomes. Hereavertexrepresentsoneof possiblegenetransmissionsfromfounderstoallnon-founders,andthenumberofverticesinad-dimensional 2 | ChaozhiZheng etal. hypercube is 2d, d being the number of non-founders. This framework has been used in many kinds of small pedigreesystems(e.g.BickebollerandThompson1996;Stefanov2000;MartinandHospital2011). However,itis impracticaltoapplyDonnelly’sMarkovianframeworktoalargecomplexpedigreesincethenumberofstates (hypercubevertices)increasesexponentiallywiththepedigreesize. Alternatively, the theory of junctions has been developed to study the gene flow for a genomic continuum (Fisher1949,1954). Ajunctionisdefinedasaboundarypoint(recombinationbreakpoint)onachromosomewhere twodistinctFGLsmeet. Fisher(1949,1954)andBennett(1953,1954)developedthetheoryofjunctionsforsimple breedingsystemsincludingselfing,brother-sistermating,andparent-offspringmating. Ithasbeenextendedto populations(Stam1980;Bairdetal.2003;ChapmanandThompson2003;MacLeodetal.2005;Zhengetal.2014; Zheng2015). Inparticular,Zhengetal.(2014)andZheng(2015)developedaMarkovianframeworkformodeling ancestraloriginswithinanindividualwiththestatespacebeingthepossiblepairsofFGLs,andthusthenumber ofstatesismuchsmallerthanthatofDonnelly’sMarkovianframework(Donnelly1983). Ontheotherhand,our Markovianframework(Zhengetal.2014;Zheng2015)isrestrictedbytwoassumptions: thematingschemefrom onegenerationtothenextisrandommating,andtheFGLsareassumedtobeexchangeablesothattheprobability distributionofancestraloriginsinoffspringisinvarianttoallpossiblepermutationsoftheFGLs. However,the exchangeabilitygenerallydoesnotholdforamappingpopulationproducedviaanarbitrarybreedingpedigree. Inthispaper,werelaxtherestrictionofourpreviousMarkovianframeworkbyextendingittoafixedpedigree. Wefirstdescribearecursivealgorithm(denotedbyEXCH)undertheassumptionofFGLsbeingexchangeable, whichisapplicablefortheconstructionofMarkovianframeworkinsimplebreedingschemes. Thenwedevelop a recursive algorithm (denoted by NON-EXCH) for modeling ancestral origins to relax the exchangeability assumption. Thetworecursivealgorithmsapplytobothautosomesandsexchromosomesiftheyexist. Theresults ofbothrecursivealgorithmsarecomparedwiththosefromextensivesimulationsonaclassicalpedigree. Wefirst applythetwoalgorithmstocomparedifferentpopulationdesigns(orbreedingpedigrees)priortoexperiments, forexample,intermsoftheoveralldensityofjunctions(recombinationbreakpoints),animportantfactorofQTL mappingresolution. Inaddition,thenon-exchangeabilitycanbeillustratedundervariousbreedingdesigns. Then weapplythetwoalgorithmstoincorporatepedigreeinformationforancestralinferencefromgenotypicdatain simulatedandrealcollaborativecross(CC)populations,resultinginconditionalprobabilitiesthatarenecessary fordownstreamQTLmapping. VolumeX August2018 | Modelingancestraloriginsinpedigrees | 3 RECURSIVE ALGORITHM EXCH Notationsandoverview ThesymbolsarebrieflyexplainedinTable1,andsomeofthemareillustratedinFigure1. Pedigreememberswith unspecifiedmotherandfatherarecalledthefoundersofthepedigree,andtheothermembersarenon-founders. Therecursivealgorithmpresupposesthatpedigreemembersareorderedundertheconstraintthatparentsalways precede children. We denote by subscripts a, b, c the members of a pedigree, and denote by a > b individual a comes after individual b ((cid:54)= a). We denote by superscripts m the maternally derived genes or chromosomes, and pforthepaternallyderived. Wedenotebysuperscriptso,o ,o ,ando theunspecifiedparentalorigins(m 1 2 3 or p)ofgenesorchromosomes. Forthesakeofbrevity, f(a) = {AA,XX,XY} denotesahorizontalpiece-wise equation,sothat f(a) = AA,XX,andXYfortheautosomesofindividuala,XXchromosomesoffemalea,andXY chromosomesofmale a,respectively. Inthederivationsofrecurrencerelationsofaquantity,wewillfocusonaparticularorderinga > bifthequantity concerns genes in two individuals; we will always trace the maternally derived gene or chromosome within non-founder a back to its two parental genes or chromosomes if the maternally derived gene or chromosome concernsthequantity,andotherwisetracethepaternallyderivedgeneorchromosome. Throughoutthispaper, ♂ and alwaysdenotethefatherandmotherofindividual a,ratherthananyother,respectively. ♀ Theidentitycoefficientφo1o2(12)denotesthattwogenesatasinglelocushaveidentitystate(12),wherethefirst ab geneisinindividual aandhasparentalorigino ,andthesecondgeneisinindividualbandhasparentalorigino 1 2 (Figure1A).Thereareonlytwotwo-geneidentitystates: IBD(11)ornon-IBD(12),andφo1o2(11) = 1−φo1o2(12). ab ab Similarly,wedenotebyφo1o2o3(123)thethree-geneidentitycoefficientwiththeidentitystatebeingthenon-IBD abc state(123),whichisrequiredtogetherwithtwo-geneidentitycoefficientsforderivingtherecursiverelationsfor junctiondensities. Anexpectedidentityjunctiondensityisdefinedastheexpectednumberofthespecifiedtypeofrecombination breakpoints per Morgan along two homologous chromosomes. Here expectation concerns the stochasticity of geneflowfromfounderstodescendantsonafixedpedigree. Andidentityindicatesthatonlytheidentitypatterns (notspecificFGLs)onthetwosidesofjunctionsmatter. IntherecursivealgorithmNON-EXCH,wewillintroduce ancestral junction densities where the specific FGLs do matter. For example, Jo1o2(1232) denotes that the four ab genes,twoateachsideofabreakpointalongtwochromosomes,havegeneticidentitystate(1232)(Figure1B). Herethefirstandthirdgenesareontheleftandrightsidesofabreakpoint,respectively,andtheyareinindividual aandhaveparentalorigino . Andthesecondandfourthgenesareontheleftandrightsides,respectively,and 1 4 | ChaozhiZheng etal. theyareinindividualbandhaveparentalorigino . WeadoptthenotationofNadotandVayssiex(1973)foran 2 identitystate: thegenesarelabeledbynaturalintegersstarting1,andthesameintegerisassignedtothegenethat isIBDwithapreviousgene. Following the previous framework (Zheng et al. 2014; Zheng 2015), we model ancestral origins along two homologouschromosomeswithinanindividualbyacontinuoustimeMarkovprocess,whichcanbedescribed by an initial distribution π of ancestral origins and a rate matrix Q. Under the assumption of exchangeability amongFGLs,theinitialdistributionπ ofindividual aisdeterminedbytheidentitycoefficientφmp(11),andthe aa ratematrixQofindividual aisdeterminedbythefiveexpectedidentityjunctiondensities Jmp(1122), Jmp(1211), aa aa Jmp(1213), Jmp(1222),and Jmp(1232);seeZheng(2015)foranillustrativeconstructionofratematrixQfromthe aa aa aa fivejunctiondensities. Inthefollowing,wedescribearecursivealgorithmforcalculatingtheidentitycoefficientsandtheexpected junctiondensitiesforanindividualinagivenpedigree. Inshort,let1 beanindicatorfunctionthatequals1if S statementSistrueand0otherwise,andlet0S2 = 1−1 1 beanindicatorfunctionthatequals0ifstatementsS1 S1 S1 S2 andS2aretrueand1otherwise. Identitycoefficients Therecurrencerelationsforthetwo-andthree-geneidentitycoefficientsaresimilartotherecursivealgorithm of Karigl (1981). However, the calculation proceeds by tracing back genes instead of individuals, see also e.g. Jacquard(1974),NadotandVayssiex(1973),Denniston(1974),andGarcia-Cortes(2015). Thus,wecanaccountfor theasymmetrybetweenmaternallyandpaternallyderivedchromosomes,andaccountforautosomesandsex chromosomessimultaneously. Throughoutthispaper,weassumethattherearenocrossoversbetweenXandY sexchromosomeswithinamale. Inaddition,wefocusonnon-IBDstate(12)insteadofIBDstate(11)tosimplify therecurrencerelations. WederivetherecurrencerelationsaccordingtotheMendelianinheritancerules: (1)amaternally(orpaternally) derived autosomal gene descends from the two genes within the mother (or father, respectively) with equal probability1/2,(2)similarto(1)foramaternallyderivedX-linkedgene,and(3)apaternallyderivedsex-linked genewithinafemale(ormale)mustdescendfromtheX-linked(orY-linked,respectively)geneofthefather. The recurrencerelationsofthetwo-geneidentitycoefficientφo1o2(12)fornon-founder aaregivenby ab 1 (cid:104) (cid:105) φmo(12) = φmo(12)+φpo(12) 0m=o,a ≥ b ab 2 b b a=b φpo(12) =(cid:26)1 (cid:104)♀φmo(12)+♀φpo (12)(cid:105),φmo(12),φpo (12)(cid:27)0p=o,a ≥ b, ab 2 b b b b a=b ♂ ♂ ♂ ♂ VolumeX August2018 | Modelingancestraloriginsinpedigrees | 5 φo1o2(12) =φo2o1(12),a < b, ab ba for o,o ,o ∈ {m,p}. Here the first equation follows from the rules 1 and 2 for the maternally derived gene 1 2 withinindividual a,thesecondequationfollowsfromtherules1and3forthepaternallyderivedgenewithin individual a,andthelastequationisobtainedbyreversingtheorderingoftwogenes. Thesymmetriessuchas φo1o2o3(123) = φo1o3o2(123) = ... = φo3o2o1(123) apply to three-gene identity coefficients by permuting the three abc acb cba genes,andthusweconsideronlyaparticularordering a ≥ b ≥ c,andwehave 1 (cid:104) (cid:105) φmo1o2(123) = φmo1o2(123)+φpo1o2(123) 0m=o10m=o20o1=o2, abc 2 bc bc a=b a=c b=c (cid:26) (cid:27) φapboc1o2(123) = 12 (cid:104)♀φmob1co2(123)+♀φpo1boc2(123)(cid:105),φmob1co2(123),φpo1boc2(123) 0ap==bo10ap==co20bo1==co2, ♂ ♂ ♂ ♂ for o ,o ∈ {m,p},wherethefirst(second)equationtracesthematernally(orpaternally,respectively)derived 1 2 genewithinindividual a. TheboundaryconditionsoftheserecurrencerelationsaregivenaccordingtotheassignmentofFGLstothe foundersinapedigree. IfweassigndistinctFGLstoeachhaploidgenomeofanoutbredfounder,allmulti-gene non-IBDprobabilitiesare1ifanypairofgenesisnotfromthesamehaploidgenome,and0otherwise. Four-gene or higher order identity coefficients may be derived similarly, but they are not required for the derivations of junctiondensities. Expectedidentityjunctiondensities Let Rm (Rp)bethematernal(paternal)mapexpansion,theexpecteddensityofrecombinationbreakpointsonthe a a maternally(paternally)derivedchromosomeofindividual a. Theimplicittwo-geneidentitystateforthemap expansionsis(12). Itholdsthat Rm =Jmp(1122)+ Jmp(1121)+ Jmp(1222)+ Jmp(1232), a aa aa aa aa Rp =Jmp(1122)+ Jmp(1112)+ Jmp(1211)+ Jmp(1213), a aa aa aa aa where Jmp(1121) = Jmp(1222)and Jmp(1112) = Jmp(1211)accordingtothereversibilityofchromosomedirection aa aa aa aa (Zheng 2015). The expected overall junction density for individual a is given by ρmp = Rm +Rp − Jmp(1122). aa a a aa In the following, we derive the recurrence relations for the five expected junction densities Rm, Rp, Jmp(1122), a a aa Jmp(1213),and Jmp(1232),fromwhich Jmp(1211)and Jmp(1222)canbederivedaccordingtotheaboveequations aa aa aa aa 6 | ChaozhiZheng etal. ofmapexpansions. Therecurrencerelationsforthetwomapexpansionsofnon-founder aarerelativelysimple, andtheyaregivenby(Zheng2015) 1 (cid:16) (cid:17) Rm = Rm+Rp +φmp(12), a 2 (cid:26)1 (cid:16)♀ ♀ (cid:17) ♀♀ (cid:27) Rp = Rm +Rp +φmp (12),Rm,Rp , a 2 ♂ ♂ ♂♂ ♂ ♂ accordingtothetheoryofjunctionsthatanewidentityjunctionisformedwheneverarecombinationeventoccurs betweentwohomologouschromosomesthatarenon-IBDatthelocationofacrossover(Fisher1954). Since chromosomes are modeled as a genomic continuum, the shared junction type (1122) between two chromosomescanbeformedonlybyduplicatingthechromosomesegmentharboringtherecombinationbreakpoint. Fortheexpectedjunctiondensity Jo1o2(1122)ofnon-founder a,wehave aa Joo(1122) =Ro, aa a 1 (cid:104) (cid:105) Jmp(1122) =Jpm(1122) = Jpm(1122)+ Jpp(1122) , aa aa 2 a a 1 (cid:104) ♀ (cid:105) ♀ Jmo(1122) = Jmo(1122)+ Jpo(1122) ,a > b, ab 2 b b (cid:26)1♀(cid:104) ♀ (cid:105) (cid:27) Jpo(1122) = Jmo(1122)+ Jpo (1122) ,Jmo(1122),Jpo (1122) ,a > b, ab 2 b b b b ♂ ♂ ♂ ♂ Jo1o2(1122) =Jo2o1(1122),a < b, ab ba foro,o ,o ∈ {m,p},whereandthelastequationisobtainedbyreversingtheorderingofthetwohaplotypes. In 1 2 thefirstequation, Joo(1122)referstothesamechromosomeinindividual a,andthusequals Ro bydefinition. We aa a tracethematernallyderivedhaplotypewithinnon-founder abacktoitstwoparentalhaplotypesifthematernally derivedhaplotypeconcernsthejunctiondensity,andotherwisetrackthepaternallyderivedhaplotype. We derive the recurrence relations for Jo1o2(1213) and Jo1o2(1232) jointly. The recombination breakpoint for ab ab (1232)occursonthefirstchromosomeofahomologouspair,anditisonthesecondchromosomefor(1213). The junctiontype(1213)becomes(1232)whenreversingtheorderingoftwohaplotypeofjunctiontype(1213). We have 1 (cid:104) (cid:105) Jmo(1232) = Jmo(1232)+ Jpo(1232) 0m=o+φmpo(123)0m=o,a ≥ b, ab 2 b b a=b b a=b Jpo(1232) =(cid:26)1♀(cid:104)Jmo(1232)+♀Jpo (1232)(cid:105)+φmp♀o♀(123),Jmo(1232),Jpo (1232)(cid:27)0p=o,a ≥ b, ab 2 b b b b b a=b ♂ ♂ ♂♂ ♂ ♂ Jo1o2(1232) =Jo2o1(1213),a < b, ab ba VolumeX August2018 | Modelingancestraloriginsinpedigrees | 7 1 (cid:104) (cid:105) Jmo(1213) = Jmo(1213)+ Jpo(1213) 0m=o,a ≥ b, ab 2 b b a=b Jpo(1213) =(cid:26)1♀(cid:104)Jmo(1213)+♀Jpo (1213)(cid:105),Jmo(1213),Jpo (1213)(cid:27)0p=o,a ≥ b, ab 2 b b b b a=b ♂ ♂ ♂ ♂ Jo1o2(1213) =Jo2o1(1232),a < b, ab ba foro,o ,o ∈ {m,p}. Therearenocontributionsfromthethree-genenon-IBDprobabilitiesintheequationsfor 1 2 Jmo(1213)and Jpo(1213),becausecrossoversbetweenthetwoparentalhaplotypesofthetracinghaplotypewithin ab ab individual aareinvisible. TheboundaryconditionsaregivenbytheassignmentofFGLstothefounders. Thewithin-andbetween-founder identityjunctiondensitiesare0ifasingleFGLisassignedtothewholehaploidgenomeofeachfounder. RECURSIVE ALGORITHM NON-EXCH Notationsandoverview Ω Let denote the set of distinct FGLs assigned to the founders of a pedigree. We adopt the same notations in therecursivealgorithmwithFGLexchangeabilityexceptforthefollowingchanges. AsshowninFigure1,we replaceidentitystatesbyancestralstates: (1122) → (iijj),(1211) → (ijii),(1213) → (ijik),(1222) → (ijjj),and (1232) → (ijkj)forthetwo-chromosomeexpectedjunctiondensities,andRm → Rm(ij)andRp → Rp(ij)explicitly a a a a forthemapexpansions,wherei,j,k ∈ Ωaredifferentfromeachother(denotedbyi (cid:54)= j (cid:54)= k). We represent a two-gene ancestral state by (ij), and a three-gene ancestral state by (ijk), where i,j,k ∈ Ω are not necessary different from each other. In addition we introduce one-gene ancestral coefficient φo(i), the a probability that the maternally (o = m) or paternally (o = p) derived gene in individual a carries FGL i ∈ Ω. The identity coefficients can be obtained by summing the corresponding ancestral coefficients. For example, φaob1o2(11) = ∑i∈Ωφaob1o2(ii). Similar relationships hold between the expected identity junction densities and the expectedancestraljunctiondensities,forexample, Jaob1o2(1122) = ∑i,j∈Ω,i(cid:54)=j Jaob1o2(iijj)and Roa = ∑i,j∈Ω,i(cid:54)=jRoa(ij). Similarly, the initial distribution π of the Markov chain can be constructed from the two-gene ancestral coefficients, and the rate matrix Q can be constructed from the expected ancestral junction densities, without assumingtheexchangeabilityofFGLs. Ancestralcoefficients Baumanetal.(2008)andZhouetal.(2012)derivedtherecurrencerelationsoftheone-andtwo-geneancestral coefficientsforautosomesbytracingtwodistinctgenessimultaneouslyintheirparents. Wederiveequivalent 8 | ChaozhiZheng etal. recurrence relations for both autosomes and sex chromosomes by tracing one gene once in its parent, instead ofsimultaneouslytracingtwogeneswithinanindividual. Inaddition,wederivetherecurrentrelationsofthe three-geneancestralcoefficientsthatarerequiredintherecurrencerelationsoftheexpectedancestraljunction densities. The relations of the ancestral coefficients are very similar to those of the identity coefficients, and they are based on the same rules of Mendelian inheritance. For example for the one-gene ancestral coefficient φo(i) of a non-founder a,wehave 1 (cid:104) (cid:105) φm(i) = φm(i)+φp(i) , a 2 (cid:26)1 (cid:104)♀ ♀ (cid:105) (cid:27) φp(i) = φm(i)+φp (i) ,φm(i),φp (i) . a 2 ♂ ♂ ♂ ♂ SeeAppendixAfortherecurrencerelationsofthetwo-geneancestralcoefficientφo1o2(ij)andthree-geneancestral abc coefficientφo1o2o3(ijk). abc TheboundaryconditionsoftherecurrencerelationsoftheancestralcoefficientsaregivenbytheFGLsassigned tothefoundersinapedigree. Expectedancestraljunctiondensities Theancestralmapexpansionscanbeexpressedintermsoftheexpectedancestraljunctiondensitiesasfollows Rm(ij) =Jmp(iijj)+ Jmp(iiji)+ Jmp(ijjj)+ ∑ Jmp(ikjk), a aa aa aa aa k∈Ω,i(cid:54)=j(cid:54)=k Rp(ij) =Jmp(iijj)+ Jmp(iiij)+ Jmp(jijj)+ ∑ Jmp(kikj). a aa aa aa aa k∈Ω,i(cid:54)=j(cid:54)=k Here Ro(ij),o ∈ {m,p} refers to the junctions on the single chromosome of a with parental origin o, while a the terms on the right-hand side refer to the junctions on the two homologous chromosomes of a. We have Jmp(iiji) = Jmp(jiii) (cid:54)= Jmp(ijjj) and Jmp(iiij) = Jmp(ijii) (cid:54)= Jmp(jijj), where the equal signs are based on the aa aa aa aa aa aa reversibilityofchromosomedirection,andtheunequalsignsrefertothenon-exchangeabilityofFGLsiand j. Therecurrencerelationsfortheancestralmapexpansionsaregivenby 1 (cid:104) (cid:105) 1 (cid:104) (cid:105) Rm(ij) = Rm(ij)+Rp(ij) + φmp(ij)+φpm(ij) 1 , a 2 2 i(cid:54)=j (cid:26)1 (cid:104)♀ ♀ (cid:105) 1♀(cid:104)♀ ♀♀ (cid:105) (cid:27) Rp(ij) = Rm (ij)+Rp (ij) + φmp (ij)+φpm (ij) 1 ,Rm (ij),Rp (ij) , a i(cid:54)=j 2 2 ♂ ♂ ♂♂ ♂♂ ♂ ♂ VolumeX August2018 | Modelingancestraloriginsinpedigrees | 9 where the contributions of the two-gene ancestral coefficients account for the asymmetry of FGLs i and j, for example,φmp(ij) = φpm(ji) (cid:54)= φpm(ij). Anewancestraljunctionisformedwheneverarecombinationeventoccurs ♀♀ ♀♀ ♀♀ betweentwohomologouschromosomesthathavetheunorderedFGLsiand jatthelocationofacrossover. Therecurrencerelationsfor Jo1o2(iijj)arethesameasthosefor Jo1o2(1122),exceptthatidentitystatesarereplaced ab ab byancestralstates. Wehave Joo(iijj) =Ro(ij), aa a 1 (cid:104) (cid:105) Jmp(iijj) =Jpm(iijj) = Jpm(iijj)+ Jpp(iijj) , aa aa 2 a a 1 (cid:104) ♀ (cid:105) ♀ Jmo(iijj) = Jmo(iijj)+ Jpo(iijj) ,a > b, ab 2 b b (cid:26)1♀(cid:104) ♀ (cid:105) (cid:27) Jpo(iijj) = Jmo(iijj)+ Jpo (iijj) ,Jmo(iijj),Jpo (iijj) ,a > b, ab 2 b b b b ♂ ♂ ♂ ♂ Jo1o2(iijj) =Jo2o1(jjii),a < b, ab ba for o,o ,o ∈ {m,p}, where the last equation is obtained by reversing the ordering of the two haplotypes. See 1 2 AppendixBfortherecurrencerelationsof Jo1o2(ijkj), Jo1o2(ijik) Jo1o2(ijjj),and Jo1o2(ijii). ab ab ab ab Asbefore,thewithin-andbetween-founderancestraljunctiondensitiesare0ifasingleFGLisassignedtothe wholehaploidgenomeofeachfounder. Dataavailability AncestralinferenceusingthetworecursivealgorithmsisimplementedinthepreviousdevelopedRABBITsoftware, wherethefunctionmagicReconstructisextendedtoincorporatepedigreeinformation. RABBITisavailableat https://github.com/chaozhi/RABBIT.git, and it is offered under the GNU Affero general public license, version 3 (AGPL-3.0). Examplescriptsforusingtherecursivealgorithms,simulatinggenotypicdata,andancestralinference areincluded. TherealCCdatahavebeendescribedbyDurrantetal.(2011). APPLICATION TO MULTIPARENTAL POPULATIONS Simulationevaluation BeforeapplyingthetworecursivealgorithmsEXCHandNON-EXCHtomultiparentalpopulations,weevaluate theiraccuracybyforwardsimulationsontheclassicalpedigreeofNativeAmericans(Figure3)(Chapmanand Jacquard1971). Thepedigreehasbeenpreviouslyusedforevaluatingrecursivealgorithms(Jacquard1974;Karigl 1981;Garcia-Cortes2015). Wesimulatetwolinkagegroups: onepairofhomologousautosomesandonepairof 10 | ChaozhiZheng etal.
Description: