ebook img

Finite Ancestral Populations and Across Population Relationships PDF

18 Pages·2015·1.08 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Finite Ancestral Populations and Across Population Relationships

GENETICS | GENOMIC SELECTION Ancestral Relationships Using Metafounders: Finite Ancestral Populations and Across Population Relationships Andres Legarra,*,1Ole F.Christensen,†ZulmaG.Vitezica,‡IgnacioAguilar,§andIgnacyMisztal** *InstitutNationaldelaRechercheAgronomiqueand‡UniversitédeToulouse,INP,ENSAT,GenPhySE,Génétique,Physiologieet Systèmesd’Elevage,F-31326Castanet-Tolosan,France,†AarhusUniversity,CenterforQuantitativeGeneticsandGenomics, DepartmentofMolecularBiologyandGenetics,DK-8830Tjele,Denmark,§InstitutoNacionaldeInvestigaciónAgropecuaria, Canelones90200,Uruguay,and**AnimalandDairyScience,UniversityofGeorgia,Athens,Georgia30602 ABSTRACTRecentuseofgenomic(marker-based)relationshipsshowsthatrelationshipsexistwithinandacrossbasepopulation(breeds or lines). However, current treatment of pedigree relationships is unable to consider relationships within or across base populations, although such relationships must exist due to finite size of the ancestral population and connections between populations. This complicatestheconciliationofbothapproachesand,inparticular,combiningpedigreewithgenomicrelationships.Wepresentacoherent theoreticalframework to consider base population inpedigree relationships.Wesuggesta conceptualframework that considers each ancestralpopulationasafinite-sizedpoolofgametes.Thisgeneratesacross-individualrelationshipsandcontrastswiththeclassicalview whicheachpopulationisconsideredasaninfinite,unrelatedpool.Severalancestralpopulationsmaybeconnectedandthereforerelated. Eachancestralpopulationcanberepresentedasa“metafounder,”apseudo-individualincludedasfounderofthepedigreeandsimilarto an“unknownparentgroup.”Metafoundershaveself-andacrossrelationshipsaccordingtoasetofparameters,whichmeasureancestral relationships, i.e., homozygozities within populations and relationships across populations. These parameters can be estimated from existing pedigree and marker genotypes using maximum likelihood or a method based on summary statistics, for arbitrarily complex pedigrees.Equivalencesofgeneticvarianceandvariancecomponentsbetweentheclassicalandthisnewparameterizationareshown. Segregationvarianceoncrossesofpopulationsismodeled.Efficientalgorithmsforcomputationofrelationshipmatrices,theirinverses, andinbreedingcoefficientsarepresented.Useofmetafoundersleadstocompatibilityofgenomicandpedigreerelationshipmatricesand tosimplecomputingalgorithms.Examplesandcodearegiven. KEYWORDSrelationships;pedigree;geneticdrift;basepopulations;markergenotypes;shareddataresource;GenPred POWELLetal.(2010)pointedouttheconceptualconflict Vitezica et al. 2011). Powell et al. (2010) showed that one between identity-by-descent (IBD) relationships based can (at least conceptually) refer genomic relationship coef- on pedigree and identity-by-state (IBS) relationships based ficientstothepedigreescaleandviceversa.Inthecontextof onmarkergenotypes.Thesearealsoknownaspedigreeand appliedgeneticevaluationoflivestock,similarnotionswere genomic (VanRaden 2008) relationships, respectively, and introduced by VanRaden (2008) and Vitezica et al. (2011), we use this terminology hereinafter. Whereas reference for explicitly modifying genomic relationships to refer to pedi- pedigree relationships is formed by founders of the pedi- gree coefficients. However, an implicit assumption in these gree, reference for the genomic relationships is most often proposals is that the genotyped population has no pedigree the current genotyped population (e.g., Powell et al. 2010; structure,e.g.,nosibgroupsandonlyonegeneration(Christensen 2012),andtheproposalsarealsodifficulttoextendtoseveral Copyright©2015bytheGeneticsSocietyofAmerica base populations (Harris and Johnson 2010; Misztal et al. doi:10.1534/genetics.115.177014 2013;Makgahlela et al. 2014). Manuscript received February 12, 2015; accepted for publication April 3, 2015; Inaddition,pedigreerelationshipshaveseveralproblems. publishedEarlyOnlineApril14,2015. Supporting information is available online at www.genetics.org/lookup/suppl/ Pedigrees,whichareincompletebydefinition,endupinone doi:10.1534/genetics.115.177014/-/DC1. orseveralbasepopulations(linesorbreeds).Forinstance,the 1Correspondingauthor:INRA,UMR1388GenPhySE,CS52627,31326Castanet- Tolosan,France.E-mail:[email protected] pedigreeoftheRomanesheepsyntheticbreedtracesbackto Genetics,Vol.200,455–468 June2015 455 two base populations, the Romanov and Berrichon du Cher alternative is truncation of the pedigree, to have a more ho- breeds,andthepedigreeofglobalHolsteincattlepopulation mogeneous base population(e.g., Lourenco etal. 2014), but canoftenbetracedbackto“European”and“NorthAmerican” this is not always a feasible option. Furthermore, defining base populations. In another more complex case, pedigrees pedigree founders as unrelated is contradictory with results are incomplete for some categories of animals. For instance, obtained if these individuals are genotyped. Christensen in dairy sheep, the father of all males is known, but only (2012) suggested taking for genomic relationships an arbi- 5–80%offemaleshaveaknownfather.Tofurthercomplicate trary reference and an ideal population with 0.5 allele fre- things, in the presence of selection, assuming that all un- quencyatthemarkers,andreferringpedigreerelationshipsto knownparentsbelongtothesamebasepopulationandhave thisbasepopulation.Bydoingso,heshowedthatfoundersof thesamegeneticlevelisunfairsinceyounger(orsometimes the base population should become related, and this extra foreign)animalsareselectedandtherefore“better”thanthe relatedness can be understood as an excess of identical-by- base population. If not properlyaccounted for, this structure descenthomozygosity.Theapproachcan beunderstoodasa inseveralbasepopulationsresultsinbiases(e.g.,Ugarteetal. marginalization with respect to uncertainty in allelic fre- 1996; Misztal et al. 2013). Therefore, unknown fathers are quencies,andastabledefinitionofthegeneticbaseacross assigned to different base populations, e.g., depending on time and different populations is obtained. Extension of year of birth, country of origin, sex, or path of selection. this method to several founder populations is, alas, not Currentpracticeofgenetic evaluations assumesthat individ- straightforward. uals in the different base populations (typically known as In this work, we present a theory to consider relation- genetic groups or unknown parent groups) have different ships within and across base, or founder, populations. This a priori average values, and these values are estimated as theoryprovidesthetools,ontheonehand,togeneralizethe fixed effects within the model (Thompson 1979; Quaas “unknown parent groups” used in genetic evaluations and, 1988).However,thequantitativegeneticstheoryofunknown ontheotherhand,togeneralizeChristensen’sresults,which parent groups has not been much furtherdeveloped. For in- conciliatepedigreeandgenomicrelationships.Theconcepts stance, Kennedy (1991) pointed out that genetic groups are developed in this work aim to be rather general and are incorrectly assumed to be unrelated to each other and that based on pedigree considerations, but their use is of large reduction of variance due to drift and selection should be interest in two cases: first, when combining genomic and accounted for. With molecular markers, there are more and pedigree relationships across individuals (as the SSGBLUP moreexamplesofobservedrelationshipsacrossbasepopula- mentioned above) and, second, when considering several tions that were a priori unrelated (Kijaas et al. 2009 ; Gibbs base populations simultaneously. etal. 2009;Ter Braaket al. 2010; VanRaden etal. 2011). Theoutlineofthisarticleisasfollows.First,weshowthat The hypothesis of unrelatedness of founders in a base base populations withrelated individuals canbeunderstood population implies that the base population is drawn from asissuedfromfinitesizeancestralpopulations.This,although averylargeancestralpopulation.Notonlyisthisfalsebutit notstrictlynecessaryforpracticalpurposes,givesaconceptual also contradicts marker-based information (animals seem- modelandageneticinterpretation(Jacquard1974).Second, inglyunrelatedshareallelesatmarkers). Althoughunrelat- such an ancestral population can be represented as a single ednesscansimplybeseenasanarbitrarystartingpoint,we pseudo-individual (a metafounder) with a particular self- suggestthatrelaxingthishypothesisgivesmoreflexibilityto relationship (a measure of homozygosity) and represents a the models. poolofgametes.Severalbasepopulationscanberepresented Ontheotherhand,genomicmeasuresofrelationshipsare asseveral,possiblyrelated,metafounders.Metafounders are not dependent on knowledge of pedigree. Further, they are convenient because they simplify the representation and the more accurate because they consider realized, not expected, algorithms for computing relationships and inbreeding. relationships (VanRaden 2008; Hayes, et al. 2009; Hill and Finally,weshowhowparameters(ancestralhomozygosities Weir 2011). Genomic relationships can be projected along andrelationshipsacrosspopulations)ofancestralpopulations the pedigree for animals with no genotypes (Legarra et al. canbeestimatedfromthecombineduseofmarkerandped- 2009;ChristensenandLund2010).Theso-calledsingle-step igreedata.Ourworkisanextensionandunificationofexist- GBLUP (SSGBLUP) thus mixes pedigree and genomic rela- ing works by Jacquard (1969, 1974), VanRaden (1992), tionships, and is becoming the de facto standard in genomic AguilarandMisztal(2008),VanRadenetal.(2011),Colleau evaluationsforlivestock(e.g.,Legarraetal.2014b).However, and Sargolzaei(2011),and Christensen(2012). SSGBLUPrequiresgenomicandpedigreerelationshiptorefer to the same base. This base is however, hard to define. Genomic relationships of the current population change Theory as more individuals are being included and are poorly de- Relationshipsinafinitepopulation fined if populations are structured (i.e., in lines, breeds or origins) (e.g., Harris and Johnson 2010). Defining a base is Relationshipsacrossbaseindividuals:Let“ancestral”bethe also difficult for pedigree relationships as pedigrees are in- population from which founders of the pedigree are drawn completeandpossiblyendupinseveralbasepopulations.An and “base” population be the set of these pedigree founders 456 A.Legarraetal. Figure1 Ancestralandbasepopulationandpedigree. Figure2 Relationships(reddashedline),inarelatedbasepopulation,of twoindividualsXandY seenasgametes(left)or individuals(right).In- dividualX (Y) containsgametesa andb (c andd). Relationships across (Figure 1). Typically, individuals in the base population are andwithingametesarerespectively g=2and1;relationshipsacrossand withinindividualsaregand1þg=2. assumedtobe drawn from alarge, unrelated,ancestral pop- ulation mating at random, so that the base population individualswillnotberelated.Jacquard(1969,1974)con- compare genomic relationships and pedigree relationships. sidered relationships in a finite-size population, showing Due to random sampling of a limited number of founders, that inbreeding and relationships increase steadily. We re- themeanofthebasepopulationcomposedofnindividuals develop his treatment in a simplified form. Pedigree found- (u ¼19u =n) will drift around the mean of the ancestral 0 0 ers in the base population are drawn at random, with population with variance Varðu Þ¼ gþð12g=2Þ=n: 0 replacement, from an ancestral finite monoecious popula- tion with effective size Ne, 2Ne gametes, “true” average Pedigree relationships from related base populations: breeding valuem, andgenetic variances2u. Inthisancestral VanRaden (1992) (unaware of the work of Jacquard 1974) population gametes are assumed to be independent (in assigned nonzero relatedness toanimals in the base popula- a sense, the ancestral population becomes the new base). tions to correctly estimate inbreeding when pedigree infor- Imagine two gametes sampled with replacement from the mation is missing. The value assigned to this relatedness, finiteancestralpopulationtoformthebasepopulation.The which is equivalent to g, was set to the average relatedness second gamete will be identical to the first gamete 1=ð2NeÞ of contemporary individuals with known relationships. of the times. Therefore, the relationship coefficient (proba- Lutaayaetal.(1999)showedthattheclassicalalgorithmfor bilityofidentitybydescent)betweenallpairsofgametesis calculatinginbreedingisverysensitivetoevenasmalllossof g=2¼1=2Ne, and this relationship g=2 can be understood pedigree while VanRaden (1992) algorithm is much better asthecorrelationbetweengametesofWright(1922).Jacquard although not perfect. This idea was also applied by Aguilar (1974) used a¼g=2 and called it the “inbreeding coefficient andMisztal(2008),andColleauandSargolzaei(2011)used ofapopulation.” a closely related idea ina similar setting. Across-individualrelationshipsinthebasepopulationare Obtaining pedigree relationships from related base pop- depicted in Figure 2. Consider diploid individuals X and Y. ulationsisconceptuallystraightforward,canbedonefollow- They are constituted by four gametes, a,b,c,d. These game- ing the tabular rules (Emik and Terrill 1949), and leads to tes have been drawn from a pool of gametes where the a matrixof additiverelationships probability of being identical (by descent) is g=2 across (cid:1) (cid:3) g gametes and 1 with itself (Figure 2, left). Therefore, the Ag ¼A 12 þgJ; 2 coancestry coefficient between X and Y is the four-way av- erageofprobabilitiesofbeingidenticalforeachpossiblepair where A is the matrix with regular relationships and J a of gametes, which sums to g=2. Additive relationship be- matrix of 1’s. In Jacquard (1974, p. 169), this formula is tweenXandYistwicethecoancestryandthereforeg (Fig- presentedusingcoancestriesinsteadofrelationships.How- ure2,right).NowconsiderindividualX.Theself-coancestry ever,algorithmsforcomputationofinbreeding(e.g.,Quaas considers four ways of sampling alleles a and b (with re- 1976; Meuwissen and Luo 1992), Henderson’s (1976) placement), and because Pða[bÞ¼g=2, self-coancestry is sparse inverse of the pedigree relationships, and other equal to 1=2þg=4, and therefore self-relationship is equal algorithms (Colleau 2002) need to be modified to account to 1þg=2. fornonzerorelatednessoffounders(e.g.,AguilarandMisztal The base population has associated breeding values u . 2008; Christensen 2012). These changes are rather complex 0 Fromthedevelopmentsabove,thevariance-covariancematrix and do not generalize well to the case of several base pop- ofbreedingvaluesisVarðu Þ¼½Ið12g=2ÞþJg(cid:2)s2; whereI ulations that are presented later. For this reason, and 0 u istheidentitymatrixandJisamatrixofones.Thiscovariance for its conceptual appeal, we have conceived the use of structure was suggested by Christensen (2012) to correctly metafounders. MetafoundersinPedigrees 457 coefficients.Theystartbyassigningself-relationshipsof1to all animals in the base population and later two rules are used, a ¼0:5ða þa Þ ij dj sj a ¼1þ0:5ða Þ; ii sd where d and s are the dam and sire of i, which must be youngerthanj.Toincludethemetafounder,theonlychange is to set its self-relationship (a in the example) to g. The 11 EmikandTerrillrulesdonototherwiseneedtobechanged. Forinstance,forindividual2inFigure3,a ¼1þ0:5a ¼ 22 11 Figure3 Basepopulationwithametafounderandcorrespondingpedi- 1þg=2,andforindividuals1and2,a ¼0:5ða þa Þ¼ gree. 12 11 11 g.Forindividuals2and3,a ¼0:5ða þa Þ¼g.There- 23 12 12 fore, assigning a metafounder with self-relationship g is Metafounder strictly equivalent to considering across-founder population relationships g and founder self-relationships 1þg=2. The We now introduce a different, but equivalent, representation of related base populations that allows a greater flexibility. recursivealgorithmsofKarigl(1981)andAguilarandMisztal (2008)areversionsofEmikandTerrill(1949)andtherefore Thisrepresentationusesso-calledmetafounders. need no modification beyond setting a to g. Using these 11 Definition: The notion of metafounder comes as an exten- rules,Ag iseasilycreated. sion of VanRaden (1992) method for estimation of across- Consider Henderson’s (1976) inverse of the relationship breed relationships. Imagine a pseudo-individual who can matrix A. This consists in a product on the form A21 ¼ be considered as, simultaneously, father and mother of all L21D21L219, where D is usually a diagonal matrix contain- base animals (Figure 3). We call this pseudo-individual ingvariancesoftheMendeliansamplingterms(deviationof a metafounder. The metafounder in Figure 3 represents anindividual’sbreedingvaluefromitsparents’average)and the ancestral population in Figure 1. L21 contains ones in the diagonal and 0.5 coefficients link- In Figure 3, the metafounder (individual 1) represents a ing parents to offspring. Elements of D are a function of finite-size pool of gametes, from which the gametes consti- inbreedingoftheparents(seeThompson1977fortheproof tuting individuals 2–6 (the base population) are drawn. andElzo(2008)foradetailedexplanation).Thisreasoning Picking two gametes at random with replacement, these applies equally well to the use of one metafounder. Thus, gameteshaveanacross-gameterelationshipofg=2.Therefore, using pedigrees with a metafounder, all the information themetafoundercanbeconsideredashavingaself-relationship aboutcovarianceofgametestransmittedfrombaseanimals of a11 ¼g and an individual inbreeding coefficient of to their descendants is contained in the inbreeding of the Fi ¼a1121¼g21, which will usually be negative. In- baseanimals,andthealgorithmofHenderson(1976)works breeding means departure from Hardy–Weinberg equilib- without changes, provided (and this is important) that in- rium, and negative inbreeding represents excess of breeding for all individuals is computed previously. This is heterozygotes. Therefore, negative inbreeding means that opposite to Christensen (2012), who had to devise modifi- in most cases two gametes are different, i.e., the size of cations of the algorithm. the pool is large, which is a tenable genetic hypothesis. InbreedingcoefficientscanbecomputedbyEmikandTerrill For instance, considering g¼0 (and therefore F ¼ 21) (1949)or,equivalently,usingrecursion(Karigl1981;Aguilar means that the two gametes are always different (by de- andMisztal2008). However, efficientalgorithmsforcompu- scent) and unrelated, i.e., the size of the pool is infinite, tation of inbreeding use Henderson’s (1976) decomposition heterozygosity (bydescent) is complete, andallindividuals ofthenumeratorrelationshipmatrix.Thesealgorithms(e.g., in the base population are unrelated. Considering g¼2 Quaas1976;MeuwissenandLuo1992)proceedbycomputing (and F ¼1) means that two gametes drawn at random are the variance of theMendelian sampling term, D . Meuwissen ii alwaysidentical,i.e.,thepoolconsistsofonegamete,there andLuo(1992)presentedonerule, is complete homozygosity, and all individuals in the base population are identical and completely inbred. Dii ¼0:520:25ðFsþFdÞ; Algorithmsforrelationshipsandinbreedingwithasingle where, in the case of unknown ancestor, s¼0 (or d¼0), metafounder:Withthisrepresentationusingmetafounders, their programming set F0 ¼ 21. The same rule for compu- regular rules for computation of relationships and inbreed- tationofD appliestothepedigreewithonemetafounderin ii ing change only slightly. Consider the Emik and Terrill Figure 3, by setting F ¼g21. In fact, the Meuwissen and 1 (1949) rules for computation of additive relationship Luo (1992) algorithm can be understood as having one 458 A.Legarraetal. Figure4 Severalrelatedbasepopulations. Figure 5 Population with two related metafounders 1 and 2, self- relationship coefficients g1; g2; and relationship coefficient g1;2 and associatedpedigree. metafounder with g¼0. Finally, the algorithm of Colleau (2002)forfastmultiplicationofmatrixAwithvectorx,Ax, or extraction of elements of A also works. (Figure5).Actualnumbersfortherelationshipswithin and across metafounders in G either can come from knowledge Multiple basepopulations ofthehistoryofthepopulations(i.e.,theydivergedsomany Across-population relationships: An important case is the generations ago) or can be inferred from genomic relation- analysis of several populations at the same time, possibly ships; this is detailed later. with crosses. The conceptual model can easily be extended to several base populations, possibly with overlap as re- Algorithms for relationships and inbreeding with several presentedinFigure4.Inthiscase,weneedtodefinewithin- metafounders: A pedigree with several metafounders and across-population relationships defines a relationship matrix AG. Algorithms for creation of 0 gA gA;B1 this matrix are extensions of previous ones. To form AGusing G¼@symm gB A: thetabularrules(EmikandTerrill1949),thefirststepistoset ... G as relationships of the metafounders and then apply the regularrules.RulesfortheinverseAG21consistin,first,invert- ThiswassuggestedbyVanRaden(1992)andusedbyVanRaden ing G to create a small submatrix of AG21 and then using et al. (2011). The interpretation of the across-base population Henderson’srules(1976)withtheelementsDii forallindivid- coefficientslikegA;B isthattheancestorpopulationsoverlap,as uals modified according to self-relationships ofmetafounders, seen in Figure 4. If population A is composed of n gametes, as in the previous section. Using generalized inverses for in- A populationBofn gametes,andtheyoverlaptoanextentofn version of G results in an algorithm that, for G¼0, gives the B AB gametes(forinstance,inFigure4theseare6,6,and2,respec- same AG21 as with unknown parent groups as in Thompson tively),thengA ¼1=n ,gB ¼1=n ;andgA;B ¼n =n n .The (1979) or Quaas (1988). The reason for this is that the gen- A B AB A B last result can be explained as follows: gA;B is the probability eralized inverse of G¼0 is 0, and otherwise the rules for that the gamete from A comes from the overlap (nAB=nA), inversion and the values of Dii are identical. This shows that times the probability that the gamete from B comes from the metafoundersareageneralizationofunknownparentgroups. overlap (nAB=nB), times the probability that both gametes are Computing Dii involves computation of inbreeding coeffi- actually the same, given that they come from the overlap cients,whichcanbedonebyrecursionormodifyingMeuwis- (1=n ). Weallowvaluesof gA,gB; andgA;B inacontinuous sen and Luo (1992). The Meuwissen and Luo (1992) AB range, even though the formulas only support values corre- algorithmgoesuptheancestorsofagivenanimaliandadds sponding to integer values of nA, nB and nAB. We also allow contributions LijDjj to the inbreeding coefficient of i; then gA;B to potentially be negative, in order to consider the situa- animal j is deleted from the list of ancestors, and Lij is set tion where populations have diverged due to selection in op- tozero.However,thisdoesnotworkintheparticularcaseof posite directions. However, there is the restriction that the acrossbredindividualissueddirectlyfromtworelatedmeta- matrixG shouldbepositivedefinite. founders, i.e., an F1 crossbred individual with unknown parents.Thisisacasethatdoessometimesexist,e.g.,insheep Metafounders: The consideration of each ancestral popula- andcattle.Inthiscase,thecontributionfroPmthemetafound- tion as a metafounder is straightforward. Metafounders ers to Aii is a sum over all metafounders k¼1;nmfðLi;kK:;kÞ2, would be related by relationships whereK:;kisthekthcolumnofK,thelowertriangularCholesky 0 1 decomposition of G¼KK9, and nmf is the number of meta- gA gA;B founders.Therefore,inthecaseofseveralmetafounders,their @ A G¼ gB contributionsneedtobeprocessedforsimultaneously.Thecore ... modificationfortheMeuwissenandLuocodeis MetafoundersinPedigrees 459 K¼lower choleskiðgammaÞ VarðyÞ¼Js2m*þJgs2u2relatedþAð12g=2Þs2u2relatedþR: Two equivalent models (with equivalent likelihoods under forðiinmetafoundersÞfFðiÞ¼1(cid:3)gammaðiÞg multivariate normality) should have the same covariance for y and therefore forðiinðmetafoundersþ1;allanimalsÞÞf s2 forðjinancestorsðiÞÞf s2u2related ¼ u122ungre=la2ted ... and s2m ¼s2m*þgs2u2related. In other words, the general across-individual covariance g is absorbed by the overall mean(anditwillbethecaseevenifthemeanisconsidered if ðjnotinmetafoundersÞthen as a “fixed” effect; Strandén and Christensen 2011). An in- tuitiveexplanationisthat,whensamplingafinitenumberof AddL D toA ij jj ii animals from a population, animals will tend to be related andthereforethemeanwilldriftfromzero;butthisdriftof L ¼0 ij the mean will be accounted for by the general mean of the model. The expression above agrees with the numerical endif results in Christensen (2012). This result looks puzzling because it suggests that an “in- g bred”populationhashighergeneticvariancethananon-inbred X​ (cid:1)(cid:4) (cid:5)2(cid:3) one,butthisisnotactuallythecase.Theparameters2u2related Add Li;1:nmfK toAii has to be interpreted as a parameter of the statistical linear model used for the analysis, and it cannot be interpreted as Li;1:nmf ¼0 agenetic variance within the population (whereass2u2unrelated canbe).Infact,thes2 wouldbegeneticvarianceintheir u2related hypothetical unrelated ancestral monoecious parents, and it g wouldbereducedtos2 assumingarateofinbreeding u2unrelated g=2 from parents to offspring, as relatedness g decreases the Finally, the algorithm of Colleau (2002) to efficiently genetic variance within a population. Thus, the genetic vari- compute products AGx as LDL9x multiplies the result of ance within the population is always s2 , and the var- L9x by D, which has an upper diagonal block equal to G iance component associated to the lineua2runmreloatdedel is s2 . u2related butthatisdiagonalotherwise.Acompletecodeisfurnished Along the same line, genetic gain in the “related” base popu- in Supporting Information, File S2. lationisnotproportionaltos2 (becausewhenselecting u2related Geneticvarianceconsideringrelated basepopulations individuals,theywillberelated)buttos2u2unrelated. Singlebasepopulation:Theadditivegeneticvarianceisthe Multiple base populations: The reasoning extends to the variance of the breeding values of the set of individuals constitutingapopulation.Thisdefinitiondoesnotinvolvea case with several populations but no crosses. For simplicity, we consider only two purebred populations. For breeds notion of (un)relatedness in itself. However, in the base b¼A;B the model for phenotypes is population,theseindividualsaretypicallyassumedunrelated, whichsimplifiesthereasoning.Aquestionishowtorelatethe yb ¼1mbþub; genetic variance of a populationmodeledas “related” to the geneticvarianceofapopulationmodeledas“unrelated.”The wherethevariance–covariancematrixofthecombinedvec- breeding value is defined as relative to the average of the tor of breeding values is population.Forthisreason,anystatisticalmodelrelatingphe- 0 ! 1 notypes to breeding values is forced to include an overall gA meanoranenvironmentaleffectconfoundedwithit.Atypical (cid:6) (cid:7) BBAA;A 12 2 0 CC uA B C modelforthephenotypecanbewrittenas var ¼s2B ! C uB B C @ gB A y¼1mþuþe: 0 AB;B 12 2 (cid:6) (cid:7) We follow the argument of Strandén and Christensen gAJ gA;BJ (2011),butforthesakeofdiscussion,considerthemean þs2 gA;BJAA gBJ AB ; BA BB as a random variable with variance s2. The covariance m of y is, for the classical model with unrelated base ani- with Ab;b being the relationship matrix of breed b¼A;B mals, VarðyÞ¼Js2 þAs2 þR; where VarðeÞ¼R. and J ; J ; J ;J being matrices consisting of 1’s. m u2unrelated AA AB BA BB As for the new model with related base animals Therefore,thevectorofbreedingvaluescanbeexpressedas 460 A.Legarraetal. (cid:6)uA(cid:7)¼0BB@uAu(cid:1)(cid:1)12g2A(cid:3)(cid:3)0:51CCAþ(cid:6)bA1n1(cid:7); E(cid:4)S2u(cid:5)¼(cid:1)diagðKÞ2K(cid:3)s2u ¼(cid:6)12n1m(cid:7)s2unrelated; uB uBu 12g2B 0:5 bB1n2 which is equal to s2 if the population is reasonably unrelated large (a popular assumption) and therefore s2 ¼s2 u unrelated where subindexuonbreeding values denotesthatthey are iffoundersareunrelated.Thismeansthatwhenthefound- in the model with unrelated base populations, and ers are unrelated, the genetic variance is, on expectation, (cid:6) (cid:7) (cid:6) (cid:7) (cid:6) (cid:7)! equaltothevariancecomponentofthecovariancestructure. bbAB (cid:4) N 00 ;s2 ggAA;B ggAB;B termCosnasriedeerqunaolwtothe structure above for Varðu0Þ. The two and assumed independence of breeding values uA and uB. diagðGÞ u u diagðKÞ¼1þ Byanargumentsimilartoabove(i.e.,StrandenandChristensen 2 2011), the parameters b and b are absorbed into the two (cid:1) (cid:1) (cid:3)(cid:3) A B general mean parameters mA and mB, respectively. Therefore, n2m2Gþnm2nm diagðGÞ=2 thetwomodelsareequivalentinthesensethatgeneticvariance K ¼ parameters are just scaled by ð12gb=2Þ and breeding values ðnmÞ2 arejustscaledandshifted.Thismodelimpliesthatphenotypes 12diagðGÞ=2 ¼Gþ are separate by population and a mean (or distinct levels of nm fixedeffects,e.g.,herds)hastobefitbypopulation.Theargu- (cid:4) (cid:5) (cid:1) (cid:3) ment above is not difficult to generalize to any number of E S2 ¼ diagðKÞ2K s2 u u populations,asfarascrossesthatdonotexist. ! diagðGÞ 12diagðGÞ=2 ¼ 1þ 2G2 s2 2 nm related Multiple base populations with crossing: For crossbred populations the equivalence above does not hold because in which we neglect the last term. This means that the gAB enters into the covariances across individuals. A differ- genetic variance is, on expectation, equal to the variance ent approximate equivalence of variances can be con- component s2 times a constant ð1þdiagðGÞ=22GÞ; structed as follows. Assume a set of n base population related whichis,1.EquatingthesetwoexpressionsforEðS2Þgives individuals(nisassumedlarge)drawnfromeachofmpop- u ulations. Let the ge(cid:6)netic(cid:7)values of the across-breed base s2 populationsbeu0 ¼ uuA0 .Thevariance–covariancematrixis s2related (cid:5)(cid:1)1þdiaugnrðeGlaÞt(cid:8)ed22G(cid:3): B0 0 1 gA This expression gives the previous result s2 ¼s2 = B1þ gA ⋯ gAB gAB ⋯C related unrelated B 2 C ð12g=2Þ for a single population. Compared to the result in B C BB gA CC the previous subsection about multiple populations, this ap- B gA 1þ ⋯ gAB gAB ⋯C B 2 C proximate equivalence is quite different. The result in the B C Varðu0Þ¼BBBBB g⋯AB g⋯AB ⋯⋯ 1þ⋯gB g⋯B ⋯⋯CCCCCs2related: pvarerivainoucesisnuabsmecotdioenlwisitahnreelqauteivdableansceeinbdeitvwideueanlsoannedgberneeetdic- B 2 C specific genetic variances in a model with unrelated base B C BB@ gAB gAB ⋯ gB 1þgB ⋯CCA individuals, whereas the result here is an approximate 2 equivalence between two genetic variances, one being in ⋯ ⋯ ⋯ ⋯ ⋯ ⋯ arelatedbasepopulationandanotherbeinginanunrelated (cid:8) base population. This last expression s2 (cid:5)s2 (cid:9)(cid:4) (cid:5)(cid:10) related unrelated The sample variance of u , across all populations, is 0 1þdiagðGÞ=22G is more general because it can con- u 9u sidercorrectlycrossesacrossindividuals.Thedifferencecomes S2u ¼ n0m02u02; also because in the previous expression there were separate meansforeachpopulation,somethingthatisnotrequiredhere. which, for Varðu Þ¼Ks2 (s2 is a parameter), has expecta- 0 u u tion (Searle 1982, p. 355) Segregation variance: When crossing pure breeds, there is (cid:4) (cid:5) (cid:1) (cid:3) an increase of genetic variance due to the increase of E S2 ¼ diagðKÞ2K s2: heterozygosityoftheQTL;forinstance,ifalternativealleles u u arefixedateachline.TheadditionalvarianceintheF2cross Intheclassicalparameterization(unrelatedfounders)K¼I comparedtothevarianceintheF1crossistermedsegregation and thus variance (Lande 1981; Lo et al. 1993). This is typically MetafoundersinPedigrees 461 ignored in a classical framework, although methods exist theserelationshipsusingmolecularmarkers,referringthem (Loetal.1993;Garcia-CortesandToro2006).Thisincrease toageneticbasedefinedaccordingtogenomicrelationships in the genetic variance can be considered using related (Christensen2012).Theobjectiveofthissectionistoobtain metafounders, as we show here. Two individuals in an F1 estimators of G based on two kinds of statistical inference: population (assuming—in a pedigree sense—unrelated and amethodofmaximumlikelihoodandamethodofmoments non-inbred parents, and factorizing out s2 ) have (roughly, make first- and second-order statistics of genomic related 0 1 and pedigree relationships comparable). gAB gA gB gAB B 1þ þ þ C B 2 4 4 2 C Maximum likelihood: Genomic information sheds light on Varðu Þ¼B C AB @gA gB gAB gAB A relationships across breeds (Gibbs et al. 2009; Kijaas et al. þ þ 1þ 2009;VanRadenetal.2011;Legarraetal.2014a).Genomic 4 4 2 2 relationships (VanRaden 2008; Hayes et al. 2009) are esti- whereastwoindividualsinanF2population(parentsinF1 matorsofrelatednessbasedontheobservationofthousands above) have of molecular markers, and typically matrix G¼ZZ9=s is 0 1 used, where Z contains centered genotypes andPs is a mea- gm gAB gm gAB ​ B1þ þ þ C sure of global heterozygosity, for instance, s¼2 piqi, the VarðuAB3ABÞ¼BB@ gm4 gAB4 2gm 2gABCCA tportinalcihpeleterboezyugsoesditytoatitnhfeermtahrekerGs.cToheifsfiicnifeonrtmsaatsionfocllaonwisn. þ 1þ þ 2 2 4 4 Marker genotypes follow Mendelian transmission, and 0 1 gF2 therefore the covariance of genotypes of two individuals is B1þ gF2 C determined by their relationship. Christensen (2012) used B 2 C ¼B@ CA; thistoestimateginasinglepopulation.First,heintegrated gF2 gF2 1þ the likelihood over the unknown allelic frequencies, which 2 results in using allelic frequencies of 0.5 as a reference (Z withgm ¼ðgAþgBÞ=2andgF2 ¼ðgAþgBÞ=4þgAB=2.The codedasf21;0;1g).AssumingmultivariatenormalityforZ, themarkers’genotype,thelikelihoodofobservedgenotypes gF2 is transmitted forward and does not change in the F3, F4, etc.Thegeneticvarianceofsuchapopulationisthus12gF2= conditional to g and s is 2¼12½ðgAþgBÞ=42gAB=2(cid:2)=2: The variance–covariance logpðZjg;sÞ¼const2pn2logðsÞ2plog(cid:4)(cid:11)(cid:11)Ag (cid:11)(cid:11)(cid:5) matrix of two individuals in the F2 can be expressed as (cid:1) 2 (cid:3) 2 22 2 p tr Ag21ZZ9 ; VarðuAB3ABÞ¼VarðuABÞ 2s 22 (cid:6) (cid:8) (cid:7) þ ðgm20gABÞ 4 ðgm20gABÞ(cid:8)4 twhheesruebnm2aitsritxheofnAugmcboerrreosfpgoenndointygpteodtihnedgiveindoutaylpseadndindAig2v2idis- uals. The parameter s is a measure of heterozygosity in the showing that the segregation variance is ðgm2gABÞ=4. Be- causeGispositivedefinite,thenthistermmustbe$0.Slatkin gePno​ typed population, and it is not equal to observed 2 pq. The extension of this likelihood to multiple popu- andLande(1994)showedthatsegregationvarianceisafunc- i i lations with different g’s in G is straightforward tion of within-loci squared differences of means at the two (cid:4)(cid:11) (cid:11)(cid:5) breeds,pluscross-productsofdifferencesacrosslociweighted logpðZjG;sÞ¼const2pn2logðsÞ2plog (cid:11)AG (cid:11) bylinkage.IfgABisestimatedusingmarkersasabove,thenitis (cid:1) 2 (cid:3) 2 22 implicitlyassumedthatgenotypesatlociforthetraitofinterest 2 p tr AG21ZZ9 ; have the same distribution across breeds and within the ge- 2s 22 nomeasmarkergenotypes.Reportsofsegregationvariancesin whereAGistherelationshipmatrixconstructedwithagiven the livestock genetics literature are scarce (e.g., Cardoso and G matrix and AG is the submatrix corresponding to the Tempelman 2004; Munilla-Leguizamon and Cantet 2011), 22 genotyped individuals. This likelihood can be factorized by partly because of poor data sets, partly because of computa- markers as tional difficulties, and partly because the bulk of crossbred (cid:4)(cid:11) (cid:11)(cid:5) animals is in poultry and swine, where crosses do not go be- logpðZjG;sÞ¼const2pn2logðsÞ2plog (cid:11)AG (cid:11) yondF1populations.Soitisuncertainwhetheraccountingfor 2 2 22 segregationvarianceisofanypracticalrelevance. 2 p nXsnpz9AG21z: 2s i 22 i Estimation ofmetafoundersancestralrelationships i¼1 fromgenomicdata The procedure can be completed by adding a prior distri- Because the within- and across-founder relationships butiontoGandusingaBayesianestimatorinsteadofmaxi- cannot be inferred from pedigree, we suggest estimating mumlikelihood.ThepriordistributionforGcanbeassigned 462 A.Legarraetal. based on spatial or temporal distances; for instance, Latxa expressionderivedbyVitezicaetal.(2011),whoadjustedGto sheep founders in 1990 and 1992 should be closer than match A and called g¼a. The expression shows that g is 1990 and 2000. However, in none of these forms of the a correction for underestimation of inbreeding of A with re- likelihood can G be factorized out, and the maximization specttoG,followingWright’sFcoefficientstheory.Anadvan- ofthelikelihoodneedstobedonebyasearchmethodsuch tageofthemethodisthatitneedsonlystatisticsoftheA and 22 as Simplex or Monte Carlo methods. For this reason, we Gmatrices,whichmightbemoreavailablethanfullmatrices. present a method based on summary statistics. Multiple pure populations: Assume that a sample from each pure breed is genotyped. Consider the purebred parts Method of moments based on summary statistics: This of G and AG , for simplicity 2 breeds A and B: (cid:6) 22 (cid:7) (cid:6) (cid:7)(cid:12) mwteinetthdhienodd-inpmeddiavitgicdrhueeeas-lbsraeuslmeadtmioranerlsyahtisipotsantsiinhstiipbcsos)thoafnAadG2c2Gro(st(shV-eiannmdRiavatidrdieuxnaoleftaenaxdl-. G¼ GGAB;;AA GGAB;;BB ¼ ZZABZZ9A9A ZZABZZ9B9B s ! 2011;Vitezicaetal.2011;Christensenetal.2012).Thisforces the equivalence between expected changes of the mean and AG ¼ AG22A;A AG22A;B varianceundergeneticdrift(Vitezicaetal.2011;Christensen 22 AG22B;A AG22B;B 0 ! 1 et al. 2012) for the populationsdescribedby eitherthepedi- gA greeorthegenomicrelationshipmatrices.Forasetofn ran- BBA22A;A 12 2 þJgA JgA;B CC dom variables u with variance–covariance matrix K, the ¼BB ! CC: sample average u¼19u=n has a variance VarðuÞ¼K, B@ gB CA whereas the sample variance S2 ¼u9u=n2u2 has expecta- JgA;B A22B;B 12 þJgB u 2 tion EðS2Þ¼trðKÞ=n2K¼diagðKÞ2K (Searle, 1982, p. u 355). The idea in the method is to force these two statistics of K (VarðuÞ and EðS2Þ) to be equivalent across both param- Tomeettheconditionsofunbiasednessweneedtoforcethe u eterizations (K¼AG and K¼G). We consider three equality ofaveragediagonalandaveragesofGandAG and situations. set up the four equations ! Single population: Two single unknowns need to be gA estimated: g and s. Since g¼Að12g=2ÞþgJ, the average A22;A;A 12 þgA ¼ZAZA9=s 2 of all elements is Ag ¼A ð12g=2Þþg, and the average 22 22 of the diagonal is diagðAg Þ¼diagðA Þð1þg=2Þþg, ! 22 22 gB where A22 is the regular pedigree-relationship matrix for A22;B;B 12 þgB ¼ZBZB9=s 2 genotypedindividuals.Therefore,asystemoftwoequations needs to be set up, (cid:6) (cid:7) A22;A;BþgA;B ¼ZAZB9=s g A22 12 þg¼ZZ9=s ! ! 2 gA gB (cid:6) (cid:7) diagðA22;A;AÞ 12 þgAþdiagðA22;B;BÞ 12 þgB 2 2 g diagðA Þ 12 þg¼diagðZZ9Þ=s 22 2 ¼diagðZZ9Þ=s: with solutions (cid:4) (cid:5) (cid:4) (cid:3) The solution is a generalization of the solutions for single diagðZZ9Þ 12A =2 2ZZ9 12diagðA Þ=2 populations.Thescalingestimatesforsinglepopulationsare 22 22 s¼ s ¼ d =m and s ¼ d =m with diagðA Þ2A A A A B B B 22 22 dA ¼diagðZAZ"A9Þð12A22;A;A=2#Þ g¼ ZZ9=s2A22: 2Z Z 9 12diagðA22;A;AÞ 12A22=2 A A 2 Thesesolutionshaveaninterpretationintermsofmeasures and mA ¼diagðA22;A;AÞ2A22;A;A, and dB and mB defined of inbreeding in the population. In a population large similarly. The solutions for two populations are enoughandmatingatrandom,inbreedingoftheindividuals is equal to half the relationships of their parents, dAnAþdBnB s¼ diagðA22Þ¼1þA22=2 ¼1þFA (FA is average pedigree mAnAþmBnB inbreeding), and diagðZZ9Þ=s¼1þZZ9=2s¼1þF (F igs=2av¼erðaFgGe2gFeAnÞo=mð1ic2FinAbÞr.eTehdiisnigs)b.aTsihcaelrleyfothree,reinvertsheisoGfcathsGee gA ¼G1A2;AA222A;A2;A2A=;2A;gB ¼G1B2;BA222A;B2;B2B=;2B;gA;B ¼GA;B MetafoundersinPedigrees 463 sothatwithin-breedandacross-breedaveragerelationships Bayesian estimation using a prior structure for G can be agree. Assuming A22;A;A A22;B;B are close to zero, considered. gA ¼ðGA;A2AA;AÞ; gB ¼ðGB;B2AB;BÞ; gA;B ¼GA;B;which Combiningpedigreerelationships withmetafounders consistinsettingp¼0:5toconstructtheGmatrices(VanRaden andgenomic relationshipswhennotallindividuals 2008) and then simply quantify average relationships across aregenotyped breeds. This is the simple method used by VanRaden et al. (2011),althoughtheydidnotdefinescalingsaswehavedone. TheSSGBLUPmethodforgenomicevaluation(Aguilaretal. This reasoning can be extended to as many breeds as needed. 2010; Christensen and Lund 2010; Legarra et al. 2014b) Again,thismethodcanbeusedfrompublishedstatisticswithout completesgenomicinformationwithpedigree-basedinforma- accesstorawdata. tionandinfactproceedsbycorrectingpedigreerelationships Populations with crosses: In some cases, pure populations in view of genomic relationships. Pedigree relationships are may not be genotyped. For instance, Angus bulls may be modifiedas(Legarraetal.2009;ChristensenandLund2010) mated to Limousine females and only the crossbreds and (cid:6) (cid:7) Angus genotyped. Another example is unknown parent H¼ A11þA12AG222A1ð2G1A2A22ÞA2221A21 A12AG2221G ; groups (Quaas 1988), base populations that account for 22 21 missing parentages. However, at some point descendants where H is a matrix with relationships after including pedi- of these populations may be genotyped, and this informa- gree and genomic relationships, G is a matrix including ge- tionisusable.Weproposeanalgorithmverysimilartothat nomic relationships for genotyped individuals ðu Þ, which is ofHarrisandJohnson(2010).LetQbeamatrixcontaining 2 projecteduponrelationshipsofungenotypedanimalsðu Þ,A in the i;j cell the expected fraction of metafounder j in the 1 is the pedigree-based relationship matrix, and A is a rela- individual i (Quaas 1988). This matrix can be efficiently 22 tionship matrix across genotyped individuals. This joint ma- obtained using Colleau (2002), recursion, or tracing down trixHcanbeunderstoodasalinearimputationofgenotypes the pedigree. The following identity, which is an extension over all nongenotyped individuals (Christensen and Lund of Ag ¼Að12g=2ÞþgJ, approximately holds 2010), considering also the uncertainty in the imputation. (cid:4) (cid:5)(cid:5) AG(cid:5)AðI20:5 diag QGQ9 þQGQ9: This covariance matrix is increasingly used in genomic pre- dictions of genetic merit (Aguilar et al. 2010; Christensen (cid:4) (cid:5) And therefore, AG (cid:5)A ½I20:5 diag Q GQ 9 (cid:2)þQ GQ9. etal.2012)andalsoinQTLdetection(Dikmenetal.2013). 22 22 2 2 2 2 AlinearmodelcanbefitasG¼AG þE;whereEisanerror ThealgebraicdevelopmentofmatrixHassumesthatbase 22 allelicfrequenciesareknownor,equivalently,thatmeanand term and Q is the section of Q containing proportions of 2 variance of the population donot changewith time.This is metafoundersingenotypedindividuals.Weneglecttheterm 20:5 diagðQ GQ 9Þ, which is small with respect to the rest notoriouslyfalsewithsmallpopulations,deeppedigrees,or 2 2 in presence of selection. Different adjustments had been of elements and obtain a further approximation G¼A þ 22 Q GQ9 þE, in which G is explicit. This expression can be suggested to modify genomic relationships so that their ge- 2 2 netic base is the same as that of pedigree relationships linearized using the vec operator (Henderson and Searle (Vitezicaetal.2011;Christensenetal.2012).Thisimplicitly 1979), and the least-squares estimator can be transformed back to a matrix form. This least-squares estimator of G is estimatestheshiftinbreedingvalues(orallelicfrequencies) fromthepedigreebasetothegenotypedpopulation(Vitezica (cid:1) (cid:3) (cid:1) (cid:3) Gb ¼ Q9Q 21Q9ðG2A ÞQ Q9Q 21 etal.2011).However,theseadjustmentsdonotconsiderthe 2 2 2 22 2 2 2 pedigree structure of the populations, and their generaliza- tions to crosses of lines or breeds are neither completely sat- usingG¼ZZ9=sandassumingthatthevalueofsisknown. isfactory nor well understood (but see Harris and Johnson If only pure population animals aregenotyped, thisis iden- 2010;Makgahlelaetal. 2014). tical to the approximation of the estimator above for “pure Christensen (2012) argued that, contrary to pedigree populations.”ThissolutionforGisidenticaltotheestimator relationships,genomicrelationshipsareindependentofped- proposed by Harris and Johnson (2010, Equations 13 and igreecompletenessandtheyshoulddefinethegeneticbase. 14). As for s, one can use He thus considered matching pedigree relationships to ge- diagðZZ9Þ2ZZ9 nomicrelationshipsinsteadoftheopposite.Heshowedthat s¼ (cid:4) (cid:5) ; after marginalizing the allelic frequencies from the joint diag AG22 2AG22 likelihood, the result was a related base population and suggested estimating g and s using maximum likelihood. where the approximation is used for AG [in this case in- 22 All our developments rely on this base and therefore, the cluding 20:5 diagðQGQ9Þ] such that diagðAG Þ and A G extended pedigrees with metafounders do automatically 22 22 arelinearfunctionsofG:Thissystemoftwoequationswith conciliate marker and pedigree-based relationships, using two unknown is iterated until convergence. If there is little estimatesofGandsfrommarkers.Inparticular,theinverse informationforsomemetafounders(asisthecaseinruminants), of the joint pedigree and markers relationship matrix is 464 A.Legarraetal.

Description:
KEYWORDS relationships; pedigree; genetic drift; base populations; marker genotypes; shared data resource; GenPred. POWELL et al. (2010) pointed out the conceptual conflict Supporting information is available online at www.genetics.org/lookup/suppl/ · doi:10.1534/genetics.115.177014/-/DC1.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.