ebook img

Quantitative Trait Loci. Methods and Protocols PDF

341 Pages·2002·2.109 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Quantitative Trait Loci. Methods and Protocols

MMeetthhooddss iinn MMoolleeccuullaarr BBiioollooggyy TTMM VOLUME 195 QQuuaannttiittaattiivvee TTrraaiitt LLooccii MMeetthhooddss aanndd PPrroottooccoollss EEddiitteedd bbyy NNiiccoollaa JJ.. CCaammpp AAnnggeellaa CCooxx HHUUMMAANNAA PPRREESSSS 1 Association Studies Jennifer H. Barrett 1. Introduction Aclassicalcase-controlstudydesignisfrequentlyusedingeneticepidemiol- ogytoinvestigatetheassociationbetweengenotypeandthepresenceorabsence ofdisease.Associationstudiescanalsobeusefulintheinvestigationofquantita- tive traits. The aim of such studies is to test for association at the population levelbetweenthequantitativetraitandgenotypeataparticularlocus.Whether investigating qualitativeor quantitativetraits, such studiesdepend onthe prior identification of a candidate gene or genes. The genotyped locus could either be a polymorphism within a potentially trait-affecting gene or a marker in linkage disequilibrium with such a gene. Currently, screening of the whole genome is only feasible using linkage analysis, which is discussed elsewhere, becauselinkageextendsovermuchgreaterdistancesthandoeslinkagedisequi- librium. Quantitative trait association studies are based on a sample of unrelated subjectsfromthepopulation.Varioussamplingdesignsarepossible,including random sampling and sampling on the basis of an extreme phenotype. The advantages and disadvantages of these alternative designs are discussed. Thebasicmethodofanalysisiscalledanalysisofvariance(seeSubheading 2.1.)astandardstatisticaltechniquefortestingfordifferencesinmeanbetween two or more groups, on the basis of the comparison of between- and within- groupvariances.Analternativeifsubjectsaresampledonthebasisofextreme phenotype is to compare genotypes between groups with high and low trait values (see Subheading 2.2.). From:MethodsinMolecularBiology:vol.195:QuantitativeTraitLoci:MethodsandProtocols. Editedby:N.J.CampandA.CoxHumanaPress,Inc.,Totowa,NJ 3 4 Barrett 2. Methods 2.1. Analysis of Variance and Linear Regression Thestandardapproachtotheanalysisofquantitativetraitassociationstudies assumes the following model. The phenotype y of individual i with genotype ij j at the locus of interest is given by y = µ + e (1) jj j i whereµ isthemeanforthejthgenotypeande representsresidualenvironmental j i andpossiblypolygeniceffectsforindividuali,assumedtobeNormallydistrib- uted with mean 0 and variance σ2. The data required consist of measured e phenotypesandgenotypesonasampleofunrelatedindividuals.Theparameters µ are estimated in the obvious way by the mean values of individuals with j genotype j. The F-statistic from analysis of variance (ANOVA), the ratio of between- and within-genotype variances, is used to test for the association between genotype and phenotype, because under the null hypothesis that all genotypes have the same mean and variance, this ratio should be 1. This approach has been called the measured-genotype test (1), in contrast to earlier biometrical methods that use information on the distribution of the phenotype only (i.e., with unmeasured genotype) discussed briefly in Note 1. Equivalently, a linear regression analysis of phenotype on genotype can be carried out, possibly including as covariates other factors that may be related tophenotype.Wherethegenotypeisdeterminedbyonebiallelicpolymorphism (with possible genotypes AA, AB, and BB), a test for trend is provided by regressing the phenotype on the number of copies of the A allele. There are many examples of this type of approach in the literature. For example,O’Donnelletal.(2)usedmultiplelinearregressiontoinvestigatethe relationship between diastolic blood pressure and different genotypes of the angiotensin-converting enzyme (ACE) gene. Hegele et al. (3) use analysis of variancetodemonstrateassociationbetweenserumconcentrationsofcreatinine and urea and the gene encoding angiotensinogen (AGT). 2.2 Analysis of Extreme Groups Analternativeapproachistouseasamplingschemethatselectsindividuals on the basis of extreme phenotypes (4,5). There is considerable literature on the useof suchsampling schemes forsibling pairlinkage studies (e.g.,ref. 6). Extreme sampling is advocated to increase power and efficiency, as extremes are more informative. The approach is particularly useful when the phenotype is relatively easy to measure, so that large numbers of individuals can easily be screened to select extremes for genotyping. Association Studies 5 Inassociationstudiesadoptingthismethod,individualsarerandomlyselected conditional on their phenotype being below a specified lower threshold or exceeding a specified upper threshold. Alternatively, the upper and lower n percentilesofarandomsamplefromthepopulationmaybeincluded.Across- tabulation is then formed by classifying subjects by genotype and by high/low phenotype.Thegenotypefrequenciesarethencomparedbetweensubjectswith high and low trait values using a chi-squared test. For example, Hegele et al. (3) compared allele and genotype frequencies at the AGT locus in subjects with the lowest and highest quartiles of serum creatinine and urea levels. 3. Interpretation Incommonwithassociationstudiesforqualitativetraits,asignificantassocia- tion does not demonstrate an effect of the polymorphism considered, because it may also arise through linkage disequilibrium with another locus. A further similarity is that population admixture can lead to spurious associations. For this reason, family-based approaches, such as the transmission-disequilibrium test for quantitative traits (7), have been developed (see Chapter 5). 3.1. Heterogeneity Published results of associations with quantitative as with qualitative traits arenotalwaysinagreement.Becauseformostcomplextraitstheeffectofany one locus is likely to be small, individual studies are often not sufficiently powerful to detect association. To address this issue, Juo et al. (8) carried out a meta-analysis of studies investigating association between apolipoprotein A- Ilevelsandvariantsoftheapolipoproteingene,whichhadproducedconflicting results.Thisisapotentiallyusefulapproach,butmaybeflawedbypublication bias, which is likely to be more of an issue in epidemiological studies than in clinical trials. There is also an assumption that patients are genetically and clinically homogeneous, with similar environmental exposures. 3.2. Using Extremes An important consideration when using extreme sampling strategies (as in outlinedinSubheading2.2.)isthatextremesmaybeuntypicalofthequantita- tive trait as a whole in that they may be under the influence of other genes. A clear example of this, cited in ref. 4, is that studying individuals with achondroplastic dwarfism would be inappropriate if the primary interest were in identifying genes controlling height. 3.3. Power of Association Studies An attractive feature of association studies is that they may require smaller sample sizes than methods based on linkage (9). 6 Barrett Schork et al. (5) investigated the power of the extreme sampling method analytically (Subheading 2.2.) to detect association between the trait and a single biallelic marker in linkage disequilibrium with a trait-affecting locus. Power depends on many factors, including locus-specific heritability, degree of linkage disequilibrium, allele frequencies, mode of inheritance, and choice of threshold. In some settings, overall sample sizes of less than 500 provided adequate power to detect association with a locus accounting for 10% of the trait variance. The power of several methods of analysis, variants of those described here, has been compared in a simulation study (10). Under the models considered, ANOVA/linear regression (see Subheading 2.1.) generally performed better than a variant of the extremes method (see Subheading 2.2.), based on the samenumberofgenotypedindividuals,asmostoftheinformationonphenotype islostbycategorizinginto“high”and“low”values.Aswithanymethodbased onselectivesampling,anotherdrawbackisthatitisalsonecessarytophenotype a larger number of subjects to achieve the same sample size for analysis. The sameauthorssuggestedavariationonANOVA/linearregression,thetruncated measuredgenotype(TMG)test,whereonlyextremesareincludedintheanalysis (see Note 4). This TMG test was found to be more powerful than ANOVA/ linear regression for the same sample size of genotyped individuals, although, again, a larger number of subjects must be phenotyped to achieve this. These results are, however, dependent on the underlying genetic model. Allison et al. (4) showed that extreme sampling can actually lead to a decrease in power in the presence of another gene influencing the trait. Page and Amos (10) also found that variants of ANOVA/linear regression and of the TMG test, which are based on alleles, were more powerful than the genotype-based methods discussed earlier. In these approaches, the phenotype ofeachindividualcontributestotwogroups,oneforeachalleleor,inthecase of homozygotes, contributes twice to one group. Allele-based methods, which “double the sample size,” are generally only valid under the assumption of Hardy–Weinberg equilibrium (11). Furthermore, the greater power of this approachistobeexpectedforthemodelsusedinthesesimulations,allofwhich assumedanadditiveeffectofthetraitallele,andmaynotapplymoregenerally. Long and Langley (12) investigated the power to detect association using a number of single nucleotide polymorphisms in the region of a quantitative trait locus, but excluding the functional locus itself. Their test statistic was based on ANOVA (see Subheading 2.1); the significance of the largest F- statisticobtainedfromanymarkerwasestimatedfromitsempiricaldistribution based on 1000 random permutations of the phenotype/marker data. From their simulations,theyconcludedthat,usingabout500individuals,therewasgener- allysufficientpowertodetectassociationif5–10%ofthephenotypicvariation wasattributabletothelocus.Furthermore,testsusingsinglemarkershadgreater Association Studies 7 Table 1 Summary Data onACE LevelsAccording to Genotype ace geno Mean Std. dev. Freq. II 74.496732 31.729764 153 ID 90.233871 39.484505 124 DD 103.73913 46.564928 23 Total 83.243333 37.475487 300 power than haplotype-based tests. The latter were based on comparing mean trait values across all distinct haplotypes, and the authors concede that other haplotype-basedtestsmakinguseofadditionalinformationmayperformbetter. 4. Software The basic methods described in this chapter can be carried out in standard statistical software packages such as Stata (13), which is used here, SAS, or SPSS. The data would generally be expected to consist of one record for each subject,recordingtheirmeasuredtraitvalue,theirgenotype,andanycovariates of interest. 5. Worked Example 5.1. Analysis of Variance An insertion/deletion (I/D) polymorphism of the ACE gene is associated withplasmaACElevelsinsomepopulations.PlasmaACElevelsweremeasured andI/D genotypeobtainedfor 300PimaIndians toinvestigate therelationship in this population (14). The data consist of 300 records, including ACE levels (ranging from 7 to 238 units) and genotype (II, ID, or DD). In Stata, ANOVA can be carried out by the command oneway ace leve ace geno, tabulate where ace leve and ace geno are the variables for ACE levels and genotype, respectively.ThisproducesTables1and2.Table1isproducedbyspecifying thetabulateoptionaftertheonewaycommand(forone-wayanalysisofvariance) andprovidesusefulsummaryinformation.In additiontothemeanACElevels within each genotype group (i.e., estimates of µ , µ , and µ ), the standard 1 2 3 deviation and the number of subjects with each genotype are displayed. It can be seen that individuals with the DD genotype have much higher levels on average than those with the II genotype, with intermediate levels found in heterozygotes. Table 2 is the basic ANOVA table. The total variability of the data is measured by the total sum of squares (419,919) (i.e. the sum of squares of the 8 Barrett Table 2 Analysis of Variance Results for the Data in Table 1 Source SS df MS F Prob > F Between groups 27426.3358 2 13713.1679 10.38 0.0000 Within groups 392492.901 297 1321.52492 Total 419919.237 299 1404.41216 differencesbetweeneachoftheobservationsandtheoverallmean).Thisfigure canbeseparatedintothebetween-genotypesumofsquares(thesumofsquares ofthedifferencebetweenthegroupmeanandtheoverallmean)andthewithin- genotype sum of squares (the sum of squares of the differences between each observation and the mean for the corresponding genotype). These are used to estimate the corresponding variance, shown in the mean square (MS) column, by dividing by the number of degrees of freedom. [The number of degrees of freedom is one less than the number of groups or observations within groups (i.e., 3−1 for between genotypes and 152+123+22 within genotypes).] The F-statistic (10.38) is the ratio of these estimated variances. Under the null hypothesis of no difference between groups, its expected value is 1 and it should follow an F-distribution with (2, 297) degrees of freedom. In this case, thereisoverwhelmingevidenceforadifferenceinlevelaccordingtogenotype. The differences in the initial table are not the result of random variation. The analysis of variance table (Table 2) can also be obtained by using the Stata command anova ace leve ace geno This gives the additional information R-squared = 0.0653 indicating that the I/D genotype explains 6.5% of the variance in plasma ACE levels in this population. Slightly different output, but exactly the same F-test and estimate of R- squared can alternatively be obtained by carrying out a regression analysis: xi: regress ace leve i.ace geno The i in front of the ACE genotype variable shows that this is to be treated as a categorical variable in the analysis. If, instead, interest was in testing for a trend in ACE levels with the number of D alleles, then genotype could be Association Studies 9 Table 3 Genotype Frequencies in Two Extreme Groups Defined by the Top and Bottom Quintiles ofACE Levelsa ace geno Five quantiles of ace leve II ID DD Total 1 39 20 3 62 62.90 32.26 4.84 100.00 5 17 33 10 122 28.33 55.00 16.67 100.00 Total 56 53 13 122 45.90 43.44 10.66 100.00 aPearsonchi2(2)=15.5722,Pr=0.000. coded as 0, 1, or 2 to indicate the number of D alleles, and the following regression carried out: regress ace leve ace geno This produces an F-statistic of 20.77 on (1, 298) degrees of freedom. 5.2. Analysis of Extremes Using the same dataset, a new variable is created, recording the appropriate quantile for each subject’s ACE level. In this example, quintiles are used, creating 5 groups of approximately 60 subjects. This is easily done in Stata as follows: xtile acegp5=ace leve, nq(5) A chi-squared test is then carried out comparing the top and bottom quintiles: tab acegp5 ace geno if acegp5==1 | acegp5==5, chi row producing Table 3. The chi-squared statistic of 15.57 on 2 degrees of freedom again indicates very strong evidence of association between ACE levels and genotype, even though only 40% of the original subjects are used in the analysis. Nearly 63% of those with low ACE levels had II genotype compared with only 28% of those with high levels, and the DD genotype was over three times as common in those with high levels compared with those with low levels. 6. Notes 1. Commingling analysis. The model underlying ANOVA (see Subheading 2.1.) assumesthatthedataconsistofamixtureofNormaldistributions,onecorresponding 10 Barrett to each genotype, each with the same variance. Even in the absence of genotype data,statisticalmethodscanbeusedtotestforevidenceofamixtureofmorethan oneNormaldistribution.This“unmeasuredgenotype”approachissometimesknown as commingling analysis. Evidence for a mixture of two or three distributions is supportive of the hypothesis that a major gene underlies the trait, although, of course, environmental factors could also give rise to distinct distributions. Model fitting allows estimates to be made of parameters of interest such as µ and σ2and j e the proportion of subjects in each class. Inthepresenceofgenotypedatainacandidategene,themethodofcommingling analysiscanbeextendedtoconditiononthemeasuredpolymorphism(s).Inaddition to testing for evidence of a mixture of distributions, this method also provides evidenceofwhetherthemeasuredgenotypeitselfgivesrisetothemixtureorwhether another polymorphism in the gene is a more likely explanation (15,16). 2. Distributional assumptions. In view of the underlying model for ANOVA, a Nor- malizingtransformation maybeapplied tothe data.Itis importanttonote thatthe modelassumesaNormaldistributionwithin eachgenotyperatherthanoverall.(In comminglinganalysis,Normalizingthedataleadstoaconservativetestformixture, as this may remove skewness in the overall distribution of the data arising from themixingofdistributions.)Thefurtherassumptionofacommonwithin-genotype variance can be tested, and homogeneity of variance may sometimes be achieved by transformation. In the worked example in this chapter, there is some evidence forheterogeneityinthevariances.Oneadvantageoftheextremesmethodoutlined in Subheading 2.2. is that it does not rely on these distributional assumptions. 3. Nonparametric alternatives. Another nonparametric alternative to ANOVA is the Kruskal–Wallis test. In this approach, the complete set of N trait values is ranked from 1 to N, and the average rank in each genotype group is calculated. The test statisticisbasedoncomparingthegenotype-specificaveragerankswiththeoverall average rank of (N+1)/2. Under the null hypothesis of no genotype–phenotype association,theteststatisticfollowsa chi-squareddistributionwithtwodegreesof freedom(assumingthreegenotypes),andasignificantlyhighervalueindicatesthat thedistributionsdiffer.ApplyingthismethodtotheexampleinSubheading5.,the test statistic takes the value 18.2 (p=0.0001). This method is only slightly less powerfulthanANOVAwhenthedataareNormallydistributedandhastheadvantage that distributional assumptions are not made. However, the test alone is not very informative, and, in general, the estimates provided by ANOVA are also useful. 4. Analysisofextremes.Analternativesuggestionfortheanalysisofextremesamples, the TMG method mentioned earlier, is to use analysis of variance, ignoring the sampling scheme. The analysis of variance assumption of random sampling from a Normal distribution is violated, but it has been argued that, for large enough sample sizes, the significance level of the test is still correct (10). The analogs of thistestandofthoseoutlinedinSubheadings2.1.and2.2.basedonallelesrather thangenotypes,whereeachindividual’sphenotypecontributestwicetotheanalysis, violate the further assumption of independence of observations. Slatkin (17) suggested selecting individuals on the basis of unusually high (or low)traitvaluesandtesting(1)foradifferenceingenotypefrequencybetweenthe Association Studies 11 selectedsampleandarandomsampleand(2)fordifferencesinphenotypedistribu- tionaccordingtogenotypewithintheselectedsample.Thesetwotestsareapproxi- matelyindependentandsocanbecombinedintooneoveralltest.Thisapproachis particularlypowerfulwhenarareallelehasasubstantialeffectonphenotype,even thoughtheoverallproportionofphenotypicvarianceattributabletothelocusissmall. 5. Family-basedsamples.Althoughassociationstudiesasdescribedinthischapterare applicable to unrelated sets of cases and controls, extensions have been suggested to allow for relatedness between subjects. Tregouet et al. (18) suggested using estimatingequations,astatisticalmethodforestimatingregressionparametersbased on correlated data. They found that, for nuclear families of equal size, the power of this approach was comparable to maximum likelihood and was similar to the powerexpectedinasampleofthesamenumberofunrelatedindividuals.However, thetype1errorratecouldbesubstantiallyinflatedinthepresenceofstrongclustering if the number of families is relatively small (<50). References 1. Boerwinkle, E., Chakraborty, R., and Sing, C. F. (1986) The use of measured genotypeinformationintheanalysisofquantitativephenotypesinman.Ann.Hum. Genet. 50, 181–194. 2. O’Donnell, C. J., Lindpainter, K., Larson, M. G., Rao, V. S., Ordovas, J. M., Schaefer, E. J., et al. (1998) Evidence for association and genetic linkage of the angiotensin-convertingenzymelocuswithhypertensionandbloodpressureinmen but not women in the Framingham Heart Study. Circulation 97, 1766–1772. 3. Hegele,R.A.,Harris,S.B.,Hanley,A.J.G.,andZinman,B.(1999)Association betweenAGTcodon235polymorphismandvariationinserumconcentrationsof creatinine and urea in Canadian Oji-Cree. Clin. Genet. 55, 438–443. 4. Allison,D.B.,Heo,M.,Schork,N.J.,andElston,R.C.(1998)Extremeselection strategies in gene mapping studies of oligogenic quantitative traits do not always increase power. Hum. Heredity 48, 97–107. 5. Schork,N.J.,Nath,S.K.,Fallin,D.,andChakravarti,A.(2000)Linkagedisequilib- riumanalysisofbiallelicDNAmarkers,humanquantitativetraitloci,andthreshold- defined case and control subjects Am. J. Hum. Genet. 67, 1208–1218. 6. Risch,N.andZhang,H.(1995)Extremediscordantsibpairsformappingquantita- tive trait loci in humans. Science 268, 1584–1589. 7. Allison,D.B.(1997)Transmission-disequilibriumtestsforquantitativetraits.Am. J. Hum. Genet. 60, 676–690. 8. Juo, S.-H.H., Wyszynski, D. F., Beaty, T. H., Huang, H.-Y., and Bailey-Wilson, J. E. (1999) Mild association between the A/G polymorphism in the promoter of the apolipoprotein A-I gene and apolipoprotein A-I levels: a meta-analysis. Am. J. Med. Genet. 82, 235–241. 9. Risch, N. J. (2000) Searching for genetic determinants in the new millennium. Nature 405, 847–856. 10. Page,G.P.andAmos,C.I.(1999)Comparisonoflinkage-disequilibriummethods for localization of genes influencing quantitative traits in humans. Am. J. Hum. Genet. 64, 1194–1205.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.