STATISTICAL APPLICATIONS IN PLANT BREEDING AND GENETICS By CARL ALAN WALKER A dissertation submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY IN CROP SCIENCE WASHINGTON STATE UNIVERSITY Department of Crop and Soil Sciences MAY 2012 To the Faculty of Washington State University: The members of the Committee appointed to examine the dissertation of CARL ALAN WALKER find it satisfactory and recommend that it be accepted. Kimberly Garland-Campbell, Ph.D., Chair Fabiano Pita, Ph.D. J. Richard Alldredge, Ph.D. Richard Gomulkiewicz, Ph.D. Daniel Skinner, Ph.D. ii ACKNOWLEDGEMENT I would like to thank my committee members for their advice and assistance with this research and with writing this dissertation. I would like to thank all the members of both the Campbell and Steber labs for their advice when I presented my work in lab meetings. I began the project presented in Chapter 3 as part of a paid internship with Dow AgroSciences. I would like to thank the members of the Dow AgroSciences Quantitative Genetics group for their assistance during that internship, especially Kelly Robins who provided some initial programs and data. I would also like to acknowledge Bruce Walsh, Rebecca Doerge, and Radu Totir for the valuable advice they gave me at conferences where I presented my work. I would not have been able conduct this research without the funding for these projects by the Washington Grain Commission and USDA project 5348-21000-023-00. Finally I‟d like to thank my parents for all their help getting me this far and my wife Elizabeth for her help editing and moral support. iii STATISTICAL APPLICATIONS IN PLANT BREEDING AND GENETICS ABSTRACT by Carl Alan Walker, Ph.D. Washington State University May 2012 Chair: Kimberly Garland-Campbell Statistical analysis has many applications ensuring the validity and reproducibility of plant breeding and genetics research. Crop plant germplasm collections are often too large to be of use regularly. A core subset with fewer accessions can increase utility while maintaining most of the genetic diversity of the complete collection. This study evaluated methods for selecting core subsets using sparse data. Cores were selected by forming clusters of accessions based on distances estimated with phenotypic data. Accessions were randomly selected relative to the number of accessions in each cluster. The method using all the available data to calculate distances, average linkage clustering, and sampling in proportion to the natural logarithm of cluster size produced the most diverse cores. Evaluations of genotypes in varied environmental conditions are referred to as multiple environment trials (MET) and often necessitate estimation of effects of genotypes within environments. Empirical best linear unbiased predictions can provide more accurate estimates of these effects, depending upon the mixed model used. An objective of this work was to simulate and analyze MET data sets to determine which models provide the most accurate estimates in varied MET conditions. Simulated MET were fit with mixed models with or without genetic relationship matrices (GRM) and with structures of varying complexity used to model relationships among environments. The model that included a GRM and a constant variance-constant correlation structure was the most accurate for the iv largest number of scenarios. More complex models were the most effective for a smaller subset of scenarios, most involving many genotypes and low experimental error. Statistical analyses were applied in consultation with other researchers for two projects studying Fusarium crown rot of wheat and one on cold tolerance of wheat. Heritability and genetic correlations were calculated for Fusarium resistance assays in field, growth chamber, and terrace bed settings. Factor analysis was used to estimate latent factors from field characteristic variables, which were used as predictor variables in linear mixed models and generalized linear mixed models. Cold tolerance among genotypes was assessed with logistic regression. v TABLE OF CONTENTS ACKNOWLEDGEMENT ........................................................................................................ III ABSTRACT .............................................................................................................................. IV TABLE OF CONTENTS .......................................................................................................... VI LIST OF TABLES .................................................................................................................... IX LIST OF FIGURES .................................................................................................................... X LITERATURE REVIEW ............................................................................................................... 1 CORE SUBSETS OF GERMPLASM COLLECTIONS ........................................................................... 1 MIXED MODELS FOR MULTIPLE ENVIRONMENT TRIALS ............................................................. 6 HERITABILITY AND GENETIC CORRELATION ............................................................................. 13 DIMENSION REDUCTION FOR LINEAR MODELING ..................................................................... 17 EXTREME COLD TOLERANCE IN WHEAT ................................................................................... 20 REFERENCES ............................................................................................................................. 22 METHODS FOR SELECTING GERMPLASM CORE SUBSETS USING SPARSE PHENOTYPIC DATA.................................................................................................................. 30 ABSTRACT ................................................................................................................................. 30 INTRODUCTION.......................................................................................................................... 32 MATERIALS AND METHODS ...................................................................................................... 36 RESULTS ................................................................................................................................... 42 DISCUSSION .............................................................................................................................. 44 Conclusion ........................................................................................................................ 47 APPENDIX ................................................................................................................................. 47 vi REFERENCES ............................................................................................................................. 50 COMPARISON OF LINEAR MIXED MODELS FOR MULTIPLE ENVIRONMENT PLANT BREEDING TRIALS ................................................................................................................... 64 ABSTRACT ................................................................................................................................. 64 INTRODUCTION.......................................................................................................................... 65 METHODS .................................................................................................................................. 68 Simulations ....................................................................................................................... 68 Analyses ............................................................................................................................ 70 RESULTS AND DISCUSSION ........................................................................................................ 74 Justification of Approach .................................................................................................. 74 Choice of a Default Model ................................................................................................ 75 Models for Specific Scenarios .......................................................................................... 76 DISCUSSION .............................................................................................................................. 78 Conclusions ....................................................................................................................... 80 APPENDIX: REAL DATA AS A BASIS FOR SIMULATIONS ............................................................. 81 REFERENCES ............................................................................................................................. 83 CONSULTING PROJECTS ......................................................................................................... 98 HERITABILITY AND GENETIC CORRELATION ANALYSES FOR FUSARIUM CROWN ROT RESISTANCE ASSAYS OF WHEAT MAPPING POPULATION .......................................................... 98 Abstract ................................................................................................................................. 98 Discussion of Statistical Methods ......................................................................................... 99 LINEAR MODELING OF THE RELATIONSHIPS BETWEEN WHEAT FIELD CHARACTERISTICS AND FUSARIUM CROWN ROT OBSERVATIONS ................................................................................. 106 vii Abstract ............................................................................................................................... 106 Discussion of Statistical Methods ....................................................................................... 107 LOGISTIC REGRESSION ANALYSIS OF WHEAT COLD TOLERANCE TESTING ............................ 112 Summary ............................................................................................................................. 112 Discussion of Methods ........................................................................................................ 113 REFERENCES ........................................................................................................................... 120 viii LIST OF TABLES Table 1. Measurement levels and missing value percentages of variables evaluated on the Triticum aestivum L. subsp. aestivum complete collection. ......................................................... 54 Table 2. Removal percentages by variable for simulating data sets with missing values by removing values from the "complete collection". .................................................................... 55 Table 3. Comparisons of core subset selection methods in terms of diversity of 1000 potential core subsets selected from 200 complete collections simulated with values removed at the rates given by set 1 (see Table 2) from accessions selected randomly from a uniform distribution. ................................................................................................................................... 56 Table 4. Comparisons of core subset selection methods in terms of diversity of 1000 potential core subsets selected from 200 complete collections simulated with values removed at the rates given by set 1 (see Table 2) from accessions selected as a contiguous group. .............. 57 Table 5. Comparisons of core subset selection methods in terms of diversity of 1000 potential core subsets selected from 200 complete collections simulated with values removed at the rates given by set 2 (see Table 2) from accessions selected randomly from a uniform distribution. ................................................................................................................................... 58 Table 6. Comparisons of core subset selection methods in terms of diversity of 1000 potential core subsets selected from 200 complete collections simulated with values removed at the rates given by set 2 (see Table 2) from accessions selected as a contiguous group. .............. 59 ix LIST OF FIGURES Figure 1. Plot of cumulative means, over simulations, of median recovery of interquartile range, over 1000 potential core subsets per simulation. Simulations were generated by removing values from randomly chosen individual accessions with missingness rates given by set 1. The values of the means of all 200 simulations are shown in Table 3……………………………..…60 Figure 2. Plot of cumulative means, over simulations, of median recovery of interquartile range, over 1000 potential core subsets, ranked across methods within each simulation. Simulations were generated by removing values from randomly chosen individual accessions with missingness rates given by set 1. The mean ranks, over all 200 simulations, are shown in Table 3.…………………………………………………………………………………………..61 Figure 1. Means, over simulations, of model ranks, where models were ranked in terms of RMSEP within each simulation. All scenarios evaluated are included, and index denotes each scenario‟s position in the order. Scenarios are ordered CS , CS H, CS VH, CS , CS H, A A A B B CS VH, Toep, ToepH, and then ToepVH, with the indices of the final scenarios of each group B equal to 76, 154, 230, 304, 380, 456, 532, 608, and 682, respectively. Within each of these patterns, numbers of environments are ordered 5, 10, 20, and then 40 environments. Within each number of environments, the numbers of genotypes are ordered 25, 50, 100, and then 150 genotypes. Within each number of genotypes, the experimental designs are ordered RCBD, MAD, and then unreplicated designs. Within each design, error variances are ordered 0.5 then 2.0.……………………………………………………………………………………………..…85 Figure 2. A standardized version of Figure 1, where models have been ranked within each scenario in terms of their mean ranks. The order of scenarios is the same as Figure 1…...86 x
Description: