EXPLORING THE GENETIC ARCHITECTURE OF LATE-ONSET ALZHEIMER DISEASE IN AN AMISH POPULATION By Anna Christine Cummings Dissertation Submitted to the Faculty of the Graduate School of Vanderbilt University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Human Genetics December, 2012 Nashville, Tennessee Approved: Professor Dana C. Crawford Professor Jonathan L. Haines Professor William K. Scott Professor Michael G. Tramontana Professor Bingshan Li i To my husband, Christopher, and son, Titus And To my parents, Bernard and Christine Davis ii ACKNOWLEDGEMENTS The work presented in this dissertation was supported by NIH grants AG019085 to Jonathan L. Haines and AG019726 to William K. Scott, a Discovery Grant from Vanderbilt University, and Michael J Fox Foundation grants. I would like to thank all individuals and communities for so graciously participating in these studies. None of this work would have been possible without them. The work presented here was guided and greatly improved by the input from my thesis committee: Dana Crawford (my committee chair), William Scott, Bingshan Li, and Michael Tramontana. Special acknowledgements are due to my mentor, Jonathan Haines. I am especially grateful for his expertise, guidance, time, and patience. I am thankful for all the members of the Haines lab (Nathalie Schnetz-Boutaud, Ping Mayo, Brent Anderson, Melissa Allen, Jacob McCauley, William Bush, Kylee Spencer, Sharon Liang, Rebecca Zuvich, Olivia Veatch, Mary Davis, Joshua Hoffman, and Laura D’Aoust) for being helpful in so many ways and for creating an enjoyable, fun, and collaborative work environment. I would also like to thank all members of the CHGR. So many people have played a role in this work for which I am very grateful. A special thanks is due to Lan Jiang who patiently provided much training for genetic analyses in the Amish. I would also like to thank all of my fellow students who have been so kind and supportive along the way. iii TABLE OF CONTENTS Page DEDICATION ............................................................................................................ ii ACKNOWLEDGEMENTS ....................................................................................... iii LIST OF TABLES ....................................................................................................... iv LIST OF FIGURES ................................................................................................... viii LIST OF APPENDICES ............................................................................................ ix Chapter I. INTRODUCTION.............................................................................................1 Pathophysiology and diagnosis of Alzheimer disease .................................. 2 Epidemiology of and risk factors for Alzheimer disease ............................. 5 The search for genetic risk factors for late-onset Alzheimer disease ........... 7 The utility of genetically isolated populations ............................................. 11 The Amish .......................................................................................................... 12 Previous work…………………………………………………………………14 Summary ............................................................................................................ 16 II. QUALITY CONTROL PROCEDURES FOR A GENOME-WIDE STUDY IN AN AMISH POPULATION .....................................................17 Introduction ....................................................................................................... 17 Methods .............................................................................................................. 20 Results ................................................................................................................ 23 Discussion .......................................................................................................... 35 III. GENOME-WIDE LINKAGE AND ASSOCIATION STUDY FOR ALZHEIMER DISEASE IN AN AMISH POPULATION .......................38 Introduction ....................................................................................................... 38 Methods .............................................................................................................. 40 Subjects .................................................................................................. 40 Clinical data .......................................................................................... 40 Genotyping ........................................................................................... 41 Statistical analysis ................................................................................ 43 Evaluation of the MMSE and Word List Learning .......................... 45 iv Results ................................................................................................................ 48 APOE ..................................................................................................... 48 Genome-wide Association .................................................................. 49 Genome-wide Linkage ........................................................................ 52 Evaluation of the MMSE and Word List Learning .......................... 55 Discussion .......................................................................................................... 62 Acknowledgements .......................................................................................... 66 IV. SEQUENCE ANALYSIS OF A NOVEL ALZHEIMER DISEASE CANDIDATE GENE: CTNNA2 ...................................................................67 Introduction ....................................................................................................... 67 Methods .............................................................................................................. 69 Study population ................................................................................. 69 Sequencing ............................................................................................ 69 Sequence processing ............................................................................ 72 Analysis ................................................................................................. 73 Genotyping ........................................................................................... 74 Results ................................................................................................................ 74 Variants in the exons ........................................................................... 74 Extra-exonic variants ........................................................................... 77 Discussion .......................................................................................................... 81 Acknowledgements .......................................................................................... 82 V. CONCLUSION ................................................................................................84 Summary ............................................................................................................ 84 Future Directions .............................................................................................. 87 REFERENCES ...........................................................................................................99 v LIST OF TABLES Table Page 1.1 Late-onset Alzheimer disease genes…………………………………………………10 1.2 Expected kinship coefficients for some familial relationships ............................... 14 1.3 Regions with LOD score >3 in previously published linkage scans in subsets of the current Amish dataset ............................................................................................. 16 2.1 Average percentage of heterozygosity for SNPs on the X chromosome for individuals whose reported and genetic sex are potentially discrepant .............. 26 2.2 Lowest mean IBS for sibling pairs ............................................................................... 32 2.3 Highest mean IBS for other relatives pairs ................................................................. 33 3.1 Genome-wide dataset .................................................................................................... 43 3.2 MQLS-corrected APOE allele frequencies .................................................................. 48 3.3 Age of onset and number of affected versus unaffected individuals by APOE genotype .......................................................................................................................... 49 3.4 Most significant genome-wide association results .................................................... 51 3.5 Most significant multipoint linkage results ............................................................... 54 3.6 MMSE and Word list learning Z scores per LOAD risk group defined by APOE ... …………………………………………………………………………………………...58 3.7 Kruskal Wallis test results with follow-up two-sample Wilcoxon rank sum test results ............................................................................................................................... 59 3.8 Analysis of covariance test results with follow-up pairwise test results ............... 60 3.9 Spearman’s correlation between 2p12 lod scores and Z scores of MMSE and Word List learning ......................................................................................................... 61 vi 4.1 Sequencing dataset characteristics including total number of individuals, APOE genotype, and mean and range of ages of exam and onset ..................................... 70 4.2 Whole exome sequence quality of dataset used for analysis ................................... 71 4.3 Summary of all detected SNVs in the exons of CTNNA2 and LRRTM1 ................ 76 4.4 Summary of selected non-exonic SNVs with at least a 30% difference in allele frequency between LOAD and cognitively normal individuals in the subpedigrees showing the most evidence for linkage at 2p12 ............................... 79 4.5 Sequenom-generated genotype results of rs72822556 ............................................. 80 vii LIST OF FIGURES Figure Page 2.1 Quantile-quantile plots of MQLS p-values before (a) and after (b) removing additional SNPs with MQLS-adjusted minor allele frequencies <0.05 .................. 23 2.2 Manhattan plots of the MQLS resultsbefore and after removing additional SNPs with MQLS-adjusted minor allele frequencies <5% ................................................. 24 2.3 Population structure of the Amish .............................................................................. 28 2.4 Output from Graphical Representation of Relationships using raw data ............. 30 2.5 Output from Graphical Representation of Relationships ........................................ 31 2.6 Flowchart of SNP and sample quality control procedures ...................................... 34 3.1 MQLS Manhattan plot ................................................................................................... 52 3.2 Strongest multipoint linkage peaks ............................................................................. 55 viii LIST OF APPENDICES Appendix Page A. Most significant genome-wide association results, stratified .................................. 90 B. Regions with at least one SNP with a two-point HLOD ≥ 3……………………….91 C. Distributions of Z scores from the Mini-Mental State Exam (MMSE_Z), Word List Memory trials 1-3, delayed recall, delayed recall, savings, recognition-yes, and recognition-no ................................................................................................................ 96 D. Scatter plots of recessive (left) and dominant (right) per-family lod scores versus Z scores from the Mini-Mental State Exam (MMSE_Z), Word List Memory trials 1-3, delayed recall, savings, recognition-yes, and recognition-no .......................... 97 E. Spearman’s correlation between 2p12 lod scores and Z scores of Word List learning with MMSE Z as a covariate ......................................................................... 98 ix CHAPTER I INTRODUCTION Alzheimer Disease (AD) is the most common cause of dementia, affects over 5 million individuals over the age of 65 in the United States (1), is the fifth-leading cause of death in the United States for individuals over the age of 65 (2), and is an increasingly serious public health issue. AD is a progressive neurodegenerative disorder of the brain characterized by loss of memory and cognitive abilities, development of neuropsychiatric symptoms and behavioral changes, and loss of daily independent function. With inadequate treatments and no cure, the nature of this disease puts a heavy burden on individuals, their families, caregivers, and society as a whole. With our aging population, this burden will only increase as the number of affected individuals is expected to triple by 2050 (1). AD can be divided into two categories: early-onset and late-onset. Individuals younger than 65 have the early-onset form but account for less than 5% of all AD cases (3). Dominant mutations in three genes cause susceptibility to the majority of early- onset familial AD: amyloid precursor protein [APP] (4) and presenilin 1 and 2 [PS1, PS2] genes (5-7). However, these three genes combined only contribute to less than 2% of all cases of AD. The much more common form, late-onset Alzheimer disease (LOAD), describes AD when it occurs in individuals older than or equal to 65 (8). Unlike early- onset where most of the genetic risk is identified and follows a simple Mendelian pattern, the majority of the genetic risk of LOAD is unexplained and has a much more 1
Description: