ebook img

Topics on high dimensional statistical inference and ANOVA for longitudinal data PDF

189 Pages·2015·2.02 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Topics on high dimensional statistical inference and ANOVA for longitudinal data

Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2011 Topics on high dimensional statistical inference and ANOVA for longitudinal data Pingshou Zhong Iowa State University Follow this and additional works at:https://lib.dr.iastate.edu/etd Part of theStatistics and Probability Commons Recommended Citation Zhong, Pingshou, "Topics on high dimensional statistical inference and ANOVA for longitudinal data" (2011).Graduate Theses and Dissertations. 12245. https://lib.dr.iastate.edu/etd/12245 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please [email protected]. Topics on high dimensional statistical inference and ANOVA for longitudinal data by Pingshou Zhong A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Statistics Program of Study Committee: Song Xi Chen, Major Professor Peng Liu William Q. Meeker Dan Nettleton Zhengyuan Zhu Iowa State University Ames, Iowa 2011 Copyright (cid:13)c Pingshou Zhong, 2011. All rights reserved. ii DEDICATION To my family iii TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix CHAPTER 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 High Dimensional Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 High Dimensional Tests for Regression Coefficients . . . . . . . . . . . . 4 1.2.2 Threshold Test for High Dimensional Mean under Dependency . . . . . 7 1.3 ANOVA Tests for Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Empirical Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 CHAPTER 2. Tests for High Dimensional Regression Coefficients with Fac- torial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Models and Test Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.1 F-test and Its Performances under High Dimensionality . . . . . . . . . 24 2.2.2 A New Test Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3 U-Statistics under High Dimensionality . . . . . . . . . . . . . . . . . . . . . . 27 2.4 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5 Generalization to Factorial Designs . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.6 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 iv 2.7 Association Test for Gene-sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.8 Appendix: Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 CHAPTER 3. Threshold Test for High Dimensional Mean under Dependency 66 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.2 Large Deviation Approximation to the Mean and Variance . . . . . . . . . . . . 70 3.3 Asymptotic Distribution of T . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 n 3.4 Extension to the Maximum Test . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.5 Optimal Detection Boundary and the Best Power . . . . . . . . . . . . . . . . . 77 3.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.7 Appendix: Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 CHAPTER 4. ANOVA for Longitudinal Data with Missing Values . . . . . . 124 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 4.2 Models, Hypotheses and Missing Values . . . . . . . . . . . . . . . . . . . . . . 127 4.3 ANOVA Test for Covariate Effects . . . . . . . . . . . . . . . . . . . . . . . . . 130 4.4 ANOVA Test for Time Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 4.5 Tests on Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 4.6 Bootstrap Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 4.7 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 4.8 Analysis on HIV-CD4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 4.8.1 Monotone Missingness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 4.8.2 Not-monotone Missingness . . . . . . . . . . . . . . . . . . . . . . . . . 153 4.9 Appendix: Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 CHAPTER 5. Summary and General Discussion . . . . . . . . . . . . . . . . . 177 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 v LIST OF TABLES Table 2.1 Empirical size and power of the F-test, the EB test and the proposed test (new) for H : β = 0 vs H : β 6= 0 at significant level 5% 0 p×1 1 p×1 for normal residual. LP represents the theoretical local power. . . . . . 39 Table 2.2 Empirical size and power of the F-test, the EB test and the proposed test (new) for H : β = 0 vs H : β 6= 0 at significant level 5% for 0 p×1 1 p×1 centralized gamma residual. LP represents the theoretical local power. 40 Table 2.3 Empirical size and power of the EB test and the proposed test (new) for H : β = 0 vs H : β 6= 0 at significant level 5% for normal 0 p×1 1 p×1 residual. LP represents the theoretical local power. . . . . . . . . . . . 41 Table 2.4 Empirical size and power of the EB test and the proposed test (new) for H : β = 0 vs H : β 6= 0 at significant level 5% for centralized 0 p×1 1 p×1 gamma residual. LP represents the theoretical local power. . . . . . . . 42 Table 2.5 Empirical size and power of the proposed test for H : β = 0 in a 0 p×1 2×2 factorial design with n = 20 and n = 30 replicates in each cell. 44 1 2 Table 2.6 P-values of the GO terms which are significant under at least three designs using the proposed test, and their number of genes. . . . . . . 49 Table 3.1 Empirical sizes of the Oracle test, C-Q test, FDR, maximum test and threshold tests with different threshold levels λ = 2slog(p) for Guas- n sian process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Table 3.2 Empirical sizes of the Oracle test, C-Q test, FDR, maximum test and threshold tests with different threshold levels λ = 2slog(p) for process n with standardized Gamma(2,2) marginal distribution. . . . . . . . . . 88 vi Table 4.1 Empiricalsizeandpowerofthe5%ANOVAtestforH : β = β = β .144 0a 10 20 30 Table 4.2 Empirical size and power of the 5% test for the existence of interaction H : γ = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 0c 20 Table 4.3 Empiricalsizeandpowerofthe5%ANOVAtestforH : γ = γ = γ .146 0d 10 20 30 Table 4.4 Empiricalsizeandpowerofthe5%ANOVAtestforH : g (·) = g (·) = 0b 1 2 g (·) with ∆ (t) = U sin(2πt) and ∆ (t) = 2sin(2πt)−2sin(2π(t+V ))147 3 2n n 3n n Table 4.5 The empirical sizes and powers of the proposed test (CZ) and the test (SZ) proposed by Scheike and Zhang (1998) for H : g (·) = g (·) vs 0b 1 2 H : g (·) = g (·)+∆ (·). . . . . . . . . . . . . . . . . . . . . . . . . 148 1b 1 2 2n Table 4.6 Differences in the AIC and BIC scores among three models (M1-M3) . 149 Table 4.7 Parameter estimates and their standard errors . . . . . . . . . . . . . . 150 Table 4.8 P-values of ANOVA tests for β s. . . . . . . . . . . . . . . . . . . . . . 151 j Table 4.9 P-values of ANOVA tests on g (·)s. . . . . . . . . . . . . . . . . . . . 152 j Table 4.10 Differences in the AIC and BIC scores among three models (M1-M3) for d = 1.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Table 4.11 Parameter estimates and their standard errors with d = 1,2,3 . . . . . 154 Table 4.12 P-values of ANOVA tests on βs with d = 1,2,3 . . . . . . . . . . . . . 155 Table 4.13 P-values of ANOVA tests on g (·)s with d = 1,2,3 . . . . . . . . . . . 156 j vii LIST OF FIGURES Figure 2.1 The auto-correlation functions for series {X }p . . . . . . . . . . . . . 37 ij j=1 Figure 2.2 The null distributions of standardized T . . . . . . . . . . . . . . . . . 43 n,p Figure 2.3 Histograms of the p-values on all GO terms using the proposed tests. . 46 Figure 2.4 Histograms of the p-values on all GO terms using Empirical Bayes (EB) tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Figure 2.5 Differences in the p-values among Designs I-IV. . . . . . . . . . . . . . 48 Figure 3.1 The detectable region of the threshold and the maximum test in (β,r) plane, which is the union of I-IV areas in the plot.. . . . . . . . . . . . 80 Figure 3.2 The histograms for the simulated null distributions of standardized T using plug-in, theoretical variance (3.3.14) estimate and spectral n smoothed variance estimate introduced in the Appendix. The (p,n) is (1000,20). Marginal distribution: Gaussian. . . . . . . . . . . . . . . 83 Figure 3.3 The histograms for the simulated null distributions of standardized T using plug-in, theoretical variance (3.3.14) estimate and spectral n smoothed variance estimate introduced in Appendix. The (p,n) is (2000,30). Marginal distribution: Gaussian. . . . . . . . . . . . . . . . 84 Figure 3.4 The histograms for the simulated null distributions of standardized T using plug-in, theoretical variance (3.3.14) estimate and spectral n smoothed variance estimate introduced in Appendix. The (p,n) is (2500,40). Marginal distribution: Gaussian. . . . . . . . . . . . . . . . 85 viii Figure 3.5 The ROC curves of the Oracle test, C-Q test, FDR test, Maximum test and the threshold test at different levels with Type I error between 0- 0.2. From top to bottom, r = 0.4,0.6 and 0.9. From left to right panels β = 0.6,0.7,0.8. (p = 1000,n = 20) . . . . . . . . . . . . . . . . . . . . 89 Figure 3.6 The ROC curves of the Oracle test, C-Q test, FDR test, Maximum test and the threshold test at different levels with Type I error between 0- 0.2. From top to bottom, r = 0.4,0.6 and 0.9. From left to right panels β = 0.6,0.7,0.8. (p = 2000,n = 30) . . . . . . . . . . . . . . . . . . . . 90 Figure 3.7 The ROC curves of the Oracle test, C-Q test, FDR test, Maximum test and the threshold test at different levels with Type I error between 0- 0.2. From top to bottom, r = 0.4,0.6 and 0.9. From left to right panels β = 0.6,0.7,0.8. (p = 2500,n = 40) . . . . . . . . . . . . . . . . . . . . 91 Figure 4.1 (a) The raw data plots with the estimates of g (t) (j = 1,2,3,4). (b) j The estimates of g (t) in the same plot: Treatment I (solid line), Treat- j ment II (short dashed line), Treatment III (dashed and doted line) and Treatment IV (long dashed line). . . . . . . . . . . . . . . . . . . . . . 151 ix ACKNOWLEDGEMENTS First and foremost, I would like to thank Dr. Song Xi Chen for his guidance, support, encouragement and tremendous effort to my research projects during my Ph.D studies. His enthusiasm and dedication toward the research will always inspire me. I am also grateful to Dr. Dan Nettleton and Dr. Peng Liu for many useful discussion and suggestions. I also appreciate my committee members for their efforts and advices to my work: Dr. Wayne Fuller, Dr. William Meeker and Dr. Zhengyuan Zhu. Sincerely thank Dr. Ken Koehler for various help. IalsothankDr. LongQuforprovidingmetheYorkshiregiltdatasetandintroducingthe biological background. Thanks also goes to Heng Wang for her support and encouragement. Finally, thanks my family for consistent support.

Description:
Topics on high dimensional statistical inference and ANOVA for longitudinal data by. Pingshou Zhong. A dissertation submitted to the graduate faculty.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.