ebook img

mixed anova model analysis of microarray experiments with locally PDF

44 Pages·2004·0.3 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview mixed anova model analysis of microarray experiments with locally

MIXED ANOVA MODEL ANALYSIS OF MICROARRAY EXPERIMENTS WITH LOCALLY POOLED ERROR Yuan Liu A Thesis Submitted to the University of North Carolina at Wilmington in Partial Fulfillment Of the Requirements for the Degree of Master of Arts Department of Mathematics and Statistics University of North Carolina at Wilmington 2004 Approved by Advisory Committee Chair Accepted by Dean, Graduate School TABLE OF CONTENTS ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Mixed ANOVA Model . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Local Pooling of Errors (LPE) . . . . . . . . . . . . . . . . . . 7 2 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 DATA ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.1 Yeast Data Background . . . . . . . . . . . . . . . . . . . . . 12 3.2 Mixed ANOVA Model . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4 DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 APPENDIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Appendix A. SAS code for the residual variance method . . . . . . . 33 ii ABSTRACT The determination of a list of differentially expressed genes is a basic objective in many cDNA microarray experiments. Combining information across genes in the statistical analysis of microarray data is desirable because of relatively small number of data points obtained for each individual gene. Our LPE approach finds a middle ground between global F test and gene-specific F test by pooling the information across a group of genes that have similar variance estimates and shrinks the within- gene variance estimate towards an estimate including more genes. This method provides a powerful and robust approach to test differential expression of genes but does not suffer from biases of the global F test and low power of gene-specific F test. In our approach the two-stage Mixed ANOVA model provides a conceptually and computationally efficient means to analyze the microarray data. iii DEDICATION This thesis is dedicated to my parents, my husband and all statistics faculty at de- partment of Mathematics and Statistics of UNCW. Thanks for their encouragement, love and support. iv LIST OF TABLES 1 Phases of Microarray Experimentation . . . . . . . . . . . . . . . . . 3 2 Pairwise comparisons of four methods. . . . . . . . . . . . . . . . . . 19 3 Pairwise comparisons of four methods without the 474 genes not eval- uated by gene-specific method. . . . . . . . . . . . . . . . . . . . . . . 20 4 Pairwise comparisons of four methods for those genes whose fold- change ≥ 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5 Pairwisecomparisonsoffourmethodsforthosegeneswhoselog2(fold- change) ≤ 0.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 v LIST OF FIGURES 1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Gene significance results by Gene specific method. . . . . . . . . . . . 23 3 Gene significance results by the residual variance method. . . . . . . . 24 4 Gene significance results comparison for the gene-specific method and the residual variance method. . . . . . . . . . . . . . . . . . . . . . . 25 5 Gene significance results comparison for the gene-specific method and the expression mean method . . . . . . . . . . . . . . . . . . . . . . . 26 6 Gene significance results comparison for the gene-specific method and the expression variance method . . . . . . . . . . . . . . . . . . . . . 27 7 Gene significance results comparison for the expression mean method and the residual variance method . . . . . . . . . . . . . . . . . . . . 28 8 Genesignificanceresultscomparisonfortheexpressionvariancemethod and the residual variance method . . . . . . . . . . . . . . . . . . . . 29 9 The distribution of σ(cid:98) . . . . . . . . . . . . . . . . . . . . . . . . . . 30 g vi 1 INTRODUCTION 1.1 Background The advent of the genome project has vastly increased our knowledge of the ge- nomic sequences of humans and other organisms. Various techniques such as cDNA microarrays and high-density oligonucleotide arrays have been developed to exploit this growing body of science and promise a wealth of data that can be used to de- velopamorecompleteunderstandingofgenefunction, regulationandinteraction[1]. In this paper, our discussion is mainly relevant to cDNA microarrays. Spot- ted cDNA microarrays are a tool for high-throughput analysis of gene expression which provides rapid, parallel surveys of gene expression patterns for hundreds or thousands of genes in a single array. In the first step of the technique, DNA is ”spotted” and immobilized on glass slides or other substrate, the microarrays. Each spot on an array contains a particular sequence, although a sequence may be spotted multiple times per array. Next, mRNA from cell population under study is reverse- transcribed into cDNA and one of two fluorescent dye labels, Cy3 (green) and Cy5 (red), is incorporated. Two pools of differently-labeled cDNA are mixed and washed over an array. Dye-labeled cDNA can hybridize with complementary sequences on the array, and any unhybridized cDNA is washed off. The array is then scanned for Cy3 and Cy5 fluorescent intensities. Although there are many unknown quantities in a microarray hybridization, such as the sizes and densities of the probe spots, and the hybridization and labeling efficiencies of different sequences, the basic principle is the following: for a given sequence spotted on the array, if one sample contains more of the corresponding transcript, the signal intensity for the dye used to label that sample should be higher than the other dye. Aside from the enormous scientific potential of microarrays to help in understanding gene regulation and interactions, they have very important applications in pharmaceutical and clinical research. By comparing gene expression in normal and disease cells, microarrays may be used to identify disease genes and target for therapeutic drugs [2]. Any microarray experiment involves a number of distinct phases. Table 1 gives a schematic view of these phases of microarray experimentation that involve data- analytic steps [3]. In this paper, we focus on the identification of differentially expressed genes across experimental conditions in Data Analysis step by exploring a statistical approach to improving estimates of variability of differential expression. Microarray experiments generate large and complex multivariate data sets. On a single glass slide, 10,000 to 20,000 cDNA probes can be spotted [4]. The current bottleneck in the processing of microarray data occurs after the data are generated. The difficulties stem primarily from myriad potential sources of random and sys- tematic measurement error in the microarray process and from the small number of replications (both biological and technical replications) relative to the large number of variables (probes)[5]. Statistical methods have been used as a way to systemati- cally extract biological information and to assess the associated uncertainty. The simplest statistical method for detecting differential expression is the t test, which can be used to compare two conditions when there is replication of samples, based on the the fold change or the base 2 log of the expression ratio. Since gene- specific t test (use an estimate of error variance from one gene at a time) and global t test (assume the homogeneous variance between different genes and use an esti- mate of pooled error variance across all genes) are subject to low power and bias, respectively; while modified versions of t test find a middle ground. One version is regularized t test proposed by Baldi and Long [6] replaces the denominator for 2 Table 1: Phases of Microarray Experimentation Experimental Design Choice of sample size Assignment of experimental conditions to arrays Signal Extraction Image analysis Gene filtering Probe level analysis of oligonucleotide arrays Normalization and removal of artifacts for comparisons across arrays Data Analysis Selection of genes that are differentially expressed across experimental conditions Clustering and classification of biological samples Clustering and classification of genes Validation and Interpretation Comparisons across platform Use of multiple independent datasets 3 a gene-specific t test with a Bayesian estimator based on a hierarchical prior dis- tribution. In SAM (significance analysis of microarrays) version of t test, a small positive constant is added to the denominator of the gene-specific t-test to stabilize the small variances. When it comes to more than two conditions or more complex (multi-factor) experimental design, it is not enough simply to compute ratios. The analysis of variance (ANOVA) model can be applied to cDNA microarray data from any experimental design; however, microarray ANOVA models are not based on ra- tios but are applied directly to relative expression values. [7] 1.2 Mixed ANOVA Model Every measurement in a microarray experiment is associated with a particular com- bination of an array in the experiment, a dye (red or green), a variety (can be treatment or experimental conditions), and a gene. Let y be the log fluorescent ijkgr intensity from the rth spot for gene g on array i for dye j and variety k [8]. A typical ANOVA model for a micorarray experiment can be the form of y = µ+A +D +(AD) +G +(AG) +(DG) +(VG) +ε (1) ijkgr i j ij g igr jg kg ijkgr Here, µ is the overall mean expression level; the array effects A account for dif- i ferences between arrays averaged over all genes, dyes, and varieties; the dye effects D account for differences between the average signal form each dye; (AD) is the j ij term accounting for effects of the interaction between the array and the dye. These three kinds of effects account for overall variation in array and dyes and also are considered as ”global” effects. They are not of interest, but accounting for them amounts to data normalization. In addition to these ”global” normalization terms, there are source of variation to consider at the level of individual genes. They are the 4

Description:
Appendix A. SAS code for the residual variance method . statistical analysis of microarray data is desirable because of relatively small number of data points
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.