ebook img

Computational and Statistical Approaches to Genomics PDF

425 Pages·2006·4.38 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Computational and Statistical Approaches to Genomics

COMPUTATIONAL AND STATISTICAL APPROACHES TO GENOMICS, SECOND EDITION COMPUTATIONAL AND STATISTICAL APPROACHES TO GENOMICS, SECOND EDITION editedby Wei Zhang University of Texas, M.D. Anderson Cancer Center and Ilya Shmulevich Institute for Systems Biology, Seattle, WA LibraryofCongressCataloging-in-PublicationData Computationalandstatisticalapproachestogenomics/editedbyWeiZhang andIlyaShmulevich.–2nded. p.cm. Includesbibliographicalreferencesandindex. ISBN-13:978-0-387-26287-1 ISBN-10:0-387-26287-3(alk.paper) ISBN-13:978-0-387-26288-8(e-book) ISBN-10:0-387-26288-1(e-book) 1.Genomics—Mathematicalmodels.2.Genomics—Statistical methods.3Genomics—Dataprocessing.4DNAmicroarrays.I.Zhang, Wei,1963-II.Shmulevich,Ilya,1969- QH438.4.M3C652005 572.8’6’015118—dc22 2005049761 (cid:2)c 2006SpringerScience+BusinessMedia,Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, Inc., 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similarordissimilarmethodologynowknownorhereafterdevelopedisforbidden. The use in this publication of trade names, trademarks, service marks and similar terms, even if they are not identified as such, is not to be taken as an expression of opinionastowhetherornottheyaresubjecttoproprietaryrights. PrintedintheUnitedStatesofAmerica. 9 8 7 6 5 4 3 2 1 SPIN11419631 springeronline.com Thisbookisdedicated toourfamiliesand colleagues Contents Preface ix 1 MicroarrayImageAnalysisandGeneExpressionRatioStatistics 1 Yidong Chen, Edward R. Dougherty, Michael L. Bittner, Paul Meltzer, and JefferyTrent 2 Statistical Considerations in the Assessment of cDNA Microarray Data 21 ObtainedUsingAmplification JingWang,KevinR.Coombes,KeithBaggerly,LimeiHu, StanleyR.Hamilton,andWeiZhang 3 SourcesofVariationinMicroarrayExperiments 37 KathleenF.Kerr,EdwardH.Leiter,LaurentPicard,andGaryA.Churchill 4 StudentizingMicroarrayData 49 Keith A. Baggerly, Kevin R. Coombes, Kenneth R. Hess, David N. Stivers, LynneV.Abruzzo,andWeiZhang 5 ExploratoryClusteringofGeneExpressionProfilesofMutatedYeastStrains 61 Merja Oja, Janne Nikkila¨, Petri To¨ro¨nen, Garry Wong, Eero Castre´n, and SamuelKaski 6 SelectingInformativeGenesforCancerClassificationUsingGene 75 ExpressionData TatsuyaAkutsuandSatoruMiyano 7 Finding Functional Structures in Glioma Gene-Expressions Using Gene 89 ShavingClusteringandMDLPrinciple Ciprian D. Giurcaneanu, Cristian Mircean, Gregory N. Fuller, and IoanTabus viii Contents 8 DesignIssuesandComparisonofMethodsforMicroarray-Based 119 Classification EdwardR.DoughertyandSanjuN.Attoor 9 AnalyzingProteinSequencesUsingSignalAnalysisTechniques 137 KarenM.BlochandGonzaloR.Arce 10 Scale-Dependent Statistics of the Numbers of Transcripts and Protein 163 SequencesEncodedintheGenome VladimirA.Kuznetsov 11 StatisticalMethodsinSerialAnalysisofGeneExpression(SAGE) 209 RicardoZ.N.VeˆncioandHelenaBrentani 12 NormalizedMaximumLikelihoodModelsforBooleanRegressionwith 235 ApplicationtoPredictionandClassificationinGenomics IoanTabus,JormaRissanen,andJaakkoAstola 13 InferenceofGeneticRegulatoryNetworksviaBest-FitExtensions 259 HarriLa¨hdesma¨ki,IlyaShmulevich,OlliYli-Harja,andJaakkoAstola 14 RegularizationandNoiseInjectionforImprovingGeneticNetworkModels 279 EugenevanSomeren,LodewykWessels,MarcelReinders,andEricBacker 15 ParallelComputationandVisualizationToolsforCodeterminationAnalysis 297 ofMultivariateGeneExpressionRelations EdwardB.Suh,EdwardR.Dougherty,SeungchanKim,MichaelL.Bittner, YidongChen,DanielE.Russ,andRobertL.Martino 16 SingleNucleotidePolymorphismsandTheirApplications 311 RudyGuerraandZhaoxiaYu 17 The Contribution of Alternative Transcription and Alternative Splicing to 351 theComplexityofMammalianTranscriptomes MihaelaZavolanandChristianScho¨nbach 18 Computational Imaging, and Statistical Analysis of Tissue Microarrays: 381 QuantitativeAutomatedAnalysisofTissueMicroarrays AaronJ.Berger,RobertL.Camp,andDavidL.Rimm Index 405 Preface During the three years after the publication of the first edition of this book,thecomputationalandstatisticalresearchingenomicshavebecome increasingly more important and indispensable for understanding cellular behaviorunderavarietyofenvironmentalconditionsandfortacklingchal- lenging clinical problems. In the first edition, the organizational structure was:data → analysis → synthesis → application.Inthesecondedition, wehavekeptthesamestructurebuthavedecidedtoeliminatechaptersthat primarilyfocusedonapplications. Ourdecisionwasmotivatedbyseveralfactors.Firstly,themainfocusof thisbookiscomputationalandstatisticalapproachesingenomicsresearch. Thus, the main emphasis is on methods rather than on applications. Secondly, many of the chapters already include numerous examples of applicationsofthediscussedmethodstocurrentproblemsinbiology. We have tried to further broaden the range of topics to which end we have included newly contributed chapters on topics such as alternative splicing,tissuemicroarrayimageanddataanalysis,singlenucleotidepoly- morphisms, serial analysis of gene expression, and gene shaving. Additionally,anumberofchaptershavebeenupdatedorrevised.Wethank allthecontributingauthorsfortheircontributionsandhopethatyouenjoy readingthisbook. WeiZhang Houston,TX IlyaShmulevich Seattle,WA Chapter 1 MICROARRAY IMAGE ANALYSIS AND GENE EXPRESSION RATIO STATISTICS Yidong Chen1, Edward R. Dougherty2, Michael L. Bittner1, Paul Meltzer1, andJefferyTrent1 1CancerGeneticsBranch,NationalHumanGenomeResearchInstitute,NationalInstitutesof Health,Bethesda,Maryland,USA 2DepartmentofElectricalEngineering,TexasA&MUniversity,CollegeStation,Texas,USA 1. Introduction A cell relies on its protein components for a wide variety of its functions, includingenergyproduction,biosynthesisofcomponentmacro-molecules, maintenance of cellular architecture, and the ability to act upon intra- and extra-cellular stimuli. Each cell in an organism contains the informa- tion necessary to produce the entire repertoire of proteins the organism can specify. Since a cell’s specific functionality is largely determined by the genes it is expressing, it is logical that transcription, the first step in the process of converting the genetic information stored in an organism’s genome into protein, would be highly regulated by the control network thatcoordinatesanddirectscellularactivity.Aprimarymeansforregulat- ing cellular activity is the control of protein production via the amounts of mRNA expressed by individual genes. The tools required to build an understanding of genomic regulation of expression reveal the probability characteristicsoftheseexpressionlevels. Complementary DNA microarray technology provides a powerful ana- lyticaltoolforhumangeneticresearch(Schenaetal.,1995;Schenaetal., 1996;DeRisietal.,1996;DeRisietal.,1997;Dugganetal.,1999).Itcom- bines robotic spotting of small amounts of individual, pure nucleic acid speciesonaglasssurface,hybridizationtothisarraywithmultiplefluores- centlylabelednucleicacids,anddetectionandquantitationoftheresulting fluor-taggedhybridswithascanningconfocalmicroscope(Fig.1.1).Aba- sicapplicationisquantitativeanalysisoffluorescencesignalsrepresenting 2 COMPUTATIONALGENOMICS clones in Photomultiplier Tube 96-well plates PMT Red channel Infected Uninfected Cells Cells Barrier filter Reverse Transcription PMT Green channel Label with Dichroic mirror Fluor Dyes Pinhole PCR Amplification Excitation Lasers DNA Purification Barrier filter Laser Laser UV-crosslink Hybridize Laser Rproinbtointigc BDleoncaktinugre pmriocbroea troray Objective Eanxaplryessission X-Y stage Poly-L-Lysine Red probe coated glass slide Green probe Array Database Microarray cDNA Probe Confocal Computer Preparation Hybridization Microscope Analysis Figure1.1. Illustrationofamicroarraysystem. the relative abundance of mRNA from distinct tissue samples. cDNA mi- croarraysarepreparedbyprintingthousandsofcDNAsinanarrayformat onglassmicroscopeslides,whichprovidegene-specifichybridizationtar- gets.DistinctmRNAsamplescanbelabeledwithdifferentfluorsandthen co-hybridized on to each arrayed gene. Ratios of gene-expression levels between the samples can be used to detect meaningfully different expres- sionlevelsbetweenthesamplesforagivengene.Givenanexperimentde- sign with multiple tissue samples, microarray data can be used to cluster genesbasedonexpressionprofiles(Eisenetal.,1998;Khan et al.,1998), tocharacterizeandclassifydiseasebasedtheexpressionlevelsofgenesets (Golub et al., 1999; Ben-Dor et al., 2000; Bittner et al., 2000; Hedenfalk et al., 2001; Khan et al., 2001), and for the many statistical methods pre- sented in this book. When using cDNA microarrays, the signal must be extracted from the background. This requires image processing to extract signalsarisingfromtaggedmRNAhybridizedtoarrayedcDNAlocations (Chen et al., 1997; Schadt et al., 2000; Kim et al., 2001), and variability analysisandmeasurementqualitycontrolassessment(Bittneretal.,2001; Newtonetal.,2001;Wangetal.,2001). This chapter discusses an image processing environment whose com- ponents have been specially designed for cDNA microarrays. It measures signalsandratiostatisticstodeterminewhetheraratioissignificantlyhigh orlowinordertoconcludewhetherthegeneisup-ordown-regulated,and providesrelatedtoolssuchasthoseforqualityassessment. MicroarrayImageAnalysisandRatioStatistics 3 2. Microarray Image Analysis A typical glass-substrate and fluorescent-based cDNA microarray detec- tionsystemisbasedonascanningconfocalmicroscope,wheretwomono- chrome images are obtained from laser excitations at two different wavelengths. Monochrome images of the fluorescent intensity for each fluorarecombinedbyplacingeachimageintheappropriatecolorchannel ofanRGBimage(Fig.1.2).Inthiscompositeimage,onecanvisualizethe differentialexpressionofgenesinthetwocelltypes:testsampletypically placed in red channel, while the reference sample in green channel. In- tenseredfluorescenceataspotindicatesahighlevelofexpressionofthat geneinthetestsamplewithlittleexpressioninthereferencesample.Con- versely,intensegreenfluorescenceatspotindicatesrelativelylowexpres- sionofthatgeneinthetestsamplecomparedtothereference.Whenboth testandreferencesamplesexpressageneatsimilarlevels,theobservedar- ray spot is yellow. We generally assume that specific DNA products from twosampleshaveanequalprobabilityofhybridizingtothespecifictarget. Thus, the fluorescent intensity measurement is a function of the amount ofspecificRNAavailablewithineachsample,providedsamplesarewell- mixed and there is sufficiently abundant cDNA deposited at each target location. Theobjectiveofthemicroarrayimageanalysisistoextractprobeinten- sities or ratios at each cDNA target location, and then cross-link printed clone information so that biologists can easily interpret the outcomes and perform further high-level analysis. The block diagram of the im- age analysis system is shown in Fig. 1.3. A microarray image is first seg- mented into individual cDNA targets, either by manual interaction or an Figure1.2. Anexampleofmicroarrayimage.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.