ebook img

DNA Microarray Data Analysis PDF

162 Pages·2003·12.908 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DNA Microarray Data Analysis

DNA Microarray Data Analysis TOMIPASANEN, JANNASAARELA, ILANASAARIKKO, TEEMUTOIVANEN, MARTTITOLVANEN, MAUNOVIHINENANDGARRYWONG EDITORS JARNOTUIMALA AND M. MINNALAINE CSC DNA Microarray Data Analysis DNA Microarray Data Analysis Editors JarnoTuimala M.MinnaLaine CSC,theFinnishITcenterforScience CSC – Scientific Computing Ltd. is a non-profit organization for high- performancecomputingandnetworkinginFinland. CSCisownedbythe MinistryofEducation. CSCrunsanationallarge-scalefacilityforcompu- tationalscienceandengineeringandsupportstheuniversityandresearch community. CSCisalsoresponsiblefortheoperationsoftheFinnishUni- versityandResearchNetwork(Funet). All rights reserved. The PDF version of this book or parts of it can be usedinFinnishuniversitiesascoursematerial,providedthatthiscopyright noticeisincluded. However,thispublicationmaynotbesoldorincluded aspartofotherpublicationswithoutpermissionofthepublisher. (cid:1)c Theauthorsand CSC–ScientificComputingLtd. 2003 ISBN952-9821-89-1 http://www.csc.fi/oppaat/siru/ Printedat PicasetOy Helsinki2003 DNAmicroarraydataanalysis 5 Preface ThisisthefirsteditionoftheDNAmicroarraydataanalysisguidebook. Although inventedinthemid-90s,DNAmicroarraysarestillnoveltiesasbiomedicalresearch tools. DNA microarraysgeneratelargeamountsofnumericaldata, whichshould beanalyzedeffectively. In this book, we hope to offer a broad view of basic theory and techniques behindtheDNAmicroarraydataanalysis. Ouraimwasnottobecomprehensive, butrathertocoverthebasics, whichareunlikelytochangemuchoveryears. We hope that especially researchers starting their data analysis can benefit from the book. The text emphasizes gene expression analysis. Topics, such as genotyping, arediscussedshortly.Thisbookdoesnotcoverthewet-labpractises,suchassam- plepreparationorhybridization. Rather,westartwhenthemicroarrayshavebeen scanned,andtheresultingimagesanalyzed. Inotherwords,wetakethefileswith signalintensities,whichusuallygeneratequestionssuchas: “Howisthedatanor- malized?” or“HowdoIidentifythegeneswhichareupregulated?”. Weprovide somesimplesolutionstothesespecificquestionsandmanyothers. Each chapter has a section on suggested reading, which introduces some of therelevantliterature. Severalchaptersalsoincludedataanalysisexamplesusing GeneSpringsoftware. This edition of the book was written by M. Minna Laine (chapters 4, 8 and 14),TomiPasanen(chapter11),JannaSaarela(chapters2and3), IlanaSaarikko (chapter8),TeemuToivanen(chapter14),MarttiTolvanen(chapter12),JarnoTu- imala(chapters4,6,7,8,9,10,13and15),MaunoVihinen(chapters10,11and 12),andGarryWong(chapters1and5). Juha Haataja and Leena Jukka are warmly acknowledged for their support duringtheproductionofthisbook. Weareveryinterestedinreceivingfeedbackaboutthispublication.Especially, ifyoufeelthatsomeessentialtechniquehasbeenmissed,letusknow.Pleasesend yourcommentstothee-mailaddressJarno.Tuimala@csc.fi. Espoo,19thMay2003 Theauthors 6 DNAmicroarraydataanalysis List of Contributors M.MinnaLaine MarttiTolvanen CSC,theFinnishITcenterforScience InstituteofMedicalTechnology Tekniikantie15aD Lenkkeilijänkatu8 02101Espoo 33520Tampere Finland Finland TomiPasanen JarnoTuimala InstituteofMedicalTechnology CSC,theFinnishITcenterforScience Lenkkeilijänkatu8 Tekniikantie15aD 33520Tampere 02101Espoo Finland Finland JannaSaarela MaunoVihinen BiomedicumBiochipCenter InstituteofMedicalTechnology Haartmaninkatu8 Lenkkeilijänkatu8 00290Helsinki 33520Tampere Finland Finland IlanaSaarikko GarryWong CentreforBiotechnology A.I.Virtanen-institute Tykistökatu6 UniversityofKuopio 20521Turku 70211Kuopio Finland Finland TeemuToivanen CentreforBiotechnology Tykistökatu6 20521Turku Finland Contents 7 Contents Preface 5 ListofContributors 6 I Introduction 14 1 Introduction 15 1.1 Whyperformmicroarrayexperiments? . . . . . . . . . . . . . 15 1.2 Whatisamicroarray? . . . . . . . . . . . . . . . . . . . . . . 15 1.3 Microarrayproduction . . . . . . . . . . . . . . . . . . . . . 16 1.4 WherecanIobtainmicroarrays? . . . . . . . . . . . . . . . . 17 1.5 ExtractingandlabelingtheRNAsample . . . . . . . . . . . . 19 1.6 RNAextractionfromscarsetissuesamples . . . . . . . . . . . 19 1.7 Hybridization . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.8 Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.9 Typicalresearchapplicationsofmicroarrays . . . . . . . . . . 21 1.10 Experimentaldesignandcontrols . . . . . . . . . . . . . . . . 22 1.11 Suggestedreading . . . . . . . . . . . . . . . . . . . . . . . . 23 2 AffymetrixGenechipsystem 25 2.1 Affymetrixtechnology . . . . . . . . . . . . . . . . . . . . . 25 2.2 SingleArrayanalysis . . . . . . . . . . . . . . . . . . . . . . 25 2.3 Detection p-value . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 Detectioncall . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Signalalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.6 Analysistips . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.7 Comparisonanalysis . . . . . . . . . . . . . . . . . . . . . . 27 2.8 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.9 Change p-value . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.10 Changecall . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.11 SignalLogRatioAlgorithm . . . . . . . . . . . . . . . . . . . 29 3 Genotypingsystems 31 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 8 DNAmicroarraydataanalysis 3.2 Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Genotypecalls . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4 Suggestedreading . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Overviewofdataanalysis 34 4.1 cDNAmicroarraydataanalysis . . . . . . . . . . . . . . . . . 34 4.2 Affymetrixdataanalysis . . . . . . . . . . . . . . . . . . . . 35 4.3 Dataanalysispipeline . . . . . . . . . . . . . . . . . . . . . . 35 5 Experimentaldesign 38 5.1 Whydoweneedtoconsiderexperimentaldesign? . . . . . . . 38 5.2 Choosingandusingcontrols . . . . . . . . . . . . . . . . . . 38 5.3 Choosingandusingreplicates . . . . . . . . . . . . . . . . . . 39 5.4 Choosingatechnologyplatform . . . . . . . . . . . . . . . . 39 5.5 Geneclusteringv.geneclassification . . . . . . . . . . . . . . 40 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.7 Suggestedreading . . . . . . . . . . . . . . . . . . . . . . . . 41 6 Basicstatistics 42 6.1 Whystatisticsareneeded . . . . . . . . . . . . . . . . . . . . 42 6.2 Basicconcepts . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.2.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.2.2 Constants . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.2.3 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.2.4 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3 Simplestatistics . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3.1 Numberofsubjects . . . . . . . . . . . . . . . . . . . . . 43 6.3.2 Mean(m) . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3.3 Trimmedmean . . . . . . . . . . . . . . . . . . . . . . . 43 6.3.4 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.3.5 Percentile . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.3.6 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.3.7 Varianceandthestandarddeviation . . . . . . . . . . . . 44 6.3.8 Coefficientofvariation . . . . . . . . . . . . . . . . . . . 44 6.4 Effectstatistics . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.4.1 Scatterplot . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.4.2 Correlation(r) . . . . . . . . . . . . . . . . . . . . . . . 45 6.4.3 Linearregression . . . . . . . . . . . . . . . . . . . . . . 46 6.5 Frequencydistributions . . . . . . . . . . . . . . . . . . . . . 47 6.5.1 Normaldistribution . . . . . . . . . . . . . . . . . . . . . 47 6.5.2 t-distribution . . . . . . . . . . . . . . . . . . . . . . . . 49 6.5.3 Skeweddistribution . . . . . . . . . . . . . . . . . . . . . 49 6.5.4 Checkingthedistributionofthedata . . . . . . . . . . . . 50 Contents 9 6.6 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 51 6.6.1 Log -transformation . . . . . . . . . . . . . . . . . . . . 52 2 6.7 Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 6.8 Missingvaluesandimputation . . . . . . . . . . . . . . . . . 53 6.9 Statisticaltesting . . . . . . . . . . . . . . . . . . . . . . . . 54 6.9.1 Basicsofstatisticaltesting . . . . . . . . . . . . . . . . . 54 6.9.2 Choosingatest . . . . . . . . . . . . . . . . . . . . . . . 55 6.9.3 Thresholdfor p-value . . . . . . . . . . . . . . . . . . . 55 6.9.4 Hypothesispair . . . . . . . . . . . . . . . . . . . . . . . 55 6.9.5 Calculationofteststatisticanddegreesoffreedom . . . . 56 6.9.6 Criticalvaluestable . . . . . . . . . . . . . . . . . . . . . 57 6.9.7 Drawingconclusions . . . . . . . . . . . . . . . . . . . . 57 6.9.8 Multipletesting . . . . . . . . . . . . . . . . . . . . . . . 57 6.10 Analysisofvariance . . . . . . . . . . . . . . . . . . . . . . . 58 6.10.1 BasicsofANOVA . . . . . . . . . . . . . . . . . . . . . 58 6.10.2 Completelyrandomizedexperiment . . . . . . . . . . . . 58 6.11 StatisticsusingGeneSpring . . . . . . . . . . . . . . . . . . . 60 6.11.1 Simplestatistics . . . . . . . . . . . . . . . . . . . . . . 60 6.11.2 Tranformations . . . . . . . . . . . . . . . . . . . . . . . 60 6.11.3 Scatterplotandhistogram . . . . . . . . . . . . . . . . . 60 6.11.4 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.11.5 Linearregression . . . . . . . . . . . . . . . . . . . . . . 61 6.11.6 One-samplet-test . . . . . . . . . . . . . . . . . . . . . . 62 6.11.7 Independentsamplest-testandANOVA . . . . . . . . . . 62 6.12 Suggestedreading . . . . . . . . . . . . . . . . . . . . . . . . 64 II Analysis 65 7 Preprocessingofdata 66 7.1 Rationaleforpreprocessing . . . . . . . . . . . . . . . . . . . 66 7.2 Missingvalues . . . . . . . . . . . . . . . . . . . . . . . . . . 66 7.3 Checkingthebackgroundreading . . . . . . . . . . . . . . . . 68 7.4 Calculationofexpressionchange . . . . . . . . . . . . . . . . 69 7.4.1 Intensityratio . . . . . . . . . . . . . . . . . . . . . . . . 69 7.4.2 Logratio . . . . . . . . . . . . . . . . . . . . . . . . . . 70 7.4.3 Foldchange . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.5 Handlingofreplicates . . . . . . . . . . . . . . . . . . . . . . 71 7.5.1 Typesofreplicates . . . . . . . . . . . . . . . . . . . . . 71 7.5.2 Timeseries . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.5.3 Case-controlstudies . . . . . . . . . . . . . . . . . . . . 72 7.5.4 Poweranalysis . . . . . . . . . . . . . . . . . . . . . . . 72 7.5.5 Averagingreplicates . . . . . . . . . . . . . . . . . . . . 72 7.6 Checkingthequalityofreplicates . . . . . . . . . . . . . . . . 72

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.