ebook img

Statistical bioinformatics with R PDF

337 Pages·2010·3.52 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical bioinformatics with R

Statistical Bioinformatics with R Thispageintentionallyleftblank Statistical Bioinformatics with R Sunil K. Mathur University of Mississippi AMSTERDAM•BOSTON•HEIDELBERG•LONDON NEWYORK•OXFORD•PARIS•SANDIEGO SANFRANCISCO•SINGAPORE•SYDNEY•TOKYO AcademicPressisanimprintofElsevier AcademicPressisanimprintofElsevier 30CorporateDrive,Suite400,Burlington,MA01803,USA 525BStreet,Suite1900,SanDiego,California92101-4495,USA 84Theobald’sRoad,LondonWC1X8RR,UK Copyright©2010Elsevier,Inc.Allrightsreserved. Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicormechanical, includingphotocopying,recording,oranyinformationstorageandretrievalsystem,withoutpermissioninwritingfromthe publisher.Detailsonhowtoseekpermission,furtherinformationaboutthePublisher’spermissionspoliciesandour arrangementswithorganizationssuchastheCopyrightClearanceCenterandtheCopyrightLicensingAgency,canbefound atourwebsite:www.elsevier.com/permissions. ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(otherthanasmay benotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroadenourunderstanding, changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecomenecessary. Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusingany information,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethodstheyshouldbe mindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeanyliabilityforany injuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceorotherwise,orfromanyuseor operationofanymethods,products,instructions,orideascontainedinthematerialherein. LibraryofCongressCataloging-in-PublicationData Mathur,SunilK. StatisticalbioinformaticswithR/SunilK.Mathur. p.cm. Includesbibliographicalreferencesandindex. ISBN978-0-12-375104-1(alk.paper) 1.Bioinformatics–Statisticalmethods.2.R(Computerprogramlanguage)I.Title. QH324.2.M3782010 570.285’5133–dc22 2009050006 BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary. ISBN:978-0-12-375104-1 ForinformationonallAcademicPresspublications visitourWebsiteatwww.elsevierdirect.com PrintedintheUnitedStatesofAmerica 09 10 9 8 7 6 5 4 3 2 1 Contents PREFACE...................................................................................... ix ACKNOWLEDGMENTS................................................................. xv CHAPTER1 Introduction........................................................... 1 1.1 StatisticalBioinformatics....................................... 1 1.2 Genetics ............................................................. 3 1.3 Chi-SquareTest................................................... 6 1.4 TheCellandItsFunction...................................... 9 1.5 DNA................................................................... 12 1.6 DNAReplicationandRearrangements.................... 14 1.7 TranscriptionandTranslation................................ 15 1.8 GeneticCode....................................................... 16 1.9 ProteinSynthesis................................................. 19 Exercise1........................................................... 20 AnswerChoicesforQuestions1through15............ 21 CHAPTER2 Microarrays.............................................................. 23 2.1 MicroarrayTechnology......................................... 23 2.2 IssuesinMicroarray............................................. 25 2.3 MicroarrayandGeneExpressionandItsUses......... 29 2.4 Proteomics.......................................................... 30 Exercise2........................................................... 31 CHAPTER3 ProbabilityandStatisticalTheory............................ 33 3.1 TheoryofProbability............................................ 34 3.2 MathematicalorClassicalProbability..................... 36 3.3 Sets.................................................................... 38 3.3.1 OperationsonSets..................................... 39 3.3.2 PropertiesofSets....................................... 40 3.4 Combinatorics..................................................... 41 3.5 LawsofProbability............................................... 44 v vi Contents 3.6 RandomVariables................................................ 53 3.6.1 DiscreteRandomVariable........................... 55 3.6.2 ContinuousRandomVariable ...................... 56 3.7 MeasuresofCharacteristicsofaContinuous ProbabilityDistribution......................................... 57 3.8 MathematicalExpectation..................................... 57 3.8.1 PropertiesofMathematicalExpectation........ 60 3.9 BivariateRandomVariable.................................... 62 3.9.1 JointDistribution....................................... 62 3.10 Regression.......................................................... 71 3.10.1 LinearRegression ...................................... 72 3.10.2 TheMethodofLeastSquares ...................... 73 3.11 Correlation.......................................................... 78 3.12 LawofLargeNumbersandCentral LimitTheorem..................................................... 80 CHAPTER4 SpecialDistributions,Properties, andApplications...................................................... 83 4.1 Introduction ........................................................ 83 4.2 DiscreteProbabilityDistributions........................... 84 4.3 BernoulliDistribution ........................................... 84 4.4 BinomialDistribution............................................ 84 4.5 PoissonDistribution............................................. 87 4.5.1 PropertiesofPoissonDistribution................. 88 4.6 NegativeBinomialDistribution.............................. 89 4.7 GeometricDistribution.......................................... 92 4.7.1 LackofMemory......................................... 93 4.8 HypergeometricDistribution................................. 94 4.9 MultinomialDistribution....................................... 95 4.10 Rectangular(orUniform)Distribution..................... 99 4.11 NormalDistribution.............................................. 100 4.11.1 SomeImportantPropertiesofNormal DistributionandNormalProbabilityCurve.... 101 4.11.2 NormalApproximationtotheBinomial......... 106 4.12 GammaDistribution............................................. 107 4.12.1 AdditivePropertyofGammaDistribution...... 108 4.12.2 LimitingDistributionofGamma Distribution............................................... 108 4.12.3 WaitingTimeModel................................... 108 Contents vii 4.13 TheExponentialDistribution................................. 109 4.13.1 WaitingTimeModel................................... 110 4.14 BetaDistribution.................................................. 110 4.14.1 SomeResults............................................. 111 4.15 Chi-SquareDistribution......................................... 111 4.15.1 AdditivePropertyofChi-Square Distribution............................................... 112 4.15.2 LimitingDistributionofChi-Square Distribution............................................... 112 CHAPTER5 StatisticalInferenceandApplications...................... 113 5.1 Introduction ........................................................ 113 5.2 Estimation........................................................... 115 5.2.1 Consistency............................................... 115 5.2.2 Unbiasedness............................................ 116 5.2.3 Efficiency.................................................. 118 5.2.4 Sufficiency................................................. 120 5.3 MethodsofEstimation.......................................... 121 5.4 ConfidenceIntervals............................................. 122 5.5 SampleSize......................................................... 132 5.6 TestingofHypotheses.......................................... 133 5.6.1 TestsaboutaPopulationMean.................... 138 5.7 OptimalTestofHypotheses.................................. 150 5.8 LikelihoodRatioTest............................................ 156 CHAPTER6 NonparametricStatistics.......................................... 159 6.1 Chi-SquareGoodness-of-FitTest............................ 160 6.2 Kolmogorov-SmirnovOne-SampleStatistic.............. 163 6.3 SignTest............................................................. 164 6.4 WilcoxonSigned-RankTest................................... 166 6.5 Two-SampleTest................................................. 169 6.5.1 WilcoxonRankSumTest............................. 169 6.5.2 Mann-WhitneyTest.................................... 171 6.6 TheScaleProblem................................................ 174 6.6.1 Ansari-BardleyTest.................................... 175 6.6.2 LepageTest.............................................. 178 6.6.3 Kolmogorov-SmirnovTest............................ 180 viii Contents 6.7 GeneSelectionandClusteringofTime-Courseor Dose-ResponseGeneExpressionProfiles................ 182 6.7.1 SingleFractalAnalysis................................ 184 6.7.2 Order-RestrictedInference.......................... 186 CHAPTER7 BayesianStatistics................................................... 189 7.1 BayesianProcedures............................................ 189 7.2 EmpiricalBayesMethods...................................... 192 7.3 GibbsSampler..................................................... 193 CHAPTER8 MarkovChainMonteCarlo...................................... 203 8.1 TheMarkovChain................................................ 204 8.2 AperiodicityandIrreducibility............................... 213 8.3 ReversibleMarkovChains..................................... 218 8.4 MCMCMethodsinBioinformatics.......................... 220 CHAPTER9 AnalysisofVariance................................................ 227 9.1 One-WayANOVA ................................................ 228 9.2 Two-WayClassificationofANOVA......................... 241 CHAPTER10 TheDesignofExperiments...................................... 253 10.1 Introduction ........................................................ 253 10.2 PrinciplesoftheDesignofExperiments.................. 255 10.3 CompletelyRandomizedDesign............................. 256 10.4 RandomizedBlockDesign..................................... 262 10.5 LatinSquareDesign............................................. 270 10.6 FactorialExperiments........................................... 278 10.6.1 2n-FactorialExperiment.............................. 279 10.7 ReferenceDesignsandLoopDesigns..................... 286 CHAPTER11 MultipleTestingofHypotheses............................... 293 11.1 Introduction ........................................................ 293 11.2 TypeIErrorandFDR............................................ 294 11.3 MultipleTestingProcedures.................................. 297 REFERENCES............................................................................... 305 INDEX .......................................................................................... 315 Preface Bioinformaticsisanemergingfieldinwhichstatisticalandcomputationaltech- niques are used extensively to analyze and interpret biological data obtained fromhigh-throughputgenomictechnologies.Genomictechnologiesallowus to monitor thousands of biological processes going on inside living organ- isms in one snapshot, and are rapidly growing as driving forces of research, particularly in the genetics, biomedical, biotechnology, and pharmaceutical industries. Thesuccessofgenometechnologiesandrelatedtechniques,however,heavily dependsoncorrectstatisticalanalysesofgenomicdata.Throughstatisticalanal- yses and the graphical displays of genomic data, genomic experiments allow biologiststoassimilateandexplorethedatainanaturalandintuitivemanner. Thestorage,retrieval,interpretation,andintegrationoflargevolumesofdata generatedbygenomictechnologiesdemandincreasingdependenceonsophis- ticatedcomputerandstatisticalinferencetechniques.Newstatisticaltoolshave beendevelopedtomakeinferencesfromthegenomicdataobtainedthrough genomicstudiesinamoremeaningfulway. Thistextbookisofaninterdisciplinarynature,andmaterialpresentedherecan becoveredinaone-ortwo-semestercourse.Itiswrittentogiveasolidbasein statisticswhileemphasizingapplicationsingenomics.Itismysincereattempt tointegratedifferentfieldstounderstandthehigh-throughputbiologicaldata anddescribevariousstatisticaltechniquestoanalyzedata.Inthistextbook,new methodsbasedonBayesiantechniques,MCMCmethods,likelihoodfunctions, design of experiments, and nonparametric methods, along with traditional methods,arediscussed.InsightsintosomeusefulsoftwaresuchasBAMarray, ORIGEN,andSAMareprovided. Chapter 1 provides some basic knowledge in biology. Microarrays are a very usefulandpowerfultechniqueavailablenow.Chapter2providessomeknow- ledge of microarray technology and a description of current problems in using this technology. Foundations of probability and basic statistics, assum- ing that the reader is not familiar with statistical and probabilistic concepts, ix

Description:
Designed for a one or two semester senior undergraduate or graduate bioinformatics course, Statistical Bioinformatics takes a broad view of the subject - not just gene expression and sequence analysis, but a careful balance of statistical theory in the context of bioinformatics applications. The inc
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.