Editedby FrankEmmert-Streiband MatthiasDehmer StatisticalDiagnostics forCancer TitlesoftheSeries “QuantitativeandNetworkBiology” Volume1 Dehmer,M.,Emmert-Streib,F.,Graber,A.,Salvador,A.(eds.) AppliedStatisticsforNetworkBiology MethodsinSystemsBiology 2011 ISBN:978-3-527-32750-8 Volume2 Dehmer,M.,Varmuza,K.,Bonchev,D.(eds.) StatisticalModellingofMolecular DescriptorsinQSAR/QSPR 2012 ISBN:978-3-527-32434-7 RelatedTitles Zhou,X.-H.,Obuchowski,N.A.,McClish,D.K. StatisticalMethodsinDiagnosticMedicine 2011 ISBN:978-0-470-18314-4 Azuaje,F. Bioinformatics andBiomarkerDiscovery “Omic”DataAnalysisforPersonalizedMedicine 2010 ISBN:978-0-470-74460-4 Quantitative and Network Biology Series Editors M. Dehmer and F. Emmert-Streib Volume 3 Statistical Diagnostics for Cancer AnalyzingHigh-DimensionalData Editedby FrankEmmert-StreibandMatthiasDehmer TheEditors LimitofLiability/DisclaimerofWarranty:Whilethe publisherandauthorhaveusedtheirbesteffortsin preparingthisbook,theymakenorepresentationsor MatthiasDehmer warrantieswithrespecttotheaccuracyorcompletenessof thecontentsofthisbookandspecificallydisclaimany UMIT impliedwarrantiesofmerchantabilityorfitnessfora InstitutfürBioinformatikundTranslationale particularpurpose.Nowarrantycanbecreatedor Forschung extendedbysalesrepresentativesorwrittensales EduardWallnöferZentrum1 materials.TheAdviceandstrategiescontainedherein 6060Hall/Tyrol maynotbesuitableforyoursituation.Youshouldconsult Austria withaprofessionalwhereappropriate.Neitherthe publishernorauthorsshallbeliableforanylossofprofit oranyothercommercialdamages,includingbutnot FrankEmmert-Streib limitedtospecial,incidental,consequential,orother Queen'sUniversityBelfast damages. CenterforCancerResearchandCellBiology 97,LisburnRoad LibraryofCongressCardNo.:appliedfor BelfastBT97BL BritishLibraryCataloguing-in-PublicationData UnitedKingdom Acataloguerecordforthisbookisavailablefrom theBritishLibrary. BibliographicinformationpublishedbytheDeutsche Cover: Nationalbibliothek Networkdesignby TheDeutscheNationalbibliothekliststhispublicationin ShaileshTripathi theDeutscheNationalbibliografie;detailed bibliographic data are available on the Internet at http://dnb.d-nb.de. #2013Wiley-VCHVerlagGmbH&Co.KGaA,Boschstr. 12,69469Weinheim,Germany Allrightsreserved(includingthoseoftranslationinto otherlanguages).Nopartofthisbookmaybereproduced inanyform–byphotoprinting,microfilm,oranyother means–nortransmittedortranslatedintoamachine languagewithoutwrittenpermissionfromthepublishers. Registerednames,trademarks,etc.usedinthisbook, evenwhennotspecificallymarkedassuch,arenottobe consideredunprotectedbylaw. PrintISBN: 978-3-527-33262-5 ePDFISBN: 978-3-527-66544-0 ePubISBN: 978-3-527-66545-7 mobiISBN: 978-3-527-66546-4 oBookISBN: 978-3-527-66547-1 Typesetting ThomsonDigital,Noida,India PrintingandBinding MarkonoPrintMediaPteLtd, Singapore CoverDesign Grafik-DesignSchulz,Fußgönheim PrintedinSingapore Printedonacid-freepaper j V Contents Preface XIII ListofContributors XVII PartOne GeneralOverview 1 1 ControlofTypeIErrorRatesforOncologyBiomarkerDiscovery withHigh-ThroughputPlatforms 3 JeffreyMiecznikowski,DanWang,andSongLiu 1.1 BriefSummary 3 1.2 Introduction 3 1.3 High-ThroughputPlatforms 4 1.3.1 GeneExpressionArrays 5 1.3.2 RNA-Seq 5 1.3.3 DNAMethylationArrays 6 1.3.4 MassSpectrometryPlatforms 6 1.3.5 aCGHArrays 7 1.3.6 PreprocessingHTPlatforms 7 1.4 AnalysisofExperiments 8 1.4.1 LinearRegression 8 1.4.1.1 SimpleLinearRegression 9 1.4.1.2 MultipleRegression 11 1.4.2 LogisticRegression(YDiscrete) 11 1.4.2.1 MultipleLogisticRegression 13 1.4.3 SurvivalModeling 13 1.4.3.1 Kaplan–MeierAnalysis 13 1.5 MultipleTestingTypeIErrors 15 1.5.1 FWER,k-FWERMethods 17 1.5.1.1 AdjustedBonferroniMethod 17 1.5.1.2 HolmProcedure 17 1.5.1.3 GeneralizedHochbergProcedure 18 1.5.1.4 GeneralizedS9id(cid:1)akProcedure 18 1.5.1.5 minPandmaxTprocedures 19 j VI Contents 1.6 Discussion 19 1.7 Perspective 20 References 21 2 OverviewofPublicCancerDatabases,Resources, andVisualizationTools 27 FrankEmmert-Streib,RicardodeMatosSimoes,ShaileshTripathi, andMatthiasDehmer 2.1 BriefOverview 27 2.2 Introduction 27 2.3 DifferentCancerTypesareGeneticallyRelated 28 2.4 IncidenceandMortalityRatesofCancer 29 2.5 CancerandDisorderDatabases 30 2.6 VisualizationandNetwork-BasedAnalysisTools 34 2.6.1 Web-BasedSoftware 34 2.6.2 R-BasedPackages 34 2.7 Conclusions 35 2.8 Perspective 37 References 37 PartTwo BayesianMethods 41 3 DiscoveryofExpressionSignaturesinChronicMyeloidLeukemia byBayesianModelAveraging 43 KaYeeYeung 3.1 BriefIntroduction 43 3.2 ChronicMyeloidLeukemia(CML) 44 3.3 VariableSelectiononGeneExpressionData 44 3.4 BayesianModelAveraging(BMA) 46 3.4.1 TheIterativeBMAAlgorithm(iBMA) 47 3.4.2 ComputationalAssessment 48 3.5 CaseStudy:CMLProgressionData 49 3.6 ThePowerofiBMA 50 3.7 LaboratoryValidation 51 3.8 Conclusions 52 3.9 Perspective 53 3.10 PubliclyAvailableResources 54 References 54 4 BayesianRankingandSelectionMethodsinMicroarrayStudies 57 HisashiNomaandShigeyukiMatsui 4.1 BriefSummary 57 4.2 Introduction 57 4.3 HierarchicalMixtureModelingandEmpiricalBayesEstimation 59 j Contents VII 4.4 RankingandSelectionMethods 60 4.4.1 RankingBasedonEffectSizes 60 4.4.1.1 PosteriorMean(PM) 61 4.4.1.2 RankPosteriorMean(RPM) 61 4.4.1.3 Tail-AreaPosteriorProbability(TPP) 62 4.4.2 RankingBasedonSelectionAccuracyofDifferentialGenes 63 4.4.2.1 PosteriorProbabilityofDifferentiallyExpressed(PPDE) 63 4.4.2.2 EvaluatingSelectionAccuracy 64 4.5 Simulations 65 4.6 Application 67 4.7 ConcludingRemarks 71 4.8 Perspective 72 4.9 Appendix:TheEMAlgorithm 72 References 73 5 MulticlassClassificationviaBayesianVariableSelection withGeneExpressionData 75 YangAijun,SongXinyuan,andLiYunxian 5.1 BriefSummary 75 5.2 Introduction 75 5.3 MatrixVariateDistribution 77 5.4 Method 77 5.4.1 Model 77 5.4.2 PriorSpecification 79 5.4.3 Computation 80 5.4.4 Classification 82 5.5 RealDataAnalysis 83 5.5.1 LeukemiaData 83 5.5.2 LymphomaData 87 5.5.3 ComputationalTime 89 5.6 Discussion 89 5.7 Perspective 89 References 90 6 SemisupervisedMethodsforAnalyzingHigh-dimensional GenomicData 93 DevinC.Koestler 6.1 BriefSummary 93 6.2 Motivation 93 6.3 ExistingApproaches 95 6.3.1 FullyUnsupervisedProcedures 96 6.3.2 FullySupervisedProcedures 96 6.3.3 SemisupervisedProcedures 97 6.3.3.1 SemisupervisedClustering 99 6.3.3.2 SemisupervisedRPMM 100 j VIII Contents 6.3.3.3 ConsiderationsRegardingSemisupervisedProcedures 101 6.4 DataApplication:MesotheliomaCancerDataSet 102 6.4.1 Results:MesotheliomaCancerDataSet 104 6.5 Perspective 105 References 106 PartThree Network-BasedApproaches 107 7 ColorectalCancerandItsMolecularSubsystems:Construction, Interpretation,andValidation 109 VishalN.PatelandMarkR.Chance 7.1 BriefSummary 109 7.2 ColonCancer:Etiology 109 7.3 ColonCancer:Development 110 7.4 ThePathwayParadigm 111 7.5 CancerSubtypesandTherapies 112 7.6 MolecularSubsystems:Introduction 113 7.7 MolecularSubsystems:Construction 113 7.7.1 Measurements 113 7.7.2 Manifolds 114 7.8 MolecularSubsystems:Interpretation 117 7.8.1 Examples 117 7.9 MolecularSubsystems:Validation 119 7.10 WorkedExample:Label-FreeProteomics 120 7.10.1 WholeProtein-LevelSignificance 122 7.10.2 Peptide-LevelSignificance 122 7.10.3 Exon-LevelSignificance 125 7.10.4 SummarizingtheResults 126 7.11 Conclusions 127 7.12 Perspective 128 References 129 8 NetworkMedicine:DiseaseGenesinMolecularNetworks 133 SreenivasChavaliandKartiekKanduri 8.1 BriefSummary 133 8.2 Introduction 133 8.3 GeneticArchitectureofHumanDiseases 134 8.4 SystemsPropertiesofDiseaseGenes 136 8.4.1 NetworkMeasures 136 8.4.2 DiseaseandDisease-GeneNetworks 137 8.4.3 DiseaseGenesinProteinInteractionNetworks 139 8.4.4 IdentificationofDiseaseModules 143 8.5 DiseaseGenePrioritization 145 8.5.1 LinkageMethods 145 j Contents IX 8.5.2 Disease-Module-BasedMethods 146 8.5.3 Diffusion-BasedMethods 147 8.6 Conclusion 147 8.7 Perspectives 148 References 148 9 InferenceofGeneRegulatoryNetworksinBreastandOvarianCancer byIntegratingDifferentGenomicData 153 BinhuaTang,FeiGu,andVictorX.Jin 9.1 BriefSummary 153 9.2 Introduction 153 9.3 TheoryandContentsofGeneRegulatoryNetwork 154 9.3.1 BasicTheoryofGeneRegulatoryNetwork 154 9.3.2 ContentofGeneRegulatoryNetwork 155 9.3.2.1 IdentifyandInfertheStructurePropertiesandRegulatory RelationshipsofGeneNetworks 155 9.3.2.2 UnderstandtheBasicRulesofGeneExpressionandFunction 155 9.3.2.3 DiscovertheTransferRulesofGeneticInformationDuringGene Expression 155 9.3.2.4 StudyontheGeneFunctioninaSystematicFramework 156 9.4 InferenceofGeneRegulatoryNetworksinHumanCancer 156 9.4.1 TheInSilicoAnalyticalApproach 156 9.4.1.1 StudyCase1:InferenceofStaticGeneRegulatoryNetworkof Estrogen-DependentBreastCancerCellLine 158 9.4.1.2 StudyCase2:GeneRegulatoryNetworkofGenome-WideMappingof TGFb/SMAD4TargetsinOvarianCancerPatients 160 9.4.2 ABayesianInferenceApproachforGeneticRegulatoryAnalysis 164 9.4.2.1 StudyCase:ERaTranscriptionalRegulatoryDynamicsin BreastCancerCell 165 9.5 Conclusions 167 9.6 Perspective 168 References 169 10 Network-Module-BasedApproachesinCancerDataAnalysis 173 GuanmingWuandLincolnStein 10.1 BriefSummary 173 10.2 Introduction 173 10.3 NotationandTerminology 174 10.4 NetworkModulesContainingFunctionallySimilarGenesor Proteins 174 10.5 NetworkModuleSearchingMethods 175 10.5.1 GreedyNetworkModuleSearchAlgorithms 175 10.5.2 ObjectiveFunctionGuidedSearch 176 10.5.3 NetworkClusteringAlgorithms 176 10.5.4 CommunitySearchAlgorithms 177 j X Contents 10.5.5 MutualExclusivity-BasedSearchAlgorithms 178 10.5.6 WeightedGeneExpressionNetwork 178 10.6 ApplicationsofNetwork-Module-BasedApproaches inCancerStudies 179 10.6.1 NetworkModulesandCancerPrognosticSignatures 179 10.6.2 CancerDriverGeneSearchBasedonNetworkModules 179 10.6.3 UsingNetworkPatternstoIdentifyCancerMechanisms 180 10.7 TheReactomeFICytoscapePlug-in 180 10.7.1 ConstructionofaFunctionalInteractionNetwork 181 10.7.2 NetworkClusteringAlgorithm 181 10.7.3 CancerGeneIndexDataSet 181 10.7.4 AnalyzingtheTCGAOVMutationDataSet 182 10.7.4.1 LoadingtheMutationFileintoCytoscapeandConstructingaFI Subnetwork 182 10.7.4.2 NetworkClusteringandNetworkModuleFunctionalAnalysis 184 10.7.4.3 Module-BasedSurvivalAnalysis 186 10.7.4.4 CancerGeneIndexDataOverlayAnalysis 187 10.8 Conclusions 189 10.9 Perspective 189 References 191 11 DiscriminantandNetworkAnalysistoStudyOriginofCancer 193 LiChen,YeTian,GuoqiangYu,DavidJ.Miller,Ie-MingShih,andYueWang 11.1 BriefSummary 193 11.2 Introduction 193 11.3 OverviewofRelevantMachineLearningTechniques 194 11.3.1 Fisher’sDiscriminantAnalysisandANOVA 194 11.3.2 HierarchicalClustering 195 11.3.3 One-Versus-AllSupportVectorMachineandNearest-Mean Classifier 196 11.3.4 DifferentialDependencyNetwork 197 11.4 Methods 198 11.4.1 CNADataAnalysisforTestingExistenceofMonoclonality 198 11.4.1.1 Preprocessing 200 11.4.1.2 AssessingStatisticalSignificanceofMonoclonality 200 11.4.1.3 VisualizationofMonoclonality 201 11.4.2 ATwo-StageAnalyticalMethodforTestingtheOriginofCancer 201 11.4.2.1 BasicAssumptions 202 11.4.2.2 TissueHeterogeneityCorrection 203 11.4.2.3 Stage1:FeatureSelectionandClassification 203 11.4.2.4 Stage2:TranscriptionalNetworkComparison 204 11.5 ExperimentsandResults 204 11.5.1 Monoclonality 204 11.5.1.1 TestingExistenceofMonoclonality 204 11.5.1.2 TheSignificanceofMonoclonality 206
Description: