Table Of ContentANALYZING THE LARGE
NUMBER OF VARIABLES
IN BIOMEDICAL AND
SATELLITE IMAGERY
ANALYZING THE LARGE
NUMBER OF VARIABLES
IN BIOMEDICAL AND
SATELLITE IMAGERY
PHILLIP I. GOOD
AJOHNWILEY&SONS,INC.,PUBLICATION
Copyright©2011byJohnWiley&Sons,Inc.Allrightsreserved
PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey
PublishedsimultaneouslyinCanada
Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,or
transmittedinanyformorbyanymeans,electronic,mechanical,photocopying,
recording,scanning,orotherwise,exceptaspermittedunderSection107or108ofthe
1976UnitedStatesCopyrightAct,withouteitherthepriorwrittenpermissionofthe
Publisher,orauthorizationthroughpaymentoftheappropriateper-copyfeetothe
CopyrightClearanceCenter,Inc.,222RosewoodDrive,Danvers,MA01923,(978)
750-8400,fax(978)750-4470,oronthewebatwww.copyright.com.Requeststothe
PublisherforpermissionshouldbeaddressedtothePermissionsDepartment,John
Wiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,(201)748-6011,fax(201)
748-6008,oronlineathttp://www.wiley.com/go/permission.
LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveused
theirbesteffortsinpreparingthisbook,theymakenorepresentationsorwarranties
withrespecttotheaccuracyorcompletenessofthecontentsofthisbookandspecifically
disclaimanyimpliedwarrantiesofmerchantabilityorfitnessforaparticularpurpose.
Nowarrantymaybecreatedorextendedbysalesrepresentativesorwrittensales
materials.Theadviceandstrategiescontainedhereinmaynotbesuitableforyour
situation.Youshouldconsultwithaprofessionalwhereappropriate.Neitherthe
publishernorauthorshallbeliableforanylossofprofitoranyothercommercial
damages,includingbutnotlimitedtospecial,incidental,consequential,orother
damages.
Forgeneralinformationonourotherproductsandservicesorfortechnicalsupport,
pleasecontactourCustomerCareDepartmentwithintheUnitedStatesat(800)
762-2974,outsidetheUnitedStatesat(317)572-3993orfax(317)572-4002.
Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthat
appearsinprintmaynotbeavailableinelectronicformats.Formoreinformationabout
Wileyproducts,visitourwebsiteatwww.wiley.com.
LibraryofCongressCataloging-in-PublicationData:
Good,PhillipI.
Analyzingthelargenumberofvariablesinbiomedicalandsatelliteimagery/PhillipI.
Good.
p.cm.
Includesbibliographicalreferencesandindex.
ISBN978-0-470-92714-4(pbk.)
1.Datamining.2.Mathematicalstatistics.3.Biomedicalengineering–Dataprocessing.
4.Remotesensing–Dataprocessing.I.Title.
QA76.9.D343G7532011
066.3(cid:2)12–dc22
2010030988
PrintedinSingapore
oBookISBN:978-0-470-93727-3
ePDFISBN:978-0-470-93725-9
ePubISBN:978-1-118-00214-8
10987654321
CONTENTS
Preface xi
1 VERY LARGE ARRAYS 1
1.1 Applications 1
1.2 Problems 2
1.3 Solutions 2
2 PERMUTATIONTESTS 5
2.1 Two-SampleComparison 5
2.1.1 Blocks 7
2.2 k-SampleComparison 8
2.3 ComputingThep-Value 9
2.3.1 MonteCarloMethod 10
2.3.2 AnRProgram 11
2.4 Multiple-VariableComparisons 11
2.4.1 EuclideanDistanceMatrixAnalysis 12
2.4.2 Hotelling’sT2 13
2.4.3 Mantel’sU 14
2.4.4 CombiningUnivariateTests 15
2.4.5 GeneSetEnrichmentAnalysis 16
v
vi CONTENTS
2.5 CategoricalData 17
2.6 Software 19
2.7 Summary 20
3 APPLYING THE PERMUTATIONTEST 23
3.1 WhichVariablesShouldBeIncluded? 24
3.2 Single-ValueTestStatistics 26
3.2.1 CategoricalData 26
3.2.2 AMultivariateComparisonBasedona
SummaryStatistic 26
3.2.3 AMultivariateComparisonBasedon
VariantsofHotelling’sT2 28
3.2.4 AdjustingforCovariates 29
3.2.5 Pre–PostComparisons 31
3.2.6 ChoosingaStatistic:Time-Course
Microarrays 32
3.3 RecommendedApproaches 35
3.4 ToLearnMore 35
4 BIOLOGICAL BACKGROUND 37
4.1 MedicalImaging 37
4.1.1 Ultrasound 38
4.1.2 EEG/MEG 39
4.1.3 MagneticResonanceImaging 41
4.1.3.1 MRI 41
4.1.3.2 fMRI 42
4.1.4 PositronEmissionTomography 44
4.2 Microarrays 44
4.3 ToLearnMore 47
5 MULTIPLE TESTS 49
5.1 ReducingtheNumberofHypothesestoBe
Tested 50
CONTENTS vii
5.1.1 Normalization 50
5.1.2 SelectionMethods 52
5.1.2.1 UnivariateStatistics 52
5.1.2.2 WhichStatistic? 54
5.1.2.3 HeuristicMethods 55
5.1.2.4 WhichMethod? 59
5.2 ControllingtheOverAllErrorRate 59
5.2.1 AnExample:AnalyzingDatafrom
Microarrays 60
5.3 ControllingtheFalseDiscoveryRate 61
5.3.1 AnExample:AnalyzingTime-Course
DatafromMicroarrays 62
5.4 GeneSetEnrichmentAnalysis 63
5.5 SoftwareforPerformingMultiple
SimultaneousTests 67
5.5.1 AFNI 67
5.5.2 Cyber-T 68
5.5.3 dChip 68
5.5.4 ExactFDR 69
5.5.5 GESS 69
5.5.6 HaploView 69
5.5.7 MatLab 69
5.5.8 R 70
5.5.9 SAM 70
5.5.10 ParaSam 71
5.6 Summary 72
5.7 ToLearnMore 72
6 THE BOOTSTRAP 73
6.1 SamplesandPopulations 73
6.2 PrecisionofanEstimate 74
6.2.1 RCode 77
6.2.2 ApplyingtheBootstrap 78
viii CONTENTS
6.2.3 BootstrapReproducibilityIndex 79
6.2.4 EstimationinRegressionModels 80
6.3 ConfidenceIntervals 82
6.3.1 TestingforEquivalence 83
6.3.2 ParametricBootstrap 84
6.3.3 BlockedBootstrap 85
6.3.4 BalancedBootstrap 85
6.3.5 AdjustedBootstrap 86
6.3.6 WhichTest? 87
6.4 DeterminingSampleSize 88
6.4.1 EstablishaThreshold 89
6.5 Validation 90
6.5.1 ClusterAnalysis 92
6.5.2 CorrespondenceAnalysis 94
6.6 BuildingaModel 96
6.7 HowLargeShouldTheSamplesBe? 98
6.8 Summary 99
6.9 ToLearnMore 99
7 CLASSIFICATIONMETHODS 101
7.1 NearestNeighborMethods 101
7.2 DiscriminantAnalysis 102
7.3 LogisticRegression 103
7.4 PrincipalComponents 103
7.5 NaiveBayesClassifier 104
7.6 HeuristicMethods 104
7.7 DecisionTrees 105
7.7.1 AWorked-ThroughExample 106
7.8 WhichAlgorithmIsBestforYourApplication? 108
7.8.1 SomeFurtherComparisons 111
7.8.2 ValidationVersusCross-validation 112
7.9 ImprovingDiagnosticEffectiveness 113
7.9.1 Boosting 113
CONTENTS ix
7.9.2 EnsembleMethods 113
7.9.3 RandomForests 114
7.10 SoftwareforDecisionTrees 116
7.11 Summary 117
8 APPLYINGDECISIONTREES 119
8.1 Photographs 119
8.2 Ultrasound 121
8.3 MRIImages 122
8.4 EEGsandEMGs 124
8.5 MisclassificationCosts 125
8.6 ReceiverOperatingCharacteristic 126
8.7 WhentheCategoriesAreAsYetUndefined 127
8.7.1 UnsupervisedPrincipalComponents
AppliedtofMRI 127
8.7.2 SupervisedPrincipalComponents
AppliedtoMicroarrays 129
8.8 EnsembleMethods 131
8.9 MaximallyDiversifiedMultipleTrees 131
8.10 PuttingItAllTogether 133
8.11 Summary 135
8.12 ToLearnMore 135
Glossaryof BiomedicalTerminology 137
Glossaryof StatisticalTerminology 141
Appendix:An R Primer 153
R1 GettingStarted 153
R1.1 RFunctions 155
R1.2 VectorArithmetic 156
R2 StoreandRetrieveData 156
x CONTENTS
R2.1 StoringandRetrievingFilesfrom
WithinR 156
R2.2 TheTabularFormat 157
R2.3 CommaSeparatedFormat 158
R3 Resampling 159
R3.1 TheWhileCommand 159
R4 ExpandingR’sCapabilities 161
R4.1 DownloadingLibrariesofRFunctions 161
R4.2 ProgrammingYourOwnFunctions 161
Bibliography 165
AuthorIndex 175
SubjectIndex 181
Description:This book grew out of an online interactive offered through statcourse.com, and it soon became apparent to the author that the course was too limited in terms of time and length in light of the broad backgrounds of the enrolled students. The statisticians who took the course needed to be brought up