STATISTICAL METHODS TO ANALYZE MASSIVE HIGH-DIMENSIONAL NEUROIMAGING DATA by Shaojie Chen A dissertationsubmittedtoTheJohnsHopkinsUniversityinconformitywiththe requirementsforthedegreeofDoctorofPhilosophy. Baltimore,Maryland August,2015 ©ShaojieChen 2015 Allrightsreserved Abstract Thestatisticalanalysisofneuroimagingdataposesseveralchallengestoday,partlydue to their size, high dimensionalityand noise. In thiswork, we address three different meth- ods foranalyzing massive,high-dimensionaland noisyfunctionalmagneticresonance im- ages(fMRI) data. In thefrstmethod,parallelcomputingtechniquesarecombinedwithan independent component analysis (ICA) algorithm to decompose resting state fMRI data. The algorithm’s performance is greatly improved compared to existing methods. In the second method, a graphical model, referred to as state space model (SSM) is extended by enforcingL-1andL-2penaltiesonparameters. Themodelscaleswelltoveryhighdimen- sionsand can beapplied to a vastclass ofdifferent neuroimaginganalysis applications. In thethirdmethod,atwo-stagemethodisdevelopedtoextractinformationfromnoisyfMRI data. WefrstusefunctionalregressiontoextractfeaturesfromfMRIdataandthenusethe features to predicts physical pains that human feels. A support vector machine (SVM) is trained forpredictionandit achieveshighpredictionaccuracy. Advisor: ii ABSTRACT Brian Caffo, PhD Committee: James Pekar, PhD (chair, SOMradiology& radiologicalscience) Brian Caffo, PhD (advisor,SPH biostatistics) MichelleCarlson, PhD (SPH mentalhealth& epidemiology) MartinLindquist,PhD (SPH biostatistics) Alternates: Gregory Pontone,MD(SOM psychiatry) Ani Eloyan,PhD (SPH biostatistics) JonathanLinks,PhD (SPH environmentalhealth) iii Acknowledgments FirstIwouldliketothankmyadvisor,BrianCaffo,whohasbeenagreatfriend,agreat advisorandgreatmentor. Hehasbeenaconstantsourceofadviceandencouragement. His nice personality and endless fow of ideas have made it a great pleasure to work with him. I have learned from him how to be a good biostatistical researcher, and more importantly, howtobeagreat person. Thanks to my thesis committee for their advice over the years: James Pekar, Michelle Carlson, MartinLindquist,GregoryPontone,Ani Eloyanand JonathanLinks. Thanks to Joshua Vogelstein and Ciprian Crainiceanu for yourgreat advice on my dif- ferent projects. Thanks to all the SMARTies. It is so great to be part of the group. The group is like a bigfamilyand isalways fullofgreat ideas and projects. Warmestthankstothestudentsofthebiostatisticsdepartmentwhooverlappedwithmy time here. Special thanks to Chen Yue, Lei Huang, Huitong Qiu, Detian Deng and Yuting Xu: you guys are so amazing that I could not imagine what my PhD life will be without you. Iam gratefultothestudentsthat begantheprogrambeforemefortheirhelpon every iv ACKNOWLEDGMENTS aspect ofmyresearch and career (especiallyJueminYang andShanshan Li). Finally,Iwouldsaythankstomyfamily. Thankstomyparentswhohavebeensupport- ing me all the time on every decision I made. They do not speak English and might never know what I am writing here, but I am sure they can feel it on the other side of the planet. Thanks to my brother, Shujie Chen, who has always been a last resort for inspiration and strength when I got confused and don’t know what to do with my life. I feel so lucky to have been born in such a great family. Thanks to Yao, my fancee, for all the care, support and everythingwehaveexperienced together. v Contents Abstract ii Acknowledgments iv ListofTables vii ListofFigures viii 1 Introduction 1 1.1 Statisticalchallengesinneuroimagingdataanalysis . . . . . . . . . . . . . 2 1.2 Organizationaloverview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 A ParallelGroup Independent Component AnalysisAlgorithm forMassivefMRIData Analysis 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Materialsand Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 TheICA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 vi CONTENTS 2.2.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 1000 FunctionalConnectomesProject Data . . . . . . . . . . . . . 14 2.2.4 AutismBrain ImagingDataExchange . . . . . . . . . . . . . . . . 15 2.2.5 SimulationStudies . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.1 1,000Functional ConnectomesProject Data . . . . . . . . . . . . . 21 2.3.2 AutismBrain ImagingDataExchange . . . . . . . . . . . . . . . . 24 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 A Sparse HighDimensional State-Space Model withanApplicationto Neuroimaging Data 31 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 TheModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3 ParameterEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.3.1 E Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.2 M Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.3 TheCompleteEM . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.1 Parameter Estimations . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.2 MakingPredictions . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 vii CONTENTS 4 fMRI Based Biomarker forPhysicalPain 61 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.2 FeatureExtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.3 SupportVectorRegression . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3.1 LinearRegression . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.3.2 Non-linearRegressionand theKernel Trick . . . . . . . . . . . . . 72 4.4 ApplicationtoPain Prediction . . . . . . . . . . . . . . . . . . . . . . . . 74 4.4.1 Prediction Acurracy . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.4.2 Further ImprovingPredictionAccuracy . . . . . . . . . . . . . . . 77 4.4.3 ClusteringVoxels . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.5 Discussionsand FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . 80 5 DiscussionandFuture Work 81 Appendices 84 A1 Appendixto Chapter3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Bibliography 87 Curriculum Vitae 97 viii List of Tables 2.1 Summarymeasures ofthecorrelationsin thetwo simulationexamples. . . . 20 2.2 Speed increaseofPGICA . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 PLDSRunningTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.2 SimilaritiesAmongEstimatedAMatrices . . . . . . . . . . . . . . . . . . 58 ix List of Figures 2.1 Histogramsofage(left), IQ (middle),and SRS (right)forparticipantsinABIDE plottedand colored 2.2 Truesignalsforthesimulationexamples. Each componentis atwodimensionalarray wherethe 2.3 Boxplots(for bothfastICA and PGICA) oftheaveragecorrelations (log-transformed)ofthetrue 2.4 Axial,sagittal,and frontal (left toright)planes oftheauditory,control,defaultmodeandvisualnetw 2.5 3Dviewofauditory,control,defaultmodeand visualnetworks(from top). . 24 2.6 Axial,sagittal,and frontal (left toright)planes ofthedefault mode,auditoryand visualnetworks 3.1 xaxis istuningparameterλ underlogscaleand yaxis isthedistancebetween truthand estimations; C 3.2 Row1: Atruth;non-penalizedestimationofA; optimallypenalized estimationofA. Row 2: C truth; 3.3 Estimationand predictionaccuracies. . . . . . . . . . . . . . . . . . . . . 53 3.4 Eigen-valuesand CorrespondingProfle LikelihoodPlot . . . . . . . . . . 56 3.5 ConnectivityGraph: Thewideredgemeansstrongerconnectivity;thered edgemeans negativeconnecti 3.6 3DRendering ofColumnsofMatrixC . . . . . . . . . . . . . . . . . . . . 58 3.7 Predictionaccuracies comparisononHCP data . . . . . . . . . . . . . . . 59 4.1 Threeweightsformultivariatetimeseries averaging . . . . . . . . . . . . . 66 4.2 Weightsestimatedwithfunctionalregression: whitecolorismoreweightwhilegreen colorisless 4.3 Predictionaccuracies comparison. Themean correlationsforfunctionalregressionweightsand box 4.4 Predictionaccuracies comparison. Themean correlationsforfunctionalregressionweightsand box 4.5 Predictionaccuracies comparison. Themean correlationsforfunctionalregressionweightsand box 4.6 Pick numberofclusters: theSSE dropsvery slowlywhenthenumberofclusters isover5. 79 x
Description: