Table Of ContentSTATISTICAL METHODS TO ANALYZE MASSIVE
HIGH-DIMENSIONAL NEUROIMAGING DATA
by
Shaojie Chen
A dissertationsubmittedtoTheJohnsHopkinsUniversityinconformitywiththe
requirementsforthedegreeofDoctorofPhilosophy.
Baltimore,Maryland
August,2015
©ShaojieChen 2015
Allrightsreserved
Abstract
Thestatisticalanalysisofneuroimagingdataposesseveralchallengestoday,partlydue
to their size, high dimensionalityand noise. In thiswork, we address three different meth-
ods foranalyzing massive,high-dimensionaland noisyfunctionalmagneticresonance im-
ages(fMRI) data. In thefrstmethod,parallelcomputingtechniquesarecombinedwithan
independent component analysis (ICA) algorithm to decompose resting state fMRI data.
The algorithm’s performance is greatly improved compared to existing methods. In the
second method, a graphical model, referred to as state space model (SSM) is extended by
enforcingL-1andL-2penaltiesonparameters. Themodelscaleswelltoveryhighdimen-
sionsand can beapplied to a vastclass ofdifferent neuroimaginganalysis applications. In
thethirdmethod,atwo-stagemethodisdevelopedtoextractinformationfromnoisyfMRI
data. WefrstusefunctionalregressiontoextractfeaturesfromfMRIdataandthenusethe
features to predicts physical pains that human feels. A support vector machine (SVM) is
trained forpredictionandit achieveshighpredictionaccuracy.
Advisor:
ii
ABSTRACT
Brian Caffo, PhD
Committee:
James Pekar, PhD (chair, SOMradiology& radiologicalscience)
Brian Caffo, PhD (advisor,SPH biostatistics)
MichelleCarlson, PhD (SPH mentalhealth& epidemiology)
MartinLindquist,PhD (SPH biostatistics)
Alternates:
Gregory Pontone,MD(SOM psychiatry)
Ani Eloyan,PhD (SPH biostatistics)
JonathanLinks,PhD (SPH environmentalhealth)
iii
Acknowledgments
FirstIwouldliketothankmyadvisor,BrianCaffo,whohasbeenagreatfriend,agreat
advisorandgreatmentor. Hehasbeenaconstantsourceofadviceandencouragement. His
nice personality and endless fow of ideas have made it a great pleasure to work with him.
I have learned from him how to be a good biostatistical researcher, and more importantly,
howtobeagreat person.
Thanks to my thesis committee for their advice over the years: James Pekar, Michelle
Carlson, MartinLindquist,GregoryPontone,Ani Eloyanand JonathanLinks.
Thanks to Joshua Vogelstein and Ciprian Crainiceanu for yourgreat advice on my dif-
ferent projects.
Thanks to all the SMARTies. It is so great to be part of the group. The group is like a
bigfamilyand isalways fullofgreat ideas and projects.
Warmestthankstothestudentsofthebiostatisticsdepartmentwhooverlappedwithmy
time here. Special thanks to Chen Yue, Lei Huang, Huitong Qiu, Detian Deng and Yuting
Xu: you guys are so amazing that I could not imagine what my PhD life will be without
you. Iam gratefultothestudentsthat begantheprogrambeforemefortheirhelpon every
iv
ACKNOWLEDGMENTS
aspect ofmyresearch and career (especiallyJueminYang andShanshan Li).
Finally,Iwouldsaythankstomyfamily. Thankstomyparentswhohavebeensupport-
ing me all the time on every decision I made. They do not speak English and might never
know what I am writing here, but I am sure they can feel it on the other side of the planet.
Thanks to my brother, Shujie Chen, who has always been a last resort for inspiration and
strength when I got confused and don’t know what to do with my life. I feel so lucky to
have been born in such a great family. Thanks to Yao, my fancee, for all the care, support
and everythingwehaveexperienced together.
v
Contents
Abstract ii
Acknowledgments iv
ListofTables vii
ListofFigures viii
1 Introduction 1
1.1 Statisticalchallengesinneuroimagingdataanalysis . . . . . . . . . . . . . 2
1.2 Organizationaloverview . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 A ParallelGroup Independent Component AnalysisAlgorithm forMassivefMRIData Analysis
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Materialsand Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 TheICA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
vi
CONTENTS
2.2.2 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 1000 FunctionalConnectomesProject Data . . . . . . . . . . . . . 14
2.2.4 AutismBrain ImagingDataExchange . . . . . . . . . . . . . . . . 15
2.2.5 SimulationStudies . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 1,000Functional ConnectomesProject Data . . . . . . . . . . . . . 21
2.3.2 AutismBrain ImagingDataExchange . . . . . . . . . . . . . . . . 24
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 A Sparse HighDimensional State-Space Model withanApplicationto Neuroimaging Data 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 TheModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3 ParameterEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 E Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.2 M Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.3 TheCompleteEM . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.1 Parameter Estimations . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.2 MakingPredictions . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
vii
CONTENTS
4 fMRI Based Biomarker forPhysicalPain 61
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 FeatureExtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 SupportVectorRegression . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3.1 LinearRegression . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3.2 Non-linearRegressionand theKernel Trick . . . . . . . . . . . . . 72
4.4 ApplicationtoPain Prediction . . . . . . . . . . . . . . . . . . . . . . . . 74
4.4.1 Prediction Acurracy . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.4.2 Further ImprovingPredictionAccuracy . . . . . . . . . . . . . . . 77
4.4.3 ClusteringVoxels . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.5 Discussionsand FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . 80
5 DiscussionandFuture Work 81
Appendices 84
A1 Appendixto Chapter3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Bibliography 87
Curriculum Vitae 97
viii
List of Tables
2.1 Summarymeasures ofthecorrelationsin thetwo simulationexamples. . . . 20
2.2 Speed increaseofPGICA . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 PLDSRunningTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 SimilaritiesAmongEstimatedAMatrices . . . . . . . . . . . . . . . . . . 58
ix
List of Figures
2.1 Histogramsofage(left), IQ (middle),and SRS (right)forparticipantsinABIDE plottedand colored
2.2 Truesignalsforthesimulationexamples. Each componentis atwodimensionalarray wherethe
2.3 Boxplots(for bothfastICA and PGICA) oftheaveragecorrelations (log-transformed)ofthetrue
2.4 Axial,sagittal,and frontal (left toright)planes oftheauditory,control,defaultmodeandvisualnetw
2.5 3Dviewofauditory,control,defaultmodeand visualnetworks(from top). . 24
2.6 Axial,sagittal,and frontal (left toright)planes ofthedefault mode,auditoryand visualnetworks
3.1 xaxis istuningparameterλ underlogscaleand yaxis isthedistancebetween truthand estimations;
C
3.2 Row1: Atruth;non-penalizedestimationofA; optimallypenalized estimationofA. Row 2: C truth;
3.3 Estimationand predictionaccuracies. . . . . . . . . . . . . . . . . . . . . 53
3.4 Eigen-valuesand CorrespondingProfle LikelihoodPlot . . . . . . . . . . 56
3.5 ConnectivityGraph: Thewideredgemeansstrongerconnectivity;thered edgemeans negativeconnecti
3.6 3DRendering ofColumnsofMatrixC . . . . . . . . . . . . . . . . . . . . 58
3.7 Predictionaccuracies comparisononHCP data . . . . . . . . . . . . . . . 59
4.1 Threeweightsformultivariatetimeseries averaging . . . . . . . . . . . . . 66
4.2 Weightsestimatedwithfunctionalregression: whitecolorismoreweightwhilegreen colorisless
4.3 Predictionaccuracies comparison. Themean correlationsforfunctionalregressionweightsand box
4.4 Predictionaccuracies comparison. Themean correlationsforfunctionalregressionweightsand box
4.5 Predictionaccuracies comparison. Themean correlationsforfunctionalregressionweightsand box
4.6 Pick numberofclusters: theSSE dropsvery slowlywhenthenumberofclusters isover5. 79
x
Description:independent component analysis (ICA) algorithm to decompose resting state fMRI data. data, due to the curse of dimensionality programming environments exist that provide basic tools, language features and .. For PGICA, each slave computer only calculates the gradient for a single subject,.