Multisite functional connectivity MRI classification of autism: ABIDE results Citation Nielsen, Jared A., Brandon A. Zielinski, P. Thomas Fletcher, Andrew L. Alexander, Nicholas Lange, Erin D. Bigler, Janet E. Lainhart, and Jeffrey S. Anderson. 2013. “Multisite functional connectivity MRI classification of autism: ABIDE results.” Frontiers in Human Neuroscience 7 (1): 599. doi:10.3389/fnhum.2013.00599. http://dx.doi.org/10.3389/fnhum.2013.00599. Published Version doi:10.3389/fnhum.2013.00599 Permanent link http://nrs.harvard.edu/urn-3:HUL.InstRepos:11878940 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of-use#LAA Share Your Story The Harvard community has made this article openly available. Please share how this access benefits you. Submit a story . Accessibility ORIGINALRESEARCHARTICLE HUMAN NEUROSCIENCE published:25September2013 doi:10.3389/fnhum.2013.00599 Multisite functional connectivity MRI classification of autism: ABIDE results JaredA.Nielsen1,2,BrandonA.Zielinski3,P.ThomasFletcher4,AndrewL.Alexander5,6, NicholasLange7,8,ErinD.Bigler9,10,JanetE.Lainhart5 andJeffreyS.Anderson1,10,11,12* 1InterdepartmentalPrograminNeuroscience,UniversityofUtah,SaltLakeCity,UT,USA 2DepartmentofPsychiatry,UniversityofUtah,SaltLakeCity,UT,USA 3DepartmentsofPediatricsandNeurology,UniversityofUtahandPrimaryChildren’sMedicalCenter,SaltLakeCity,UT,USA 4SchoolofComputingandScientificComputingandImagingInstitute,UniversityofUtah,SaltLakeCity,UT,USA 5WaismanLaboratoryforBrainImagingandBehavior,DepartmentofPsychiatry,UniversityofWisconsin,Madison,WI,USA 6DepartmentofMedicalPhysics,UniversityofWisconsin,Madison,WI,USA 7DepartmentsofPsychiatryandBiostatistics,HarvardUniversity,Boston,MA,USA 8NeurostatisticsLaboratory,McLeanHospital,Belmont,MA,USA 9DepartmentofPsychologyandNeuroscienceCenter,BrighamYoungUniversity,Provo,UT,USA 10TheBrainInstituteofUtah,UniversityofUtah,SaltLakeCity,UT,USA 11DepartmentofBioengineering,UniversityofUtah,SaltLakeCity,UT,USA 12DivisionofNeuroradiology,UniversityofUtah,SaltLakeCity,UT,USA Editedby: Background: Systematic differences in functional connectivity MRI metrics have RajeshK.Kana,Universityof been consistently observed in autism, with predominantly decreased cortico-cortical AlabamaatBirmingham,USA connectivity. Previous attempts at single subject classification in high-functioning autism Reviewedby: usingwholebrainpoint-to-pointfunctionalconnectivityhaveyieldedabout80%accurate Ralph-AxelMüller,SanDiegoState classification of autism vs. control subjects across a wide age range. We attempted to University,USA GopikrishnaDeshpande,Auburn replicatethemethodandresultsusingtheAutismBrainImagingDataExchange(ABIDE) University,USA includingrestingstatefMRIdataobtainedfrom964subjectsand16separateinternational *Correspondence: sites. JeffreyS.Anderson, Methods: For each of 964 subjects, we obtained pairwise functional connectivity InterdepartmentalProgramin Neuroscience,UniversityofUtah, measurements from a lattice of 7266 regions of interest covering the gray matter 201PresidentsCir,SaltLakeCity, (26.4 million “connections”) after preprocessing that included motion and slice timing UT84112,USA correction, coregistration to an anatomic image, normalization to standard space, and e-mail:[email protected] voxelwiseremovalbyregressionofmotionparameters,softtissue,CSF,andwhitematter signals. Connections were grouped into multiple bins, and a leave-one-out classifier was evaluated on connections comprising each set of bins. Age, age-squared, gender, handedness,andsitewereincludedascovariatesfortheclassifier. Results: Classification accuracy significantly outperformed chance but was much lower formultisitepredictionthanforprevioussinglesiteresults.Ashighas60%accuracywas obtainedforwholebrainclassification,withthebestaccuracyfromconnectionsinvolving regionsofthedefaultmodenetwork,parahippocampalandfusiformgyri,insula,Wernicke Area,andintraparietalsulcus.Theclassifierscorewasrelatedtosymptomseverity,social function, daily living skills, and verbal IQ. Classification accuracy was significantly higher forsiteswithlongerBOLDimagingtimes. Conclusions: Multisite functional connectivity classification of autism outperformed chance using a simple leave-one-out classifier, but exhibited poorer accuracy than for single site results. Attempts to use multisite classifiers will likely require improved classification algorithms, longer BOLD imaging times, and standardized acquisition parametersforpossiblefutureclinicalutility. Keywords:functionalconnectivity,fcMRI,classification,autism,ABIDE INTRODUCTION etal.,2010;Ingalhalikaretal.,2011),positronemissiontomog- Brain imagingclassification strategies of autism have used raphy (Duchesnay et al., 2011), and magnetoencephalography information from structural MRI (Ecker et al., 2010a,b; Jiao (Roberts et al., 2010, 2011; Tsiaras et al., 2011; Khan et al., et al., 2010; Uddin et al., 2011; Calderoni et al., 2012; Sato 2013). Such approaches have been undertaken for several clini- etal.,2013),functionalMRI(Andersonetal.,2011d;Coutanche cal objectives. Sensitive and specific biomarkers for autism may et al., 2011; Wang et al., 2012), diffusion tensor MRI (Lange contributepotentiallyusefulbiologicalinformationtodiagnosis, FrontiersinHumanNeuroscience www.frontiersin.org September2013|Volume7|Article599|1 Nielsenetal. Multisitefunctionalconnectivityclassification prognosis,andtreatmentdecision-making.Itishopedthatimag- functional connectome project (Biswal et al., 2010), the ADHD ing biomarkers may also help delineate subtypes of individu- 200 Consortium dataset (ADHD-200_Consortium, 2012), and als with autism that may have common brain neuropathology mostrecentlytheAutismBrainImagingDataExchange(ABIDE) and respond to similar treatment strategies, although different (Di Martino et al., 2013), consisting of images from 539 indi- methodology will likely be required for subgrouping individu- viduals with ASD and 573 typical control individuals, acquired alsthanforclassifyingindividualsbydiagnosis.Suchquantitative at 16 international sites. In the present study, we evaluate clas- biomarkers may also serve as a metric for biological efficacy sificationaccuracyofwhole-brainfunctionalconnectivityacross of potential behavioral or pharmacologic interventions. Finally, sites, and determine which abnormalities in connectivity across imagingbiomarkersmayhelpidentifypathophysiologicmecha- thebrainaremostinformativeforpredictingautismfromtypical nismsofautisminthebrainthatcanguideinvestigationsintothe development,whichimagingacquisitionfeaturesleadtogreatest specific neural circuits, developmental windows, and genetic or accuracy, whether functional connectivity abnormalities covary environmentalfactorsthatmayresultinimprovedtreatments. withmetricsofdiseaseseverity,andtheextenttowhichabnormal Abnormal functional connectivity MRI (fcMRI) has been functionalconnectivityisreplicatedacrosssites. amongthemostreplicatedimagingmetricsinautism.Thepro- posed basis for fcMRI is that connected brain regions are likely MATERIALSANDMETHODS toexhibitsynchronizedneuralactivity,whichcanbedetectedas SUBJECTSAMPLE covarianceofslowfluctuationsinBloodOxygenLevelDependent ABIDE consists of 1112 datasets comprised of 539 autism and (BOLD) signal between the regions. Initial reports of decreased 573 typically developing individuals (Di Martino et al., 2013). functional connectivity in autism by three independent groups Each dataset consists of one or more resting fMRI acquisitions (Justetal.,2004;Villalobosetal.,2005;Welchewetal.,2005)have andavolumetricMPRAGEimage.Alldataarefullyanonymized beenfollowedbymorethan50primaryreportsofabnormalfunc- in accordance with HIPAA guidelines, with analyses performed tionalconnectivityinautismintheliterature,derivedfromfMRI inaccordancewithpre-approvedproceduresbytheUniversityof data both in a resting state and acquired during cognitive tasks UtahInstitutionalReviewBoard.Allimageswereobtainedwith (Anderson,2013). informedconsentaccordingtoproceduresestablishedbyhuman Most reports show decreases in connectivity between dis- subjectsresearchboardsateachparticipatinginstitution.Details tant brain regions, including nodes of the brain’s default mode of acquisition, informed consent, and site-specific protocols are network (Cherkassky et al., 2006; Kennedy and Courchesne, availableathttp://fcon_1000.projects.nitrc.org/indi/abide/. 2008; Wiggins et al., 2011), social brain regions (Gotts et al., Inclusion criteria for subjects were successful preprocessing 2012;vondemHagenetal.,2013),attentionalregions(Koshino with manual visual inspection of normalization to MNI space etal.,2005),languageregions(Dinsteinetal.,2011),interhemi- ofMPRAGE,coregistrationofBOLDandMPRAGEimages,seg- spheric homologues (Anderson et al., 2011a), and throughout mentationofMPRAGEimage,andfullbraincoveragefromMNI the brain (Anderson et al., 2011d). Nevertheless, some reports z>−35toz<70onallBOLDimages.Inclusioncriteriaforsites have also shown abnormal increases in functional connectiv- wereatotalofatleast20subjectsmeetingallotherinclusioncrite- ity in autism (Muller et al., 2011) or unchanged connectivity ria.Atotalof964subjectsmetallinclusioncriteria(517typically (Tyszka et al., 2013). In particular, higher correlation between developingsubjectsand447subjectswithautismfrom16sites). brainregionshasbeenobservedinnegativelycorrelatedconnec- Eachsitefolloweddifferentcriteriafordiagnosingpatientswith tions (Anderson et al., 2011d), corticostriatal connections (Di autismorascertainingtypicaldevelopment,however,themajor- Martino et al., 2011),visual search regions (Keehn et al., 2013), ityofthesitesusedtheAutismDiagnosticObservationSchedule and brain network-level metrics (Anderson et al., 2013a; Lynch (Lord et al., 2000) and Autism Diagnostic Interview-Revised etal.,2013). (Lord et al., 1994). Specific diagnostic criteria for each site can Despite the large and growing body of reports of abnormal befoundatfcon_1000.projects.nitrc.org/indi/abide/index.html. functionalconnectivityinautism,uncertaintyremainsaboutthe Subjectdemographicsforindividualssatisfyinginclusioncri- spatialdistributionofdecreased andincreasedconnectivityand teria are shown in Table1. Six different testing batteries were howthisrelatestotheclinicalheterogeneityofautismspectrum used to calculate verbal IQ and performance IQ, respectively. disorders(ASD).Oneofthechallengesforansweringtheseques- In addition to the IQ measures, the following measures were tionshasbeenfractionationoftheavailabledataintoindividual included in correlations with the classifier score (see Table1 site-specificstudieswithrelativelysmallsamplesizes.Thereisa for summary of behavioral measures):the Social Responsiveness needforanalysisofmultisitedatasetsthatcanimprovestatistical Scale (Constantino et al., 2003) is a measure of social func- power, represent greater variance of disease and control sam- tionandtheVinelandAdaptiveBehaviorScales(Sparrowetal., ples,andallowreplicationacrossmultiplesiteswithdifferential 1984) is a measure of daily functioning. See the ABIDE web- subjectrecruitment,imagingparameters,andanalysismethods. site for more information on the specific behavioral measures Ultimately, clinically useful biomarkers will need to be repli- used.Forhandedness,categoricalhandedness(i.e.,right-handed, cated in diverse acquisition conditions that reflect community left-handed,orambidextrous)wasusedintheleave-one-outclas- andacademicimagingpractices. sifier (see details below). In the case that only a quantitative The advent of cooperative, publicly available datasets for handedness measure was reported, positive values were con- resting state functional MRI is an important step forward. vertedtoright-handed,negativevaluestoleft-handed,andavalue Multiplesuchdatasetshavenowbeenreleasedincludingthe1000 of zero to ambidextrous. Fifteen subjects lacked a categorical FrontiersinHumanNeuroscience www.frontiersin.org September2013|Volume7|Article599|2 Nielsenetal. Multisitefunctionalconnectivityclassification Table1|SubjectsincludedfromtheABIDEsamplewithdemographicinformation. Age ADI-Rsocial ADI-Rverbal ADOStotal VerbalIQ PerformanceIQ SRStotal Vineland Numberof 964 348 349 348 781 796 335 201 subjects Control (426M,91F) 0 0 32 413 425 160 80 Autism (396M,51F) 348 349 316 367 371 175 121 Controlmean 16.9±7.56 NA NA 1.25±1.37 112±13.3 108±13.3 21.2±16.2 105±11.6 ±SD (Control (6.47–56.2) NA NA (0–4) (67–147) (67–155) (0–103) (77–131) range) Autismmean 16.6±8.1 19.7±5.65 15.9±4.55 11.9±3.81 105±17.4 106±17.2 91.6±30.6 75±13.2 ±SD (Autism (7–64) (2–30) (2–26) (2–22) (50–149) (59–157) (6–164) (41–106) range) andquantitative measureofhandedness. Inthosecases,anear- (c) Mask of scalp and facial soft tissues (Anderson et al., est neighbor classification function (ClassificationKNN.m in 2011b). MATLAB) was used to assign categorical handedness. For the classifier, 862 subjects were right-handed, 95 were left-handed, (8) Voxelwise regression using glmfit.m (MATLAB Statistics and7wereambidextrous. Toolbox) software of CSF, WM, Soft tissue, and 6 motion parametersfromrealignmentstepfromtimeseriesofeach BOLDPREPROCESSING voxelofBOLDimages. Preprocessing was performed in MATLAB (Mathworks, Natick, (9) Motion scrubbing (Power et al., 2012) of framewise dis- MA) using SPM8 (Wellcome Trust, London) software. The fol- placement and DVARS with removal of volumes before lowingsequenceofpreprocessingstepswasperformed: andafteraroot-mean-squaredisplacementof>0.2mmfor eitherparameterandconcatenationofremainingvolumes. (1) Slicetimingcorrection. In86.2%oftheparticipantsmorethan50%ofthevolumes (2) Realign and reslice correction of motion for each volume remained after motion scrubbing. Among the remaining relativetoinitialvolume. participants with fewer than 50% retained volumes, the (3) Coregistration of BOLD images to MPRAGE anatomic majoritybelongedtotheautismgroup(8.8%,comparedto sequence. 5.0% from the typically developing group; p=0.02). The (4) Normalization of MPRAGE to MNI template brain, with groups differed in the number of retained volumes when normalization transformation also applied to coregistered considering the entire sample of 964 subjects (t =4.11, BOLDimages. p<0.001) and when considering only those with greater (5) Segmentation of gray matter, white matter, and CSF com- than50%ofthevolumesremaining(t =2.04,p=0.04). ponentsofMPRAGEimage(thoroughclean). (10) Nospatialsmoothingwasperformed.Theglobalmeansig- (6) Voxelwisebandpassfilter(0.001–0.1Hz)andlineardetrend nal and gray matter time courses were not regressed from voxelwisedata(Saadetal.,2012,2013;Joetal.,2013). (a) Thelowerlimitof0.001Hzwaschoseninordertobe certain as much neural information was included as possible (Anderson et al., 2013b). The linear detrend ROIANALYSIS removed much of the contribution of low frequencies From preprocessed BOLD images for each subject, mean time given the relatively short time series available in the course was extracted from 7266 gray matter ROIs. These ROIs dataset. fromalatticecoveringthegray.niiimage(SPM8)fromz=−35 toz=70at5-mmresolution,withMNIcoordinatesofcentroids (7) Extractionofmeantimecoursesfromtherestrictionmasks previouslyreported(Andersonetal.,2011d).TheROIsaveraged appliedtoBOLDimagesfromROIsconsistingof: 4.9±1.3 standard deviation voxels in size for 3mm isotropic (a) CSF segmented mask with bounding box −35<x< voxels.A7266×7266matrixofFisher-transformedPearsoncor- 35,−60<y<30,0<z<30. relationcoefficientswasobtainedforeachsubjectfromtheROI (b) Whitemattersegmentedmaskoverlappingwith10mm timecoursesrepresentinganassociationmatrixoffunctionalcon- radii spheres centered at x=−27, y=−7, z = 30, nectivityineachsubjectbetweenallpairsofROIs.Eachpairof x=27,y=−7,z=30. ROIsistermeda“connection”forthepresentanalysis. FrontiersinHumanNeuroscience www.frontiersin.org September2013|Volume7|Article599|3 Nielsenetal. Multisitefunctionalconnectivityclassification LEAVE-ONE-OUTCLASSIFIER The classification approach is summarized in Figure1. Overall, a leave-one-out classifier was used to generate a classification score for each of the 964 subjects, leaving out one subject at a timeandcalculatingtheclassificationscorefortheleftoutsub- ject.Theclassificationapproachfollowedtheapproachreported previously, with slight modifications (Anderson et al., 2011d). First, the correlation measurements for the remaining 963 sub- jectswereextractedforoneofthe26.4millionconnectionsfrom the 7266×7266 association matrix described above (Figure1, Step 1). Second, a general linear model was fit to the measure- mentsseparatelyforautism(redfitlineinFigure1,Step2)and control subjects (black fit line in Figure1, Step 2) for the given connection with covariates of subject age, age-squared, gender, andhandedness.Fromthesedata,estimatedvaluesfortheleftout subjectforthisconnectionwerecalculatedbasedontheleftout subject’sage,gender,andhandedness.Avaluewasestimatedsep- arately from the remaining autism subjects (blue X in Figure1, Step 2) and remaining control subjects (green X in Figure1, Step2). Because each site used slightly different scanning hardware andparametersthatmaysystematicallybiasresults,theestimated values of the left out subject (blue and green X in Figure1, Step2)wereadjustedbyaddingthedifferenceofthesite’smean value for that connection (minus the left out subject) from the mean value for that connection from all other sites. Finally, the actual value for the left out subject for the connection (green dotinFigure1,Step2)wassubtractedfromtheestimatedvalue obtained from autism subjects (blue vertical line on Figure1, Step2)andfromtheestimatedvalueobtainedfromcontrolsub- jects (green vertical line in Figure1, Step 2). The difference of the absolute value of these two differences was then multiplied bytheF-statisticforthedifferencebetweentheremainingautism andcontrolsubjects.Thisprocesswasiterativelycarriedoutfor all 26.4 million connections and then averaged across the 7265 connections in which each of 7266 ROIs participates. Then the averaged values for each of the 7266 ROIs were summed. The summed value was equal to the classification score for the sub- ject.Morenegativevaluesfortheclassificationscorepredictthe left out subject was a control subject, and more positive values forclassificationscorepredicttheleft-outsubjectwasanautism subject. BINSOF“CONNECTIONS” Connections were grouped into bins in several different ways to aggregate groups of connections to test for accuracy in dis- criminating autism from control subjects. First, a measurement of correlation strength was obtained for each connection from 961independentsubjectsfromthe1000FunctionalConnectome projectusingidenticalpreprocessingsteps(seey-axisofFigure6). Subjectsincludedinthissamplehavebeenpreviouslydescribed (Ferguson and Anderson, 2011). Second, Euclidean distance betweeneachpairofROIswascalculatedfromthecentroidcoor- dinates for the ROIs (see x-axis of Figure6). Connections were FIGURE1|Summaryofclassificationapproach.Step1,Association grouped into 2-dimensional bins based on the strength of the matricescorrespondingtotheintrinsicconnectivitybetweeneachpairof correlation and the distance between the ROIs, with bin spac- 7266graymatterregions(about26.4millionconnections)areestimated (Continued) ing of 0.05 units of Fisher-transformed correlation and 5-mm FrontiersinHumanNeuroscience www.frontiersin.org September2013|Volume7|Article599|4 Nielsenetal. Multisitefunctionalconnectivityclassification FIGURE1|Continued fortheleftoutsubjectandthe963remainingsubjects.Step2,Plotdepictingan exampleconnection(i.e.,singlecellofthepossible26.4millioncellsfromthe associationmatricesinStep1)forthe964subjects.Theplotincludesaxesfor correlationstrengthandage,however,theplotrepresentsamultidimensional spacethatincludesage-squared,gender,andhandednessascovariates. Blackline,fitlineforthecontrolgroup;redline,fitlinefortheautismgroup; greendatapoint,leftoutsubject(acontrolsubjectinthisexample);greenX, estimatedvalueforthecontrolgroup;blueX,estimatedvalueforautism group;greenverticalline,differencebetweenactualconnectionstrength valueforleftoutsubjectandestimatedvalueforcontrolgroup;bluevertical line,differencebetweenactualconnectionstrengthvalueforleftoutsubject andestimatedvalueforautismgroup.Steps3and4aredescribedinthetext. distance.Theresultsforaccuratelyclassifyingthesubjectsusing FIGURE2|Totalaccuracy,sensitivity,andspecificityforleave-one-out thisbinningsystemaresummarizedinFigure6. classifierin964subjects.Thetotalaccuracy,sensitivity,andspecificityare A separate binning scheme was performed during the evalu- shownwhenall26.4millionconnectionswereincludedintheclassifierand ation ofa leave-one-out-classifier. Foreach left outsubject, sets thenfordifferentp-valuethresholdsthatdeterminewhichconnectionsare of connections were calculated that satisfied a two-tailed t-test includedintheclassifier. between remaining autism and control subjects with p-values less than 0.01, 0.001, 0.0001, and 0.00001. These sets of con- RESULTS nections varied slightly for each left out subject, since no data First,weinvestigatedtheoverallaccuracy,sensitivity,andspeci- that can reflect the value of the left-out subject’s connectivity ficity of the leave-one-out classifier for all 964 subjects in measurementcanbeusedintheclassifier. the ABIDE consortium (Figure2) and the 16 data collection Classification accuracy, sensitivity, and specificity were cal- sites individually (Figure3). For the entire ABIDE consortium, culated for the set of connections that differed between autism we achieved the highest overall accuracy (60.0%), sensitivity and control subjects at p-values of 0.01, 0.001, 0.0001, 0.00001 (62.0%),andspecificity(58.0%)whenconnectionswereincluded (Figure3A). We used this last binning system because there is intheclassificationalgorithmifgroupdifferencesfortheconnec- atradeoffinusingmanyconnectionsinconstructingtheclassi- tionmetap-valuethresholdoflessthan10−4;whereasthelowest fierscoresandusingfewerbutmoreinformativeconnections.We accuracy (55.7%), sensitivity (57.1%), and specificity (54.4%) wanted to determine which thresholded bin yielded the highest were found when all 26.4 million connections were included in accuracy. the leave-one out classifier. When considering only those sub- jects with greater than 50% of the BOLD volumes remaining STATISTICALANALYSES aftermotionscrubbing,theaccuracyforthefivedifferentp-value Foreachbinofconnections,avectorof964classificationscores thresholds increased between 0.6% and 3.1%, although the dif- wasobtained(oneforeachleftoutsubject)andtheclassification ferencewasnotsignificantcomparedtotheaccuracyforall964 scorewasthresholdedat0(inthecaseofthestrength/Euclidean subjects (p>0.18). No difference in classification accuracy was distancebins,oratathresholdselectedtooptimizetheareaunder foundbetweensubjectswhohadtheireyesopenduringthescan a receiver operating characteristic curve for the case of the bins vs. those who had their eyes closed, after correcting for multi- determinedbyp-values.Predicteddiagnosis(autismvs.control) plecomparisonsusinganFDRofq<0.05.Also,nodifferencein wascomparedtotheactualdiagnosisofeachleftoutsubject,and classificationaccuracywasfoundbetweenmaleandfemalesub- significantclassificationaccuracywasdeterminedbyabinomial jects,aftercorrectingformultiplecomparisonsusinganFDRof distribution. For 964 subjects, predicting 509 subjects (52.8%) q<0.05. correctly corresponded to an uncorrected p-value of less than We also compared the accuracy, sensitivity, and specificity 0.05,andpredicting531subjects(55.1%)correctlycorresponds across sites using different p-value thresholds for determining to p-value of less than 0.001. Two-proportion z-tests were used whichconnectionstoincludeintheleave-one-outclassifier.The totestthefollowing:(1)whethertherewasagroupdifferencein accuracy,sensitivity,andspecificityvariedateachsitedepending theproportionofsubjectswithlessthan50%oftheBOLDvol- on the p-value threshold, however, we consistently achieved the umesremainingaftermotionscrubbing(resultsaboveinBOLD highestaccuracyatSBL(meanaccuracy=69.3%),USM(mean preprocessingsection),(2)whetherclassificationaccuracydiffered accuracy = 69.1%), Stanford (mean accuracy = 67.7%), and betweentheeyesopenandeyesclosedsubjects,(3)whetherclas- Pitt (mean accuracy = 65.4%); the highest sensitivity at SDSU sificationaccuracydifferedbetweenthemaleandfemalesubjects, (90.0%), Leuven (88.9%), SBL (84.0%), and Stanford (74.4%); and(4)whetheraccuracyincreasedwhenconsideringonlythose andthehighestspecificityatUSM(79.5%),Olin(75.0%),UCLA subjectswithgreaterthan50%oftheBOLDvolumesremaining (71.5%),andKKI(70.6%). aftermotionscrubbing,ratherthanall964subjects.Two-sample Next,wedeterminedwhetherthesite’ssamplesizeorthenum- t-tests were used to determine if there was a group difference berofimagingvolumesfromasinglerunrelatedtothesite’sclas- in the number of remaining volumes (results above in BOLD sification accuracy (Figure4). The number of imaging volumes preprocessingsection). was positively correlated with accuracy (r=0.55, p=0.03). FrontiersinHumanNeuroscience www.frontiersin.org September2013|Volume7|Article599|5 Nielsenetal. Multisitefunctionalconnectivityclassification FIGURE3|Accuracy, sensitivity, and specificity for each data shown for each data acquisition site at a threshold of p<0.0001 (i.e., acquisition site. Accuracy (A) is shown for each data acquisition site the threshold at which optimal total accuracy was obtained in at different p-value thresholds. The sensitivity and specificity (B) are Figure 2). If the number of imaging volumes post-scrubbing was aver- Finally,weinvestigatedtherelationshipbetweenthesubject’s aged across site, the relationship between number of imaging classifierscoreandbehavioralmeasures(Figure7).Estimatesof volumes and accuracy was no longer significant. Sample size symptomseverity(r=0.13,p=0.01),asmeasuredbytheADOS did not correlate with site’s classification accuracy (r=0.17, social + communication algorithm score, and SRS (r=0.17, p=0.53). p=0.002) positively correlated with the classifier score, how- Wethendeterminedwhichbrainregionsandconnectionchar- ever,symptomseverity,asmeasuredbytheADI-Rverbaldomain acteristics accurately classified the ABIDE subjects. In Figure5, algorithm score (r=−0.06, p=0.30) or social domain algo- thefollowingbrainregions(andthe7265connectionsinwhich rithm score (r=−0.04, p=0.51), and performance IQ (r= they were involved) resulted in the highest accuracy: parahip- −0.03,p=0.38)didnotcorrelatewiththeclassifierscore.Verbal pocampalandfusiformgyri,insula,medialprefrontalcortex,pos- IQ(r=−0.07,p=0.05)andVinelandadaptivebehaviorcom- terior cingulate cortex, Wernicke Area, and intraparietal sulcus. posite score(r=0.17, p=0.002) negatively correlate with the InFigure6,twoclustersofbinsresultedinthehighestaccuracy. classifier score. In other words, as social function (lower SRS Thefirstclusterincludedbinswithshort-range(10–25mm)and score is indicative of better social function), verbal IQ, and medium-strengthconnections(0.3<z<0.5).Thesecondclus- dailyliving skillsincreased andcurrent level ofsymptomsever- ter included bins with long-range (100–125mm) and medium- ity decreased, a subject was more likely to be classified as a strengthconnections(0.15<z<0.4). control. FrontiersinHumanNeuroscience www.frontiersin.org September2013|Volume7|Article599|6 Nielsenetal. Multisitefunctionalconnectivityclassification FIGURE4|Relationshipbetweenasite’stotalaccuracyandthe numberofimagingvolumesacquiredbyeachsite.Eachsite’stotal accuracywascalculatedwhenusingap<0.0001threshold(i.e.,the thresholdatwhichoptimaltotalaccuracywasobtainedinFigure2)and correlatedwiththenumberofBOLDimagingvolumesacquiredduringthe resting-statesequence. DISCUSSION FunctionalconnectivityMRIdatafromasetof26.4million“con- nections” per subject is able to successfully classify a subject as autistic or typically developing using a leave-one-out approach with an accuracy of 60.0% (p<2.2×10−10), across a set of 964 subjects contributed from 16 different international sites. Overall specificity was 58.0% and overall sensitivity was 62.0%. Classification consisted of a weighted average of connections that used no information about the left out subject except for age, gender, site, and handedness. Using a weighted average of all 26.4 million connections resulted in a classification accuracy FIGURE5|Totalaccuracyfor7266brainregions.Accuracywas of 55.7% (p=0.00017), with best accuracy (60.0%) achieved determinedforeachofthe7266brainregionsindependentlybyonlytaking for a subset of connections that satisfied p<10−4 for a dif- intoaccountthe7265connectionsinwhichagivenregionwasinvolved(no p-valuethreshold,allconnectionsused).Theminimumaccuracydisplayed ference between autism and control among remaining subjects forasingleregionis53.95%,whichwasthefalsediscoveryratecorrected foreachleft-outsubject.Classificationscoressignificantlycovar- percentagefor7266regionsandabinomialcumulativedistribution. ied with metrics of current disease severity including ADOS-G (as opposed to ADI-R, which incorporates disease severity at early ages), SRS, and verbal IQ metrics. Classification accuracy population ethnicity all varied across sites. Each of these vari- significantly improved in sites for which longer BOLD imag- ables has the potential to decrease sensitivity and specificity of ing times were used, but no relationship was found between functional connectivity measurements for autism. Nevertheless, number of subjects contributed by a site and classification a multisite cohort helps test generalizability of the results accuracy. across different samples, making it more likely that connec- Classificationaccuracywaslowerinthismultisitestudydespite tions identified as discriminatory between autism and control its much larger sample size when compared with a prior study reflect disease properties rather than particulars of a single using similar methods from a single site (Anderson et al., dataset. 2011d). The prior study achieved ∼80% accuracy, with 90% Classificationaccuracyinthemultisitecohortvariedwiththe accuracy for subjects under 20 years of age in both a pri- subsetofconnectionsusedtoconstructtheclassifier.Thisfind- mary cohort and a replication sample of affected and unaf- ing reflected a tradeoff between improved accuracy when using fected individuals from multiplex families. Several reasons may more connections with decreased accuracy when including less explain this difference. Expanding a classifier to accommodate specific connections in the classifier. This result argues against multisite data necessarily involves dealing with many addi- a homogenous regional distribution of connectivity abnormal- tional sources of variance. The pulse sequence, magnetic field ities in autism in favor of a heterogeneous spatial distribu- strength, scanner type, patient cohort and recruitment pro- tion of connectivity disturbances that involves specific brain cedures, scan instructions (eyes open vs. closed vs. fixation), regions. Analysis of brain regions most affected in abnormal BOLD imaging length, age distribution, gender differences, and connections herein confirms the findings of previous reports: FrontiersinHumanNeuroscience www.frontiersin.org September2013|Volume7|Article599|7 Nielsenetal. Multisitefunctionalconnectivityclassification cautionary finding is relevant when attempting to identify the “optimal” set of connections for constructing candidate brain imagingbiomarkersforASD.Althoughspecificaffectedregions appear to have autism connectivity abnormalities, classifica- tion schemes using only a small number of connections are likely to suffer from the high variance in metrics for individual connections. This point is reinforced by a significant positive relation- ship between classification accuracy across sites and the length of BOLD imaging time per subject. Previous studies of test- retest reliability using functional connectivity MRI have shown that accuracy of results varies with one over the square root of BOLD imaging time (Van Dijk et al., 2010; Anderson et al., 2011c), with only moderate reproducibility when short BOLD imaging times such as 5min are used (Shehzad et al., 2009; Van Dijk et al., 2010; Anderson et al., 2011c). This relation- FIGURE6|Totalaccuracyacrossconnectionstrengthanddistance shipwouldsuggestthatclassifiersusinginformationfrommany betweenbrainregions.The26.4millionconnectionsweredividedupinto brainregionscontinuetoshowbenefitfrommuchlongerimag- binsbasedonthecorrelationstrengthoftheconnection(determinedbyan ing times, with continued improvements even after hours of independentsample)andthedistancebetweentheconnection’stwo endpoints.Accuracyisdisplayedforeachbinwithatleastoneconnection. imaging across multiple sessions per subject to the extent this is practical (Anderson et al., 2011c). Improvements in pulse sequence technology may also facilitate acquisition of greater areas of greatest abnormality included the insula, regions of numbers of volumes in shorter periods of time (Feinberg and the default mode network including posterior cingulate and Yacoub, 2012).The correlation between total imaging time and medial prefrontal cortex, fusiform and parahippocampal gyri, accuracywasmoresignificantthanthecorrelationbetweennum- WernickeArea(posteriormiddleandsuperiortemporalgyrus), ber of volumes used after scrubbing and accuracy. This might and intraparietal sulcus (Anderson et al., 2011a,d; Gotts et al., indicate that imaging time is more important than the num- 2012). All of these regions correspond to functional domains berofvolumesused.Asmultibandacquisitionprotocolsbecome that are known to be impaired in autism, including attention, more prevalent (Setsompop et al., 2012), it will be important language, interoception, and memory. We note that some of to determine the extent to which finer sampling vs. longer these regions are in brain areas with relatively high suscepti- imaging time will contribute to specificity of BOLD fcMRI bility artifact and sensitivity to changes in brain shape (such measurements. as the medial prefrontal cortex). However, given the coherent In a prior study that examined the effect of BOLD imag- distribution of the default mode network, we favor an inter- ing time on ability to identify functional connectivity values pretation of network-based differences attributable to autism obtained from a single individual compared to a group mean, rather than underlying structural or artifactual sources of these individual “connections” could only be reliably distinguished findings. after 25min of BOLD imaging time. The number of connec- When interrogating subsets of connections from an inde- tionsthatcouldbereliablydistinguishedincreasedexponentially pendent dataset based on the Euclidean distance between with imaging time for at least up to 10h of total imaging time ROIs and connection strength in a previous study, we found (Anderson et al., 2011c). Indeed, there is good theoretical basis that the most informative connections consisted of typically that any desired accuracy can be obtained with sufficient imag- strong connections between distant ROIs that were weaker in ing time, stretching into many hours. Although Van Dijk and autism,andtypicallynegativelycorrelatedconnections,thatwere colleagues report that the intrinsic connectivity measurements less negative in autism (less anti-correlated) (Anderson et al., stabilizearound5minofimagingtime,theyalsostatethatnoise 2011d). In the current study, the connection bins based on continuestodecreaseatarateof1/sqrt(n),wherenistheamount strength and distance that showed greatest classification accu- of imaging time (Van Dijk et al., 2010) (which is in accordance racy were not precisely the same connection bins found pre- withourfindingsfrom(Andersonetal.,2011c).Moreover,they viously. Rather, they were adjacent to the bins in the previous report that the stabilization is of composite network-level met- study. This is the case because the classification algorithm in rics rather than connections between small individual ROIs. In the current study takes advantage of larger numbers of con- contrast,wehavefoundthatcoarsenetwork-levelmeasurements nections. There was again a tradeoff between using more con- are not particularly informative in classification compared to nections, given that individual connections exhibited relatively fine-grainedmetricsthattakeintoaccountspecificdifferencesin little information, and using sets of connections that differed the spatial distribution of connectivity. There may be no upper more in autism. Thus, bins of medium strength connections limit for continued improvements if more imaging time were (0.3<z<0.5)outperformedthemorespecificbinsofstronger obtained. connections (z>0.5) because the slightly weaker sets of con- We found significant relationships between the classification nections included many more connections in the bin. This scoreandsomebehavioralmeasures,suchassocialfunctionand FrontiersinHumanNeuroscience www.frontiersin.org September2013|Volume7|Article599|8 Nielsenetal. Multisitefunctionalconnectivityclassification FIGURE7|Scatterplots depict the relationship between the algorithm score (B), verbal IQ (C), performance IQ (D), SRS total classifier scores for control subjects (black) and subjects with score (E), and Vineland Adaptive composite standard score (F). autism (red) and the following behavioral measures: ADOS-G Correlation coefficients and corresponding p-values are included on social + communication algorithm score (A), ADI-R social verbal the plots. daily living skills, however, the proportion of variance in the poor accuracy of the classification approach. As accuracy and behavioralmeasuresthatwasexplainedbythelinearrelationship techniquesforcombiningmultisitedataimproves,wealsoexpect betweentheclassificationscoreandthebehavioralmeasurewas an increase in the proportion of variance accounted for by the small (between 0.5 and 2.9%). This may be due to the overall correlations. FrontiersinHumanNeuroscience www.frontiersin.org September2013|Volume7|Article599|9
Description: