Stephan, K. E., Schlagenhauf, F., Huys, Q. J. M., Raman, S., Aponte, E. A., Brodersen, K. H., Rigoux, L., Moran, R. J., Daunizeau, J., Dolan, R. J., Friston, K. J., & Heinz, A. (2016). Computational neuroimaging strategies for single patient predictions. NeuroImage, 145(B), 180-199. https://doi.org/10.1016/j.neuroimage.2016.06.038 Publisher's PDF, also known as Version of record License (if available): CC BY Link to published version (if available): 10.1016/j.neuroimage.2016.06.038 Link to publication record in Explore Bristol Research PDF-document This is the final published version of the article (version of record). It first appeared online via Elsevier at http://dx.doi.org/10.1016/j.neuroimage.2016.06.038. Please refer to any applicable terms of use of the publisher. University of Bristol - Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/ NeuroImage145(2017)180–199 ContentslistsavailableatScienceDirect NeuroImage journal homepage: www.elsevier.com/locate/ynimg Computational neuroimaging strategies for single patient predictions K.E.Stephana,b,c,F.Schlagenhaufd,e,Q.J.M.Huysa,f,S.Ramana,E.A.Apontea,K.H.Brodersena,L.Rigouxa,c, R.J.Moranb,g,J.Daunizeaua,h,R.J.Dolanb,i,K.J.Fristonb,A.Heinzd,j aTranslationalNeuromodelingUnit(TNU),InstituteforBiomedicalEngineering,UniversityofZurichÐZurich,8032Zurich,Switzerland bWellcomeTrustCentreforNeuroimaging,UniversityCollegeLondon,London,WC1N3BG,UK cMaxPlanckInstituteforMetabolismResearch,50931Cologne,Germany dDepartmentofPsychiatryandPsychotherapy,CampusCharitéMitte,Charité-UniversitätsmedizinBerlin,10115Berlin,Germany eMaxPlanckInstituteforHumanCognitiveandBrainSciences,04130Leipzig,Germany fDepartmentofPsychiatry,PsychosomaticsandPsychotherapy,HospitalofPsychiatry,UniversityofZurich,Switzerland gVirginaInstituteofTechnology,USA hICMParis,France iMaxPlanckUCLCentreforComputationalPsychiatryandAgeingResearch,London,UK jHumboldtUniversitätzuBerlin,BerlinSchoolofMindandBrain,10115Berlin,Germany a r t i c l e i n f o a b s t r a c t Articlehistory: Neuroimagingincreasinglyexploitsmachinelearningtechniquesinanattempttoachieveclinicallyrelevant Received27November2015 single-subjectpredictions.Analternativetomachinelearning,whichtriestoestablishpredictivelinksbetween Revised21May2016 featuresoftheobserveddataandclinicalvariables,isthedeploymentofcomputationalmodelsforinferring Accepted20June2016 onthe(patho)physiologicalandcognitivemechanismsthatgeneratebehaviouralandneuroimagingresponses. Availableonline22June2016 Thispaperdiscussestherationalebehindacomputationalapproachtoneuroimaging-basedsingle-subjectinfer- ence,focusingonitspotentialforcharacterisingdiseasemechanismsinindividualsubjectsandmappingthese Keywords: characterisationstoclinicalpredictions.Followinganoverviewoftwomainapproaches–Bayesianmodelselec- Generativemodel fMRI tionandgenerativeembedding–whichcanlinkcomputationalmodelstoindividualpredictions,wereviewhow EEG thesemethodsaccommodateheterogeneityinpsychiatricandneurologicalspectrumdisorders,helpavoiderro- Bayesian neousinterpretationsofneuroimagingdata,andestablishalinkbetweenamechanistic,model-basedapproach Modelselection andthestatisticalperspectivesaffordedbymachinelearning. Modelcomparison ©2016TheAuthors.PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense Modelevidence (http://creativecommons.org/licenses/by/4.0/). Generativeembedding Classification Clustering Computationalpsychiatry Translationalneuromodeling Introduction However,indirectapplicationtoclinicalquestions,neuroimaging- directedmachinelearninghasmainlybeenusedtodiscriminatepatient Despiteitspotentialtoprovideanon-invasiveassayofwhole-brain groupsfromeachotherorfromhealthycontrols.Thishasbeenvariably function,neuroimaginghasexperiencedsurprisingdifficultiesindeliv- successful,asindicatedbythediverseoutcomesfromfMRI-basedclas- eringdiagnosticapplicationsforclinicalpractice,particularlyinpsychi- sification competitions. In these competitions, the accuracies of atry.Whilethereareseveralreasonsforthisdisappointingtrackrecord, neuroimaging-baseddiagnoseshaverangedfrompoor(e.g.,attention whichissharedbyotherapproacheslikegenetics(fordiscussionsand deficit hyperactivity disorder; Brown et al., 2012) to excellent reviews,seeCaseyetal.,2013;Kapuretal.,2012;KrystalandState, (e.g.,schizophrenia;Silvaetal.,2014).Moreimportantly,however, 2014;Stephanetal.,2015),theperceivedfailurehastriggeredimpor- theattempttoreplaceoraugmenttraditionalclinicaldiagnosticsbyap- tantdiscussionsaboutthemostpromisingavenuesforclinicalneuroim- plyingmachinelearningtechniquestoneuroimagingdataisastrategy aging.Oneparticularhopeisthattheinfluxofmethodsfrommachine oflimitedlong-termclinicalutility.Thisisbecauseamultitudeofphys- learningwillrealisethetranslationalpotentialofneuroimaging.Forex- iological,geneticandclinicalstudiesoverthepastdecadeshavemadeit ample,infMRI,multivariateclassificationandregressionschemeshave clearthatmentaldiseasesasdefinedbycontemporaryclassification recentlyyieldedimpressivesuccessesinclinicallyrelevantdomains schemes–suchastheDiagnosticandStatisticalManualofMentalDis- such as pharmacological or pain research (e.g., Duff et al., 2015; orders(DSM)ortheInternationalClassificationofDiseases(ICD)–are Wageretal.,2013). highlyheterogeneous.Thatis,disorderslikeschizophrenia,depression, http://dx.doi.org/10.1016/j.neuroimage.2016.06.038 1053-8119/©2016TheAuthors.PublishedbyElsevierInc.ThisisanopenaccessarticleundertheCCBYlicense(http://creativecommons.org/licenses/by/4.0/). K.E.Stephanetal./NeuroImage145(2017)180–199 181 autismetc.grouppatientswithsimilarclustersofsymptomsandsigns numberofkeyfeatures,whichwediscussindetailbelow.Inshort,com- thatarecausedbydiversepathophysiologicalmechanisms(Cuthbert putationalapproaches(i)allowonetoinvokemodelcomparisonproce- andInsel,2013; Kapuretal.,2012;Krystal andState,2014; Owen, duresforclarifyingwhetheranyneurophysiologicaldifferencesamong 2014;Stephanetal.,2016).Thispathophysiologicaldiversityexplains patientssignalrealdifferencesinpathophysiology,orsimplyreflectdif- whypsychiatricdiagnosesbasedonDSM/ICDhavelittlepredictiveva- ferentcognitivestrategies;(ii)providetheory-drivendimensionality lidity;thatis,withfewexceptions(suchasdifferentiatingmono-and reduction;and(iii)cansupportpowerfulsingle-subjectpredictions bipolaraffectivedisorders)theydonotinformtheclinicianaboutindi- basedoninferredmechanisms,asopposedtopatternsofdatafeatures. vidualclinicaltrajectoriesortreatmentresponses.Asaconsequence, Thispaperprovidesanoverviewofcomputationalneuroimaging evenifhighlyaccurateDSM/ICDdiagnosescouldbederivedfromma- strategiesforsingle-subjectpredictions,independentlyof–orincon- chinelearningclassifiersappliedtoneuroimagingdata,thiswouldsim- junctionwith–machinelearningtechniques.Weattempttoillustrate ply recapitulate a diagnostic scheme that does not directly inform centralconceptsofgenerativemodelling,andtheclinicalutilitythey clinicalmanagement–andwoulddosousingaconsiderablymoreex- mayafford.Tothisend,wefocusonthegeneralformofmodelclasses; pensiveandlesswidelyavailabletechnologycomparedtoclassicalpsy- bycontrast,wedonotdiscussmathematicalpropertiesofanysingle chiatricinterviews. modelindetail,butreferthereadertotherelevantliterature.One Thisisonereasonwhytheapplicationofmachinelearningtoneuro- areawhichrequiresaslightlymoredetailedmathematicaldiscussion imagingdatahaschangeddirectioninrecentyearsandisnowbeingin- istheframeworkofBayesianmodelcomparison.Evenhere,however, creasingly applied to problems more directly related to clinical we restrict our treatment to interpreting the general form of key management,suchaspredictingindividualdiseasecourseortreatment equations. efficacy.Thishasshownsomepromisingrecentresults,indicatingthat Thispaperhasthefollowingstructure.First,forreaderswithout itmaybecomepossibletopredictindividualtrajectoriesofpatients muchbackgroundincomputationalneuroimaging,weprovideabrief withschizophrenia(Anticevicetal.,2015)ormooddisorders(Lythe overviewofexistingapproaches,clarifysomenomenclature,andrevisit etal.,2015;Schmaaletal.,2015)fromneuroimagingdata,orforecast someofitsprevioussuccesses.Second,wediscusstheimportanceof individual treatment responses to psychotherapy (Mansson et al., modelcomparisonfordealingwithheterogeneityacrossindividuals 2015),antidepressants(DeBattistaetal.,2011;McGrathetal.,2013; andintroducetheprinciplesofBayesianmodelselection(BMS)and Milleretal.,2013)andantipsychotics(Hadleyetal.,2014;Nejadetal., Bayesianmodelaveraging(BMA).Third,weoutlinehowclinicalpredic- 2013). tionscanbederivedfromcomputationalmodels,(i)illustratingtheuse Theseareremarkablesuccessesandraisehopesthatneuroimaging ofBMSwhentheoriesofdiseasemechanismsexistand(ii)introduc- may finally contribute to clinical decision-making in the not-too- inggenerativeembeddingasalinkbetweencomputationalmodel- distantfuture.However,thestraightforwardapplicationofmachine lingandmachinelearning,whendisease(process)theoriesarenot learningtoneuroimagingdatafacesanumberofnon-trivialtechnical available.Moreover,weoutlinehowBMSandgenerativeembedding andconceptualchallengesthatmayimpedetheirlong-termclinical canbedeployedinanunsupervisedorsupervisedfashioninorderto utility(fordiscussionsandreviews,seeBrodersenetal.,2011;Klöppel addressproblemsrelatedtonosology(i.e.,detectingsubgroupsin etal.,2012;Orruetal.,2012;Wolfersetal.,2015).Onecentralchallenge heterogeneous disorders), differential diagnosis, and outcome isthatneuroimagingdataarenoisyandveryhigh-dimensional,pre- prediction. sentingwithamyriadofdatafeaturesthatcouldinformprediction.Di- mensionality reduction and optimal feature selection thus become Computationalneuroimaging–what,whyandhow? criticalproblems.Oneinterestingdevelopmentinthisregardconcerns recent advances in feature extraction methods based on restricted Theterm“computational”originallyderivesfromthetheoryofcom- Boltzmannmachines(Hjelmetal.,2014)anddeepneuralnetworks putation,asubfieldofmathematicsthatexamineswhichparticular (Plisetal.,2014),whichoffernovelrepresentationsofdiseasestates functionsarecomputable.Afunctioniscomputableifitrepresentsa withpotentialdiagnosticopportunities. mapping(fromaninputtoanoutputset)thatcanbeimplementedby Second,hemodynamicandelectrophysiologicalmeasurementsrep- analgorithm;i.e.,awell-definedsequenceofoperations.Inneurosci- resentdistalandpotentiallycomplicatedtransformsofunderlyingneu- ence,theterm“computational”hasbeenusedquiteflexibly,ranging ronalmechanisms.Thismeansthatconventionalmachinelearning fromanemphasisoninformationprocessing(irrespectiveofitsbio- methods,whichoperatedirectlyonobservedfeaturesofneuroimaging, physicalimplementation),toverybroadusage,encompassinganyalgo- donotfurnishmechanisticinsightsintopathophysiology.Thiscanbeil- rithmic investigation of neuronal systems (cf. “computational lustratedwiththreeexamples.First,inmultivariateclassificationstud- neuroscience”),incontrasttoanalyticalmathematicaltreatments. iesoffMRI,eventhoughthespatialdistributionofinformativevoxels Inneuroimaging,threemaincomputationalapproachesarepresent- canbedetermined,thisdoesnotdiscloseaconcretebiologicalprocess. lybeingpursued(forreview,seeStephanetal.,2015).Theseinclude Similarly,unsupervisedlearningapproachesthatexploitmultimodal biophysicalnetworkmodels(BNMs),generativemodels,and“model- imaging measures can obtain compelling subgroup delineations basedfMRI”.BNMsarelarge-scalenetworkmodelswhosenodesrepre- (Ingalhalikaretal.,2014)butremaindescriptiveanddonotoffera sentmeanfield(orneuralmass)modelsofneuronalpopulationactivi- mechanisticlinkbetweenthestructuralandfunctionalcomponentsof ty.Augmentedwithahemodynamicorelectrophysiologicalforward any identified predictor. Finally, while functional connectivity model,neuronalpopulationactivityistranslatedintoapredictedfMRI (i.e.,statisticaldependenciesbetweenregionaltimeseries)hasenabled or EEG signal (for reviews, see Deco et al., 2013a; Deco and successfulmachinelearningapplications(Arbabshiranietal.,2013; Kringelbach,2014;WangandKrystal,2014).Byconnectingtheindivid- Craddocketal.,2009;Duetal.,2015;Richiardietal.,2011;Rosaetal., ualnodesinaccordancewithanatomicalconnectivitydata–obtained 2015),itscharacterisationofneuronalprocessesisrestrictedtostatisti- fromhumandiffusion-weightedimagingorMacaquetracttracingstud- calcorrelationsthatareagnosticaboutthephysiologicalcausesofnet- ies–thedynamicsoflarge-scalenetworksandensuingwhole-brain work dynamics. In general, machine learning applied to “raw” neuroimagingsignalscanbesimulated.Whiletheneuronalstateequa- neuroimagingdatadoesnoteasilyidentifymechanismsfromwhich tionsofBNMscanberichinbiophysicaldetail,theircomplexityrenders noveltherapeuticapproachescouldbederived. parameterestimationverydifficult,andcurrentmodelsallowonlythe Analternativetomachinelearningistheuseoftheory-drivencom- estimationofasingleglobalscalingparameterofconnectivity(Deco putationalmodelstoinferpathophysiologicalmechanismsinindividual etal.,2013b).Forthisreason,thispaperfocusesontheothertwoclasses patients(Fristonetal.,2014;Huysetal.,2016;MaiaandFrank,2011; ofmodels,generativemodelsandmodel-basedfMRI.Theserestonless Montagueetal.,2012;StephanandMathys,2014).Thisstrategyhasa complexandfine-grainedformulations,butallowforestimatingmodel 182 K.E.Stephanetal./NeuroImage145(2017)180–199 Fig.1.Summaryofagenerativemodelforneuroimagingdata.Thisfigurecontainsgraphicsthatarereproduced,withpermission,fromChenetal.(2009),Garridoetal.(2008),and Stephanetal.(2003). parametersfrommeasureddata.Inthefollowing,wesummarisethe lowercaseitalics;sets,functionalsandquantitiesfromprobabilitytheo- principlesofthesemodelsandclarifysomeofthetechnicaltermsand ry(suchasinformation-theoreticsurpriseorfreeenergy)byuppercase conceptsinvolved.Thefollowingconventionswillbeusedforequa- italics, vectors by lowercase bold, and matrices by uppercase bold tions:functions,distributionsandscalarvariablesarerepresentedby letters. Fig.2.OverviewofDCMforfMRI.Reproduced,withpermission,fromStephanetal.(2015). K.E.Stephanetal./NeuroImage145(2017)180–199 183 Generativemodelsofneuroimagingdata inference schemes. These comprise two main approaches: Markov Instatistics,a“generativemodel”isdefinedbythejointprobability chainMonteCarlo(MCMC)samplingandvariationalBayes(VB);for overallrandomvariables(e.g.,observeddataandmodelparameters) in-depthdiscussionsseeMacKay(2003)andBishop(2006).MCMCis thatdefineasystemofinterest.Moreintuitively,onecanthinkofagen- computationallyexpensiveandcanrequireverylongruntimesbutis erativemodelasdescribinghowobserveddataweregeneratedand guaranteedtoconvergetothecorrectsolution(inthelimitofinfinite henceviewingitasa“recipe”forgeneratingsimulateddata.Agenera- time).Ontheotherhand,VBiscomputationallyveryefficientbutissus- tivemodelisspecifiedbydefiningtwocomponents:alikelihoodfunc- ceptibletolocalextremaandcanbeaffectedbyviolationsofdistribu- tion and a prior density. The likelihood function rests on a tionalassumptions(Daunizeauetal.,2011). probabilisticmappingfromhiddenquantities(parametersθ)ofthesys- Inthispaper,wefocusonthefirstandmostwidelyusedgenerative temofinteresttoobservablequantities(measurements)y: modellingframeworkforneuroimagingdata,dynamiccausalmodelling (DCM).ThisapproachwasintroducedadecadeagoforfMRI(Friston y¼fðθÞþε ð1Þ etal.,2003).DCMforfMRIusesdifferentialequationstodescribethedy- namicsofneuronalpopulationstatesx(t)thatinteractviasynapticcon- Thissimplysaysthatthedata(feature)vectoryoriginatesfrom nectionsandaresubjecttoexperimentallycontrolledperturbations sometransformationf,whichencodesaputativesignal-generatingpro- u(t).Theseperturbationscaneitherinduceneuronalpopulationactivity cess,plusstochasticnoiseε.WedeliberatelywriteEq.1inthisform,asit directly;e.g.,intermsofsensorystimulation(“drivinginput”)ordy- willprovideausefulreferenceforthedistinctionbetweenfeatureselec- namicallymodulatethestrengthsofsynapticconnections.Theformof tionmethodsthatdoordonotexploitknowledgeaboutthehidden theseneuronalstateequationsisgivenbyalow-order(Taylor)approx- causesofmeasurements(seesectiononGenerativeEmbeddingbelow). imationtoanynonlinearsystem(Fristonetal.,2003;Stephanetal., Thefunctionfencodeshowsystemparametersdetermineitsoutput 2008),wherethestrengthsofsynapticconnectionsandweightsofdriv- (signal);thiscanrangefromextremelysimpleconcepts(e.g.aconstant ingandmodulatoryinputsrepresenttheparametersofinterestwewish termdescribingmeansignal)tocomplexfunctions;e.g.,abiophysically toinferbymodelinversion(seeFig.2).1 motivateddynamicalsystem,asinthecaseofdynamiccausalmodelling Theactivityofeachneuronalpopulationiscoupledtoaregional (DCM,seebelow).Eq.1cannowbeusedtospecifythelikelihoodfunc- bloodoxygenleveldependent(BOLD)signalbyacascadeofdifferential tionasquantifyingtheprobabilityp(y|θ)ofobservingaparticularmea- equationsdescribinghemodynamicprocesses,suchaschangesinblood surement y, given a particular parameterisation of the system. For flowandbloodvolume(Fristonetal.,2000;Stephanetal.,2007).2Tech- example,assumingidenticallyandindependentlydistributedGaussian nically,thismeansthatDCMrepresentsahierarchicalgenerativemodel, noiseε,thelikelihoodcanbewrittenas: wherethelikelihoodfunctionispartitionedintodeterministicdynamic (cid:1) (cid:3) stateequationsfofhiddenneuronalandhemodynamicprocesses(with pðyjθÞ¼N y;fðθÞ;σ2I ð2Þ neuronalandhemodynamicparametersθ ,θ )andastaticobservation n h functiongthatimplementsdiscretesamplingandaccountsformea- whereIdenotestheidentitymatrixandσ2noisevariance. surementnoiseε(forthedetailedequations,seeFig.2andFriston Thepriordensityencodestherangeofvaluestheparametersareex- etal.,2003;Stephanetal.,2007): pectedtotakeapriori,i.e.,beforeanydataareobserved.Again,under Gaussianassumptionswecanexpressthisas: dx Neuronalstates: n¼ f ðx ;u;θ Þ pðθÞ¼Nðθ;μθ;ΣθÞ ð3Þ ddtx n n n ð5Þ Hemodynamicstates: h¼f ðx ;x ;θ Þ whereμθ,Σθdenotepriormeanandpriorcovariance,respectively. Measurements: ydt¼gðxhÞþn εh h h Togeneratedata,onecouldsimplysamplefromthepriordensity andplugtheensuingparametervaluesintothelikelihoodfunction. Notably,intheseequations,noiseonlyentersattheobservation Thisapproachtosimulatingdataisverygeneral,andtraditionalnon- probabilisticsimulationsincomputationalneuroscience(withfixedpa- levelwhereasneuronaldynamicsunfoldsinadeterministicfashion, givenexternalperturbationsandsystemparameters(e.g.,synapticcon- rameters)canberegarded asaspecialcase of agenerative model, nectionstrength).Thatis,inEq.5,thedynamicvariablesofinterest wherethepriordensityreducestoaDiracdeltafunction(pointmass) (neuronalandhemodynamicstatesx ,x )aredeterministicfunctions overthechosenparameters. n h ofdesignedandknowninputsuandoftime-invariantneuronal(θ ) Usingtheprobabilisticmappingfromhiddenparametersofthesys- n temtoobservedsignalsinthe“forward”directionisveryusefulforsim- andhemodynamic(θh)parameters.(Foreaseofreading,wehaveomit- tedexplicitreferencestotime.)Thismeansthatweonlyneedtoinfer ulatingobservableresponses,underdifferentparametervaluesand theparameters–thestatetrajectoriesfollowautomaticallyfromanyin- exploringsystembehaviour.Inneuroimaging,however,wewishto ferredparametervalues.Thiswouldbedifferentifstochasticdifferential proceedintheoppositedirection;i.e.,estimatetheparametervalues equationswerechosen;inthiscase,thestatesarenotfullydetermined fromobserveddata.Thisreversemappingisequivalentlyreferredto as“modelinversion”,solvingthe“inverseproblem”,orsimply“infer- bythechoiceofparametersandwouldneedtobeinferred,inaddition ence”.Formally,thiscorrespondstocomputingtheposteriorprobability totheparameters.ThisisknownasstochasticDCM(Daunizeauetal., p(θ|y)=N(θ;μθ|y,Σθ|y)oftheparameters,giventhedata(Fig.1).This 2009;Daunizeauetal.,2012;Lietal.,2011). Byspecifyingplausiblepriordensitiesovertheneuronalandhemo- followsdirectlyfromBayestheorem: dynamicparameters(seeFristonetal.,2003fordetails),thegenerative pðyjθ;mÞpðθjmÞ modeliscompleted.Invertingthisgenerativemodelallowsonetoinfer pðθjy;mÞ¼ ð4Þ pðyjmÞ theneuronalparametersofinterest(e.g.,couplingstrengthsandtheir modulationbyexperimentalconditions)fromempiricallymeasured Here,wehavemadethedependencyonachosenmodelstructurem fMRI data. Notably, the separate representation of neuronal and explicitbyconditioningalltermsonm.Thepracticaldifficultyisthatde- rivingtheterminthedenominatorrequirescomputinganintegral(see 1 SimulationsthatprovideanintuitionofneuronaldynamicsaccountedforbyDCMand Eq.9)whichisusuallynotanalyticallytractable,exceptforsomevery illustratehowdifferentparametersimpactontheresultingsignalscanbefoundinseveral previouspapers;forexample,seeFig.1inPennyetal.(2004b)andFig.2inStephanetal. simplecases.Unfortunately,evennumericalintegrationisrarelyfeasi- (2008). ble,sincecomputationtimeincreasesexponentiallywiththenumber 2 Forsimulationsillustratingthenatureofthishemodynamicmodel,seeFigs.3,8,9in ofmodelparameters.Inpractice,onehastoresorttoapproximate Fristonetal.(2000)andFig.5inStephanetal.(2007). 184 K.E.Stephanetal./NeuroImage145(2017)180–199 hemodynamicmechanismsiscrucialforgenerativemodelsoffMRI; Themodel-basedfMRIapproachwaspioneeredbyO'Dohertyand sincevariabilityinneurovascularcouplingacrossregionsandsubjects colleagues(O'Dohertyetal.,2003)whousedatemporaldifference canotherwiseconfoundinferenceonconnectivity(Davidetal.,2008). (TD)learningmodeltoshowthatphasicactivationoftheventralstria- FollowingitsintroductionforfMRI,DCMhasbeenextendedtoother tum,amajorrecipientofdopaminergicprojectionsfromthemidbrain, neuroimagingmodalities,includingevent-relatedpotentials(ERPs; correlatedwiththemagnitudeoftrial-wiserewardPEsduringanin- Davidetal.,2006),inducedresponses(Chenetal.,2008),andspectral strumentalconditioningtask.Thiswasmotivatedbytheseminalwork responses(Moranetal.,2009),asmeasuredbyelectroencephalography ofSchultz,Dayan,andMontaguewhofoundthatrewardPEscorrelated (EEG)andmagnetoencephalography(MEG).DCMhasfoundwide- withphasicactivityofdopaminergicmidbrainneurons(Schultzetal., spreaduseforanalysisofeffectiveconnectivitybetweenneuronalpop- 1997)andthatchangesinthisphasicactivityduringlearningcouldbe ulationsandhasfurnishedinsightsintocircuit-levelmechanismsthat predicted under the formalism of a TD learning model (see also eluded previous schemes.This includes,for example, physiological (Montagueetal.,2004)forreview). characterisationsofpredictivecodingincorticalhierarchiesduringper- Model-basedfMRIhassubsequentlybeenappliedtonumerousdo- ceptualinferenceandlearning,bothinhealthysubjects(e.g.,denOuden mainsofcognition,accommodatingadiversityofmodellingapproaches etal.,2009;Garridoetal.,2008;Summerfieldetal.,2006)andinpa- andcomputationalthemes,suchasreinforcementlearning(RL)models tients with schizophrenia (Dima et al., 2010; Dima et al., 2009; ofbehaviourandBayesianmodelsofcognition(e.g.,(D'Ardenneetal., Ranlundetal.,2015)oralteredlevelsofconsciousnessduetobrain 2013;Dawetal.,2006;Iglesiasetal.,2013;Klein-Flüggeetal.,2011; damage.Beyondlong-rangeconnectivity,DCMhasalsoprovenuseful Schwartenbecketal.,2015;Seymouretal.,2004;Vosseletal.,2015). forinferringdetailed,low-levelphysiological(synaptic)mechanisms Arecentapplicationofmodel-basedfMRIhasbeentheinvestigation withinlocalneuronalcircuitsofthehumanbrain,exploitingtherich of interactions between learning and decision-making processes temporalinformationcontainedbyEEG/MEGdata.Examplesinclude whichdoordonotderivefromanexplicitmodeloftheenvironment thedetectionofconductancechangesinAMPAandNMDAreceptors ortaskstructure.Thisdistinctioniscommonlyreferredtoas“model- underdopaminergicmodulation(Moranetal.,2011),changesinpost- based”vs.“model-free”computations(e.g.,Dawetal.,2011;Deserno synaptic gain of supragranular pyramidal cells in auditory cortex etal.,2015;Gläscheretal.,2010;Huysetal.,2012);wherethelatter undercholinergicstimulation(Moranetal.,2013),orthecharacterisa- terminducesaterminologicaltwistinthecontextofmodel-basedfMRI. tionofchangesinneuronalphysiologyinindividualswithselectivemu- Model-basedfMRIinhumanshasproducedresultsofhighrelevance tationsofparticularionchannels(Gilbertetal.,2016). for pathophysiological theories, corroborating, for example, hypothesisedlinksbetweentrial-by-trialactivityinneuromodulatory Model-basedfMRI nuclei and the trajectories of specific computational quantities Generativemodelsalsoplayacentralroleinthesecondcomputa- suggestedbytheoreticalaccountsand/oranimalexperiments.Promi- tionalapproachconsideredinthispaper,“model-basedfMRI”.Howev- nent examples include the encoding of reward PEs and precision er,incontrasttothepurelyphysiologicalDCMsdescribedabove,this (inverseuncertainty)byphasicandtonicchangesinactivitylevelsof approachaskswhethera(particularcomponentofa)computational thedopaminergicmidbrain(e.g.,D'Ardenneetal.,2013;Klein-Flügge processisreflectedinBOLDsignals(GläscherandO'Doherty,2010; etal.,2011;Schwartenbecketal.,2015),orthereflectionofexpected O'Dohertyetal.,2003).Inotherwords,ittriestoexplainvoxel-wise andunexpecteduncertainty(YuandDayan,2005)byactivityinthe BOLDsignalsasalinearmixtureofcomputationalprocesses,which cholinergicbasalforebrain(Iglesiasetal.,2013)andnoradrenergic areassumedtobedirectlyencodedbytheunderlyingneuronalactivity. locuscoeruleus(Payzan-LeNestouretal.,2013).Model-basedfMRI Thesameapproachcanbeapplied,ofcourse,toM/EEGresponses;for has also been applied to patients, for example, in depression example,inorderexplaintrial-by-trialamplitudesorwaveformsof (Dombrovskietal.,2013;Gradinetal.,2011)andaddiction(Harle eventrelatedpotentials(Liederetal.,2013;Ostwaldetal.,2012).How- etal.,2015;Tanabeetal.,2013).Perhapsmostnotably,model-based ever,givenitsdominanceinthepresentcomputationalneuroimaging fMRI studies of patients with schizophrenia (Gradin et al., 2011; literature,weherefocusentirelyonmodel-basedfMRI. Murrayetal.,2008;Romaniuketal.,2010)havecontributedempirical Model-basedfMRIrestsonatwo-stepprocedure(seeFig.2A).First, evidence for the long-standing hypothesis that disturbances in PE agenerativemodelofbehaviouralresponses,withcomputationalstates signalling by dopaminergic neurons in the midbrain might assign x (e.g.,trial-wisepredictionerrors(PEs)oruncertainty)andparame- “aberrant salience” to environmental events (Heinz, 2002; Kapur, c tersθ,areestimatedusingthemeasuredbehavioury ofanindividual: 2003). c b yb¼gðxc;θcÞþεb ð6Þ Hybridandunifiedmodels Therelationbetweenthebehaviouralandneuroimagingdomainsof Byinvertingthismodel,thecomputationalstatesxccanbeinferred model-based fMRI is summarised schematically in Fig. 3A. This (dottedlinemarkedwith1inFig.2A).Thesubsequentconvolutionwith highlights the fact (expressed by equations 6 and 7 above) that astandardhemodynamicresponsefunction(HRF)thenprovidesex- model-basedfMRIessentiallyrepresentsacorrelationalapproachbe- planatoryvariablesorregressorsforastandardmass-univariateGLM tweentwotypesofmeasurements,eachofwhichhasitsowngenera- analysisofvoxel-wisefMRIdata: tiveprocess.Inthelongrun,unifiedmodelsmaybedevelopedthat y ¼ðx ⊗HRFÞβþε ð7Þ explainbothneuroimagingsignalsandbehaviourofagivenindividual fMRI c fMRI fromthesameunderlyingstateequation(i.e.,neuronalprocess).Forex- ample,thiscouldbeastateequationdescribingthebiophysicalimple- TheparametersofthisGLM,β,cannowbeestimatedinasecondin- mentationofrelevantcomputationsinacircuitofinterest;thiswould ferencestep(dottedlinemarkedwith2inFig.2A),eitherunderflat requirebothamappingfrom(hidden)neuronalstatestobehavioural priors(i.e.,maximumlikelihoodestimation)orusingempiricalBayes- observationsthatconsidersthebiophysicalimplementationofrelevant ianprocedures(FristonandPenny,2003).Overall,thistwo-stepproce- computations(e.g.,predictivecoding),andamappingfromcircuitstate dureenablesonetosearch,acrossthewholebrain,fortheneuronal toneuroimagingdatadescribinghowneuronalactivitytranslatesinto correlatesofcomputationalvariablesofinterestwhichhadbeenin- measurablesignals(Fig.3B).Thiswouldallowonetoinfercircuitpa- ferredfromthesimultaneouslymeasuredbehaviour.3 rametersofinterest,simultaneouslyfrombothmeasuredbehaviour andneuroimagingdataandprovideamechanistic(andquantitative) 3 Foraveryinstructiveoverviewwithbothsimulationsandempiricalresults,pleasesee characterisation ofneuronalprocessingthatwasgroundedbothin Figures1and2inGlascherandO'Doherty(2010). termsofphysiologyandcomputationalfunction.Moreover,unified K.E.Stephanetal./NeuroImage145(2017)180–199 185 Fig.3.A.Summaryofthetwo-stepprocedurein“model-based”fMRI.Greyplatesdenoterandomvariables;lightblueellipsesrepresentobservedvariables(measurements);darkerblue diamondsrepresentstates(whichfollowdeterministically,giventheparametersandinputs).Solidarrowsrepresentdependenciesamongvariables;dashedarrowsrepresentinference.B. Summaryofaunifiedmodelinwhichbothbehaviouralandneuroimagingdataarepredictedfromthesameunderlyingstateequation.Inthismodel,theunknownparameterscanbe inferredinonestep,basedonbothbehaviouralandneuroimagingdata.Seemaintextfordetailsanddefinitionofvariables. modelsofthissortcanhelpidentifyingwhichpartsofacircuitarepar- (seetheirFigs.3–5forsimulationsthatdemonstratetheprinciplesof ticularlycriticalformaladaptiveactions.Thismightbeparticularlyuse- thismodel).Whilethisrelativelycoarsestateequationonlyallowsfor ful when addressing the problem of multiple potential causes or relativelysimplisticrelationsbetweencircuitdynamicsandbehaviour, strategiesunderlyinganobservedbehaviour;cf.(Schlagenhaufetal., theconceptualadvanceofthismodelisconsiderablesinceithelpsiden- 2014). tifyingwhichpartsofthenetwork(nodesorconnections)arecrucialfor Asignificantsteptowardssuchaunifiedmodelhasbeenmadere- funnellinginputs(stimuliortaskinstructions)intobehaviouraloutputs. cently(RigouxandDaunizeau,2015).Theyproposedamappingfrom Priorworktowardsintegratinggenerativemodelsofneurophysiolo- asinglestateequationofcircuitdynamics(basedontheformalismof gyandcomputation(informationprocessing)havemainlyexamined DCMforfMRI)tosimultaneouslyacquiredbehaviourandfMRIsignals thenotionofPEsas“teachingsignals”thatregulatetheamountof Fig.4.Illustrationthatmodelselectioncanprovideaformalbasisfordifferentialdiagnosis.Here,therelativeplausibilityofasetofcompetingmodels,representingalternativemechanisms howtheobserveddatacouldhavebeengenerated,isevaluatedintermsoftheposteriormodelprobability.Inthetypicalcaseofaflatprioronmodelspace,thelatterisidenticaltothe modelevidence(seemaintextfordetails). 186 K.E.Stephanetal./NeuroImage145(2017)180–199 (Adamsetal.,2013;Stephanetal.,2009a):(i)alteredprefrontalinputs thattargetmidbrainneuronsviaNMDAreceptors;(ii)enhancedinputs fromcholinergicbrainstemnuclei(PPT/LDT),or(iii)alteredautoregula- tionofdopaminergicmidbrainneurons(byparacrinereleaseofdopa- mineandactivationofdopaminergicautoreceptors).Disambiguating betweenthesepossibilities,bycomparingmodelsthatembodythe abovemechanisms(givenmeasurementsfromthemidbrainandthe areasitcommunicateswith),wouldhavetremendousrelevanceforde- lineatingschizophreniaintopathophysiologicalsubgroups–andfor guidingindividualtreatmentdecisions. Inwhatfollows,weunpackthestatisticalbasisofdifferentialdiag- nosisbymodelselection.WehopetofamiliarisethereaderwithBayes- ian techniques for comparing and selecting models that are used frequently in the currentliterature andprovide apowerful way to deal with individual variability in physiology and/or computation. ThesetechniquesareequivalentlyreferredtoasBayesianmodelcom- parison(BMC)orBayesianmodelselection(BMS);whiletheformeris themoregeneralterm,thelatterdescribesthecommonsituationof selectingasingle(mostplausible)modelfromasetofalternatives.4 Fig.5.Anillustrationofthetrade-offbetweenmodelfitandmodelcomplexity,andan exampleofoverfitting.Here,modelsofincreasingcomplexityarefittedtodatathat Modelevidence weregeneratedfromanexponentialfunction,plusaddedobservationnoise.Itcanbe Generally,thefirststepofanymodel-driven(hypothesis-led)inves- seenthatahighlycomplexmodelfitsthedataperfectlybut,becauseitistryingto tigationistodecidewhichclassofexplanationaccountsfortheobserva- explainthenoiseaswell,makespredictions(suchasthepronouncedbumpsinthe tions.Thisiswhatallscientistsimplicitlydowhentestinghypotheses– middleofthedataseries)whichwillnotgeneraliseacrossfutureinstantiationsofthe althoughthisstepmightnotalwaysrepresentanexplicitchoice.Tech- datafromthesameunderlyingprocess(“overfitting”).Reproduced,withpermission, fromPittandMyung(2002). nically,hypothesistestingormodelcomparisoncorrespondstodefining ahypothesissetormodelspaceMofcompetingexplanationsthatare deemedplausibleapriori.Thisisequivalenttospecifyingapriorover synapticplasticityneededtoupdateneuralcircuitsduringlearning. models;where,typically,allmodelswithinMareconsideredequally SpecificallydenOudenetal.(2009,2010)demonstratedthatshort- likelyandallotherpossiblemodelshavezeropriorprobability: termplasticityduringsensorylearningcouldbemeasuredbyinferring (cid:4) howeffectiveconnectionstrengthsweremodulatedbytrial-by-trial pðmÞ¼ 1=jMjifm∈M ð8Þ predictionerrorsobtainedfromRLandhierarchicalBayesianmodels, 0ifm∉M respectively.Asimilardemonstrationwasprovidedinthecontextof learningunderaversiveoutcomes(Royetal.,2014).Mostrecently, (Here,|M|referstothecardinalityofthehypothesisset.)Thechal- Vosseletal.(2015)showedhowattentionalshiftswereaccompanied lengethenistofindthemodelmwithinMthatprovidesthebestexpla- bychangesincortical-subcorticalnetworkconnectivitythatevolvedac- nationoftheobserveddata.Importantly,selectingamodelthat“best cordingtotrial-wiseestimatesofcertainty(precision)oftargetpredic- explains”thedataisnotsimplyastatementaboutmodelfit.Indeed,it tions,wherethelatterwereinferredfromsaccadiceyemovementdata istrivialtofind,foranydataset,modelswithexcellentorevenperfect usingahierarchicalGaussianfilter(Mathysetal.,2011). fit;forexample,foranyobservationconsistingoftdatapoints,apolyno- mialfunctionofordert−1willfitthedataperfectly.Theseoverlyaccu- Bayesianmodelselection rate models simply explain noise or random fluctuations that are specifictotheparticularmeasurementanddonotgeneralisetoother AsoutlinedintheIntroduction,manyconventionallydefinedneuro- (e.g.,future)measurementsofthesameprocess.Thistendencyofan logicalandprobablyallpsychiatricdiseasesarehighlyheterogeneous: overlyflexiblemodeltorecognisespuriouspatternsinnoiseisreferred patientswithsimilarsymptomsandbehaviourmaydifferconsiderably toas“overfitting”(seeFig.5ofthispaperandFig.1.4ofBishop(2006) intermsof(patho)physiologicalmechanismsand/orcognitiveprocess- forexamples).Ontheotherhand,thesimplestmodelpossible,which es.Acentralgoalforneuroimagingapproacheswiththeambitionof wouldconsistofaconstanttermonlyandexplainsnosignalvariance clinicalutilityisthustoidentify,inanygivenindividualpatient,the (i.e.,R2=0),canindeedbethebestexplanationofatimeseries– mostlikelymechanismthatunderliesaparticularobservation(brain whenthetimeseriescontainsnosignalandonlynoise.Insummary, activitypattern).Thisissimplythechallengeofdifferentialdiagnosis, measuresoffitaloneareinadequatetojudgemodelgoodness(Pitt whichisubiquitousthroughoutmedicine.Differentialdiagnosismaps andMyung,2002).Instead,thechallengeistoselectmodelsthatgener- directlyontohypothesistestingwhich,inparametricstatistics,corre- alisebest;thesearethemodelsthatprovideanoptimalbalancebe- spondstotheformalcomparisonofdifferentmodelsofhowobserved tweenfit(accuracy)andcomplexity. datacouldhavebeengenerated(Fig.4).Inotherwords,anelegantap- ThisbalanceisimplicitintheBayesianmodelevidenceusedduring proachtoestablishingdifferentialdiagnosesinpsychiatrybasedonneu- Bayesianmodelselection(BMS).Themodelevidenceistheprobability roimaging would be to formalise competing pathophysiological ofobservingthedataygiventhemodelm.Thisprobabilityisalsore- theoriesintermsofalternativegenerativemodels.Therelativeplausi- ferredtoasthemarginalorintegratedlikelihoodandcorrespondsto bilityofthesemodels(hypotheses)wouldthenbeevaluatedbyformal thedenominatorfromBayestheorem(seeEq.4).Itcanbecomputed modelcomparisonprocedures,givenempiricalmeasurementsofneu- roimagingand/orbehaviour. 4 WhilethispaperfocusesontheconceptualandmathematicalfoundationsofBMS, Asaconcreteexample,manypathophysiologicalconceptsofschizo- previouspapershaveprovidedtoyexamples(simulations)andstep-by-stepBMSanaly- phreniaconvergeonthenotionofdysregulationofdopaminergicmid- sesofsinglesubjectdatawhichmaybeusefulfortheinterestedreader.Forexample,for simulations,pleaseseeFigures4–6andTables2–5inPennyetal.(2004a)andFigure2 brain neurons in patients with schizophrenia (Heinz, 2002; Kapur, inStephanetal.(2009b);fordetailedsinglesubjectBMSanalyses,pleaseseeFigures7– 2003;Kingetal.,1984;Winton-Brownetal.,2014).Thisdysregulation 9andTables6–13inPennyetal.(2004a)andFigures3–4andTable1inStephanetal. couldbecausedbyatleastthreedifferentmechanisms(fordetails,see (2005). K.E.Stephanetal./NeuroImage145(2017)180–199 187 byintegratingout(ormarginalising)theparametersfromthejoint ofdatapoints: probability: Z AIC¼ logpðyjθ;mÞ−k logn ð12Þ pðyjmÞ¼ pðyjθ;mÞpðθjmÞdθ ð9Þ BIC¼ logpðyjθ;mÞ−k 2 ThisiswhytheBayesianmodelevidenceisalsoreferredtoasthe TheadditionalscalingfactorinBICmeansthat,oncethatmorethan marginallikelihood.Asasimplifyingintuition,theevidencecanbeun- nN8datapointsareavailable,BICentailsastrongercomplexitypenalty derstoodasprovidingananswertothequestion:“IfIrandomlysampled thanAIC.ThesimplicityoftheircomplexityapproximationsmakesAIC/ frommypriorandpluggedtheresultingvalueintothelikelihoodfunc- BICeasytocompute,buthastwosignificantdisadvantages:AIC/BICig- tion,howclosewouldthepredicteddatabe–onaverage–tomyob- noreinterdependenciesamongmodelparameters(whichareubiqui- serveddata?” tous in biological systems; Gutenkunst et al., 2007) nor can they Inpractice,modelcomparisondoesnotutilisethemodelevidence capturedifferencesinpriorvarianceacrossparameters. directlybuttypicallyemploysitslogarithm(logevidence).Giventhe Theseissuesareresolvedbyathirdapproximationtothelogevi- monotonicnatureofthelogarithmicfunction,rankingmodelsbased dence,thenegativefreeenergyF.Itsnamederivesfromcloseconnec- oneithermodelevidenceorlogevidenceyieldsidenticalresults.How- tions between free energy concepts in statistical physics and ever,thelogevidenceisnumericallyeasiertodealwith(thelogarithm variationalapproachestoprobabilitytheory(seeFristonetal.,2007; ofasmallnumberbetweenzeroandoneisalargenegativenumber) NealandHinton,1998).Variationalfreeenergyrepresentsalower andresultsinmoreintuitiveequations,someofwhichweencounter boundapproximationtothelogevidence,wherethetightnessofthe below.Italsooffersanadditionalniceintuitionderivedfrominforma- bounddependsonhowwellthetrue(butunknown)posteriorcanbe tiontheory.Specifically,giventhat(Shannon)surpriseSisdefinedas matchedbyanapproximateposteriorq(ofknownform): negativelogprobability,foranagentoperatingunderagivenmodel m,thelogevidencecorrespondstothenegativesurpriseaboutobserv- logpðyjmÞ¼FþKL½qðθÞ∥pðθjy;mÞ(cid:2) ð13Þ ingthedatay: Here,KLreferstoKullback-Leiblerdivergenceorrelativeentropy,an logpðyjmÞ¼−SðyjmÞ ð10Þ informationtheoreticmeasureofthedissimilaritybetweentwoproba- bilitydensities.TheKLdivergenceiszerowhenthedensitiesareidenti- Putsimply,logevidence–andhencemodelgoodness–increases calandbecomesincreasinglypositivethemorethetwodensitiesdiffer whenwearelesssurprisedaboutthedataencountered. (KullbackandLeibler,1951).Importantly,sincewedonotknowthe Whilethestatisticalprocedureofmodelcomparisontypicallyrests trueposterior,theKLtermcannotbeevaluateddirectly.However,by onthelogevidence,theresultofcomparingtwomodelsisunderstood maximisingFoneimplicitlyminimisestheKLterm,thustightening moreintuitivelywhenreportedasaBayesfactor(BF);thisissimply thelowerboundapproximationtothelogevidence(seeFig.6).Thisis theratiooftwomodelevidences.Aswithp-valuesinfrequentiststatis- achievedbyoptimisingtheapproximateposteriorq(e.g.,whenqis tics,conventionsexistaboutwhichthresholdsaremeaningfulforBayes Gaussian,findingthemeanandvarianceofqthatmaximisesFaccord- factors(KassandRaftery,1995).Forexample,aBayesfactorlargerthan ingtoEq.13above).Inotherwords,bymaximisingFwecanbothobtain 20(equivalenttoalogevidencedifferencelargerthan3)wouldbecon- anapproximationtothelogevidenceandtheposteriordensitiesofthe sideredas“strong”evidenceinfavourofonemodelrelativetoanother. parameters. Analternativeoption,whenreportingtheresultsofmodelcompar- The negative free energy (and hence model evidence) can be isonsinanintuitivelyaccessibleform,istocompute,foreachmodelm, decomposedintothefollowingbalancebetweenmodelfitandmodel i itsposteriorprobability.Inthetypicalcasewheretheprioronmodelsis complexity(fordetails,seePennyetal.,2004a;Stephanetal.,2007): uninformativeorflat(cf.Eq.8),thissimplifiestonormalisingtheevi- denceforeachmodelbythesumofallmodelevidences: F¼hlogpðyjθ;mÞiq−KL½qðθÞ∥pðθjmÞ(cid:2) ð14Þ pðmjyÞ¼ pðyjmiÞpðmiÞ ¼ pðyjmiÞ ð11Þ Inthisexpression,thefirsttermrepresentsaccuracy:theexpected i XjMj (cid:1) (cid:3) (cid:1) (cid:3) XjMj (cid:1) (cid:3) loglikelihood,underachosenapproximateposteriorq.Thesecond p yjm p m p yjm j j j termrepresentscomplexityandisgivenbyanotherKLdivergence; j¼1 j¼1 thistimebetweentheapproximateposteriorandtheprior.Whenthe formoftheapproximateposterioristhesameasthetrueposterior, Thismakesiteasytoseethatacrossallmodels,theposteriormodel thecomplexityisexactlythedifferencebetweentheposteriorand probabilitysumstounity. prior and inference becomes exact. Put simply, this means that a modelhashighcomplexityifitissufficientlyflexibletoallowforasub- Approximationstothelogevidence stantialbeliefupdate,i.e.,apronounceddivergenceoftheposterior Onemajorbarriertocomputingthemodelevidenceisthattheinte- fromthepriorbelief.Anotherheuristicisthatthecomplexityreflects gralinEq.9canrarelybeevaluatedanalytically;furthermore,numerical boththeeffectivenumberofparametersthatneedtobedisplaced integrationistypicallyprohibitivelyexpensive.Therefore,oneusually fromtheirpriorvaluestoprovideanaccurateexplanationfordataand resortstoapproximationsofthelogevidence,suchastheAkaikeinfor- thedegreeoftheirdisplacement. mationcriterion(AIC;Akaike,1974),Bayesianinformationcriterion Thisheuristiccanbeturnedintoamoreformalperspectivebyexam- (BIC; Schwarz,1978),or negative free energy (Fristonet al.,2007; iningtheanalyticalexpressionofthecomplexitytermunderanas- NealandHinton,1998;Pennyetal.,2004a).Theseapproximationsde- sumed distributional form for the approximate posterior (see composemodelgoodnessintoabalanceoftwoterms–accuracyand discussionsinStephanetal.(2009b)andPenny(2012)).Forexample, complexity.Allofthemagreeinthedefinitionofaccuracyasloglikeli- underGaussianassumptions: hood.Bycontrast,theydifferconsiderablyintheirapproximationof complexity. AICandBIChaveaseeminglystraightforwardapproximationof KL½qðθÞ∥pðθjmÞ(cid:2)¼ cfroemepplaerxaimtye.tIenrsA;ICincBoImC,ptlheixsitisyasdimdiptiloyncaolrlyresscpaolenddsbytoththeelongumnubmerbeorf 12logðdetðCθÞÞ−12log(cid:1)det(cid:1)Cθjy(cid:3)(cid:3)þ(cid:5)μθjy−μθ(cid:6)TC−θ1(cid:5)μθjy−μθ(cid:6) ð15Þ 188 K.E.Stephanetal./NeuroImage145(2017)180–199 Fig.6.Agraphicalillustrationofthenegativefreeenergyapproximationtothelogmodelevidence,anditsevolutionduringmodelinversioninthecontextofvariationalBayes.Here,by adjustingtheparametersoftheapproximateposteriorsuchthatthenegativefreeenergyFismaximised,oneimplicitlyminimisestheKLdivergencebetweentheapproximateandtrue posteriorandtightenstheboundonthelogevidence.Seemaintextfordetails. Here,CθandCθ|ydenotepriorandposteriorcovariancematrices,and superior to BIC for model comparison of directed acyclic graphical detreferstothedeterminant,amatrixpropertythatcanbeinterpreted models(BealandGhahramani,2003). asameasureof“volume”(thespacespannedbytheeigenvectorsofthe Analternativetotheaboveapproximationsaresampling-basedap- matrix).Thisvolumeincreaseswiththenumberofdimensions(the proaches,typicallybasedonMCMC.One(highlysimplified)wayto rankofthecovariancematrix),andwiththelengthandorthogonality thinkofMCMC–inthisparticularcontext–isofreconstructinganinte- ofthebasisvectors.Withthisinmind,thefirstterminEq.15means gralbyanapproximation(essentiallylikeahistogram)thatis“experi- thatcomplexityincreaseswiththenumberoffreeparameters,the enced” by a random walk. Here, each step only depends on the moreflexibletheseparametersare(thehighertheirpriorvariance), previousone(Markovchain)andtendstomoveinadirectionthatis andthemoreorthogonaltheyare.Thesecondtermmeansthatcom- likelytoprovideameaningfulcontributiontotheintegral.Depending plexitydecreaseswithincreasingorthogonalityoftheposteriorparam- onhowmuchcomputationtimeoneiswillingtoinvest,differentop- eterestimates(adesirablepropertyofaninterpretablemodel)andwith tionsexist.Acomputationallylessexpensiveapproachistouseasingle increasingposteriorvariances(highlypreciseposteriorestimatesresult chainforobtainingsamplesfromaparticulardistributionandusing inbrittlemodelpredictionswhichareunlikelytogeneralise).Finally, thesesamplestoevaluateEq.9,inordertoobtainanapproximation thethirdtermcapturesourheuristicaboveandexpressesthatcomplex- tothemodelevidence.Forexample,usingthechaintosamplefrom itygrowsthemoretheposteriormeandivergesfromthepriormean. thepriorleads to thepriorarithmetic mean (PAM) approximation Whilealloftheaboveapproximationshaveprovenusefulinprac- (whichtendstounderestimatethemodelevidence),whereassampling tice,theycomewithdifferentprosandcons.5AICandBICareeasyto fromtheposteriordistributionleadstotheposteriorharmonicmean computesincetheloglikelihoodisalwaysavailableandestimating (PHM) approximation (which tends to overestimate the model complexityboilsdowntosimplycountingthenumberoffreeparame- evidence). ters.Onthedownside,AICandBICareagnostictoseveralimportantas- Amorerobustalternativeismulti-chainsampling.Thekeyideahere pectsofcomplexity,suchasthepriorvarianceofandinterdependence istobuildasequence(path)ofprobabilitydistributionsthatconnect amongparameters.Bycontrast,thefreeenergyapproximationprovides thepriortotheposteriordistributionbyusingatemperatureparameter amoreinformedmeasureofcomplexitythatisgenerallymoreappro- onthelikelihoodpartofthemodel(LartillotandPhilippe,2006).Inde- priateforreal-worldbiologicalsystemswhichareimbuedwithparam- pendentsinglechainscanbeusedtoobtainsamplesfromeachdistribu- eter interdependencies (Gutenkunst et al., 2007). However, tionfromthissequence;joiningthesamplesfrommultiplechainsyields distributionalassumptionsmayhavegreaterimpactthanforBICand anasymptoticallyexactestimateofthelogevidence.Anadditionalim- AIC(sincetheyconcernnotonlytheaccuracy,butalsothecomplexity provementistousepopulationMCMC(CalderheadandGirolami,2009) term),andevaluatingthetightnessofitsboundapproximationrequires which imposes a dependency between neighbouring distributions computationallyexpensivesamplingschemes(Aponteetal.,2016).On (chains) in thesequence andimproves the samples obtained from theotherhand,thenegativefreeenergyapproximationwasshownto eachsinglechain. exhibitbettermodelcomparisonperformancethanAIC/BICinthecon- Whilesampling-basedapproachestothelogevidenceareapromis- textofregressionmodelsandDCM(Penny,2012)andalsoproved ingdirectingforfuturedevelopmentsofBMS,theyhaveusuallybeen prohibitivelyexpensive(intermsofcomputetime)sofar.However,re- centadvancesinexploitingthepowerofgraphicsprocessingunits (GPUs) are now beginning to turn sampling approaches, including 5 Severalsimulationstudieshaveexaminedthevalidityoftheseapproximationsinthe multi-chainandpopulationMCMCmethods,intoaviablealternative contextofthemodelsdiscussedinthispaper;forexample,seeFigure4inPennyetal. (2004a),Figures3–5inStephanetal.(2008),andFigures6–8inPenny(2012). forcomputingaccuratelogevidenceestimates(Aponteetal.,2016).
Description: