YNIMG-08516;No.ofpages:12;4C: NeuroImagexxx(2011)xxx–xxx ContentslistsavailableatSciVerseScienceDirect NeuroImage journal homepage: www.elsevier.com/locate/ynimg Technical Note Comparing Dynamic Causal Models using AIC, BIC and Free Energy ⁎ W.D. Penny WellcomeTrustCentreforNeuroimaging,UniversityCollege,LondonWC1N3BG,UK a r t i c l e i n f o a b s t r a c t Articlehistory: Inneuroimagingitisnowbecomingstandardpractisetofitmultiplemodelstodataandcomparethemusing Received23June2011 a model selection criterion. This is especially prevalent in the analysis of brain connectivity. This paper Accepted7July2011 describesasimulationstudywhichcomparestherelativemeritsofthreemodelselectioncriteria(i)Akaike's Availableonlinexxxx InformationCriterion(AIC),(ii)theBayesianInformationCriterion(BIC)and(iii)thevariationalFreeEnergy. DifferencesinperformanceareexaminedinthecontextofGeneralLinearModels(GLMs)andDynamicCausal Keywords: Models(DCMs).WefindthattheFreeEnergyhasthebestmodelselectionabilityandrecommenditbeused Bayesian forcomparisonofDCMs. Modelcomparison Brainconnectivity ©2011ElsevierInc.Allrightsreserved. DynamicCausalModelling fMRI Introduction classofmodelwedefinea‘full’anda‘nested’model,wherethenested modelisaspecialcaseofthefullmodelwithasubsetofparameters Mathematicalmodelsofscientificdatacanbeformallycompared settozero.Weexaminehowmodelcomparisonaccuracyvariesasa using the Bayesian model evidence (Bernardo and Smith, 2000; function of number of data points and signal to noise ratio for the Gelmanetal.,1995;Mackay,2003),anapproachthatisnowwidely separatecasesofdatabeinggeneratedbyfullornestedmodels.This usedinstatistics(Hoetingetal.,1999),signalprocessing(Pennyand allowsustoassessthesensitivityandspecificityofthedifferentmodel Roberts,2002),machinelearning(BealandGhahramani,2003),and comparison criteria. The paper uses simulated data generated from neuroimaging(Fristonetal.,2008;Pennyetal.,2003;Trujillo-Barreto models with known parameters but these parameters are derived et al., 2004). By comparing the evidence or ‘score’ of one model fromempiricalneuroimagingdata.Westartbybrieflyreviewingthe relativetoanother,adecisioncanbemadeastowhichisthemore relevanttheoreticalbackgroundandthengoontopresentourresults. veridical.Thisapproachhasnowbeenwidelyadoptedfortheanalysis ofbrainconnectivitydata,especiallyinthecontextofDynamicCausal Methods Modelling(DCM)(Fristonetal.,2003;Pennyetal.,2004). Originally (Penny et al., 2004), it was proposed to score DCMs We consider Bayesian inference on data y using model m with usingacombinationofAkaike'sInformationCriterion(AIC)andthe parameters θ. In the analysis of brain connectivity, the data would Bayesian Information Criterion (BIC) criteria. Specifically, it was comprise,forexample,fMRItimeseriesfrommultiplebrainregions, proposedthat(Pennyetal.,2004)ifbothAICandBICprovidedalog the model would make specific assumptions about connectivity Bayesfactor(differenceinlogmodelevidences)ofgreaterthanthree structure, and the parameters would correspond to connections in favour of model one versus two, one could safely conclude that strengths.Agenericapproachforstatisticalinferenceinthiscontextis modelonewasthemoreveridical.Morerecentlyithasbeenproposed Bayesian estimation (Bishop, 2006; Gelman et al., 1995) which (Stephanetal.,2010),ontheoreticalgrounds,toinsteadscoreDCMs provides estimates of two quantities. The first is the posterior usingtheFreeEnergy(Fristonetal.,2007a).However,untilnowthere distribution over model parameters p(θ|m,y) which can be used to hasbeennoempiricalcomparisonofthemodelcomparisonabilities make inferences about model parameters θ. The second is the ofthedifferentapproaches. probability of the data given the model, otherwise known as the Thismotivatestheworkinthispaperwhichdescribesasimulation modelevidence.Thiscanbeusedformodelcomparison,inthatratios study comparing AIC, BIC and the Free Energy. Differences in of model evidences (Bayes factors) allow one to choose between performance areexaminedinthecontextofGeneralLinearModels models(KassandRaftery,1995;Raftery,1995).Thispaperfocusses (GLMs) and Dynamic Causal Models (DCMs). Specifically, for each onDynamicCausalModelsandonmodelinferenceusingAIC,BICor FreeEnergyapproximationstothemodelevidence.Wefirstdescribe DCM,showhowmodelparametersareestimated,describeBayesian ⁎ Fax:+442078131420. inference for General Linear Models and then go on to describe E-mailaddress:w.penny@fil.ion.ucl.ac.uk. the different model selection criteria. In what follows N(x;m,S) 1053-8119/$–seefrontmatter©2011ElsevierInc.Allrightsreserved. doi:10.1016/j.neuroimage.2011.07.039 Pleasecitethisarticleas:Penny,W.D.,ComparingDynamicCausalModelsusingAIC,BICandFreeEnergy,NeuroImage(2011),doi:10.1016/ j.neuroimage.2011.07.039 2 W.D.Penny/NeuroImagexxx(2011)xxx–xxx representsamultivariateGaussiandensityoverxwithmeanmand priorvarianceswhichdependonthenumberofregionsinthemodel covarianceS,and|S|denotesthedeterminantofmatrixS. (Friston et al., 2003), and in this paper we model activity in three regions.Fortheintrinsicself-connectionswehave DCMforfMRI (cid:2) (cid:3) pðA jmÞ=N A ;−1;σ2 ð3Þ DynamicCausal Modelling is a framework for fittingdifferential ii ii self equation models of neuronal activity to brain imaging data using withσ =0.177.Thetimeconstantassociatedwithaself-connection Bayesianinference.ThereisnowalibraryofDCMsandvariantsdiffer self isτ=−1/A ,andthetimeatwhichactivitydecaystohalfitsinitial according to their level of biological realism and the data features i ii value(half-life)is(1/A )log0.5(Fristonetal.,2003).Thepriorover whichtheyexplain.TheDCMapproachcanbeappliedtofunctional ii self-connections corresponds to a prior over half-life's that can be MagneticResonanceImaging(fMRI),Electroencephalographic(EEG), determinedbysamplingfromp(A |m)andtransformingvariablesto Magnetoencephalographic(MEG),andLocalFieldPotential(LFP)data ii τ=−1/A .Thisproducesameanhalflifeofapproximately720ms (Daunizeauetal.,2009).TheempiricalworkinthispaperusesDCM i ii with90%ofthedistributionbetween500and1000ms. forfMRI. Forthoseintrinsiccrossconnectionswhicharenotfixedatzeroby modelassumptionsmwehave Neurodynamics ThispaperusesDCMforfMRIwithbilinearneurodynamicsandan (cid:2) (cid:3) extendedBalloonmodel(Friston,2002)forthehemodynamics.The pðAikjmÞ=N Aik;1=64;σc2ross ð4Þ neurodynamicsaredescribedbythefollowingmultivariatedifferen- tialequation whereσ =0.5.Elementsofthemodulatoryandinputconnectivity cross ! matrices (which are not fixed at zero by model assumptions) have z˙ = A+∑M uðjÞBj z +Cu ð1Þ shrinkagepriors t t t t j=1 (cid:2) (cid:3) (cid:2) (cid:3) p Bjjm =N Bj;0;σ2 ð5Þ ik ik s wheretindexescontinuoustimeandthedotnotationdenotesatime derivative.Theithentryinz correspondstoneuronalactivityinthe t (cid:2) (cid:3) (cid:2) (cid:3) ithbrainregion,andut(j)isthejthexperimentalinput. p C jm =N C ;0;σ2 ð6Þ ADCMischaracterisedbyasetof‘intrinsicconnections’,A,that ij ij s specifywhichregionsareconnectedandwhethertheseconnections are unidirectional or bidirectional. We also define a set of input and σ =2. In the above, i and k index brain regions and j indexes s connections, C, that specify which inputs are connected to which experimentalinput. regions,andasetofmodulatoryconnections,Bj,thatspecifywhich Theprior varianceparameters σ2 ,σ2 and σ2 alongwiththe self cross s intrinsicconnectionscanbechangedbywhichinputs.Usually,theB prior variances on hemodynamic parameters (see Appendix A) parametersareofgreatestinterestasthesedescribehowconnections determinetheoverallpriorcovarianceonmodelparameters,Cθ(see betweenbrainregionsaredependentonexperimentalmanipulations. next section). In the free energy model comparison criterion (see The overall specification of input, intrinsic and modulatory below) these variances contribute to the penalty paid for each connectivity comprise our assumptionsabout modelstructure. This parameter. inturnrepresentsascientifichypothesisaboutthestructureofthe large-scale neuronal network mediating the underlying cognitive Optimisation function.Thesehypotheses,ormodelsareindexedbym. Thesimulationsinthispaperuse‘DCM8’(availableinSPM8prior ThestandardalgorithmusedtooptimiseDCMsistheVariational to revision 4010) with a deterministic, single-state, bilinear neuro- Laplace (VL) method described in (Friston et al., 2007a). The VL dynamicalmodelasdescribedabove. algorithmcanbeusedforBayesianestimationofanynonlinearmodel oftheform Modelpredictions InDCM,neuronalactivitygivesrisetofMRIsignalsviaanextended y=gðθÞ+e ð7Þ Balloonmodel(Buxtonetal.,2004)andBOLDsignalmodel(Stephan etal.,2007)foreachregion.Thisspecifieshowchangesinneuronal whereg(θ)issomenonlinearfunction,andeiszeromeanadditive activitygiverisetochangesinbloodoxygenationthataremeasured Gaussian noise with covariance C . This covariance depends on y withfMRI.Theequationsforthesehemodynamicsareprovidedinthe hyperparameters λ as shown below. The likelihood of the data is AppendixAanddependonasetofhemodynamicparametersh. therefore Overall,theDCMparametersarecollectivelywrittenasthevector (cid:2) (cid:3) θ={A,B,C,h}.Numericalintegrationoftheneurodynamic(Eq.1)and pðyjθ;λ;mÞ=N y;gðθ;mÞ;C ð8Þ y hemodynamic equations (Appendix A) leads to prediction of fMRI activity in each brain region. These values are concatenated to TheframeworkallowsforGaussianpriorsovermodelparameters produceasinglemodelpredictionvectorg(θ). pðθjmÞ=Nðθ;μθ;CθÞ ð9Þ Priors Thepriorsfactoriseoverparametertypes wherethepriormeanandcovarianceareassumedknown.Theerror covariancesareassumedtodecomposeintotermsoftheform pðθjmÞ=pðAjmÞpðBjmÞpðCjmÞpðhjmÞ ð2Þ C−1= ∑expðλÞQ ð10Þ y i i i andeachparameterpriorisGaussian.Thepriorsusedinthispaper correspond to those used in ‘DCM8’. The priors over the intrinsic whereQ areknownprecisionbasisfunctions.Thehyperparameters i connectionsarechosentoencouragestabledynamics.Thisresultsin that govern these error precisions are collectively written as the Pleasecitethisarticleas:Penny,W.D.,ComparingDynamicCausalModelsusingAIC,BICandFreeEnergy,NeuroImage(2011),doi:10.1016/ j.neuroimage.2011.07.039 W.D.Penny/NeuroImagexxx(2011)xxx–xxx 3 vectorλ.Thesewillbeestimated.Additionally,thehyperparameters Comparisonofalargenumberofmodelsisbestimplementedusing areconstrainedbytheprior thefullposteriordensity,p(m|y),asdescribedin(Pennyetal.,2010). pðλjmÞ=Nðλ;μλ;CλÞ ð11Þ Freeenergy Theabovedistributionsallowonetowritedownanexpressionfor Itispossibletoplacealowerboundonthelogmodelevidenceof thejointloglikelihoodofthedata,parametersandhyperparameters thefollowingform(Beal,2003) pðy;θ;λjmÞ=pðyjθ;λ;mÞpðθjmÞpðλjmÞ ð12Þ logpðyjmÞ=FðmÞ+KL½qðθ;λjmÞjjpðθ;λjy;mÞ(cid:2) ð17Þ TheVLalgorithmthenassumesanapproximateposteriordensity ofthefollowingfactorisedform where F(m) is known as the negative variational free energy (henceforth‘FreeEnergy’)andthelasttermistheKullback–Liebler qðθ;λjy;mÞ=qðθjy;mÞqðλjy;mÞ distance between the true posterior density, p(θ,λ|y,m) and an qðθjy;mÞ=Nðθ;mθ;SθÞ ð13Þ approximate posterior q(θ,λ|m). Because KL is always positive qðλjy;mÞ=Nðλ;mλ;SλÞ (Mackay,2003),F(m)providesalowerboundonthemodelevidence. TheFreeEnergyisdefinedas Theparametersoftheseapproximateposteriorsaretheniteratively (cid:4) (cid:5) updated so as to minimise the Kullback–Liebler (KL)-divergence FðmÞ=∫∫qðθ;λjy;mÞlog pðy;θ;λjmÞ dθdλ ð18Þ between the true and approximate posteriors. This algorithm is qðθ;λjy;mÞ describedfullyin(Fristonetal.,2007a). We emphasise here that the Variational Laplace framework and canbeestimatedusing aLaplaceapproximation (Fristonetal., assumes that the prior means and covariances (μθ,Cθ,μλ,Cλ) are 2007a), FL(m), as derived in Appendix B. As noted in (Wipf and known.Theyarenotestimatedfromdata,asisthecaseforEmpirical Nagarajan,2009),becausetheLaplaceapproximationisnotexactly Bayesmethods(CarlinandLouis,2000).Wewillreturntothisissuein equaltotheFreeEnergy,theabovelowerboundpropertynolonger thediscussion. holds.Thatis,theLaplaceapproximationdoesnotlowerboundthe logmodelevidence.Asweshallsee,however,itneverthelessprovides HyperparametersinDCMforfMRI averyusefulmodelcomparisoncriterion.TheLaplaceapproximation InDCMforfMRItheprecisionbasisfunctionsQ,definedinEq.(10), totheFreeEnergyisgiveninEq.(57)andcanbeexpressedasasumof i aresettoQ=Iforeachbrainregion.Thequantityγ=exp(λ)therefore accuracyandcomplexityterms(Beal,2003) i i i correspondstothenoiseprecisioninregioni. F ðmÞ=AccuracyðmÞ−ComplexityðmÞ ð19Þ The overall error covariance matrix C has a block structure L y correspondingtotheassumptionthatobservationnoiseisindepen- dentandidenticallydistributedineachregion.Thisisvalidastime AccuracyðmÞ=−1eTC−1e −1logjC j−Nlog2π ð20Þ 2 y y y 2 y 2 series data are usually pre-whitened before entering into a DCM analysis(Fristonetal.,2003).Thepriormeanandcovarianceofthe associatedlatentvariablesaresetto ComplexityðmÞ= 12eTθCθ−1eθ+ 12logjCθj−12logjSθj ð21Þ Cμλλ==01 ð14Þ + 12eTλCλ−1eλ+ 12logjCλj−12logjSλj whereNisthenumberofdatapointsandthe‘errorterms’are This corresponds to the assumption that the mean prior noise precision, γi=1:7. These values, along with the priors on the ey=y−gðmθÞ neurodynamicparameters,havebeensetsoastoproducedatasets eθ=mθ−μθ ð22Þ withtypicalsignaltonoiseratiosencounteredinfMRI. eλ=mλ−μλ Modelevidence ThefirstrowofEq.(21)isthecomplexitytermfortheparameters andthesecondrowthecomplexitytermforthehyperparameters.If Themodelevidence,alsoknownasthemarginallikelihood,isnot the hyperparameters are known then the last row of Eq. (21) straightforwardtocompute,sinceitscomputationinvolvesintegrat- disappears.Inthiscasewecanwritethecomplexityas ingoutthedependenceonmodelparameters pðyjmÞ=∫∫pðyjθ;λ;mÞpðθjmÞpðλjmÞdθdλ: ð15Þ ComplexityðmÞ= 12eTθCθ−1eθ+ 12logjjCSθθjj ð23Þ The following sections describe Free Energy, AIC and BIC Inthelimitthattheposteriorequalstheprior(eθ=0,Cθ=Sθ),the approximationstothe(log)modelevidence.Oncetheevidencehas complexitytermequalszero.ThelastterminEq.(23),12logjjCSθθjj,isalso beencomputedmodelsm andm canbecomparedusingtheBayes referred to as an Occam factor (see page 349 in (Mackay, 2003)). 1 2 factor Because the determinant of a matrix corresponds to the volume spanned by its eigenvectors, this Occam factor gets larger and the B = pðyjm1Þ ð16Þ model evidence smaller as the posterior volume, |Sθ|, reduces in 12 pðyjm2Þ proportion to the prior volume, |Cθ|. Models for which parameters have to be specified precisely (small posterior volume) are brittle. withavalueof20correspondingtoaposteriorprobabilityofgreater Theyarenotgoodmodels(complexityishigh). than0.95infavourofmodelm .ThecorrespondinglogBayesfactor The above considerations also apply to cases where hyperpara- 1 is3.TheuseofBayesfactorsformodelcomparisonisdescribedmore metersareestimated.Thereisanadditionalcomplexityterm(lastline fully elsewhere (Kass and Raftery, 1995; Penny et al., 2004). ofEq.21)andthereforeanadditionalOccamfactor. Pleasecitethisarticleas:Penny,W.D.,ComparingDynamicCausalModelsusingAIC,BICandFreeEnergy,NeuroImage(2011),doi:10.1016/ j.neuroimage.2011.07.039 4 W.D.Penny/NeuroImagexxx(2011)xxx–xxx Correlatedparameters TheseparametervaluescanthenbepluggedintoEqs.(19)to(22) Otherfactorsbeingequal,modelswithstrongcorrelationinthe to compute the Free Energy. If the hyperparameters are assumed posteriorarenotgoodmodels.Forexample,givenamodelwithjust knownthentheFreeEnergyexpressioninEq.(19)isexactlyequalto twoparametersthedeterminantoftheposteriorcovarianceisgiven thelogmodelevidence.Thatis,F(m)=logp(y|m).Wewillrevisitthis L by caseintheResultssection.Ifthehyperparametersareestimatedthen (cid:2) (cid:3) theFreeEnergyprovidesaverycloseapproximation,asconfirmedby jSθj = 1−r2 σθ2σθ2 ð24Þ samplingmethods(Fristonetal.,2007a). 1 2 where r is the posterior correlation, σθ and σθ are the posterior AICandBIC 1 2 standard deviations of the two parameters. For the case of two parametershavingasimilareffectonmodelpredictionstheposterior Asimpleapproximationtothelogmodelevidenceisgivenbythe correlationwillbehigh,thereforeimplyingalargecomplexitypenalty. BayesianInformationCriterion(Schwarz,1978) However, there is another factor at play. This is that neither parameterwillbeestimatedaccurately(theposteriorvarianceswill p behigh).Thissecondfactorcanoffsetthehighercomplexitydueto BIC=AccuracyðmÞ− logN ð28Þ 2 correlationandcanleadtoasituationinwhichadditionalextraneous parameters will not lead to a significant drop in free energy. One would then appeal to a further Occam's Razor principle (Mackay, wherepisthenumberofparameters,andN isthenumberofdata 2003), namely, that in the absence of significant free energy points.TheBICisaspecialcaseoftheFreeEnergyapproximationthat differencesoneshouldpreferthesimplermodel(seeDiscussion). dropsalltermsthatdonotscalewiththenumberofdatapoints(see WhenfittingDCMstofMRIdataitislikelythatsomeparameters e.g. Appendix A2 in (Penny et al., 2004) for a derivation). This is willbecorrelatedwitheachother.Thiscorrelationcanbeexamined equivalenttothestatementthatBICisequaltotheFreeEnergyunder bylookingattheposteriorcovariancematrixSθ.Agoodexampleof theinfinitedatalimit,andwhenthepriorsoverparametersareflat, this is provided in Fig. 6 of Stephan et al. (2007) who describe andthevariationalposteriorisexact(seesection2.3in(Attias,1999) posteriorcorrelationsamonghemodynamicandconnectivityparam- and page 217in (Bishop,2006)).In practise,as we shall see, these eters. Importantly, these correlations are accomodated in the Free threerequirementsarealmostnevermetandBICwillproducemodel Energy model comparison criterion (see Eq. 23 and above). This is comparisons that are often very different to those from the Free possiblebecauseVariationalLaplacedoesnotassumethatparameters Energy. areaposterioriindependentamongthemselves,ratheritisassumed An alternative model selection criterion is Akaike's Information thattheparametersare aposterioriindependentofthehyperpara- Criterion(or‘AnInformationCriterion’)(Akaike,1973) meters(seeEq.13). Decompositions AIC=AccuracyðmÞ−p ð29Þ It is instructive to decompose approximations to the model evidence into contributions from specific sets of parameters or predictions.InthecontextofDCM,onecandecomposetheaccuracy AICisnotaformalapproximationtothemodelevidencebutderives terms into contributions from different brain regions, as described from information theoretic considerations. Specifically, AIC model previously(Pennyetal.,2004).Thisenablesinsighttobegainedinto selectionwillchoosethatmodelinthecomparisonsetwithminimal whyonemodelisbetterthananother.Itmaybe,forexample,thatone expected KL divergence to the true model (Akaike, 1973; Burnham modelpredictsactivitymoreaccuratelyinaparticularbrainregion. andAnderson,2002).Thereareprecedentsintheliterature,however, Similarly, it is possible to decompose the complexity term into forusingitasasurrogateforthemodelevidence,inordertoderivea contributions from different sets of parameters. If we ignore posteriordensityovermodels(BurnhamandAnderson,2004)(Penny correlation among different parameter sets then the complexity is etal.,2004). approximately The AIC criterion has been reportedto performpoorly for small numbers of data points (Brockwell and Davis, 2009; Burnham and ! ComplexityðmÞ≈12∑j eTθjCθ−j1eθj+logjjCSθθjjj ð25Þ Anderson,2004).Thishasmotivatedtheinclusionofacorrectionterm j pðp+1Þ AICc=AIC− ð30Þ wherejindexesthejthparameterset.InthecontextofDCMthese N−p−1 could index input connections (j=1), intrinsic connections (j=2), modulatoryconnections(j=3)etc.Wewillseeanexampleofthisin theResultssection. knownasthe‘corrected’AIC(AICc)(HurvichandTsai,1989).TheAICc criterion thus penalises parameters more than does AIC. The two GeneralLinearModels criteria become approximately equal for N>p2 and identical in the ForGeneralLinearModels(GLMs)modelpredictionsaregivenby limitofverylargesamplesizes.Wenote,however,thatforN<p+1 thedenominatorinthecorrectiontermbecomesnegativeandAICc gðθÞ=Xθ ð26Þ penalisesparameterslessthandoesAIC.Intheempiricalworkinthis paperwethereforeavoidthis(highlyunlikely)regime. whereXisadesignmatrixandθarenowregressioncoefficients.The InapplicationsofAIC and BICtoDCMs(Pennyetal., 2004),the posteriordistributionisanalyticandgivenby(Bishop,2006) estimatedparametersaretakentobeequaltotheposteriormeansmθ andmλ.AICandBICareusefulapproximationsbecauseoneonlyneeds to quantify the fit of the model to provide an estimate of the log- Sm−θθ1==SXθ(cid:2)TXCy−TC1y−X1+y+Cθ−C1θ−1μθ(cid:3) ð27Þ eappvaiprdaremonxecitmee.raAtiinIoCnthaneindmBtohIdCaetla.rtehequsaalmitaetivfiexlyeddipffeenreanlttytoisthpeaifdreeforeneeargchy Pleasecitethisarticleas:Penny,W.D.,ComparingDynamicCausalModelsusingAIC,BICandFreeEnergy,NeuroImage(2011),doi:10.1016/ j.neuroimage.2011.07.039 W.D.Penny/NeuroImagexxx(2011)xxx–xxx 5 Results therefore different, with the full model design matrix having 12 regressorsandthenestedmodelhaving9regressors. Linearmodels Estimatedregressioncoefficients,θˆ,andnoisevarianceestimates, σ̂ =0.73 were extracted for a voxel showing a significant overall e We first compare the different approximations to the model response to faces (i.e. over all conditions). The corresponding fMRI evidenceusingBayesianGLMs.Wedefinetheseusingthefollowing time series comprised N=351 values. We then created simulated priorandlikelihood databasedonthisobservedfMRIdataasfollows. First, we estimated the deviation of the fitted regression co- pðypðjθθÞÞ==NNð(cid:2)θy;;μXθθ;;CCθÞ(cid:3) ð31Þ eesffiticmieanttiosnabwouastzbearsoedanodnspetartahmepetreiorrfiStDsftrootmhisdavtaaluaet,σaps=ing6l.e05v.oTxheils. y Theuseofacommonσ valueforallregressioncoefficientsimplies p thattheeffectsareofsimilarmagnitudeforallfourconditionsandall whereθisthe[p×1]vectorofregressioncoefficients,yisthe[N×1] threetemporalbasisfunctions,andisareasonableassumption.We vectorofdatapoints,Xisthe[N×p]designmatrix,andfortheprior then computed <σ >, the average signal standard deviation when meanwehaveμθ=0.Fortheworkinthispaperweassumeisotropic drawingparametersythepriorp(θ). covariancematrices WethenproducedsimulateddatasetswheretheSignaltoNoise ratio Cθ=σp2Ip ð32Þ <σ > C =σ2I SNR= y ð33Þ y e N σ e where σp and σe are the standard deviations of the prior and was set to a range of values by choosing an appropriate σe. SNR observationerror.Weassumethattheseparametersareknown. definedinthismannercanberelatedtotheproportionofvariance We compare Bayes factors based on AIC, BIC and FL for nested explainedbythemodel,asshowninAppendixC.TheobservedfMRI GLMsderivedfromanfMRIstudy.ThefMRIdatasetwascollectedto datahaveavalueofSNR=1.3. studyneuronalresponsestoimagesoffacesandisavailablefromthe Eachsimulateddatasetwasthengeneratedbydrawingregression SPM web site (http://www.fil.ion.ucl.ac.uk/spm/data/face_rep/ coefficients from their prior densities, producing model predictions face_rep_SPM5.html.). Each face was presented twice, and faces g=Xθ (for both full and nested models) and adding zero mean eitherbelongedtofamiliarorunfamiliarpeople.Thisgaverisetofour Gaussiannoisewithvarianceσ2. e conditions,eachofwhichwasmodelledwith3hemodynamicbasis Wethenfittedbothfullandnestedmodelstoeachsimulateddata functions(Fristonetal.,2007b).Forafulldescriptionofthisdataset setandestimatedBayesfactorsusingAIC,BICandF.Thesecriteria L andWsiemfiirlastrdaenfianlyeseas‘nseesete(dH’emnsoodneleitnawl.,h2ic0h0o2n).ly3oftheseconditions wseecrteioncoinmtpouEtqe.d(2b7y)fsourbcsotimtuptuintgingX,thCey,pCoθs,tearniodrμmθeaasndaenfidnceodvairniatnhcies aarsecomnotadienlilnedg,arneseuxlttriang3rineg9rersesgorressfsroorms.Wtheeathddenitidoenfianlceoand‘fiutlilo’nm(ofidrestl fwoerrleintehaernmcoomdeplus.teTdhefropmredEiqcsti.o(n22e)rraonrds,(e2y6,)a.nWdepcaorualmdettheernecroromrps,uetθe, responsetounfamiliarfaces).Fig.1showsthedesignmatrixforthe the accuracy and complexity terms using Eqs. (20) and (21) (the full model. The design matrices for the full and nested models are complexitytermsforλwereignoredastheobservationnoisevariance wasknownforthesesimulations). Fig.2showsresultsfordatadrawnfromthefullmodel.Thefigure plots the log Bayes factors (differences in log model evidence) at variousvaluesofSNR,whereeachpointineachcurvewasaveraged over 1000 simulated data sets. At low SNRs, experimental effects shouldbeimpossibletodetect.ThisisreflectedintheFreeEnergylog Bayesfactorwhichcorrectlyasymptotestoavalueofzero,indicating neithermodelispreferred.Inthisregime,however,BICandtoalesser extentAICbothincorrectlyfavourthenestedmodel.Theerrorbarson the plots (not shown) are extremely tight in this regime, being ± 0.0001, ±0.09 and ±0.35 for SNRs of 0.0025, 0.029 and 0.055 respectively(averagedoverthethreecriteria).Thismeanswecanbe highlyconfidentthatF isunbiasedbutthatAICandBICarebiassed L towardsthenestedmodel. Theaboveprocedurewasthenrepeatedbutthistimegenerating datafromthenestedmodel.TheresultsareshowninFig.3(notethe broader range of SNRs plotted). In the low SNR regime, model comparisonshouldagainbeimpossible.Thisiscorrectlyreflectedin theF criterionwithalogBayesfactorapproachingzero,butnotsoin L theAICorBICcriteria. Finally,weexaminedthedependenceofmodelcomparisononthe numberofdatapoints,N.WevariedNover20valuesbetween32and 512 with 1000 replications at each value, using SNR=0.5 (results werequalitativelysimilarforotherSNRs).Theresultsareshownin Fig. 4 for data generated from the full model. As expected, Bayes factorsincreasewiththenumberofdatapoints.Thefreeenergy,AIC Fig.1.DesignmatrixforthefullGLM.ThenestedGLMusesanidenticaldesignmatrix andAICcshowverysimilarperformancewithF beingslightlybetter butwiththefirstthreecolumnsremoved.ThefulldesignmatrixcomprisesN=351 L atlowNandAIC/AICcathighN.TheBICcriterionisbiassedtowards rows,oneforeachfMRIscan,andtwelvecolumns,oneforeachputativeexperimental effect. thenestedmodel. Pleasecitethisarticleas:Penny,W.D.,ComparingDynamicCausalModelsusingAIC,BICandFreeEnergy,NeuroImage(2011),doi:10.1016/ j.neuroimage.2011.07.039 6 W.D.Penny/NeuroImagexxx(2011)xxx–xxx Fig.2.LogBayesfactoroffullversusnestedmodel,LogBf,n,versusthesignaltonoiseratio,SNR,whenthetruemodelisthefullGLMforFL(black),AIC(blue)andBIC(red).(For interpretationofthereferencestocolourinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.) Fig.5showstheresultsfordatageneratedfromthenestedmodel. wehavelinearmodels,thelastrequirementismetbutthepriorover The Bayes factors from the free energy and BIC increase with the parameters is Gaussian, rather than flat. A data set comprising 512 numberofdatapoints,whereasthisisnotthecaseforAICandAICc. points is about the maximum one could hope to get from a single WeseethatAICandAICcareequivalentforlargesamplesizes.For sessionoffMRIscanning(approximately17minwithaTRof2s).We small sample sizes AICc pays a larger parameter penalty. This is thereforeconclude that for neuroimaging applicationsBIC and Free beneficialwhenthenestedmodelistrue(Fig.5)butnotwhenthefull Energyarelikelytogivedifferentresults. model is true (Fig. 4). Overall, we do not see a good reason for favouring AICc over AIC and so exclude it from subsequent model comparisons. DCMforfMRI Theory(Attias,1999)tellsusthatBICshouldconvergetotheFree Energyforlargesamplesizes.However,thisisonlythecaseforflat We now compare the model comparison criteria using DCM for priorsoverparametersandifthevariationalposterioriscorrect.As fMRI.WegeneratedatausingsyntheticDCMswithknownparameter Fig.3.LogBayesfactorofnestedversusfullmodel,LogBn,f,versusthesignaltonoiseratio,SNR,whenthetruemodelisthenestedGLMforFL(black),AIC(blue)andBIC(red). (Forinterpretationofthereferencestocolourinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.) Pleasecitethisarticleas:Penny,W.D.,ComparingDynamicCausalModelsusingAIC,BICandFreeEnergy,NeuroImage(2011),doi:10.1016/ j.neuroimage.2011.07.039 W.D.Penny/NeuroImagexxx(2011)xxx–xxx 7 Fig.4.LogBayesfactoroffullversusnestedmodel,LogBf,n,versusthenumberofdatapoints,N,whenthetruemodelisthefullGLMforFL(black),AIC(blue),BIC(red)andAICc (green).(Forinterpretationofthereferencestocolourinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.) values. However, to ensure the data are realistic we use parameter (Leffetal.,2008).Thetimeserieswhichweremodelledinthisstudy valuesthatwereestimatedfromneuroimagingdata. compriseN=488datapointsineachofthreebrainregions. Thisdataderivefromapreviouslypublishedstudyonthecortical WefocusonjusttwoofthemodelsconsideredbyLeffetal.(Leff dynamicsofintelligiblespeech(Leffetal.,2008).Weuseddatafroma et al., 2008). These are a ‘nested’ model, which has full intrinsic single representative subject. This study applied DCM for fMRI to connectivity with auditory input, u , entering region P, and a aud investigate activity among three key multimodal brain regions: the modulatoryconnectionfromregionPtoF(thisallowsregionFtobe left posterior and anterior superior temporal sulci (subsequently differentiallyresponsivetointelligibleversustime-reversedspeech). referredtoasregionsPandArespectively)andparsorbitalisofthe Wealsodefinea‘full’modelwhichisidenticalbuthasanadditional inferiorfrontalgyrus(regionF).Theaimofthestudywastoseehow modulatoryconnectionfromregionPtoA(b —seebelow).Thetwo AP connectionsamongregionsdependedonwhethertheauditoryinput networksareshowninFig.6.Thetwomodelsdifferinonlyasingle was intelligible speech or time-reversed speech. Full details of the connectionandwechosetheseverysimilarmodelstomakemodel experimental paradigm and imaging parameters are available in comparisonaschallengingaspossible. Fig.5.LogBayesfactorofnestedversusfullmodel,LogBn,f,versusthenumberofdatapoints,N,whenthetruemodelisthenestedGLMforFL(black),AIC(blue),BIC(red)andAICc (green).(Forinterpretationofthereferencestocolourinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.) Pleasecitethisarticleas:Penny,W.D.,ComparingDynamicCausalModelsusingAIC,BICandFreeEnergy,NeuroImage(2011),doi:10.1016/ j.neuroimage.2011.07.039 8 W.D.Penny/NeuroImagexxx(2011)xxx–xxx fitted both full and nested models to each simulated data set and estimatedBayesfactorsusingAIC,BICandF. L Fig.7showsresultsfordatadrawnfromthefullmodel.Thefigure plots the log Bayes factors (differences in log model evidence) at variousvaluesofSNR,whereeachpointineachcurvewasaveraged over50simulateddatasets.FortheseDCMsimulations,theaveraging wasimplementedusingthemedianoperator(ratherthanthemean) Fig.6.Anested(left)andfull(right)DCM.ThefullDCMisidenticaltothenestedDCM astheresultsweremorevariablethanfortheGLMcase.Thecurvesin exceptforhavinganadditionalmodulatoryforwardconnectionfromregionPtoregion A.Intrinsicconnectionsareindicatedbydottedarrows,modulatoryconnectionsby Fig. 7 show that only the Free Energy criterion is able to correctly overlaidsolidarrowsandinputsbysolidsquareswithanarrow. identifythefullmodel. Theaboveprocedurewasthenrepeatedbutthistimegenerating Mathematically,neurodynamicsevolveaccordingto data from the nestedmodel. Again, each point in each curve is the medianvalueover50simulateddatasets.Theresultsareshownin 2 3 02 3 2 312 3 z˙ a a a 0 0 0 z Fig.8(notethebroaderrangeofSNRsplotted). 4z˙P5=@4aPP aPF aPA5+u 4b 0 05A4zP5 Theresultsondatafromthenestedmodelareverysimilartothose F FP FF FA int FP F z˙ a a a b 0 0 z fortheGLMcase(compareFigs.3and8).Theresultsfordatafromthe A AP AF AA AP A 2 3 ð34Þ fullmodel,however,arenot(compareFigs.2and7),asAICandBIC 4cP5 are unableto correctly identifythe fullmodel evenat high SNR.In +uaud 0 ordertofindoutwhythisisthecaseweexaminedDCMsfittedtodata 0 at SNR=2, and examined the relative contributions to the model evidence,asdescribedinDecompositionssection. whereu isatrainofauditoryinputspikes,u indicateswhether ForthishighSNRscenariowefound,slightlytooursurprise,that aud int theinputisintelligible(Leffetal.,2008),a denotesthevalueofthe thefullDCMswereonlyslightlymoreaccuratethanthenestedDCMs. AF intrinsicconnectionfromregionFtoA,b andb arethestrengths Unsurprisingly, this increase in accuracy was realised in region A, FP AP ofthetwomodulatoryconnections,andc isthestrengthoftheinput which receives modulatory input in the full but not in the nested P connection.ForthenestedDCMwehaveb =0. model(seeFig.6).However,themainquantitydrivingthedifference AP Wefirstgenerateddatasetsfromthefullmodeloverarangeof inFreeEnergybetweenfullandnestedDCMswasnottheaccuracy SNRsasfollows.TobestreflecttheempiricalfMRIdataallparameters butratherthecomplexity. otherthanthemodulatoryparameterswereheldconstant.Foreach ItturnsoutthatthenestedDCMsareabletoproduceareasonable simulateddatasetthemodulatoryparameterswerefirstdrawnfrom data fit by using a very large value for the intrinsic connection, a AF their prior densities (see Eq. 5). Additionally, the modulatory (fromregionFtoA).Thisconnectionvalue(typically1.5)wasabout5 parameters were then constrained to be positive (by taking the timesbiggerthanthevalueforafullDCM(typically0.3).Thismakes absolutevalue)sothatmodulatoryeffectswouldbefacilitating. sense because, in the nested model, the connection from P to F is Synthetic fMRI data was then generated by integrating the modulatedbyintelligibility,andbyfacilitatingtheintrinsicconnec- neurodynamicandhemodynamicequationsandaddingobservation tionfromFtoAthis‘modulatorysignal’ispassedontoregionA.Since noisetoobtainthetargetSNR.TheSNRwasdefinedinthesameway thismodulationisofanadditivenature,thisthereforecrudelymimics asforthelinearmodels,butwiththesignalstandarddeviation,<σ >, adirectmodulationofthePtoAconnection.However,suchastrong y averaged over the three predicted time series (one for each brain intrinsicconnectionfromFtoAisa-prioriunlikely(thepriorisazero- region).TheobservedfMRIdatahaveavalueofSNR=0.2.Wethen mean Gaussian, with standard deviation σ =0.5). The nested cross Fig.7.LogBayesfactoroffullversusnestedmodel,LogBf,n,versusthesignaltonoiseratio,SNR,whenthetruemodelisthefullDCMforFL(black),AIC(blue)andBIC(red). (Forinterpretationofthereferencestocolourinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.) Pleasecitethisarticleas:Penny,W.D.,ComparingDynamicCausalModelsusingAIC,BICandFreeEnergy,NeuroImage(2011),doi:10.1016/ j.neuroimage.2011.07.039 W.D.Penny/NeuroImagexxx(2011)xxx–xxx 9 Fig.8.LogBayesfactorofnestedversusfullmodel,LogBn,f,versusthesignaltonoiseratio,SNR,whenthetruemodelisthenestedDCMforFL(black),AIC(blue)andBIC(red).(For interpretationofthereferencestocolourinthisfigurelegend,thereaderisreferredtothewebversionofthisarticle.) models are therefore heavily penalised for having such unlikely lackthesensitivitythatisrequired,inthiscase,toinferthatdatawas parametervalues(beingthreestandarddeviationsawayfromtheir drawnfromthefullmodel. prior means). Only the Free Energy criterion is sensitive to such Weemphasisethatthiswillnotalwaysbethecase,andAIC/BIC subtleties because AIC and BIC pay the same penalty for each can in general be sensitive to ‘full model’ effects in DCMs. This is parameter,regardlessofmagnitude. demonstrated,forexample,inourpreviouswork(Pennyetal.,2004). Asmentionedabove,theempiricalSNRforthisdataisSNR=0.2 However, if prior information about parameter values is available whichisverylow.FittingthefullandnestedDCMstothisdatayielded then it should be used, and can be used to good effect in the Free aFreeEnergydifferenceofonly0.11(infavourofthefullDCM).This Energycriterion. differenceisnegligible,andpointstothedifficultyofmodelinference ItmayalsobearguedthatintheapplicationinthispaperAICand for very similar models and at low SNR (as exemplified by Figs. 7 BICareimplicitlyusingpriorinformationinthattheaccuracytermis and8).Inthisregimeitmaybeabetterideatomakeinferencesover computedatthemaximumposteriorvalue.Beingaposteriorestimate families of models (Penny et al., 2010) and to look for consistent thisisnaturallyconstrainedbytheprior.Toavoidthisonewouldhave differencesoveragroupofsubjects(Stephanetal.,2009). to implement a separate Maximum Likelihood optimisation. Given thisfact,itthereforeseemsconsistenttoalsousepriorinformation whenapproximatingtheevidence. Discussion AccordingtoconventionsinBayesianstatistics(KassandRaftery, 1995), and as stated above, models can be considered clearly We have described a simulation study which compared the distinguishable once the log Bayes factor exceeds three. The relativemeritsofAIC,BICand FreeEnergymodelselection criteria. simulation results for both GLMs and DCMs show smaller Bayes Differences in performance were examined in the context of GLMs factors when the true model is nested rather than full. This is and DCMs and we found that the Free Energy has the best model particularlypronouncedforthe(challenginglysimilar)DCMsexam- selection ability and recommended it be used for comparison of inedinthispaperforwhichtheFreeEnergyonlyachievesaLogBayes DCMs. Similar conclusions have been reached in earlier work FactorofthreeatanSNRof10.Insuchacase,modellersandimaging comparing Free Energy with BIC in the context of non-Gaussian neuroscientistsshouldappealtoasecondOccamprinciple(Mackay, autoregressive modelling (Roberts and Penny, 2002) and Hidden 2003),notthenumericaloneembeddedintheequationfortheFree MarkovModelling(ValenteandWellekens,2004). Energy,butaconceptualonethatwhentwomodelscannotbeclearly TheGLMsimulationresultsshowedthat,atlowSNR,AICandBIC distinguishedoneshouldpreferthesimplerone. incorrectlyselectednestedmodelswhendataweregeneratedbyfull In previous work (Penny et al., 2004) we have advocated the models. At higher SNR, however, this bias disappeared and AIC/BIC combineduseofAICandBICcriteriaforthecomparisonofDCMs.This showed increased sensitivity. We also investigated a corrected AIC wasmotivatedbyaconcernabouthowFreeEnergymodelinference criterionbutthisshowednobenefitoverthestandardAICmeasure. dependsonthechosenvaluesofthepriormeansandvariances(see TheDCMsimulationresultsshowedthatonlytheFreeEnergywas earlier section on priors). Specifically, the values σ , σ and σ self cross s able to correctly detect that data had been generated from the full implicitly set the penalty paid for intrinsic, modulatory and input model.BydecomposingtheFreeEnergydifferenceintocontributions parameters(asgovernedbyEq.(21)viatheoverallpriorcovariance fromdifferentregionsandparameters,wefoundthatthisabilitywas matrixCθ.). mainlyduetopenalisingthenestedmodelforhavingaverylarge,and This therefore motivates the future application of an Empirical a-priori unlikely, intrinsic connection from brain region F to A. Bayes(CarlinandLouis,2000)approachwhichwouldestimatethese Because AIC and BIC use the same complexity penalty for every variance parameters from data. This would effectively perform a parameter, and one that is not matched to prior expectations, they search in the continuous space of prior variances instead of the Pleasecitethisarticleas:Penny,W.D.,ComparingDynamicCausalModelsusingAIC,BICandFreeEnergy,NeuroImage(2011),doi:10.1016/ j.neuroimage.2011.07.039 10 W.D.Penny/NeuroImagexxx(2011)xxx–xxx discretespace(e.g.,nestedversusfull)examinedinthispaper.Such TheonlyfreeparameteroftheBOLDsignalmodelis(cid:2),theratioof anapproachcanbeimplementedwithinthenewframeworkofpost- intra-toextra-vascularsignal.Togethertheaboveequationsdescribe hocmodelselection(FristonandPenny,2011). a nonlinear hemodynamic process and BOLD signal model that convertneuronalactivityintheithregion,z,tothefMRIsignal,y. i i Acknowledgments A.2.Priors ThisworkwassupportedbytheWellcomeTrust.WethankGareth Barnes, Guillaume Flandin, Karl Friston, Vladimir Litvak, Klaas Theunknownparametersare{κ,τ,(cid:2)}.Thesearerepresentedas i i Stephan, Maria Rosa and Ged Ridgway for useful feedback on this (cid:2) (cid:3) work. κi=0:64e(cid:2)xp(cid:3)θκi AppendixA.Hemodynamics τi=2exp θτi ð38Þ (cid:2)=expðθ(cid:2)Þ InDCM,neuronalactivitygivesrisetofMRIactivitybyadynamic andwehaveGaussianpriors processdescribedbyanextendedBalloonmodel(Buxtonetal.,2004) (cid:2) (cid:3) (cid:2) (cid:3) and BOLD signal model (Stephan et al., 2007) for each region. This specifies how changes in neuronal activity give rise to changes in p(cid:2)θκi(cid:3)= =N(cid:2)θκi;0;0:135(cid:3) bloodoxygenationthataremeasuredwithfMRI. p θτ = =N θτ;0;0:135 ð39Þ i i The hemodynamic model involves a set of hemodynamic state pð(cid:2)Þ= =Nð(cid:2);0;0:135Þ variables,stateequationsandhemodynamicparameters.Fortheith region,neuronalactivityz(i)causesanincreaseinvasodilatorysignal where h={θκ,θτ,(cid:2)} are the hemodynamic parameters to be s that is subject to autoregulatory feedback. Inflow f responds in estimated. i i i i proportiontothissignalwithconcomitantchangesinbloodvolumev i anddeoxyhemoglobincontentqi. AppendixB.Laplaceapproximation s˙ =zðiÞ−κs−γðf−1Þ In what follows we have simplified notation by dropping the i i i i ˙ dependence on model m. The negative variational free energy f =s i i (henceforth‘FreeEnergy’)isdefinedas τiv˙i=fi−v1i=α ð35Þ (cid:4) (cid:5) τiq˙i=fiEðfρi;ρÞ−vi1=αqvi F=∫∫qðθjyÞqðλjyÞlog qðpθðjyyÞ;qθð;λλÞjyÞ dθdλ ð40Þ i where Outflowisrelatedtovolumef =v1/αthroughGrubb'sexponent out α(Fristonetal.,2003).Theoxygenextractionisafunctionofflow pðy;θ;λÞ=pðyjθ;λÞpðθÞpðλÞ ð41Þ Eðf;ρÞ=1−ð1−ρÞ1=f ð36Þ Wecanrewritethisas F=I+HðθÞ+HðλÞ ð42Þ where ρ is resting oxygen extraction fraction. The free parameters ofthemodelaretherateofsignaldecayineachregion,κi,andthe where transit time in each region, τ. The other parameters are fixed to i γ=α=ρ=0.32. I=∫∫qðθjyÞqðλjyÞUðθ;λÞdθdλ ð43Þ A.1.BOLDsignalmodel andH(x)isthe(differential)entropyofxand The Blood Oxygenation Level Dependent (BOLD) signal is then Uðθ;λÞ=logpðy;θ;λÞ ð44Þ takentobeastaticnonlinearfunctionofvolumeanddeoxyhemoglo- bin that comprises a volume-weighted sum of extra- and intra- ForaGaussiandensityp(x)=N(x;m,S)theentropyis vascularsignals.ThisisbasedonasimplifiedapproachfromStephan etal.(Stephanetal.,2007)(Eq.12)thatimprovesupontheearlier HðxÞ= 1ðklog2πe+logjSjÞ ð45Þ model(Fristonetal.,2003) 2 (cid:4) (cid:6) (cid:7) (cid:5) wherek=dim(S).Hence q y =V k ð1−qÞ+k 1− i +k ð1−vÞ i 0 1 i 2 v 3 i i 1 k1=4:3θ0ρTE ð37Þ F=I+2ðplog2πe+logjSθjÞ ð46Þ kk2==1(cid:2)r−0ρ(cid:2)TE + 12ðhlog2πe+logjSλjÞ 3 where p is the number of parameters and h is the number of whereV0isrestingbloodvolumefraction,θ0isthefrequencyoffsetat hyperparameters.TheVariationalLaplaceapproximationtotheFree the outer surface of the magnetised vessel for fully deoxygenated Energyisthengivenby bloodat1.5T,TEistheechotimeandr istheslopeoftherelation 0 b(Settewpeheannethteal.i,n2tr0a0v7a)s.cIunlatrhirseplaaxpaetriowneruatseetahnedstoaxnydgaerndpsaatruarmateitoenr FL=IL+12ðplog2πe+logjSθjÞ ð47Þ valuesV =4,r =25,θ =40.3andforourfMRIimagingsequencewe 1 haveTE=0 0.04.0 0 +2ðhlog2πe+logjSλjÞ Pleasecitethisarticleas:Penny,W.D.,ComparingDynamicCausalModelsusingAIC,BICandFreeEnergy,NeuroImage(2011),doi:10.1016/ j.neuroimage.2011.07.039
Description: