Table Of Content

The Causal Frame Problem: An Algorithmic Perspective ArdavanS.Nobandegani1,2 IoannisN.Psaromiligkos1 [email protected],[email protected] { } 1DepartmentofElectrical&ComputerEngineering,McGillUniversity 2DepartmentofPsychology,McGillUniversity Abstract intimeandcomputationalresources—wouldnevergettothe secondstep,hadsheadheredtosuchamethodology. Inother The Frame Problem (FP) is a puzzle in philosophy of mind words,inlinewiththenotionofboundedrationality(Simon, andepistemology,articulatedbytheStanfordEncyclopediaof Philosophyasfollows: “Howdoweaccountforourapparent 1957),aboundedly-rationalreasonermusthavetheoption,if abilitytomakedecisionsonthebasisonlyofwhatisrelevant needbe,tomerelyconsultafractionofthepotentiallylarge— 7 to an ongoing situation without having explicitly to consider ifnotinfinitelyso—relevantsubmodel. 1 allthatisnotrelevant?”Inthiswork,wefocusonthecausal 0 variantoftheFP,theCausalFrameProblem(CFP).Assuming RecentworkbyIcardandGoodman(2015)elegantlypro- 2 thatareasoner’smentalcausalmodelcanbe(implicitly)repre- motesthisinsightwhentheywrite:“Somehowthemindmust n sPeontteendtibalyLaecvaeuls(aPlLB)a.yPeLs,nient,ewsseefincrset,ienntrcoodduecsetahenoretiloantivcealpleod- focus in on some “submodel” of the “full” model (includ- a sitionofanodewithrespecttoitsneighborsinacausalBayes ing all possibly relevant variables) that suffices for the task J net. Drawing on the psychological literature on causal judg- athandandisnottoocostlytouse.”1 Theythenaskthefol- 6 ment,wesubstantiatetheclaimthatPLmaybearonhowtime lowingquestion: “whatkindofsimplermodelshouldarea- is encoded in the mind. Using PL, we propose an inference 2 framework,calledthePL-basedInferenceFramework(PLIF), sonerconsultforagiventask?”Thisisaninspiringquestion whichpermitsaboundedly-rationalapproachtotheCFPtobe hintingtoaninterestinglineofinquiryastohowtoformally ] formallyarticulatedatMarr’salgorithmiclevelofanalysis.We I articulate a boundedly-rational approach to the FP at Marr’s A showthatourproposedframework,PLIF,isconsistentwitha widerangeoffindingsincausaljudgmentliterature,andthat algorithmiclevelofanalysis(1982). s. PLandPLIFmakeanumberofpredictions,someofwhichare Inthiswork,wefocusonthecausalvariantoftheFP,the c alreadysupportedbyexistingfindings. CausalFrameProblem(CFP),statedasfollows: Uponbeing [ Keywords: Causal Frame Problem; Time and Causality; presentedwithacausalquery,howdoesthereasonermanage BoundedRationality;AlgorithmicLevelAnalysis 1 toattendtohercausalknowledgerelevanttothederivationof v thequerywhilerightfullydismissingtheirrelevant?Weadopt 0 1 Introduction 0 Causal Bayesian Networks (CBNs) (Pearl, 1988; Gopnik et 1 At the core of any decision-making or reasoning task, real., 2004, inter alia) as a normative model to represent how 8 sides an innocent-looking yet challenging question: Given thereasoner’sinternalcausalmodeloftheworldisstructured 0 an inconceivably large body of knowledge available to the (i.e.,reasoner’smentalmodel). First,weintroducethenotion 1. reasoner, what constitutes the relevant for the task and what of Potential Level (PL). PL, in essence, encodes the relative 0 theirrelevant? Thequestion, asitisposed, echoesthewell- positionofanode(representingapropositionalvariableora 7 knownFrameProblem(FP)inepistemologyandphilosophy concept)withrespecttoitsneighborsinaCBN.Drawingon 1 of mind, articulated by Glymour (1987) as follows: “Given thepsychologicalliteratureoncausaljudgment,wesubstan- : v an enormous amount of stuff, and some task to be done us- tiate the claim that PL may bear on how time is encoded in Xi ingsomeofthestuff,whatistherelevantstuffforthetask?” themind. EquippedwithPL,weembarkoninvestigatingthe Fodor (1987) comments: “The frame problem goes very CFP at Marr’s algorithmic level of analysis. We propose an r a deep;itgoesasdeepastheanalysisofrationality.” inferenceframework,termedPL-basedInferenceFramework Thequestionposedaboveperfectlycaptureswhatisreally (PLIF), which aims at empowering the boundedly-rational atthecoreoftheFP,yet, itmaysuggestanunsatisfyingap- reasonertoconsult(orretrieve2)partsoftheunderlyingCBN proach to the FP at the algorithmic level of analysis (Marr, deemedrelevantforthederivationoftheposedquery(therel- 1982). Indeed, the question may suggest the following two- evant submodel) in a local, bottom-up fashion until the sub- step methodology: In the first step, out of all the body of modelisfullyretrieved.PLIFallowsthereasonertocarryout knowledgeavailabletothereasoner(termed,themodel),she inferenceatintermediatestagesoftheretrievalprocessover has to identify what is relevant to the task (termed, the rele- thethus-farretrievedparts,therebyobtaininglowerandupper vantsubmodel);itisonlythenthatsheadvancestothesecond bounds on the posed causal query. We show, in the Discus- step by performing reasoning or inference on the identified submodel.Thereissomethingfundamentallywrongwiththis 1InaninformativeexampleonHiddenMarkovModels(HMMs), methodology(whichweterm,sequentialapproachtoreason- Icard&Goodman(2015)presentasettingwhereintherelevantsub- ing) which bears on the following understanding: The rele- model is infinitely large—an example which makes it pronounced whatiswrongwiththesequentialapproachstatedearlier. vant submodel, i.e., the portion of the reasoner’s knowledge 2The terms “consult” and “retrieve” will be used interchange- deemed relevant to the task, oftentimes is so enormous (or ably. WeelaborateontherationalebehindthatinSec. 5,wherewe even infinitely large) that the reasoner—inevitably bounded connectourworktoLongTermMemoryandWorkingMemory. sionsection,thatourproposedframework,PLIF,isconsistent Before formally introducing the notion of PL (unavoid- with a wide range of findings in causal judgment literature, ably, with some mathematical jargon), we articulate in sim- andthatPLandPLIFmakeanumberofpredictions,someof ple terms what the idea behind PL is. PL simply induces a whicharealreadysupportedbythefindingsinthepsychology chronologicalorderonthenodesofaCBN,allowingtherea- literature. sonertoencodethetimingbetweencauseandeffect.5 Aswe In their work, Icard and Goodman (2015) articulate a will see, PL plays an important role in guiding the retrieval boundedly-rational approach to the CFP at Marr’s computa- process used in our proposed framework. Next, PL is for- tional level of analysis, which, as they point out, is from a mallydefined,followedbytwoclarifyingexamples. “god’s eye” point of view. In sharp contrast, our proposed Def. 1. (Potential Level (PL)) Let par(x) and child(x) frameworkPLIFisnotfroma“god’seye”pointofviewand denote, respectively, the sets of parents (i.e., immediate hence could be regarded, potentially, as a psychologically causes) and children (i.e., immediate effects) of x. Also let plausible proposal at Marr’s algorithmic level of analysis as T0 R ∞ . The PL of x, denoted by pl(x), is de- tohowthemindbothretrievesand,atthesametime,carries fine∈d as∪fo{l−low}s: (i) If par(x)=∅, pl(x)=T0, and (ii) If out inference over the retrieved submodel to derive bounds par(x) = ∅, pl(x) is a real-valued quantity selected from (cid:54) onacausalquery. Wetermthisconcurrent approachtorea- the interval (maxy par(x)pl(y),minz child(x)pl(z)) such that soning, as opposed to the flawed sequential approach stated pl(x) maxy par(x)∈pl(y)indicatesth∈eamountoftimewhich earlier.3 Theretrievalprocessprogressesinalocal, bottom- elapse−s betwe∈en intervening simultaneously on all the RVs upfashion,hencethesubmodelisretrievedincrementally,in in par(x) (i.e., do(par(x)= parx)) and x taking its value x a nested manner.4 Our analysis (Sec. 4.1) confirms Icard in accord with the distribution P(x parx). If child(x)=∅, andGoodman’sinsight(2015)thatevenintheextremecase substitutetheupperboundofthegiv|enintervalby+∞. (cid:4) of having an infinitely large relevant submodel, the portion Parameter T0 symbolizes the origin of time, as perceived ofwhichthereasonerhastoconsultsoastoobtaina“suffi- by the reasoner. T0 =0 is a natural choice, unless the rea- cientlygood”answertoaquerycouldindeedbeverysmall. soner believes that time continues indefinitely into the past, inwhichcaseT = ∞. Thenexttwoexamplesfurtherclar- 0 − 2 PotentialLevelandTime ifytheideabehindPL.InbothexamplesweassumeT =0. 0 Beforeproceedingfurther,letusintroducesomepreliminary −∞ −∞ notations. Random Variables (RVs) are denoted by lower- casebold-facedletters,e.g.,x,andtheirrealizationsbynon- pl(x)=4 x pl(x)=4 x boldlower-caseletters,e.g.,x. Likewise,setsofRVsarede- p(y)=4.7 y l z notedbyupper-casebold-facedletters,e.g.,X,andtheircor- pl(y)=4.7 y pl(z)=5 responding realizations by upper-case non-bold letters, e.g., p(z)=5 z l p(t)=5.6 X. Val() denotes the set of possible values a random quan- l t · tity can take on. Random quantities are assumed to be dis- + + crete unless stated otherwise. The joint probability distribu- ∞ (a) ∞ (b) tionoverx , ,x isdenotedbyP(x , ,x ). Wewilluse 1 n 1 n thenotationx··· todenotethesequence·o··fnRVsx , ,x , Figure1: RelationbetweenPLandtime: Example. 1:n 1 n hence P(x , ,x )=P(x ). The terms “node” and··“·vari- 1 n 1:n ··· able” will be used interchangeably throughout. To simplify For the first example, let us consider the CBN depicted presentation, we adopt the following notation: We denote in Fig. 1(a) containing the RVs x,y, and z with pl(x) = the probability P(x=x) by P(x) for some RV x and its re- 4,pl(y) = 4.7, and pl(z) = 5. According to Def. 1, the alization x Val(x). For conditional probabilities, we will given PLs can be construed in terms of the relative time be- usethenota∈tionP(xy)insteadofP(x=xy=y). Likewise, tweentheoccurrenceofcauseandeffectasarticulatednext. P(X Y)=P(X=X |Y=Y)forX Val(X|)andY Val(Y). Upon intervening on x (i.e., do(x=x)), after the elapse of A ge|neric condition|al independen∈ce relationship i∈s denoted pl(y) pl(x)=0.7unitsoftime,theRVytakesitsvalueyin by (A BC) where A,B, and C represent three mutually accord−withthedistributionP(yx). Likewise,uponinterven- disjoint⊥⊥sets|of variables belonging to a CBN. Furthermore, ingony(i.e.,do(y=y)),aftert|heelapseof pl(z) pl(y)= throughout the paper, we assume that ε is some negligibly 0.3unitsoftime,ztakesitsvaluezaccordingtoP(−zy). | small positive real-valued quantity. Whenever we subtract ε Forthesecondexample,considertheCBNdepictedinFig. fromaquantity,wesimplyimplyaquantitylessthanbutar- 1(b)containingtheRVsx,y,z,andtwith pl(x)=4,pl(y)= bitrarily close to the original quantity. The rationale behind 4.7,pl(z)=5, and pl(t)=5.6. Upon intervening on x (i.e., adoptingsuchanotationwillbecomeclearerinSec. 4. do(x = x)) the following happens: (i) after the elapse of p(y) p(x)=0.7unitsoftime,ytakesitsvalueyaccording l l 3WeelaboratemoreonthisintheDiscussionsection. toP(y−x),and(ii)aftertheelapseof pl(z) pl(x)=1unitof 4Theterm“nested”impliesthatthethus-farretrievedsubmodel | − issubsumedbyeverylatersubmodel(shouldthereasonerproceeds 5Moreprecisely,PLinducesatopologicalorderonthenodesof withtheretrievalprocess). aCBN,withtemporalinterpretationssuggestedinDef.1. time,ztakesitsvaluezaccordingtoP(zx). Also,uponinter- −∞ | veningsimultaneouslyonRVsy,z(i.e.,do(y=y,z=z)),aftertheelapseof p(t) max p(r)=0.6unitsoftime, ttakesitsvaluet alcco−rdingtro∈pPa(rt(t)y,zl). pl(y) y y y y y | Insum,thenotionofPLbearsontheunderlyingtime-grid pl(t2) t2 t2 t2 t2 upon which a CBN is constructed, and adheres to Hume’s principle of temporal precedence of cause to effect (Hume, pl(t1) t1 t1 t1 t1 t1 1748/1975). A growing body of work in psychology literature corroborates Hume’s centuries-old insight, suggesting pl(x) x x x x x that the timing and temporal order between events strongly influences how humans induce causal structure over them +∞ (a) (b) (c) (d) (e) (Bramley, Gerstenberg, & Lagnado, 2014; Lagnado & Slo- man, 2006). The introduced notion of PL is based on the Figure2: Example. Queryvariablesareshowninorange. following hypothesis: When learning the underlying causal structure of a domain, humans may as well encode the tem- poralpatterns(orsomeestimatesthereof)onwhichtheyrely Starting from the target RV x in the original CBN (Fig. to infer the causal structure. This hypothesis is supported 2(a)) and moving one step backwards,7 t1 is reached (Fig. by recent findings suggesting that people have expectations 2(b)). Since pl(y)< pl(t1), y must be a non-descendant of aboutthedelaylengthbetweencauseandeffect(Greville& t1,andtherefore,ofx. Hence,conditioningont1 d-separates Buehner,2010; Buehner&May,2004; Schlottmann,1999). x from y (Pearl, 1988), yielding (x yt1). Thus P(xy)= rIretelliaasttiiwvveeoraetbhxpsnoeolcuttietnedgtittmihmaeet.wbUeentwdceoeruesnlducchahauavsneeidanentfiedrnpeerfdefetPacLtti,oirnna,tthteherermttsihmaonef i∑isntg1c∈:rVumaclii(ant1lt)1∈PtoV(axln|(yto1,t)te1P)(tPhx(a|ttt11|)tyh≤)e=Pg∑(ivxte|1yn∈)Vb≤alo⊥(⊥utm1n)adPx|s(tx1c∈|taV1na)lP(bt(1et)1P|cy(o)xm|itm1p|)up.tleyId-t whichelapsesbetweentheinterventiononacauseandtheoc- using the information thus-far retrieved, i.e., the informa- currenceofitseffectwouldbemodeledbyaprobabilitydis- tion encoded in the submodel shown in Fig. 2(b). Taking tribution, and PL would be defined in terms of the expected a step backwards from t1, t2 is reached (Fig. 2(c)). Using valueofthatdistribution. Ourproposedframework,PLIF,is a similar line of reasoning to the one presented for t1, hav- indifferentastowhetherPLshouldbeconstruedintermsof ing pl(y)< pl(t2) ensures (x yt2). Therefore, the fol- ⊥⊥ | absoluteorexpectedtime.GrevilleandBuehner(2010)show lowing bounds on the posed query can be derived, which, that causal relations with fixed temporal intervals are con- crucially,canbecomputedusingtheinformationthus-farre- steismtepnotrlayljiundtegrevdalass.sTtrhoinsgfienrdcionmg,ptahreerdeftooreth,oseseemwsitthovsaurgiagbelset Itrtiiesvsetdr:aimghintfto2∈rwVaal(rtd2)tPo(sxh|to2w)≤thaPt(txh|ey)b≤oumndasxdt2e∈rViavle(td2)iPn(txe|rtm2)s. that people expect, to a greater extent, fixed temporal inter- oft2 aretighterthantheboundsderivedintermsoft1.8 Fi- valsbetweencauseandeffect,ratherthanvariableones—an nally, taking one step backward from t2, y is reached (Fig. interpretation which, at least to a first approximation, favors 2(d))andtheexactvalueforP(xy)canbederived,againus- | construingPLintermsofrelativeabsolutetime(seeDef.1).6 ingonlythesubmodelthus-farretrieved(Fig. 2(d)). Wearenowwell-positionedtopresentourproposedframe- 3 InformativeExample work,PLIF. Todevelopourintuition,andbeforeformallyarticulatingour 4 PL-basedInferenceFramework(PLIF) proposedframework, letuspresentasimpleyetinformative example which demonstrates: (i) how the retrieval process In this section, we intend to elaborate on how, equipped canbecarriedoutinalocal,bottom-upfashion,allowingfor with the notion of PL, a generic causal query of the form9 retrieving the relevant submodel incrementally, and (ii) how P(O=OE=E)canbederivedwhereOandEdenote, re- | adoptingPLallowsthereasonertoobtainboundsonagiven spectively, the disjoint sets of target (or objective) and ob- causalqueryatintermediatestagesoftheretrievalprocess. served (or evidence) variables. In other words, we intend to LetusassumethattheposedcausalqueryisP(xy)where formalize how inference over a CBN whose nodes are en- | x,y are two RVs in the CBN depicted in Fig. 2(a) with PLs p(x),p(y),andlet p(x)> p(y). Therelevantinformation 7Takingonestepbackwardsfromvariableqqqamountstoretriev- l l l l ingalltheparentsofqqq. for the derivation of the posed query (i.e., the relevant sub- 8Here we are implicitly making the assumption that the CPDs model)isdepictedinFig. 2(e). involved in the parameterization of the underlying CBN are non- degenerate. Dropping this assumption yields the following result: 6Therearecases,however,that,despitetheprecedenceofcause Theboundsderivedintermsoft2 areequally-tightortighterthan toeffect,quantifyingtheamountoftimebetweentheiroccurrences theboundsderivedintermsoft1. may bear no meaning, e.g., when dealing with hypothetical con- 9We do not consider interventions in this work. However, structs. In such cases, PL should be simply construed as a topo- with some modifications, the presented analysis/results can be ex- logicalordering. Fromapurelycomputationalperspective,PLisa tendedtohandleagenericcausalqueryoftheformP(O=OE= generalizationoftopologicalsortingincomputerscience. E,do(Z=Z))whereZdenotesthesetofintervenedvariables.| dowedwithPLasanattributeshouldbecarriedout. Before submodel (i.e., the submodel obtained during the identifica- wepresentthemainresult,afewdefinitionsareinorder. tionofR ). Thefollowingremarkbearsonthat. T Def. 2. (CriticalPotentialLevel(CPL))Thetargetvari- Remark 1. If for IT T, R satisfies either: (i) R E, T T ⊆ able with the least PL is denoted by o∗ and its PL is re- or (ii) for all r RT, pl(r)=T0, and mine Epl(e)>T, or ferredtoastheCPL.Moreformally, p∗l :(cid:44)mino Opl(o)and (iii)theloweran∈dupperboundgivenin(1)a∈reidentical,then o :(cid:44)argmin p(o).E.g.,forthesettinggiven∈inFig.2(a), the exact value of the posed query can be derived using the ∗ o O l o =x,and p∈= p(x). Viewedthroughthelensoftime,o submodelretrievedintheprocessofobtainingR . Fig. 2(d) ∗ ∗l l ∗ T isthefurthesttargetvariableintothepast,withPL p . showsasettingwhereinconditions(i)and(iii)arebothmet. ∗l There are two possibilities: (a) p∗l >T0, or (b) p∗l =T0, TherationalebehindRemark1isprovidedintheSupple- withT0 denotingtheoriginoftime;cf. Sec. 2. Inthesequel mentaryInformation. weassumethat(a)holds. Foradiscussiononthespecialcase (b),thereaderisreferredtotheSupplementaryInformation. 4.1 CaseStudy Def. 3. (InferenceThreshold(IT)andITRootSet(IT- Next, we intend to cast the Hidden Markov Model (HMM) RS)) To any real-valued quantity, T, corresponds a unique studied in (Icard & Goodman, 2015, p. 2) into our frame- set,R ,obtainedasfollows:Startateveryvariablex O E T ∈ ∪ work. ThesettingisshowninFig. 3(left). Weadheretothe with PL T and backtrack along all paths terminating at ≥ x. Backtracking along each path stops as soon as a node with PL less than T is encountered. Such nodes, together, −∞ 0.9 compose the set RT. It follows from the definition that: xt-3 pl(xt−3)=−5 0.8 mferaexnt∈cReTTphlr(ets)h<oldT(.ITT)aannddthReTITarReoteortmSeedt,(IrTe-sRpeSc)tfivoerlTy,.In- yt-3 xt-2 pl(xt−2)=−4 )t:100..67 pTinicsF=tteeoadrpdeli(xnotaf1,Fm)is−pgalsyeε.,,Tta2hn(=edb-sTdpe)lt(=oxafr)epv,la(trεthi2,ae)bw−lIeeTsε-cR,coriSureslcsdlpefehodcartaviTtveetshl=yae.idspN:tla(ofgxotee)rs−tadhneaεy-t, yytt--12 xxtt-1 ppll((xxtt−)1=)=−2−3 yx(P!1+tj000...345 − lTibe∈ra(tpesl(ut1s)f,rpolm(xh))a.viHngowtoeevxepr,reesxsptrheesmsinignIteTrsmisnotferinmtesrovfalεs yt xt+1 pl(xt+1)=−1 00..12 thereby simplifying the exposition in the sole hope that the +∞ -1-0 -2-0 -3-0 -4-0 -5-0 -6-0 -7-0 -8-0 -9-0 -10-0 reader finds it easier to follow the work. We would like to T emphasizethattheadoptednotationshouldnotbeconstrued Figure 3: Left: The infinite-sized HMM discussed in (Icard asimplyingthattheassignmentofvaluestoITsissuchasen- & Goodman, 2015) with parameterization: P(x x) = t+1 t sitive task that everything would have collapsed, had IT not P(x¯ x¯)=0.9, and P(y x)=P(y¯ x¯)=0.8. Right|: Ap- t+1 t t t t t beenchoseninsuchafine-tunedmanner. Torecap,insimple plying|PLIF on the HMM|shown in l|eft. Vertical and hori- terms, T bears on how far into the past a reasoner is con- zontalaxesdenote,respectively,thevalueoftheposedquery sultinghermentalmodelintheprocessofansweringaquery, P(x y ) and the adopted IT T. The vertical bars de- t+1 ∞:t and RT characterizes the furthest-into-the-past concepts en- pict the| i−ntervals within which the query lies due to Lemma tertainedbythereasonerinthatprocess. 1. The dotted curves—which connect the lower and upper Next,weformallypresentthemainideabehindPLIF,fol- boundsoftheintervals—showhowtheintervalsshrinkasIT lowedbyitsinterpretationinsimpleterms. T decreases. Lemma1. ForanychosenITT < p anditscorrespond- ∗l ingR ,defineS:(cid:44)R E. Thenthefollowingholds: sameparametrizationandqueryadoptedtherein. AllRVsin T T \ this section are binary, taking on values from the set 0,1 ; min P(OS,E) P(OE) max P(OS,E). (1) { } S Val(S) | ≤ | ≤S Val(S) | x=x indicates the event wherein x takes the value 1, and ∈ ∈ x = x¯ implies the event wherein x takes the value 0. We Crucially, the provided bounds can be computed using the assume p(x )=i 2.10 We should note that the assign- l t+i information encoded in the submodel retrieved in the very ment of the PLs for−the variables in y +∞ does not af- processofobtainingtheRT. (cid:3) fectthepresentedresultsinanyway. T{het−qi}uie=r0yofinterestis For a formal proof of Lemma 1, the reader is referred to P(x y ). Noticethatafterperformingthreestepsofthe t+1 ∞:t the Supplementary Information. Mathematical jargon aside, sortdis|cu−ssedintheexamplepresentedinSec. 3(fortheIT the message of Lemma 1 is quite simple: For any chosen T = 3 ε),thelowerboundontheposedqueryexceeds0.5 inferencethresholdT whichisfurtherintothepastthano , − − ∗ (shown by the red dashed line in Fig. 3(right)). This obser- Lemma 1 ensures that the reasoner can condition on S and vationhasthefollowingintriguingimplication. Assume,for obtainthereportedlowerandupperboundsonthequeryby usingonlytheinformationencodedintheretrievedsubmodel. 10Notethatthetrendoftheupper-andlower-boundcurveaswell It is natural to ask under what conditions the exact value asthesizeoftheintervalsshowninFig.3(right)areinsensitivewith totheposedquerycanbederivedusingthethus-farretrieved regardtothechoiceofPLsforvariables{xt−i}+i=∞−1. thesakeofargument,thatwewerepresentedwiththefollow- proachesstartwiththefullknowledgeoftheunderlyingCBN ingMaximumA-Posterior(MAP)inferenceproblem: Upon and,dependingontheposedquery,graduallyprunetheirrel- observingallthevariablesin y +∞ takingonthevalue1, evantpartsoftheCBN.Inthisrespect,top-downapproaches whatwouldbethemostlikely{stta−tie}if=o0rthevariablex ? In- areinevitablyfrom“god’seye”pointofview—acharacteris- t+1 terestingly, we would be able to answer this MAP inference ticwhichunderminestheircognitive-plausibility. Bottom-up problem simply after three backward moves (corresponding approaches,ontheotherhand,startatthevariablesinvolved totheITT = 3 ε). InFig. 3(right), theintervalswithin in the posed query and move backwards till the boundaries − − whichtheposedqueryfalls(duetoLemma1)intermsofthe of the underlying CBN are finally reached, only then they adoptedITT aredepicted. starttoprunethepartsoftheconstructedsubmodel—ifany— OuranalysisconfirmsIcardandGoodman’sinsight(2015) whichcanbesafelyremovedwithoutjeopardizingtheexact thatevenintheextremecaseofhavinginfinite-sizedrelevant computation of the posed query. It is important to note that submodel(Fig. 3(left)),theportionofwhichthereasonerhas bottom-upapproachescannotstopatintermediatestepsdur- toconsultsoastoobtaina“sufficientlygood”answertothe ingthebackwardmoveandruninferenceonthethus-farcon- posedquerycouldhappentobeverysmall(Fig. 3(right)). structedsubmodelwithoutrunningtheriskofcompromising someofthe(in)dependencerelationsstructurallyencodedin 5 Discussion theCBN,whichwouldyielderroneousinferences. Thisob- Toourknowledge,PLIFisthefirstinferenceframeworkpro- servation is due to the fact that there exists no local signal posedthatcapitalizesontimetoconstrainthescopeofcausal revealinghowthethus-farretrievednodesarepositionedrel- reasoningoverCBNs,wherethetermscopereferstothepor- ativetoeachotherandtotheto-be-retrievednodes—ashort- tionof aCBN onwhichinference iscarried out. PLIFdoes comingcircumventedinthecaseofPLIFbyintroducingPL. not restrict itself to any particular inference scheme. The Another pitfall shared by both top-down and bottom-up ap- claimofPLIFisthatinferenceshouldbeconfinedwithinand proachesistheirsequentialmethodologytowardsthetaskof carriedoutoverretrievedsubmodelsofthekindsuggestedby inference, according to which the relevant submodel for the Lemma1soastoobtainthereportedboundstherein. Inthis posedqueryshouldbefirstconstructed,andonlytheninfer- light,PLIFcanaccommodateallsortsofinferenceschemes, enceiscarriedouttocomputetheposedquery.13 Onthecon- including Belief Propagation (BP), and sample-based infer- trary,PLIFsubmitstowhatwecalltheconcurrent approach ence methods using Markov Chain Monte Carlo (MCMC), to reasoning, whereby retrieval and inference take place in as two prominent classes of inference schemes proposed in tandem. TheHMMexampleanalyzedinSec. 4.1,showsthe the literature.11 For example, to cast BP into PLIF amounts efficacyoftheconcurrentapproach. torestrictingBP’smessage-passingwithinsubmodelsofthe Work on causal judgment provides support for the so- kind suggested by Lemma 1. In other words, assuming that called alternative neglect, according to which subjects tend BPistobeadoptedastheinferencescheme,uponbeingpre- toneglectalternativecausestoamuchgreaterextentinpre- sentedwithacausalquery,anITaccordingtoLemma1will dictive reasoning than in diagnostic reasoning (Fernbach & beselected—atthemeta-level—bythereasonerandthecor- Rehder, 2013; Fernbach, Darlow, & Sloman, 2011). Alter- respondingsubmodel,assuggestedbyLemma1,willbere- native neglect, therefore, implies that subjects would tend trieved, over which inference will be carried out using BP. to ignore parts of the relevant submodel while constructing This will lead to obtaining lower and upper bounds on the it. Recent findings, however, seem to cast doubt on alterna- query,asreportedinLemma1. Iftimepermits,thereasoner tive neglect (Cummins, 2014; Meder, Mayrhofer, & Wald- buildsupincrementallyonthethus-farretrievedsubmodelso mann, 2014). Meder et al. (2014), Experiment 1 demon- astoobtaintighterboundsonthequery.12 MCMC-basedin- strates that subjects appropriately take into account alterna- ferencemethodscanbecast,inasimilarfashion,intoPLIF. tive causes in predictive reasoning. Also, Cummins (2014) TheproblemofwhatpartsofaCBNarerelevantandwhat substantiatesatwo-partexplanationofalternativeneglectac- areirrelevantforagivenquery,accordingto(Geiger,Verma, cording to which: (i) subjects interpret predictive queries as &Pearl,1989),wasfirstaddressedbyShachter(1988). The requests to estimate the probability of the effect when only approaches proposed for identifying the relevant submodel thefocalcauseispresent,aninterpretationwhichrendersal- for a given query fall into two broad categories (cf. (Ma- ternativecausesirrelevant,and(ii)theinfluenceofinhabitory honey&Laskey,1998)andreferencestherein): (i)top-down causes (i.e., disablers) on predictive judgment is underesti- approaches, and (ii) bottom-up approaches. Top-down ap- mated, and this underestimation is incorrectly interpreted as neglecting of alternative causes. Cummins (2014), Experi- 11MCMC-basedmethodshavebeensuccessfulinsimulatingim- ment 2 shows that when predictive inference is queried in a portantaspectsofawiderangeofcognitivephenomena,andgiving accountsformanycognitivebiases;cf. (Sanborn&Chater,2016). mannerthatmoreaccuratelyexpressesthemeaningofnoisy- Also, work in theoretical neuroscience has suggested mechanisms forhowBPandMCMC-basedmethodscouldberealizedinneural 13The computation can be carried out to obtain either the exact circuits;cf.(Gershman&Beck,2016;Lochmann&Deneve,2011). valueorsimplyanapproximationtothequery. Nonetheless, what 12Theverypropertythatthesubmodelgetsconstructedincremen- bothtop-downandbottom-upapproachesagreeonisthattherele- tallyinanestedfashionguaranteesthattheobtainedlowerandupper vantsubmodelistobefirstidentified,shouldthereasonerintendto boundsgettighterasthereasoneradoptssmallerITs;cf.Fig.3(left). computeexactlyorapproximatelytheposedquery. ORBayesnet(i.e.,thenormativemodeladoptedbyFernbach apredictiveoradiagnosticquery(i.e.,P(ec)andP(ce),re- | | etal.(2011))likelihoodestimatesapproachednormativeesti- spectively), subjectsshouldnotretrieveanyoftheeffectsof mates. Cummins(2014),Experiment4showsthattheimpact e. Introspectively,thispredictionseemsplausible,andcanbe of disablers on predictive judgments is far greater than that tested,usingasimilarapproachto(Cummins,2014;DeNeys, ofalternativecauses,whilehavinglittleimpactondiagnostic Schaeken,&d’Ydewalle,2003),byaskingsubjectsto“think judgments. PLIFcommitstotheretrievalofenablersaswell aloud”whileengaginginpredictiveordiagnosticreasoning. asdisablers. Asmentionedearlier,PLIFabstractsawayfrom Also, PL yields the following prediction: Upon intervening theinferencealgorithmoperatingontheretrievedsubmodel, oncausec,subjectsshouldbesensitivetowheneffectewill and,hence,leavesittotheinferencealgorithmtodecidehow occur, even in settings where they are not particularly in- the retrieved enablers and disablers should be integrated. In structedtoattendtosuchtemporalpatterns.Thispredictionis thislight,PLIFisconsistentwiththeresultsofExperiment4. supported by recent findings suggesting that people do have expectationsaboutthedelaylengthbetweencauseandeffect In an attempt to explain violations of screening-off re- (Greville&Buehner,2010;Buehner&May,2004). ported in the literature, Park and Sloman (2013) find strong There is a growing acknowledgment in the literature that, supportforthecontradictionhypothesisfollowedbytheme- notonlytimeandcausalityareintimatelylinked,butthatthey diatingmechanismhypothesis,andfinallyconcludethatpeo- mutuallyconstraineachotherinhumancognition(Buehner, pledoconformtoscreening-offoncethecausalstructurethey 2014). In line with this view, we see our work also as an areusingiscorrectlyspecified. PLIFisconsistentwiththese attempttoformallyarticulatehowtimecouldguideandcon- findings,asitadherestotheassumptionthatreasonerscarry strain causal reasoning in cognition. While many questions out inference on their internal causal model (including all remain open, we hope to have made some progress towards possible mediating variables and disablers), not the poten- betterunderstandingoftheCFPatthealgorithmiclevel. tially incomplete one presented in the cover story; see also (Rehder&Waldmann,2015;Sloman&Lagnado,2015). Acknowledgments Experiment 5 in (Cummins, 2014), consistent with WearegratefultoThomasIcardforvaluablediscussions.We (Fernbach&Rehder,2013),showsthatcausaljudgmentsare wouldalsoliketothankMarcelMontreyandPeterHelferfor stronglyinfluencedbymemoryretrieval/activationprocesses, helpfulcommentsonanearlierdraftofthiswork. Thiswork and that both number of disablers and order of disabler re- wassupportedinpartbytheNaturalSciencesandEngineer- trieval matter in causal judgments. These findings suggest ingResearchCouncilofCanadaundergrantRGPIN262017. that the CFP and memory retrieval/activation are intimately linked. In that light, next, we intend to elaborate on the ra- References tionale behind adopting the term “retrieve” and using it in- Baddeley,A. (2003). Workingmemory: lookingbackandlooking terchangeablywiththeterm“consult”throughoutthepaper; forward. NatureReviewNeuroscience,4(10),829–839. this is where we relate PLIF to the concepts of Long Term Baddeley,A.D.,&Hitch,G.(1974).Workingmemory.Psychology Memory(LTM)andWorkingMemory(WM)inpsychology ofLearningandMotivation,8,47–89. Bramley, N. R., Gerstenberg, T., & Lagnado, D. A. (2014). The andneurophysiology. Next,weelaborateonhowPLIFcould orderofthings:Inferringcausalstructurefromtemporalpatterns. beinterpretedthroughthelensesoftwoinfluentialmodelsof InProceedingsofthe36thannualconferenceofthecognitivesci- WM,namely,BaddeleyandHitch’s(1974)Multi-component encesociety(pp.236–241). Buehner,M.J. (2014). Timeandcausality: editorial. Frontiersin model of WM (M-WM) and Ericsson and Kintsch’s Long- Psychology,5,228. termWorkingMemory(LTWM)model(1995). TheM-WM Buehner, M. J., & May, J. (2004). Abolishing the effect of rein- postulates that “long-term information is downloaded into forcementdelayonhumancausallearning. QuarterlyJournalof ExperimentalPsychologySectionB,57(2),179–191. a separate temporary store, rather than simply activated in Cummins,D.D. (2014). Theimpactofdisablersonpredictivein- LTM”,amechanismwhichpermitsWMto“manipulateand ference.JournalofExperimentalPsychology:Learning,Memory, createnewrepresentations,ratherthansimplyactivatingold andCognition,40(6),1638. DeNeys, W., Schaeken, W.,&d’Ydewalle, G. (2003). Inference memories”(Baddeley, 2003). InterpretingPLIFthroughthe suppressionandsemanticmemoryretrieval:Everycounterexam- lens of the M-WM model amounts to the value for IT be- plecounts. Memory&Cognition,31(4),581–595. ing chosen (and, if time permits, updated so as to obtain Ericsson,K.A.,&Kintsch,W.(1995).Long-termworkingmemory. PsychologicalReview,102(2),211. tighterbounds)bythecentralexecutiveintheM-WMandthe Fernbach,P.M.,Darlow,A.,&Sloman,S.A.(2011).Asymmetries submodelbeingincrementally“retrieved”fromLTMintoM- inpredictiveanddiagnosticreasoning. JournalofExperimental WM’sepisodicbuffer. InterpretingPLIFthroughthelensof Psychology:General,140(2),168–185. Fernbach,P.M.,&Rehder,B.(2013).Cognitiveshortcutsincausal theLTWMmodelamountstohavingnoretrievalfromLTM inference. Argument&Computation,4(1),64–88. into WM and the submodel suggested by Lemma 1 being Fodor,J.A.(1987).Modules,frames,fridgeons,sleepingdogs,and merely “activated in LTM” and, in that sense, being simply themusicofthespheres. Geiger,D.,Verma,T.,&Pearl,J. (1989). d-separation: fromthe- “consulted”inLTM.Insum,PLIFiscompatiblewithbothof oremstoalgorithms. FifthWorkshoponUncertaintyinArtificial thenarrativesprovidedbytheM-WMandLTWMmodels. Intelligence,pp.118–125. Gershman,S.J.,&Beck,J.M.(2016).Complexprobabilisticinfer- AnumberofpredictionsfollowfromPLandPLIF.Forin- ence: Fromcognitiontoneuralcomputation. InComputational stance,PLIFmakesthefollowingprediction: Promptedwith ModelsofBrainandBehavior,edA.Moustafa. GlyTmheoRuro,bCo.t’(s1D98il7e)m.mAan:drTohideFepraismteemPorloobglyemanidntAheI.fprapm.6e5p–r7o5b.lem, a≥reTn,eaitnhderfiinnaElly,nEo−TridneEno+te(si.eth.,eEset:(cid:44)ofEvari(aEblesEin+E)).wNhoicthe Gopnik,A.,Glymour,C.,Sobel,D.M.,Schulz,L.E.,Kushnir,T.,& T T −T \ T∪ T Danks,D. (2004). Atheoryofcausallearninginchildren:causal that, byconstruction, thePLsofthevariablesinE−T areless mapsandbayesnets. PsychologicalReview,111(1),3-32. than the adopted IT T, hence the adopted notation. For ex- Greville,W.J.,&Buehner,M.J. (2010). Temporalpredictability ample,forthesettingdepictedinFig. 2(b)(correspondingto facilitatescausallearning. JournalofExperimentalPsychology: General,139(4),756. theITT =pl(x)−ε),ET =∅,E+T =∅,andE−T ={y}.Also, Hume,D. (1748/1975). Aninquiryconcerninghumanunderstand- forthesettingdepictedinFig. 2(d)(correspondingtotheIT ing. OxfordUniversityPress. T = p(t ) ε),E = y ,E+=∅,andE =∅. Next,we Icard, T. F., & Goodman, N. D. (2015). A resource-rational ap- l 2 − T { } T −T proach to the causal frame problem. Proc. of the 37th Annual presentakeyresultasalemma. MeetingoftheCognitiveScienceSociety. Lemma S.1. Let P(OE) denote the posed causal query. Lagnado, D. A., & Sloman, S. A. (2006). Time as a guide to ForanychosenITT < p| anditscorrespondingIT-RSR , cause. JournalofExperimentalPsychology: Learning,Memory, ∗l T andCognition,32(3),451–60. thefollowingconditionalindependencerelationholds: Lochmann, T.,&Deneve, S. (2011). Neuralprocessingascausal inference. CurrentOpinioninNeurobiology,21(5),774–781. (O E R E+). (S2) Mahoney, S. M., & Laskey, K. B. (1998). Constructing situation ⊥⊥ −T| T ∪ T specificbeliefnetworks. Proc.ofthe14thconferenceonUncer- Proof. The relations between the PLs of the variables in- taintyinArtificialIntelligence,pp.370–378. Marr,D. (1982). Vision:acomputationalapproach. volved in the statement (S2) ensures that, according to d- Meder, B., Mayrhofer, R.,&Waldmann, M.R. (2014). Structure separation criterion (Pearl, 1988), conditioning on the vari- inductionindiagnosticcausalreasoning. PsychologicalReview, ables in R E+ blocks all the paths between the variables 121(3),277. T ∪ T Park, J., & Sloman, S. A. (2013). Mechanistic beliefs determine inOandE ,hencefollows(S2). −T adherencetothemarkovpropertyincausalreasoning. Cognitive Thefollowingtwo-partargumentrespondstothequestion Psychology,67(4),186–216. Pearl,J. (1988). Probabilisticreasoninginintelligentsystems:net- posedin(Q.1)intheaffirmative. First,noticethat: worksofplausibleinference. MorganKaufmann. Rehder,B.,&Waldmann,M.R.(2015).Failuresofexplainingaway P(OS,E)= P(OS,E ,E ,E+) andscreeningoffindescribedversusexperiencedcausallearning | | T T− T Sanscbeonranr,iAos..NS.u,b&mCittheadtefro,rNp.u(b2l0ic1a6t)io.nB.ayesianbrainswithoutprob- = P(O|RT,ET−,ET+) abilities. TrendsinCognitiveSciences,20(12),883–893. (=S2) P(OR ,E+). (S3) Schlottmann, A. (1999). Seeing it happen and knowing how it | T T works: howchildrenunderstandtherelationbetweenperceptual causalityandunderlyingmechanism. DevPsychol,35(1),303. Second, note that the process of obtaining RT, namely, Shachter, R.D. (1988). Probabilisticinferenceandinfluencedia- moving backwards from the variables in O E+ until R grams. OperationsResearch,36(4),589–604. is reached, ensures that the submodel retriev∪ed Tin this proT- Simon,H.A. (1957). Modelsofman. Wiley. Sloman,S.A.,&Lagnado,D.(2015).Causalityinthought.Annual cess suffices for the derivations of P(O|RT,ET+). Using the ReviewofPsychology,66,223–247. approach introduced in (Geiger, Verma, & Pearl, 1989) for identifying the relevant information for the derivation of a Supplementary Information query in a Bayesian network, this follows from the following fact: Conditioned on R E+, the set O is d-separated S-I ProofofLemma1: T ∪ T fromallthenodesinthesetAn(O E) R whosePLsare T Simpleuseofthetotalprobabilitylemmayields: lessthantheadoptedITT. Noteth∪atAn\(O E)denotesthe ∪ ancesteral graph for the nodes in O E. This completes the P(OE)= ∑ P(OS,E)P(SE). (S1) proof. ∪ (cid:4) | | | S Val(S) ∈ S-II TheRationalebehindRemark1: Equation(S1)immediatelyrevealsasimplefact,namely,that P(OE) is a linear combination of the members of the set Case (i) and Case (iii) immediately follow from Lemma 1 P(O| S,E) ,anobservationwhichgrantsthevalidity in the main text. Case (ii) implies that all the ancestors of {ofthe|expre}sSs∈iVoanl(gSi)venin(1)inthemaintext. variablesinO∪Eareretrieved, hencethesufficiencyofthe retrievedsubmodelfortheexactderivationofthequery; see The key point which is left to be shown is the following: alsoSec. S-III. (Q.1) Why can the bounds given in (1) be computed using the submodel retrieved in the process of obtaining the cor- S-III OntheSpecialCaseofHaving p =T : ∗l 0 responding R for the adopted IT T < p ? This is where T ∗l In such circumstances, to derive P(OE), the set of all the thenotion ofPL comesinto play. Toarticulate theintended | ancestors of variables in O E should be retrieved and then line of reasoning let us introduce some notations first. Ac- ∪ inferenceshouldbecarriedoutontheretrievedsubmodel. cording to Def. 3, any chosen IT T induces an IT-RS R . T LetuspartitionthesetofevidencevariablesEintothreemu- tually disjoint sets E+,E , and E , where E denotes the T T −T T set of variables in E which belong to the IT-RS R (i.e., T E :(cid:44)E R ),E+denotesthesetofvariablesinEwithPLs T ∩ T T