Table Of ContentCausally Appropriate Graphical Modelling
for Time Series with applications to
Economics, Ecology and Environmental
Science
A thesis submitted in partial fulfilment of the requirements for
the Degree of Master of Science in Mathematics and
Statistics in the University of Canterbury
By Carla Meurk
2005
Contents
1 Introduction 6
2 IntroductiontoTimeSeriesandGraphicalModelling 8
2.1 TimeSeries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 StochasticProcesses,StationarityandErgodicity . . . . . . . . . 9
2.1.2 AUnivariateExample: AR(p)Models . . . . . . . . . . . . . . . 12
2.1.3 Finding p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.4 Finding α: LikelihoodEstimation . . . . . . . . . . . . . . . . . 14
2.1.5 TheMultivariateCase: VectorAutoregressiveModels . . . . . . 16
2.1.6 ModellingVARProcesses . . . . . . . . . . . . . . . . . . . . . 18
2.2 GraphicalModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.1 Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 ConditionalIndependenceGraphs . . . . . . . . . . . . . . . . . 20
2.2.3 TurningCIGsintoDAGs . . . . . . . . . . . . . . . . . . . . . . 21
2.2.4 Moralisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 ModelAssessment,AIC,HICandSIC . . . . . . . . . . . . . . . . . . . 23
3 CausalityasaStatisticalConcern 26
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.1 DoesEveryStatisticianNeedCausation? . . . . . . . . . . . . . 26
3.2 TheMistakeofDe(cid:2)ningCausation . . . . . . . . . . . . . . . . . . . . . 27
3.2.1 DifferentJobDescriptionsforDifferentJobs . . . . . . . . . . . 27
3.2.2 MetaphysicsandCounterfactuals . . . . . . . . . . . . . . . . . 28
3.2.3 CounterfactualAccountsofCausation . . . . . . . . . . . . . . . 28
3.2.4 GrangerCausality . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.5 WhyG-CausalityisCausal . . . . . . . . . . . . . . . . . . . . . 32
3.3 CausalModellingofMultivariateTimeSeries . . . . . . . . . . . . . . . 32
2
3.3.1 ContemporaneousCausality . . . . . . . . . . . . . . . . . . . . 33
3.4 ConditionalIndependenceandCausalSensitivity . . . . . . . . . . . . . 33
3.5 TheGMTSapproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Step1: ConditionalIndependenceGraph . . . . . . . . . . . . . . . . . . 34
3.7 Step2: CreatingAlternativeModels . . . . . . . . . . . . . . . . . . . . 36
3.8 Step3: RegressionandModelSelection . . . . . . . . . . . . . . . . . . 37
3.9 IsGMTSCausallyAppropriate? . . . . . . . . . . . . . . . . . . . . . . 38
3.10 OtherAdvantagesofGMTS . . . . . . . . . . . . . . . . . . . . . . . . 38
4 CaseStudyI:Economics 40
4.1 TheData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Step1: DerivingtheCIG . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Step2: CreationofAlternativeDAGs . . . . . . . . . . . . . . . . . . . 46
4.3.1 Step2(a): FindingPossibleContemporaneousModels . . . . . . 46
4.3.2 Step2(b): AddingEdgeswithDe(cid:2)niteLaggedVariables . . . . . 48
4.3.3 Step2(c): Identi(cid:2)cationofPossibleMoralEdges . . . . . . . . . 50
4.3.4 AlternativeModels . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Step3: RegressionandBestModelSelection . . . . . . . . . . . . . . . 54
4.5 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5 CaseStudyII:HydrologicalData 59
5.1 DataDescription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 PreliminaryDataAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.1 AnalysisbyPlotting . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.2 AnalysisofCorrelation . . . . . . . . . . . . . . . . . . . . . . . 63
5.2.3 SelectionofVariables . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 ModellingtheDatawithGMTS . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1 SpecifyingtheOrderoftheVAR . . . . . . . . . . . . . . . . . . 65
3
5.4 ResultsforDataSpaced30Minutely . . . . . . . . . . . . . . . . . . . . 66
5.4.1 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5 AggregatedModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5.1 DataAggregation . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.5.2 BestModelwhenDataAggregatedHourly . . . . . . . . . . . . 69
5.5.3 BestModelForDataAggregated2hourly . . . . . . . . . . . . . 70
5.6 ModelsforDifferentSeasons . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6.1 Autumn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.6.2 Spring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.6.3 ModelComparisonBetweenSeasons . . . . . . . . . . . . . . . 73
5.7 CausalRelevanceofTheseModels . . . . . . . . . . . . . . . . . . . . . 74
5.7.1 MoralisationinPractise . . . . . . . . . . . . . . . . . . . . . . 74
5.7.2 TheContemporaneousBoundary . . . . . . . . . . . . . . . . . 76
5.8 ASINfulAlternative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.9 ConclusionandProblemsforFutureResearch . . . . . . . . . . . . . . . 79
6 CaseStudyIII:EcologicalData 81
6.1 TheCausalQuestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.2 AbouttheData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.3 TheProblemofEcologicalData . . . . . . . . . . . . . . . . . . . . . . 83
6.4 SoWhatCanWeDo? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.5 ModellingwithGMTS . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.6 APossibleModi(cid:2)cationtotheGMTSapproach . . . . . . . . . . . . . . 86
6.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7 Summary 95
A ListofMATLABCodeUsed 97
4
A.1 cigts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
A.2 DAG(cid:2)t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.3 CaseStudyI,CodeForGeneratingContemporaneousModels . . . . . . 101
B ListofRCodeUsed 104
B.1 spacedata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
B.2 CREATEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
C FullCIGforChapter3 107
5
1 Introduction
In 1967 Clive Granger proposed a de(cid:2)nition to decide if a variable in a time series was
causalornot. Thisde(cid:2)nitionhascometobeknownasGranger-Causality,orG-Causality
forshort. Overtheyears,therehasbeenmuchdebateoverwhetherthisprocedureshould
be deemed (cid:147)causal(cid:148) at all (Granger, C., 1988). About 30 years later, Marco Reale and
GranvilleTunnicliffe-Wilson,developedamethod-GraphicalModellingforTimeSeries
(GMTS)forprovidingcausalmodelsofmultivariatetimeseries(Reale,M.,1998). Orig-
inally,thismethodwasappliedtoeconomics,andhereIextenditsapplicationtoecology
andenvironmentalscience.
Theaimsofthisthesissplitroughlyintotwoparts. The(cid:2)rstaimisinrelationtocausal-
ity, where I ask what, if any, place does causality have in statistics? In particular, I will
discuss the notion of G-Causality and whether I believe it can legitimately be deemed
causal. Finally,IwilldiscussGMTSandwhetheritprovidesacausallyappropriatemod-
ellingstrategy.
Thesecondaimrelatestothepracticalapplicationofgraphicalmodellingintimeseries,
inparticularitsapplicationtoecologicalandenvironmentaltimeseries. Iexplorewhether
GMTSproducesimprovedmodelsofthedatacomparedwithatraditionalapproach.
Chapter2brie(cid:3)youtlinessomecommonconceptsusedintimeseriesanalysisaswellas
introducing some graphical modelling terms that will be referred to subsequently. Chap-
ter 3 aims to provide some background to the causal debate and where I stand vis the
relationshipthatshouldexistbetweenstatisticiansandthetopicofcausation. Inaddition,
the GMTS modelling strategy to be adopted for subsequent analysis is explained along
withsomeremarksonitssuitabilityasacausallyappropriatestrategy.
Chapters 4, 5 and 6 provide three very different case studies from which to explore
and assess the GMTS methodology. The (cid:2)rst, is an example from economics, where
6
this methodology was originally applied. This example contains a number of variables
and hence posed a combinatorial challenge in terms of possible contemporaneous rela-
tionships. Through this case study however, not only was the combinatorial problem
resolved,butwaysofpossiblyextendingtheautomationofthisprocessdiscovered.
Chapter 5 contains an environmental example of hydrological processes where, due to
automated recording of data, there are vast numbers of observations (in the order of 104)
for a number of different variables. This chapter is interesting both because it provided
evidence that the GMTS approach consistently generates better models than a traditional
approach,butalsoprovidessomeinterestinginsightintothecausalproblemGMTSfaces.
Finally,IadaptedthenewlydevelopedSINfulapproachtographicalmodelling,proposed
byDrtonandPerlman(2004),totimeseriestoseeifitimprovedonGMTS.
Chapter 6 takes us to the other extreme, an ecological dataset built up over more than
20 years, which consists of only 90 observations. This dataset strongly violates all the
requirementsamultivariatetimeseriesneedsforsensibleanalysisandrequiredsomema-
nipulationoftheoriginalGMTSmethodinordertocreateafunctioningmodel. Thiscase
study indicated that while GMTS was developed for time series with a few adjustments,
itmayhaverelevancetoothertypesofmodelling.
7
Figure1: AtimeseriesshowingthenumberofseriousdrugoffencescommittedinNewZealand
from1981-2000.
2 Introduction to Time Series and Graphical Modelling
Thisthesisisconcernedwiththecausallyappropriategraphicalmodellingofmultivariate
timeseries-vectorautoregressive(VAR(p))modelsinparticular. Inthischaptersomeba-
sic concepts are introduced. These are divided into three sections: time series modelling,
graphicalmodellingandmodelassessmentmethods.
2.1 Time Series
A time series is simply a set of data collected over time. For example, (cid:2)gure 1 gives a
time series showing the number of serious drug crimes committed in New Zealand from
1981-2000.
Time series can model either discrete or continuous processes. In addition, the sampling
of the process can also be either discrete or continuous. Figure 2 gives examples of the
differentsortsoftimeseries. TheAllBlackswin/lossrecordisanexampleofatimeseries
with discrete sample space and discrete state space, a signal passing through a cable has
continuous sample space and discrete state space. The time series of temperature, is an
exampleofaprocesswithcontinuousstatespacesampleddiscretelyandanexampleofa
serieswithcontinuousstateandsamplespaceistheplotofseismicactivity.
8
a)TheAllBlacksWinLossRecordfrom2003-2005 b)SignalPassingThroughaCable
W xxxxxxxxxxx xxxxxxx xxxxxxx
D on 1
current
L x x xx off 0
1 2 3 4 5 6 7 8 9
2003 2004 2005 seconds
c)Hourlytemperatureandhumidityreadings d)SeismicactivityforonedayintheUK
Figure2: Examplesofdifferenttypesoftimeseries
In this thesis we will focus on time series with continuous state-space and discrete
samplespace. Hence,allde(cid:2)nitionsprovidedwillbethediscreteformulation.
2.1.1 StochasticProcesses,StationarityandErgodicity
Time series are (cid:2)nite realisations of stochastic processes, that is, the data are a collec-
tion of random variables that are ordered in time. This means that each observed data
point is only one realisation in a distribution, and hence, each time series (a collection of
datapoints)isonlyonerealisationofanin(cid:2)nitenumberofpossibletimeseries(Chat(cid:2)eld,
1989). As(cid:2)gure3shows,ateachtimepointtheobservedvalueisdrawnfromadifferent
distribution.
Thetaskofmodelling,givenjustoneobservationfromeachrandomvariablemightseem
like an impossible task. But, by making a few key assumptions, a sophisticated disci-
pline of time series analysis has developed. These key assumptions are stationarity and
ergodicity.
9
Figure3: Diagramoftimeseriesthoughtofasasequenceofrandomvariableseachfrom
adifferentdistribution
Stationarity
Stationarity is de(cid:2)ned with respect to the parameters of interest and in time series this
isusuallythemeanandvariance. Ifaseriesisstationaryitimpliesthatthereisnochange
inthemeanandvarianceovertime(Chat(cid:2)eld,C.,1989,10).
There are two de(cid:2)nitions of stationarity: a strict de(cid:2)nition, which is very thorough but
dif(cid:2)culttoassessandaweakeronewhichiscommonlyused.
Aseriesissaidtobestrictlystationaryif:
Thejointdistributionof Xt,Xt+1,...,Xt+n and Xt+τ,...,Xt+n+τ,∀t,τ,n arethesame
(1)
Aseriesissaidtobeweaklystationaryif:
E(Xt) = µ ∀t & Cov(Xt,Xt−τ) = γ(τ) ∀t,τ. (2)
(Chat(cid:2)eld, C., 1989, 28-29), where Xt refers to the observed variable of a series at time
t. Ifaseriessatis(cid:2)estheconditionofweakstationaritywesaythatits(cid:2)rsttwomoments,
themean, E(Xt),andcovariance,Cov(Xt,Xt−τ),aretimeinvariant.
10
Description:2 Introduction to Time Series and Graphical Modelling. 8 . inally, this method
was applied to economics, and here I extend its application to ecology.