ebook img

ar, ma, arma, & all that PDF

73 Pages·2011·4.21 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview ar, ma, arma, & all that

6 TIME SERIES MODELS: AR, MA, ARMA, & ALL THAT Time series are ubiquitous in everyday manipulations of financial data. They are especially well suited to the nature of financial markets, and models and methods havebeendevelopedtocapturetimedependenciesandproduceforecasts.Thisisthe mainreasonfortheirpopularity.Thischapterisdevotedtoageneralintroductionto thelineartheoryoftimeseries,restrictedtotheunivariatecase.Laterinthebook,we willconsiderthemultivariatecase,andwewillrecasttheanalysisoftimeseriesdata intheframeworkofstatespacemodelsinordertoconsiderandanalyzenonlinear models. 6.1 NOTATION AND FIRST DEFINITIONS The goal of time series analysis is to analyze data containing finite sequences of measurements,eachcomingwithatimestamp,thesetimestampsbeingorderedin anaturalfashion.Thepurposeoftheanalysisistoquantifythedependenciesacross time,andtotakeadvantageofthesecorrelationstoexplaintheobservationsathand, andtoinferpropertiesoftheunobservedvaluesoftheseries. Wehavealreadyencounteredmanyinstancesoftimeseries(recall,forexample, the coffee futures data as plotted in Figure 3.4). In most cases, we transformed the datatoreducetheserialcorrelationtoaminimum,andweusedstatisticaltechniques completelyindifferenttotheorderofthedata:inthisway,wedidnotuseanypossi- bleserialdependenceinthedata.Itisnowtimetoinvestigatethevariouswaysone can model this dependence, and take advantage of the properties of these models. However,therewasoneinstancewherewehadtodealwiththeeffectofthedepen- denciesovertimeofthedataentries.Thiswasthecaseoftheutilityindexes,whose datawereproducenow. ENRON.index DUKE.index UTILITY.index 01/04/1993 135.0000 104.2857 99.94170 01/05/1993 135.3714 103.5714 99.49463 01/06/1993 132.8571 104.2857 99.86034 304 6 TIMESERIESMODELS:AR,MA,ARMA,&ALLTHAT 01/07/1993 130.7143 103.5714 98.70023 ...... ...... ...... ...... 12/28/1993 166.4571 125.0000 107.15202 12/29/1993 170.7429 123.9429 106.75023 12/30/1993 169.3143 124.2857 106.12351 12/31/1993 165.7143 121.0857 104.95227 Thefirstcolumngivesasetofdates,someformofdailytimestamps,whilethenextthree columnscontainthenumericalvaluesofthethreeindexesonthesedates.Thisisthetypical structureoftimeseriesdatawhichweconsiderinthischapterandthenext. 6.1.1 Notation Moststatisticalproblemsdealwithdataintheform x ,x ,...,x . (6.1) 0 1 n In the regression applications considered so far, the order in which the observations were collecteddidnotplayanyrole.Wearenowinterestedinapplicationsforwhichtheorderof thex ’splaysacrucialroleintheinterpretationofthedata,aswellasinthedefinitionofthe i inferentialproblemsweconsider. Intheapplicationsweareconsideringnow,thelabelnoftheobservationx corresponds n toatimestamp,sayt ,givingthetimeatwhichthemeasurementwastaken.Asalways,itis n convenienttoviewtheobservations(6.1)asrealizationsofrandomvariablesX ,X ,...,X 0 1 n whichweshallsometimesdenoteX ,X ,...,X whenwewanttoemphasizetheroleof t0 t1 tn thetimestamps.Thesen+1randomvariableswillmostoftenberegardedasasubsetofa (possiblyinfinite)sequence{X }ofrandomvariables.Thex ’s(andhencetheX ’s)canbe t i i scalarsasinthischapter,inwhichcasewetalkaboutunivariatetimeseries,orvectorsasin thenextchapter,inwhichcasewetalkaboutmultivariatetimeseries.Asbefore,wetrytouse regularfontsforscalarsandbold-facefontsforvectors. Mostofthischapterisdevotedtotheanalysisoftimeseriesmodels.Amodel isaset ofprescriptionsforthejointdistributionsoftherandomvariables(orrandomvectorsinthe multivariatecase) X ,X ,...,X i1 i2 ik forallpossiblechoicesofthefiniteorderedsequencei < i < ··· < i oftimestamps. 1 2 k Thesejointdistributionsarecompletelydeterminedbythemodelinsomecases,whileinother cases,onlypartialinformationisprovidedbytheprescriptionsofthemodel. 6.1.2 RegularTimeSeriesandSignals Regulartimeseries aresetsofmeasurementstakenatregulartimeintervals.Inotherwords, the time stamps {t } are of the form t = t +j∆t for j = 0,1,...,n. Such j j=0,1,...,n j 0 asequenceoftimesisdeterminedbyitsstartt ,itslengthn+1,andthetimeinterval∆t 0 betweentwosuccessivetimes.Notethat,insteadofgivingthesamplinginterval∆t,onecan equivalentlygivethe samplingfrequency,orthetimeofthefinalmeasurement.Oncethetime sequencehasbeendefined,onecanthengivethesequenceofcorrespondingmeasurements separately.Thischaracterizationofthesequenceoftimestampsofaregulartimeseriesby threecharacteristicsisfundamentalinthetimeseriespackagesimplementedinmostofthe statisticalsoftwarecomputerpackages. 6.1 NotationandFirstDefinitions 305 Figure6.1givesanexampleofsucharegulartimeseries.Itisaspeechsignalwhichwe createdbyrecordingtheshortsentence”how are you”,digitizingthesoundfile,andcol- lectingtheresultingnumericalvaluesinanRnumericalvectorwhichwecalledHOWAREYOU. Figure6.1wasproducedwiththecommand: Speech Signal HOWAREYOU 0.5 OU WAREY 0.0 O H -0.5 0 1000 2000 3000 4000 5000 Index Fig.6.1.Plotofthesound”HowAreYou”digitizedat8000Hz plot(HOWAREYOU,type="l",main="Speech Signal HOWAREYOU") Insuchaplot,thetimestampsusedtolabeltheelementsofthesignalaresimplysuccessive integersstartingfromone.TheyarereferredtoasINDEXintheplot.Thisshouldbecontrasted withwhatcomesnext. In part because of their frequent occurrence in applications to signal analysis (as tradi- tionallyperformedbyelectricalengineers),regulartimeseries areoftencalledsignals.The librarystatsofRprovidesobjectsofclasststomanipulatesignals,butweshallnotuse them. 6.1.3 CalendarandIrregularTimeSeries Mostofthefinancialtimeseriesdonothavethegoodtastetoberegularinthesensegiven above.Theydifferfromtheregulartimeseriesdiscussedaboveinseveralways,andmostly bythefactthatthetimestampsaregivenbydatesandtimes,thustheirnamecalendartime series.Eventhoughcalendartimeseriesareparticular casesofalargerclassofirregulartime series,theywillbetheonlyonesconsideredhere. Oftentimes,thesedataaredaily,andgaps duetoweekendsandholidayscreateirregularities.Figure6.2givesthedailyclosingpricesof theS&P500indexontheNewYorkStockExchange(NYSEforshort)fortheperiodranging fromJanuary3,1950toAugust20,2010. 306 6 TIMESERIESMODELS:AR,MA,ARMA,&ALLTHAT Time Series Plot of DSP.ts 0 0 5 1 0 0 0 1 0 0 5 0 1960 1970 1980 1990 2000 2010 Fig.6.2.timeSeriesplotofthedailyclosingvaluesoftheS&P500indexproducedby thecommandplot(DSP.ts). Hereisa(very)smallsubsetofthedatausedtoproducetheplot. Date Open High Low Close ..... ..... ..... ..... ..... 17-Sep-01 1092.54 1092.54 1037.46 1038.77 10-Sep-01 1085.78 1096.94 1073.15 1092.54 7-Sep-01 1106.40 1106.40 1082.12 1085.78 6-Sep-01 1131.74 1131.74 1105.83 1106.40 5-Sep-01 1132.94 1135.52 1114.86 1131.74 4-Sep-01 1133.58 1155.40 1129.06 1132.94 31-Aug-01 1129.03 1141.83 1126.38 1133.58 30-Aug-01 1148.60 1151.75 1124.87 1129.03 29-Aug-01 1161.51 1166.97 1147.38 1148.56 28-Aug-01 1179.21 1179.66 1161.17 1161.51 ..... ..... ..... ..... ..... Aswecansee,thetimestampscanbeveryirregularlyspacedattimes.Asillustratedinthis snapshot,theregularityofthemeasurementscanbeaffectedbyunexpectedevents.However, thescaleofatypicalplotofamulti-yeardataserieswouldnotallowustoseethegapsdueto weekendsandholidaysandextraordinarymarketclosureevents. 6.1.4 CreatingandPlottingtimeSeriesObjectsinR The manipulation of calendar time series in R in done via objects of class timeSeries. Theseobjectscontainaslotpositionsforthetimestamps,andaslotdatafortheac- tualvaluesofthenumericalmeasurements. Typically,positionsisavectorofdatesor datesandtimes,whiledataisanumericalmatrixwithonerowforeachentryofthevec- torpositions.OnecreatesanobjectofclasstimeSerieswiththeconstructorfunction timeSeries whose use is illustrated below. The numeric vector of the weekly values of 6.1 NotationandFirstDefinitions 307 theS&P500indexwasusedearlierinChapter2whenwecomputedtheweeklyreturnson theindex.WemakeitintoatimeSeriesobjectbyattachingthetimestampsgivingthe weeksofthemeasurements.WefirstcreatethevectorSPWEEKSofdateswiththefunction timeSequence,andwebundlethisvectorofdateswiththevectorWSPofclosingvalues intoatimeSeriesobjectWSP.tswhichwethenplotwiththecommandplot.Allthis isdonewiththefollowingcommands: SPWEEKS <- timeSequence(from = "1950-01-03", by = "week",length.out = 3163) WSP.ts <- timeSeries(positions=SPWEEKS,data=WSP) plot(WSP.ts) The generic method plot can be used with timeSeries objects. The resulting plot is giveninFigure6.3.Atthisscale,itisdifficulttodifferentiatethisplotfromtheplotofthe dailyclosingvaluesoftheindex. Time Series Plot of WSP.ts 0 0 5 1 0 0 0 1 0 0 5 0 1960 1970 1980 1990 2000 2010 Fig. 6.3. timeSeries plot of the weekly closes of the S&P 500 index produced by the commandplot(WSP.ts). 6.1.5 HighFrequencyData The widespread availability of high frequency data changed the landscape of financial data analysis, and spurred for better or worse the development of high frequency trading. Our discussionofthemorningandafternoonindicatorsinChapter5onnonparametricregression wasafirstexampleinvolvinghighfrequencydata.Here,weconsideranotherexamplemore inlinewiththecurrentdiscussionoftimeseries. Highfrequencydataaretheresultofadifferentdatacollectionprocess:arecordisadded tothedatafileeachtimeanewtransactiontakesplace.Thesedataarealsocalledtransaction data, or tick data. They offer a unique insight into actual trading processes and market mi- crostructure.Inthissubsection,westudythemostimportantfeaturesofhigh-frequencytime seriesdata,identifyingstrikingdifferenceswithlower-frequencydata,andintroducingnew toolsandnewmethodstailoredtothenewchallengespresentedbythisnewtypeofdata. 308 6 TIMESERIESMODELS:AR,MA,ARMA,&ALLTHAT Tick-by-tickdataareavailableforliquidfuturescontracts,andinthissection,weconsider theexampleofafuturescontractsontheS&P500indexforthesakeofillustration.Hereis thewaythedataoftheSeptember1998contractlooklike. ”date” ”time” ”close” 1997100814:53:381013.20 1997101710:59:16 986.00 1997102710:02:13 960.00 1997110310:28:51 968.00 1997110509:08:44 975.00 1997110610:59:21 969.00 1997112412:52:52 986.90 1997120910:58:181015.00 1997121009:22:051005.70 1997122409:27:21 968.00 ... ... ... Wenoticethatthetimestampscanbeverysparse.Thisisduetothefactthatthesetrades occurredondaysveryfarfromthematurityofthecontract:speculatorsareactivelytrading contractsclosertodelivery!However,thesituationchangesdramaticallywhenwelookatthe datalaterinthelifeofthecontract.Indeed,oneseesthatnotonlythetransactionsaremore frequent,butalsothatalargenumberoftransactionsappearstohappensimultaneouslysince theyhavethesametimestamp.Giventhefactthateachrowcontainsonlyonenumberbesides thedateandtime,weshallassumethatthisnumberisthepriceatwhichthetransactionwas settled, not a bid or ask price. This information is not always given by the data provider, andthedataanalystmaybeforcedtomakethissortofassumption.Oneoftheunexpected surpriseswithhigh-frequencyfinancialdataisthefactthatthenotionofpriceisnotclearly defined.Strangelyenough,therearemanyreasonsforthat.Thefirstoneisclearfromthedata reproducedbelow:differentvaluescanbequotedwiththesametimestamp,sowhatisthe priceatthattime? ”date” ”time” ”close” ... ... ... 1998080411:05:001103.50 1998080411:05:001103.00 1998080411:06:001102.80 1998080411:06:001102.60 1998080411:06:001102.50 1998080411:06:001102.40 1998080411:06:001102.20 1998080411:06:001102.00 ... ... ... 1998080411:06:001102.50 1998080411:06:001102.40 1998080411:06:001102.20 1998080411:06:001102.00 1998080411:07:001101.70 ... ... ... Whetheronelooksatthisparticularportionofthedatasetornot,ithappensveryoftenthat, manysecondsdonotappearbecausethereisnotransactionatthesetimes.Anotheridiosyn- 6.1 NotationandFirstDefinitions 309 crasyofhigh-frequencydataisthefactthatthebidandaskpricesdonotmakesenseallthe time.Indeed,whenthefrequencyishighenough,thetimeintervalbetweentwotransactionsis sosmallthatthepricecannotmoveoutofthebid-askspread,muddyingboththedefinitionof thenotionofpriceandofbid-askspreadatthesametime.Weavoidthisissuebyconsidering thatthetickvaluegivenbythedataprovideristhesettlementprice. ThedatadiscussedaboveareincludedinthelibraryRsafd.Theyarecontainedinthe dataframeSPsep98whosetopwereproducebelow: head(SPsep98) date time close 1 19971008 14:53:38 1013.2 2 19971017 10:59:16 986.0 3 19971027 10:02:13 960.0 4 19971103 10:28:51 968.0 5 19971105 09:08:44 975.0 6 19971106 10:59:21 969.0 Thisdataframehasthreecolumns.Thefirstone,nameddate,givesthedateofthequote. Noticetheformat!Thesecondcolumn,namedtime,givesthetimeofthedayatwhichthe quotewasprovided,andfinally,thethirdcolumn,namedclosegivestheactualthequote (soweassume). InordertocreateanobjectofclasstimeSerieswiththesedata,wefirstcreateavector ofpositionswiththedaysandtimesgiveninthefirsttwocolumnsofthedataframe.Wefirst usethefunctionmakeDatewiththecolumndate,specifyingtheformatfromwhichthe dateshouldbereadintheparameterin.format.Inthiscase,theuppercase”Y”indicates thatthefirstfourcharactersshouldbeunderstoodastheyear,thelowercase”m”indicatesthat thenexttwocharactersshouldbeunderstoodasanintegergivingthemonthoftheyear,andthe lowercase”d”indicatesthatthelasttwocharactersshouldbeunderstoodasanintegergiving thedayoftheweek.ThecommandusingthefunctiontimeDategivesanotherexampleof formattedreading,moreintheunixstylethistime.Inanycase,itsoutputisavectorwhich canbeusedasavectorofpositionsforanobjectofclasstimeSeries.Theextraparameter unitsintheconstructortimeSeriesisusedtospecifyanameforthenumericalvalues ofthedatacomponentofthetimeseries. SPsep98day <- makeDate(SPsep98[,1],in.format="Ymd") SPsep98POS <- timeDate(paste(SPsep98day,SPsep98[,2]), format = "%Y-%m-%d %H:%M:%S") SPsep98.ts <- timeSeries(positions=SPsep98POS, data=SPsep98[,3], units="Sep98") head(SPsep98.ts) Sep98 1997-10-08 14:53:38 1013.2 1997-10-17 10:59:16 986.0 1997-10-27 10:02:13 960.0 1997-11-03 10:28:51 968.0 1997-11-05 09:08:44 975.0 1997-11-06 10:59:21 969.0 plot(SPsep98.ts) 310 6 TIMESERIESMODELS:AR,MA,ARMA,&ALLTHAT ThecorrespondingtimeSeriesplotisreproducedinFigure6.4. Time Series Plot of SPsep98.ts 1200 1150 1100 1050 1000 950 Nov Jan Mar May Jul Sep Fig. 6.4. timeSeries plot of the high frequency quotes of the S&P 500 futures contract maturingonSeptember1998asproducedbythecommandplot(SPsep98.ts). Thisplotshowsclearlythespecificfeaturesofthedatawhichweidentifiedearlier.Theleft partoftheplotcontainsonlyafewtransactions,andbecausetheplottingprograminterpolates linearlybetweenpoints,weseeanartificialpiecewiselinearpatternforthepriceofthefutures contract.Therightpartoftheplotismoretypicalofvolatilefinancialtimeseries. Remark on Quantized Ticks. Price changes from one transaction to the next are quoted inmultiplesofticksize.Thisticksizevariesfromoneexchangetoanother.Typicalvalues are (or used to be) one eighth and one sixteenth of a dollar. This practice is obsolete on a certainnumberofexchanges.Forexample,allNewYorkStockExchange(NYSE)andNew YorkMercantileExchange(NYMEX)stocksaretradedindecimalssinceJanuary29,2001. Nevertheless,practitionersshouldbeawareofthefactthathigh-frequencydata,andespecially historical high-frequency raw data which has not been pre-processed, quite often take only discretevalues:thiscanintroducenumericalartifacts,andinparticularspuriouscorrelation. 6.1 NotationandFirstDefinitions 311 Histogram of FRACT 0 0 8 0 0 6 y c n e 0 u 0 q 4 e Fr 0 0 2 0 0.0 0.2 0.4 0.6 0.8 FRACT Fig.6.5.HistogramofthefractionalpartoftheJune15,1998quotesoftheS&P500Septem- ber1998futurescontract.Thediscretenatureofthedatashowsclearly. WeillustratethisfactwiththeS&P500dataconsideredinthissubsection.Wecomputed the fractional parts of the transaction prices (obtained by removing the integer parts to the actualprices),andweplottedtheirhistograminFigure6.5.Thequantificationeffectappears clearly. 6.1.6 TimeDateManipulations It is very easy to develop specific functions satisfying the needs of most time series data analysis.Inparticular,thelibraryRsafdcontainsacertainnumberof”homegrown”func- tionswhichwewrotetomaketimeDatemanipulationseasy.Amongthem,thefunctions begdayandnoonextractthebeginningofagivenday,andnoonofthesameday.Inother words,givenatimeDate,thefirstfunctionextractstheday,andreturnsatimeDatein- cludinghours,minutesandsecondsofthebeginningofthatsameday. ThefollowingcommandsillustratethealgebraicmanipulationsontimeDateobjects,and showhowonecanextractsubsetsofatimeseries.vskip4pt DAY <- timeDate("08/12/1998") SP081298 <- SPsep98.ts[seriesPositions(SPsep98.ts) >= DAY & seriesPositions(SPsep98.ts)<DAY +24*3600] plot(SP081298) andtheresultingplotisreproducedinFigure6.6. 312 6 TIMESERIESMODELS:AR,MA,ARMA,&ALLTHAT Time Series Plot of SP081298 1090 1088 1086 1084 1082 1080 1078 10:00 12:00 14:00 Fig.6.6.PlotofonedayextractedfromthetimeSeriesobjectSPsep98.ts. Onecouldaswellextractthreeconsecutivedaysinsteadofone. SPDAYS <- SPsep98.ts[seriesPositions(SPsep98.ts)>=DAY & seriesPositions(SPsep98.ts)<DAY+3*24*3600] plot(SPDAYS) andtheresultingplotisgiveninFigure6.7.Asexplainedearlier,theplottingfunctionper- Time Series Plot of SPDAYS 1090 1080 1070 1060 Thu Fri Fig. 6.7. Plot of the short timeSeries object obtained by extracting three consecutive daysfromthetimeSeriesobjectSPsep98.ts. formsalinearextrapolationtojointhelastquoteofonedaytothefirstquoteofthefollowing day.However,thescaleofthehorizontalaxisisuniformthroughoutthetimedomainandno

Description:
dencies over time of the data entries. This was .. The corresponding timeSeries plot is reproduced in Figure 6.4. 950. 1000. 1050. 1100. 1150. 1200.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.