ebook img

See the Near Future: A Short-Term Predictive Methodology to Traffic Load in ITS PDF

0.21 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview See the Near Future: A Short-Term Predictive Methodology to Traffic Load in ITS

See the Near Future: A Short-Term Predictive Methodology to Traffic Load in ITS Xun Zhou1, Changle Li1,2, , Zhe Liu1, Tom H. Luan3, Zhifang Miao1, Lina Zhu1, and Lei Xiong2 ∗ 1State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, Shaanxi, 710071 China 2State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, 100044 China 3School of Information Technology, Deakin University, Melbourne, VIC, 3125 Australia ∗[email protected] 7 1 Abstract—TheIntelligentTransportation System(ITS)targets methodsandnon-linearmethodsbasedonthepredictionfunc- 0 toacoordinatedtrafficsystembyapplyingtheadvancedwireless tionsadopted.Thelinearmethodsincludethegreyforecasting 2 communicationtechnologiesforroadtrafficscheduling.Towards GM(1,1) [13] , error component model [14] and the Autore- n anaccurateroadtrafficcontrol,theshort-termtrafficforecasting gressive Integrated Moving Average (ARIMA) Model [15] a to predict the road traffic at the particular site in a short J periodisoftenusefulandimportant.Inexistingworks,Seasonal withfixedpredictionfunctionswhichexplorestheassumption Autoregressive Integrated Moving Average (SARIMA) model of linearity and stationarity of the prediction function to infer 8 is a popular approach. The scheme however encounters two futuretransportationtrends.Comparedwiththelinearmethods challenges:1) theanalysisonrelateddataisinsufficientwhereas ] which only adjusts related coefficients according to historical G some important features of data may be neglected; and 2) with data, the non-linear methods can better learn data features data presenting different features, it is unlikely to have one L predictivemodelthatcanfitallsituations.Totackleaboveissues, from huge samples and therefore build adaptive prediction s. in this work, we develop a hybrid model to improve accuracy functions. Corresponding prediction model can describe non- c of SARIMA.In specific,wefirstexploretheautocorrelation and linear characteristics and achieve more accurate forecasting [ distribution features existed in traffic flow to revise structure performancein transportation systems. Currently typical non- of the time series model. Based on the Gaussian distribution of 1 linear methods such as machine learning models [16], [17], trafficflow,ahybridmodelwithaBayesianlearningalgorithmis v developedwhichcaneffectivelyexpandtheapplicationscenarios [19],havebeenappliedinseveralapplicationfieldsfromtraffic 7 ofSARIMA.Weshowtheefficiencyandaccuracyofourproposal forecasting to communication allocating [19]. 1 using both analysis and experimental studies. Using the real- However, in the short-term traffic forecasting field, 9 worldtracedata,weshowthattheproposedpredictingapproach SARIMA [20] which is improved from ARIMA is also a 1 can achieve satisfactory performance in practice. 0 popularlinear method which can outperformmany non-linear Index Terms—Intelligent Transportation System, time series, . methods. In contrast with most learning models, Marco et al. 1 learning algorithm, short-time traffic forecasting. 0 [19] prove that the typical SARIMA coupled with a Kalman 7 filter would work better in traffic forecasting than other I. INTRODUCTION 1 learning models under the same conditions. There are two : v To predict the futuristic road traffic at a particular site is major reasons. Firstly, SARIMA based on regression analysis i fundamental to plenty of Intelligent Transportation System possessesthemachinelearningfeatures,whichiseffectivefor X (ITS) applications [1]–[6], such as traffic management [7], applyinghistory data to establish predictionmodel. Secondly, r a communicationresourcesallocation[8] and road-relatedinfo- withrespecttomostnon-linearmodelapplyinghistorydatato tainmentapplications[9].Forinstance,Bo¨rjesson[10]applies obtainaglobaloptimumresult,SARIMcanutilizeperiodicity the Swedish official long-distance model to estimate related existing in data to prevent uncorrelated samples influencing information about traffic flow and predict demand for High the performance of the model. Considering the remarkable SpeedRail(HSR)inordertoguideinvestmentforconstruction periodicity consisting in traffic flow, SARIMA is the most of new HSR. Gu et al. [11] present that the short-term appropriate model for traffic forecasting. and very short-term traffic load forecasting are essential to SARIMA encounters two challenges, which will be inves- the commitment scheduling and transmission loss estimation. tigated in this work. The first one is that related data used Vlahogiannietal.[12]havesummarizedrelevantworkswithin in the models usually lacks analysis. Beforehand analyzing threedecadesandindicatedthatthedevelopmentofshort-term traffic data features can be useful for adapting SARIMA to trafficforecastingispromotingfriendlyapplicationswhichcan traffic flow. The second one is that different data sensitivity both provide accurate information to drivers and be used for makes it unreasonable to apply only one predictive model in signal optimization. In a nutshell, to achieve accurate traffic all situations. Although there are some fixed hybrid models, prediction is important to the performance of many advanced however, employing machine learning to combine models is ITS applications. advantageousto revised the hybrid structure dynamically and Theseexistingpredictionschemesaretypicallybyexploring reduce the effect from unexpected incidents. the correlatedhistoricaldata,whichcan be classified aslinear We address the above challenges in two aspects. First of all, we extractstable partoftraffic flow as a constant,i.e., the ∇D =(1−Bs)D. (5) s traffic flow constant, and the remainder is the fluctuant part which is proved to follow Gaussian distribution. Combining The (1) can be transformed as a more common form in (6) the autocorrelation property in traffic flow, we revise the structure of SARIMA only to predict fluctuant part. Further- zt =α1zt−1+α2zt−2+...+αnzt−p−d−Ps−Ds+at (6) more, applying the Gaussian distribution feature, a hybrid β1at−1+β2at−2+...+βmat−q−Qs. model is proposed based on the revised SARIMA to adjust The t−n denotesthere are n time intervalsand the predicted forecasted results dynamically. In particular, we highlightour result can be denoted as (7) main contributions in this paper as follows: • Basedontheanalysisoftrafficflow,wedefinethetraffic ∼ flow constant and calculate residuals among real data zt =α1zt−1+α2zt−2+,...,+αnzt−p−d−Ps−Ds (7) and constant as fluctuant part to revise the structure of β1at−1+β2at−2+...+βmat−q−Qs. original. Accuracy of it can be improved. ∼ • We establish a Bayesian learning algorithm to combine Therefore, the at−n = zt−n − zt−n is the residual between history measured data and predicted data. The least square withclassicalSARIMAandrevisedSARIAMtoobtaina methodisusedtotrainsamplestocalculatedrelatedcoefficient hybridmodeltoexpanditsapplicationrangeandimprove of each parameter. its stability. • Real datais used to verifythe performanceimprovement of hybrid model and thoroughly analyze this model. B. Data Processing The rest of this paper is organized as follows. Section II In this paper, the real data is downloaded from the New introducespreliminaryworks,anddisplaystheprocessofdata YorkStateHome.ItisapublicdatasourceestablishedbyUSA preprocessing. Section III is about detailed information of government. The more detailed information can be found on revisedprocessandhybridmodel.Furthermore,inSectionIV, [22].Dataistrueandcollectedbydevicesonturnstiles,which simulation results are discussed to examine the performance cancountthenumberofpeopleenteringintosubwaystations, of hybrid model based on real data of subway traffic flow. and the data is often uploaded each 4 hours. Attributes such Finally, Section V concludes this paper and future works. as subway stations number, time stamps and the number of II. PRELIMINARY entrancing people are included. We choose the raw data sets collected from January to This part is divided into two subsections. In the first one, Marchin 2016as samples. Firstly, raw data sets are classified we summarize the SARIMA processes in order to introduce basedontheattribute,stationID.Werandomlyextractrelevant thenotationsusedintheremainderofthepaper.Inthesecond data sets of one subway station and exclude data collected on one,wealsorevealrelevantinformationaboutourdatausedin the public holidays, weekends and bad weather, which may thesimulation.Itisacommonpartintheliteraturesaboutdata influence the people flow seriously, from the original data processes. Although this part may be seen as non-technical, sets. Then, the data sets are cleared. Some data collected on however, it is important for coming works. the special devices is ignored, for example the devices are A. SARIMA Model fault or unavailable, and redundant members which may be SARIMA model was proposed by George et al. based on recordedformanytimesaredeleted.Asforthemissingvalue, ARIMA.Itisskilledattacklingtimeseriesthatexhibitsan s- it is interpolatedwith the mean value. Data used in this paper periodic behavior. The s denotes that similarities in the series is the number of people entering into the station at six time occuraftersbasictimeintervals.Forexample,theseasonality segments and they are respectively 03:00-07:00,07:00-11:00, existing in daily models [21] is 5 days. The model is often 11:00-15:00,15:00-19:00,19:00-23:00andfromtheday23:00 shown as SARIMA(p,d,q)(P,D,Q) and the function is to the next day 03:00(23:00-03:00).Finally, six data sets can s be acquired. These data sets includes peak and off-peak of φp(B)ϕP(Bs)∇d∇Ds zt =θq(B)ΘQ(Bs)at, (1) people flow. In this paper, data of two months is used as training sets to establish related model and Bayesian learning where a is a white noise. B is the backward-shift operator, t algorithm. After this, data of one week is used as testing sets i.e., Bzt = zt−1. ∇ is the differencing operator, i.e., ∇ = 1−B. φ(B), ϕ(Bs), θ(B) and Θ(Bs) are the polynomials to evaluate these models mentioned above. in B and Bs respectively. p,d,q,P,D, and Q are degrees of corresponding polynomials, i.e., III. BAYESIAN SEASONALAUTOREGRESSIVE INTEGRATED φp(B)=1−φ1B1−φ2B2−...−φpBp, (2) MOVING AVERAGE Inthispart,wepresentourscheme,Bayesian SeasonalAu- ϕP(Bs)=1−ϕ1Bs−ϕ2B2s−...−ϕPBPs, (3) toregressiveIntegratedMovingAverage(BSAIMA),including the theoretical analysis, model confirmation and optimization ∇d =(1−B)d, (4) respectively. A. Model Revision Generally speaking, it is accessible for us to predict the traffic flow based on the hourly model in [21], whose time interval in (7) is one hour. It can be shown as (8) z(t)=f{z(t−1),z(t−2),...,z(t−n)}, (8) where f{ . } defines the forecast algorithm in (7). One hour of time interval is often the upper bound for guaranteeingthe efficiencyofthe model.Themajorreasonisthattime interval Fig. 1: Autocorrelation and Partial Autocorrelation of the is so great leading the information getting from the adjacent Traffic Flow Collected from January to March (7:00-11:00) timeintervalstobecomeindependent.Relatedinformationcan be known from the Burke theory [23] that the leaving flow is irrelevant with the number of people existing in the system. Itmeans the traffic flow in nth day at time t can be estimated Thus, it is necessary to establish a prediction model that is by the data collected from the other days before nth day. more reliable for different data sets. Considering the second conclusion, we can divide the traffic In normal condition, it is reasonable to assume the people flow into fluctuant and stable parts. Only the fluctuant part is in each time segment can be defined as (9) predicted in the (11). Then integrating predicted results and constant can obtain traffic flow in next time interval. It is a zi =di+εi, (9) effective way by reducing the predicted content to decrease error of prediction. So, the (11) can be revised as where z denotes the traffic flow in ith time segment. We i define d in this equation as traffic flow constant which is the zn(t)=f{εn−1(t),εn−2(t),...,εn−j(t)}+d(t), (12) i number of constant part consisting of such as office workers and the common form revised from (7) in the predicted task andstudents.ε isfluctuationaroundtheconstantd .Itisfrom i i can be shown as (13) a lot of wispy factors such as weather and ticket price. So it is reasonable to consider εi as the Gaussian distribution with zn(t)=α1εn−1(t)+α2εn−2(t)+...+αjεn−j(t) zero mean. All of these can be verified based on the law of +β1an−1(t)+β2an−2(t)+... (13) large numbers. The above (9) can be converted as (10) +β an−m+d(t), m n n j ∈[1,p+d+Ps+Ds]andm∈[1,q+Qs].Modeldescribed lim 1 P zj = lim 1 P (dj +εj), n→∞nj=1 i n→∞nj=1 i i (10) by (12) and (13) is named Residual Seasonal Autoregressive n Integrated Moving Average (RARIMA).In order to determine lim 1 P zj =d , n→∞nj=1 i i the value of p,d,q,P,D, and Q, the least square method is used to fit minimum Mean Absolute Error (MAE) and where n denotes the number of samples used in statistical Goodnessof Fit R2 of differentordercombinationis checked algorithm and j denotes the order of samples rather than to choose the largest one. However, in the (1), because of power. Based on the assumption the that expectation of ε is i polynomialsinB,theSARIMAmodelalwaysincludesseveral zero, the constant di can be obtained by calculating the mean items such as Bs−1 and Bs−2. Although, z is strongly t of n samples. In order to verified our assumption, samples correlated with the sth item, i.e., Bs due to seasonality, in collectedfromtwo monthsareusedtocalculatethed andε . i i fact the adjacentitems aroundthe sth item is often irrelevant. In the next part, based on the Kolmogorov-Smirnove (K-S) It will influencethe accuracyof SARIMA. A instance will be test [24], we can verify the Gaussian distribution. used to explain the fact. Now, two import conclusionscan be summarized. The first At first, the correlation among these samples can be calcu- one, people flow at the same time segment in different days lated based on the(14) proposed by George et al. [15] may be relevant,because they have the same cardinalnumber T di and the random variable following the same distribution. P (zt−z)(zt−l−z) The second one, applying the constant di and discussing the p = t=l+1 , (14) variationofεi canbeaforecastingway.Thus,thedailymodel l T Pz −z [21]canbeseenasmoreperfectchoice,whichtimeintervalis t t=1 one day.At the same time, assumingseasonality existingalso wherez istherealdataattimet,z isthemeanoftrainingset, intherelevantdataisadvisable.Forexample,ineachMonday, t listhelagobject,T isthesizeoftrainingset,p isthevalueof majority of companiesand schools often will hold meeting to l summarizethe workof lastweek,sopeoplewillarriveearlier correlationbetweenzt andzt−l.Thevalueofpl isgreater,the than other day and the peak will also arrive earlier. So, (11) zt is more similar with zt−l. Fig. 1 is about autocorrelation and partial autocorrelation of the traffic flow collected from can be used as forecast algorithm January to March at peak 07:00 to 11:00. When the value of zn(t)=f{zn−1(t),zn−2(t),...,zn−j(t)}. (11) correlationisaroundthedoublestandardline(dashlineinthe picture), it represents the current data is strongly related with information about current data, it can be available to choose thecorrespondinglagobject[15].Fromthepicturewecansee themodelwithbestperformanceinthisprediction.Inaddition, the lag objects 5 and 7 are strongly related with the current learning algorithm can perfect the relevant functions with trafficflow.TheSARIMA(2,0,2)(1,0,0)5isthebestmodeland increaseofthesizeofhistorydatasets,andrevisedthehybrid it can be shown as structure dynamically to reduce the effect from unexpected zn(t)=α1zn−1(t)+α2zn−2(t)+α5zn−5(t) incidents. In this paper, Bayesian decision theory is used to +α6zn−6(t)+α7zn−7(t) (15) generatedthe hybridmodel,because it can rely on fewer data +β1an−1(t)+β2an−2(t). features to gain a desired achievement. These detailed steps are as follow: In order to cover the relevant item 5 and 7. Two irrelevant 1) Step1: In theBayesiandecisiontheory,the information items1 and6 isalso included.Atthe same time, theequation about class should be clear at first, such as the number of contains so many lag objects which may lead to over fitting. the class. In this paper, each model is seen as one class and In order to reject irrelevant items and avoid over fitting, at show as C , i∈{A,B,C,...}. In this paper only two models i most three relevant items should be used in autoregression are combined, namely S-ARIMA and RARIMA, so they are models [15]. We only use the top three items on the value of defined as C and C respectively. A B correlationtoestablishmodel.The(15)canbetransformedas 2) Step 2: Attributes about each class should be clear. It zn(t)=α2εn−2(t)+α5εn−5(t)+α7εn−7(t) is a difficult work for us, because the type of these attributes +β2an−2(t)+β5an−5(t) (16) should be same for each class at first. Then, the condition +β7an−7+d(t). probability of these attributes should be possible for being calculated. Finally, the most important one is these attributes The simple form of the transformation (15) is define as S- can distinguish each member. In this paper, the residual ARIMA(top.1,top.2,top.3). top.1,top.2, and top.3 denotes about predicted value and mean value of history is chosen the order of the three relevant items whose the values of as attribute, it can be shown as the (19) correlation are in the top three. So, the (16) is shown as ∼ S-ARIMA(2,7,5). The model defined in the (13) should be ε = z −d . (18) i i i denoted as RARIMA(top. 1, top. 2, top. 3). Then the finally function used to describe RARIMA can be shown as (17) Since εi can been seen to follow Gaussian Distribution rea- sonably,itisusefulforuscalculatedtheconditionprobability. zn(t)=αtop.1εn−top.1(t)+αtop.2εn−top.2(t) 3) Step 3: The Bayesian decision theory is based on the +αtop.3εn−top.3(t)+βtop.1an−top.1(t) (17) posterior probability to determine which class the member +βtop.2an−top.2(t)+βtop.3an−top.3+d(t). should be allocated to. If Data sets used into the two revised models must be stable, P(C |x)>P(C |x), (19) if not it should previously perform differencing on the data A B sets to make it stable. Then after checking adjusted R2 to which can be indicated as the (20) determine the order combination of the model and the least square method is used to determine coefficient of each item P(CA) Πd P(xi|CA)>P(CB) Πd P(xi|CB). (20) to obtain minimum MAE. P(x) i=1 P(x) i=1 The member is allocated to class C . x and x represent the B. Model Optimization A i attribute vector and the ith attribute in the vector. d denotes In this part, we optimize the model mentioned above. the number of attributes. Although the inspiration actually derived from the compared 4) Step 4: In our method, only one attribute is considered results in simulation part, the theory is introduced now. and it is the ε , residuals among the value of prediction and i Maybe one model can be more accurate than other models mean of history data. The P(C) can be obtained by the (21) on MAE of sample sets. However, it does not mean it is N more excellent than others in each step. For example, the i P(C )= . (21) RARIMA model does well in tracking the peak about traffic i N flow, however, when the people flow is stable in several time N and N represent the number of all the samples and the i segments, the S-ARIMA model is more effective than it. If numberofsamplesbelongtoclassP(C ).Basedonthehistory i we can combine the two models to predict traffic flow, it is data, the N can be obtained, if the ε is least, the sample i i useful to gain a more accurate result. In other words, before shouldbetheclassP(C ).The(20)canbeconverterasfollow i predicting the traffic flow, we should estimate which models P(C )P(ε |C )>P(C )P(ε |C ), (22) will perform better. A A A B B B In order to solve the question, classification function in and P(ε |C )∼N(µ ,σ2), P(ε |C )∼N(µ ,σ2), µ and A A A A B B B B machine learning is considered. Based on the history data, σ are mean of the samples and standard deviation of the samples. Then, we can acquire the final equation. When corresponding information can be used to establish relevant Cfuonmctbioinnisngtowdietshcrtihbeerfeelaattuerdesfeeaxtiusrtiensgoifnmthoeddelisffearnedntcmororedlealtse. √P2(CπσAA) exp(−(εA,i2−σA2µA)2)> √P2(CπσBB) exp(−(εB,i2σ−B2µB)2), (23) TABLE I: Kolmogorov-SmirnoveTest for Six Data Sets TABLE III:MeanAbsoluteErrorofSARIMA andS-ARIMA Time SampleMean Statistical Significance Time SARIMA S-ARIMA 03:00-07:00 0 0.412 03:00-07:00 (2,0,2)(1,0,0)5=15.09 (1,5,7)=14.01 07:00-11:00 0 0.611 07:00-11:00 (2,0,2)(1,0,0)5=85.05 (2,5,7)=79.82 11:00-15:00 0 0.609 11:00-15:00 (3,0,3)(1,0,0)5=162.11 (3,5,9)=140.62 15:00-19:00 0 0.266 15:00-19:00 (3,0,2)(1,0,0)5=159.40 (5,8)=131.98 19:00-23:00 0 0.612 19:00-23:00 (2,0,2)(1,0,0)5=129.04 (2,5)=117.04 23:00-03:00 0 0.951 23:00-03:00 (1,0,1)(1,0,0)5=57.24 (1,4,5)=66.42 TABLE II: Stationarity of Real Data TABLEIV:MeanAbsoluteErrorofS-ARIMAandRARIMA Time TestCritical Value1%level TestStatistic Constant Time S-ARIMA RARIMA 03:00-07:00 -4.199 -4.881 244.865 03:00-07:00 (1,5,7)=14.01 (1,5,7)=12.13 07:00-11:00 -4.199 -5.318 3334.73 07:00-11:00 (2,5,7)=79.82 (2,5,7)=66.09 11:00-15:00 -4.199 -5.270 2412.30 11:00-15:00 (3,5,9)=140.62 (3,5,9)=125.78 15:00-19:00 -4.199 -6.166 7718.78 15:00-19:00 (5,8)=131.98 (5,8)=142.95 19:00-23:00 -4.199 -6.425 3427.81 19:00-23:00 (2,5)=117.04 (2,5)=113.77 23:00-03:00 -4.199 -4.782 528.430 23:00-03:00 (1,4,5)=66.42 (1,4,5)=47.3 it means in the next step, prediction results will be better C. Reduction of Error from the model A, on the contrary, the model B is the better Inthissubsection,weapplytheMAE(whoseunitisnumber choice. The hybrid output mode is named BARIMA in this of people)on predictedresults to estimate the performanceof paper. each model. Based on the last subsection, ε and real data i are stable, so we can take them into these models directly. At IV. PERFORMANCE EVALUATION the first, the SARIMA and revised S-ARIMA are compared In this part, we apply the real data introduced in part II to with each other to shown the transformation is applicable. verifyandanalyseouralgorithmontheSPSSandEviwsplat- The result is on the Table III. From the table we can see the forms.The Gaussian distributionof fluctuantpart, stationarity transformation of the SARIMA model is effective to reduce of the real data and reduction of error is also discussed. In error. It may reduce about 7% of MAE of SARIMA. So, the order to make the result more convictive, data collected from following works is based on the revised S-ARIMA model. two months ahead is used in the learning models to establish Based on the (17) and the constant d in the Table II, the i S-ARIMA, RARIMA and BARIMA models, and data of one completeRARIMAmodelcanbeobtained.Inparticularly,the week is used to test the relevant performance. RARIMA modelshould be revised fromthe S-ARIMA rather than SARIMA. From the Table IV, the predicted results can A. Gaussian Distribution be comparedwith each other. From the Table IV we can seen Based on the law of large numbers, the constant d in six the RARIMA model is better than S-ARIMA model in most i times can be calculated and applying K-S test to check the situations.Now,weanalysethetwomodelfromapicturethat Gaussian distribution of residual ε . If the value of Statistical describes the predicted results of them in 19:00-23:00. From i Significanceismorethan0.1,theGaussiandistributioncanbe the Fig. 2, we can see the RARIMA is better to track the seen as correct. The result is shown in the Table I. From the it, the assumption that residual ε with zero mean following i 3800 Gaussian distribution is credible. 3700 B. Stationarity of Real Data ple3600 o disItnribthuetiloans.tIsnubosredcetrioton,uεsieitshpesroevmenodfoellslotwoifnogresctaabstl,ethGeasuesdsiaatna mber of pe33450000 sets should be stable. Based on the Eviews, we have checked E/nu3300 A3200 thestationarityandtheresultisintheTableII.FromtheTable, M 3100 Real data all the absolute values of Test Statistic are greater than Test S ARIMA RARIMA CriticalValue1%level.Itmeansallthedatasixsetsarestable 30000 5 10 15 20 25 30 35 Time/day and can be used in the time series model directly. Relevant Fig. 2: Predicted Results of Two Models in 19:00-23:00 constant d is also shown in the Table II. i TABLE V: Performance of BARIMA remaining data and test the performance of these models. In this paper, data of one week after the two months is used Time SARIMA RARIMA BARIMA Minimumerror to finish the task. In order to compare our model with some 03:00-07:00 15.09 12.13 11.25 9.08 models in [19], the MAE is normalized and the the mean 07:00-11:00 85.05 66.09 69.06 42.05 absolute percentage error (MAPE) is used to estimated each 11:00-15:00 140.62 125.78 112.65 86.36 model. The MAPE about minimum error of BARIMA is also 15:00-19:00 162.11 142.95 124.53 87.44 provided. Detailed information is shown in Table VI. From 19:00-23:00 129.04 110.42 113.77 99.75 TABLE VI: MAPE of Prediction 23:00-03:00 57.24 47.30 45.79 37.33 Model SARIMA SM RW ANN BARIMA 180 !"#$! "!"#$!% MAPE(%) 6.28 6.76 6.30 5.80 4.93 160 &!"#$! 140 $’(’)*)+,--.- people120 tthhee ctelastsisnicgalreSsuAlRt,IMtheA,BRAWRIManAd SmMod,ewlhpeerrefoRrmWsibseattesrimthpalne ber of 100 baseline that predicts traffic in the future that is equivalent to m 80 currentconditions,andSMpredictstheaverageinthetraining u n AE/ 60 set for a given time of the day. In addition, the BARIMA can M also be improved by prefect Bayesian learning algorithm to 40 obtain the minimum error. 20 0 03:00 07:00 07:00 11:00 11:00 15:00 15:00 19:00 19:00 23:00 23:00 03:00 V. CONCLUSION AND FUTUREWORKS Time/hour Inthispaper,byanalyzingtrafficflowfeaturesandstructure Fig. 3: Performance of BARIMA of SARIMA model, we propose a hybrid model to improve accuracy of short-time traffic forecasting. Firstly, based on the autocorrelation of traffic follow, classical SARIMA is real data, especially for the peak. however, the S-ARIMA is revised to prevent irrelevant data from predicting traffic flow. more stable, when the real data is increasing and decreasing Secondly, the traffic flow is divided into fluctuant and stabile placidly. So, if we use the RARIMA model to track peak and parts to reduce the content of prediction. Finally, according use the S-ARIMA to track other situation, results are better. to Gaussian distribution of residuals, a Bayesian learning Inspirationforconstructingahybridmodelisderivedfromthe algorithm is applied to conduct our proposed scheme, i.e., analysis and detail theorieshave been explainedin the Model BARIMA. Extensive simulation results show that proposed Optimization in part II. scheme performsbetter in comparison with existing schemes. We use the Bayesian learning algorithm to choose the In the future, we will combine abundantnon-transportdata best model in the next prediction. Corresponding results are sets to perfect proposed Bayesian learning algorithm, and shown in the Table V. From the table we can see compared attempt to improve the performance of our scheme. with SARIMA, performance of the hybrid model BARIMA is obviously outstanding. It reduce the 20% MAE of the REFERENCES SARIMA. In order to explain the improvement, the Fig. 3 is [1] XTa,GMao,andBDOAnderson,“Onthegiantcomponentofwireless shownasfollowandaminimumerrorofBARIMAisprovided multihop networks in the presence of shadowing,” in IEEE Trans. Veh. to analyze our model thoroughly. Technol.,vol.58,no.9,pp.5152-5163, 2009. From the Fig. 3, performance of BARIMA is better than [2] AA Kannan, B Fidan, and G Mao, “Robust distributed sensor network localization basedonanalysisofflipambiguities,”inIEEEGLOBECOM, SARIMA. Combining two model to generate a hybrid model pp.1-6,2008. modelis a correctmethodto improvethe accuracy.Neverthe- [3] G Mao, BDO Anderson, and B Fidan, “Wsn06-4: Online calibration of less,itishardtoexactlychoosethebestmodeltoforecasteach path loss exponent in wireless sensor networks,” in IEEEGlobecom, pp. 1-6,2006. time.Inourpaper,theBayesianlearningalgorithmiseffective [4] X Ge, S Tu, T Han, Q Li, and G Mao, “Energy efficiency of small inmosttime.WhentheP(εA|CA)issimilar withP(εB|CB), cell backhaul networks based onGaussCMarkov mobile models,” inIET the algorithm can not work well such as at 07:00-11:00 and Networks,vol.4,no.2,pp.158-167, 2015. [5] R Mao, and G Mao, “Road traffic density estimation in vehicular 19:00-23:00. The major reason is that only one attribute is networks,”inIEEEWirelessCommunicationsandNetworkingConference, used in the Bayesian learning algorithm. pp.4653-4658,2013. [6] G Mao, and BDO Anderson, “Graph theoretic models and tools for D. Performance Test of BARIMA the analysis of dynamic wireless multihop networks,” in IEEE Wireless Communications andNetworking Conference, pp.1-6,2009. Based on the training sets, training result have shown the [7] G. Al-Kubati, A. Al-Dubai, L. Mackenzie, and D. P. Pezaros, “Stable BARIMA modelcan reduce the predictionerror of SARIMA. infrastructure-based routing for Intelligent Transportation Systems,” in However,itisnotreliableonlyfromtrainingresulttoevaluate Proc.ofIEEEICC,pp.3394-3399, Jun.2015. [8] N. Ekedebe, C. Lu, and W. Yu, “Towards experimental evaluation of anyone model. The more professional method is that using intelligent TransportationSystemsafetyandtrafficefficiency,” inProc.of the complete model obtained from training sets to predict IEEEICC,pp.3757-3762, 2015. [9] J. Qin, H. Zhu, Y. Zhu, L. Lu, G. Xue, and M. Li, “POST: Exploiting [17] M. Jun and M. Ying, “Research of traffic flow forecasting based on DynamicSocialityforMobileAdvertisinginVehicularNetworks,”inProc. neural network,” in Proc. Workshop Intell. Inf. Technol, vol. 2, pp. 104- ofIEEEINFOCOM,pp.1761-1769, May2014. 108,2008. [10] M.Sh.Levin,A.Andrushevich, R.Kistler,andA.Klapproth,“Combi- [18] H. Su, L. Zhang, and S. Yu, “Short-term traffic flow prediction based natorialevolutionofZigBeeprotocol,”IEEERegion8Int.Conf.Sibircon on incremental support vector regression,” in Proc. Int. Conf. Natural 2010,vol.1,pp.314-319,2010. Comput.,vol.1,pp.640-645,2007. [11] C.Gu,D.Yang,P.Jirutitijaroen, W.M.Walsh,andT.Reindl,“Spatial [19] M.Lippi,M.Bertini,andP.Frasconi,“Short-termtrafficflowforecast- loadforecasting withcommunication failure usingtime-forward kriging,” ing: An experimental comparison of time-series analysis and supervised IEEEPowerSystems.,vol.29,no.6,pp.2875-2882, 2014. learning,” IEEE Trans. Intell. Transp. Syst., vol. 14, no.2, pp. 871-882, [12] E.I.Vlahogianni, M.G.Karlaftis, andJ.C.Golias,“Short-termtraffic 2013. forecasting: where we are and where were going,” Transp. Res. Part C [20] B.Smith, B. Williams, and R.Oswald, “Comparison ofparametric and Emerg.Technol., vol.43,pp.3-19,2014. nonparametric modelsfortrafficflowforecasting,” Transp.Res.PartC,vol. [13] J. Deng, “Control problems of grey systems,” Syst. Control Lett., vol. 10,no.4,pp.257-321, 2002. 1,no.5,pp.288-294, 1982. [21] Z.Ma,J.Xing,M.Mesbah,andL.Ferreira,“Predictingshort-termbus [14] E. Frejinger and M.Bierlaire, “Capturing correlation with subnetworks passenger demand using a pattern hybrid approach,” Transp. Res.PartC inroute choice models,” Transp.Res.PartB,vol.41,no. 3,pp.363-378, Emerg.Technol.,vol.39,pp.148-163,2014. 2007. [22] NewYorkStateHome,available: http://data.ny.gov. [15] G.Box,G.Jenkins,andC.Reinsel, TimeSeries Analysis:Forecasting [23] A. A. Borovkov, Asymptotic methods in queuing theory, John Wiley andControl, 3rded.,EnglewoodCliffs, NJ:Prentice-Hall, 1994. andSons,1984. [16] Y.H.Lin,P.C.Lee,andT.P.Chang,“Adaptiveandhigh-precisiongrey [24] H. W. Lilliefors, “On the Kolmogorov-Smirnov test for normality forecasting model,” Expert Systems with Applications, vol.36, no. 6, pp. with mean and variance unknown,” Journal of the American Statistical 9658-9662,2009. Association,vol.62,no.318,pp.399-402,1967.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.