ebook img

Research Article Hybrid Wavelet-Postfix-GP Model for Rainfall Prediction of Anand Region of India PDF

12 Pages·2015·1.86 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Research Article Hybrid Wavelet-Postfix-GP Model for Rainfall Prediction of Anand Region of India

Hindawi Publishing Corporation Advances in Artificial Intelligence Volume 2014, Article ID 717803, 11 pages http://dx.doi.org/10.1155/2014/717803 Research Article Hybrid Wavelet-Postfix-GP Model for Rainfall Prediction of Anand Region of India VipulK.Dabhi1andSanjayChaudhary2 1InformationTechnologyDepartment,DharmsinhDesaiUniversity,Nadiad387001,India 2IICT,AhmedabadUniversity,Ahmedabad380009,India CorrespondenceshouldbeaddressedtoVipulK.Dabhi;[email protected] Received11January2014;Accepted15May2014;Published2June2014 AcademicEditor:DjamelBouchaffra Copyright©2014V.K.DabhiandS.Chaudhary. This is an open access article distributed under the Creative Commons AttributionLicense,whichpermitsunrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalworkis properlycited. Anaccuratepredictionofrainfalliscrucialfornationaleconomyandmanagementofwaterresources.Thevariabilityofrainfall inbothtimeandspacemakestherainfallpredictionachallengingtask.Thepresentworkinvestigatestheapplicabilityofahybrid wavelet-postfix-GPmodelfordailyrainfallpredictionofAnandregionusingmeteorologicalvariables.Thewaveletanalysisisused asadatapreprocessingtechniquetoremovethestochastic(noise)componentfromtheoriginaltimeseriesofeachmeteorological variable.ThePostfix-GP,aGPvariant,andANNarethenemployedtodevelopmodelsforrainfallusingnewlygeneratedsubseries ofmeteorologicalvariables.Thedevelopedmodelsarethenusedforrainfallprediction.Theout-of-samplepredictionperformance ofPostfix-GPandANNmodelsiscomparedusingstatisticalmeasures.TheresultsarecomparableandsuggestthatPostfix-GP couldbeexploredasanalternativetoolforrainfallprediction. 1.Introduction thegiventimeseriesandtheindependenceoftheresiduals. Moreover,theseapproacheslacktheabilitytoidentifynon- An accurate prediction of rainfall is crucial for agriculture linearpatternsandirregularityinthetimeseries. based Indian economy. Moreover, it also helps in the pre- Hence, in recent years, use of different machine learn- vention of flood, the management of water resources, and ing techniques for modeling and prediction of rainfall has generatingrecommendationsrelatedtocropforfarmers[1]. receivedmuchattentionofpractitioners[7].Hungetal.[8] The variability of rainfall in both time and space makes developed a neural network for 1 to 3 hours ahead forecast the rainfall prediction a challenging task. Moreover, the ofrainfallforBangkok.Theyusedmeteorologicalparameters meteorologicalparametersneededfortherainfallprediction (air pressure, relative humidity, wet bulb temperature, and are complex and nonlinear in nature. Practitioners have cloudiness)andtherainfallregisteredattheneighborstations applied numerical [2, 3] and statistical [4, 5] models for asinputintheneuralnetwork.Theyusedthehydrometeoro- the rainfall prediction. The Numerical Weather Prediction logicaldatathatcoversbothrainyandnonrainyperiodsfor (NWP) models are deterministic models and approximate trainingtheneuralnetwork.Moustrisetal.[9]usedaneural complexphysicalprocessesforweatherprediction.However, network approach for forecasting the monthly minimum, themodelsarenotusefulforpredictionatsmallerscaledue maximum, mean, and cumulative precipitation for the four to inherent limitations of these models to initial conditions meteorologicalstationsofGreece.TheynotedthatANNwas and model parameterization. Practitioners have also used not able to predict the peaks in all cases. They suggested autoregressive moving average (ARMA) and autoregressive increasing the size of training dataset to overcome this integratedmovingaverage(ARIMA)techniquesfordevelop- problem [9]. Lin and Chen [10] used a neural network for ing a model for the rainfall [6]. However, these approaches forecastingtyphoonrainfall.ANNisdevelopedwhichtakes were developed based on the assumption of stationarity of typhoon characteristics (direction of typhoon movement, 2 AdvancesinArtificialIntelligence latitudeandlongitudeofthetyphooncentre,themaximum [24] to develop a model that can explain the cause and windspeed,andatmosphericpressureofthecentre)andthe effectrelationshipbetweenrainfallandrunoffprocessesata spatialrainfallinformationofnearbyraingaugeasinputsand catchmentinSingapore.BabovicandKeijzer[25]employed gives1houraheadforecastoftyphoonrainfall.Practitioners GPfordevelopingrainfall-runoffmodelsusingthehydrom- applied ANN [11, 12] and approach based on chaos theory eteorologicaldataandtheavailabledomainknowledge.Khu [13]fornumericalmodelerrorpredictioninhydrology.ANN etal.[26]appliedGPforreal-timerunoffforecastingatthe andconceptsofchaostheoryareusedtoadjustthevalueof Orgeval catchment in France. The GP is used as an error outputsproducedbythenumericalmodel. updatingstrategytoaccompanyarainfall-runoffmodel.The The hydrometeorological time series can be considered GPproducedresultsarecomparedwiththoseobtainedusing as a composition of stochastic (noise or fluctuations) and autoregressionandKalmanfilter.TheresultssuggestthatGP structuredcomponents.Thestochasticcomponentobscures outperforms other strategies for real-time flow forecasting themodelingoftimeseries.Thestructuredcomponentcan [26]. be extracted by removing the stochastic component from a The objective of the present work is to explore the timeseries.Then,adeterministicmodelcanbedevelopedfor applicability of a hybrid wavelet-postfix-GP model for pre- thestructuredcomponentofatimeseries.Practitionershave diction of daily rainfall of Anand station of Gujarat, India, used following data preprocessing techniques for cleaning using meteorological variables. The wavelet analysis is used hydrological time series: wavelet analysis (WA), principal to decompose the original series of each meteorological component analysis (PCA), and singular spectrum analysis variable into various discrete wavelet (DW) subseries. The (SSA). However, in recent years, the wavelet analysis has decompositionisusefultoidentifytheDWsubserieswhich become an effective tool for analyzing nonstationary time have high correlation with the original rainfall series. The series. effective DW subseries for each meteorological variable are Partal and Cigizoglu [14] proposed a wavelet-ANN added to generate a final subseries. The objective behind approachforpredictingthedailyprecipitationof12meteo- addition of DW subseries is to increase the correlation rological stations of Turkey. They used meteorological data between the final subseries and the original rainfall series. for precipitation prediction. Nasseri et al. [15] applied a ThePostfix-GP,aGPvariant,isthenusedtodevelopamodel combination of back propagation algorithm and genetic thatcanexplaintherelationshipbetweenthefinalsubseriesof algorithm (GA) for rainfall forecasting in western suburbs meteorologicalvariablesandtheoriginalrainfallseries.The ofSydney.TheyusedGAtotrainandoptimizefeed-forward Postfix-GP uses linear individual representation and stack neural network. They concluded that a combined approach based evaluation. This helps Postfix-GP to minimize both outperformedanapproachthatusesaneuralnetworkalone. the memory requirement and evaluation time compared to A modular artificial neural network (MANN) [16] is com- conventionaltreebasedrepresentation.TheevolvedPostfix- bined with three data preprocessing techniques: moving GPmodelsarethenusedforout-of-samplepredictions.The average(MA),PCA,andSSAforpredictionoftwomonthly performance of the evolved Postfix-GP model is measured andtwodailyprecipitationseries. using mean absolute error (MAE), mean squared error The symbolic regression technique can be used for (MSE),andcorrelationcoefficient(CC). developing a model (mathematical model in a symbolic The rest of the paper is organized as follows. The next form)thatcanexplaintherelationshipbetweenrainfalland sectionpresentshomogeneityanalysisofcollectedmeteoro- meteorological parameters. The advantage of the symbolic logicaldata.Section3presentstheproposedhybridwavelet- regression technique over traditional regression techniques Postfix-GP model for rainfallprediction. Section4 presents isthatitsearchesforboththestructureandtheappropriate experimental settings. The evolved Postfix-GP model that numeric coefficients of the model. Symbolic regression can representsrainfallasfunctionofmeteorologicalsubseriesis be performed by means of genetic programming (GP) [17]. presentedinSection5.Thesectionalsopresentscomparison Moreover, GP approach is preferred over other approaches of the out-of-sample predictive performance of wavelet- for symbolic regression because the approach produces an Postfix-GPmodelandwavelet-ANNmodel.Conclusionsare explicitmathematicalexpressionasasolution(model)[18]. presentedinSection6. Theproducedmodelcanprovideaninsightintotheprocess which gives rise to the data [19]. Moreover, the interpreta- tion of the produced model allows us to combine evolved 2.DataandHomogeneityAnalysis knowledge with already existing knowledge [20, 21]. An exhaustive survey on open issues and approaches used by MeteorologicaltimeseriesofAnandregionfor12years(from practitioners to deal with these issues in field of symbolic 1991to2002)werecollectedfromAnandAgricultureUniver- regressionthroughGPispresentedin[22].KisiandShiri[23] sity,Anand,Gujarat,India.Wehaveused10yearsofdatafor suggestedthatuseofGPispreferredinfollowingsituations: trainingand2yearsofdataforpredictions.Wehavecollected (i)therelationshipbetweentherelevantvariablesarepoorly timeseriesdataforthefollowingmeteorologicalvariables:(i) understood,(ii)determinationofoptimalsolutionisdifficult, minimumtemperature—𝑇Min,(ii)maximumtemperature— (iii)anapproximatesolutionisacceptable,and(iv)thereisa 𝑇Max,(iii)meantemperature—𝑇Mean,(iv)relativehumidity— largeamountoftrainingdatatobemodeled. RH, (v) evaporation—EP, (vi) 1-day previous rainfall—RF1, Considering the mentioned advantages, many practi- and (vii) 2-day previous rainfall—RF2. The objective is to tioners applied GP for rainfall prediction. GP is used in evolve a model (as a function of mentioned meteorological AdvancesinArtificialIntelligence 3 Table1:Resultofdifferenthomogeneitytests. Test 𝑇Min 𝑇Max 𝑇Mean RH EP RF Pettitt 0.204 0.241 0.238 0.354 0.481 0.093 SNHT 0.372 0.466 0.500 0.528 0.225 0.151 BR 0.287 0.438 0.395 0.508 0.455 0.115 VNR 0.495 0.146 0.595 0.004 0.304 0.127 variables)fordailyrainfallusingahybridwavelet-Postfix-GP (v) normalize the values of newly generated subseries in approach. the range (0, 1), (vi) divide the normalized dataset into Homogeneitytestsareappliedtodetectthevariabilityof (a) training and (b) test dataset, (vii) evolve the model for themeteorologicaldata.Severalfactorscanaffectthequality training data using Postfix-GP, and (viii) apply the evolved ofmeteorologicaldata.Themainsourcesofinhomogeneity modelforout-of-samplerainfallprediction.Theobjectiveis are station relocation, changes in measurement techniques toevolveamodel(asafunctionofmentionedmeteorological and observational procedures, and changes in instruments. variables) for rainfall using a hybrid wavelet-Postfix-GP Homogeneity test are useful to detect the break (shift in approach.Inthefollowingsubsections,wediscussthewavelet themean)inthegiventimeseries.Asmanymeteorological transformandPostfix-GPinbrief. parameters are highly variable in time and space, most of the homogeneity tests are designed for monthly or yearly 3.1. Wavelet Transform. The properties of irregularity in data and not for daily data. We applied the following four shapeandcompactnessmakewaveletsanidealtoolforanal- homogeneity tests to meteorological time series [27]: stan- ysis of nonstationary signals. Fourier analysis decomposes dardnormalhomogeneitytest(SNHT),Buishandrange(BR) a signal into sine and cosine waves of various frequencies test,Pettitttest,andvonNeumannratio(VNR)test.Thenull whereaswaveletanalysisdecomposesasignalintoshiftedand hypothesisforallthementionedtestsistheannualvalues𝑋𝑖 scaledversionsofthemotherwavelet.Theshifting(delaying) ofthetestvariable𝑋whichareindependentandidentically ofthemotherwaveletprovideslocalinformationofthesignal distributed and series can be viewed as homogeneous [27]. intimedomainwhereasscaling(stretchingorcompressing) ThealternativehypothesisforBRtest,SNHT,andPettittest of the mother wavelet provides local information of the presumesthatthereisabreakinthemeanoftheseriesand signal in frequency domain [28]. The scaling and shifting theseriescanbeviewedasinhomogeneous.Thereasonfor operations applied to mother wavelet are used to calculate applying more than one test to check homogeneity is that wavelet coefficients that provide correlation between the thesetestshavedifferentsensitivityindetectingabreak.For waveletandlocalportionofthesignal.Fromthecalculated example,theSNHTtestissensitiveindetectingabreaknear wavelet coefficients, we can extract two types of compo- the starting and the end of the series whereas Pettit and nents: approximate coefficients and detail coefficients. The Buishandtestsaresensitiveindetectingbreakinthemiddle approximatecoefficientsrepresenthighscale,lowfrequency oftheseries. componentoftheoriginalsignalwhereasdetailcoefficients The significance level is set to 1% for all tests. If the representlowscale,highfrequencycomponent. obtained 𝑃 value is lower than the selected significance Continuouswavelettransform(CWT)operatesatevery level,thenwerejectthenullhypothesis.Table1presentsthe scalefromthatoftheoriginalsignaluptosomemaximum results of homogeneity tests for the annual mean values of scale. This distinguishes CWT from DWT (which operates meteorologicalparameters.Asallthetestsgivevaluesgreater at dyadic scales only). CWT is also continuous in terms than0.01,thenullhypothesiscannotberejected(timeseries of shifting: during computation, the analyzing wavelet is is homogeneous). For the relative humidity, VNR test gives shifted smoothly over the full domain of signal. The results a 𝑃 value of 0.004, less than 0.01. However, as the 𝑃 values of the CWT are wavelet coefficients, which are a function associatedtotheothertestsaregreaterthan0.01,weaccept of scale and position. Multiplying each coefficient by the thenullhypothesis. appropriatelyscaledandshiftedwaveletgivestheconstituent waveletsoftheoriginalsignal. The computation of wavelet coefficients at every scale 3.AHybridWaveletPostfix-GPModel requires large computational time. To reduce the time, it is preferred to calculate wavelet coefficients for selected The architecture that integrates discrete wavelet transform subset of scales and positions. If the scales and positions (DWT) and Postfix-GP for modeling and prediction of rainfalltimeseriesispresentedinFigure1.Thearchitecture are selected based on power of two (dyadic scales and comprises the following main steps: (i) apply DWT on positions), then the analysis will be efficient and just as meteorologicaldataandgenerateDWsubseriesateverylevel, accurate,nameddiscretewavelettransform(DWT)[29].The (ii)calculatethecorrelationcoefficientbetweenthegenerated process of decomposition can be iterated, with successive DWsubseriesandtheoriginalrainfallseries,(iii)identifythe approximationsbeingdecomposedinturn(discardingdetail significantDWsubseries,(iv)addsignificantDWsubseries coefficients), so that original signal is broken down into togenerateanewsubseriesforeachmeteorologicalvariable, manylower-resolutioncomponents.Thisprocessisreferred 4 AdvancesinArtificialIntelligence Raw data Check homogeneity of annual data Start (time series) Selection of mother wavelet and decomposition level Generate semantically diverse initial population Repeat Evaluate fitness of individuals of for every Apply DWT on raw data (time series) to population on training data meteorological generate DW subseries at every level variable Calculate CC between every DW subseries and original rainfall series Termination criterion satisfied Identify significant DW subseries (components) Apply genetic operators to produce Add significant components and calculate semantically diverse offspring CC between summed series and original rainfall series Return to the best solution Training data Divide data set into: (i) training, (ii) test End data set Test Best solution data Perform out-of-sample predictions Figure1:HybridWavelet-Postfix-GParchitectureforrainfallmodelingandprediction. as multiresolution analysis. We have selected Db4 (length- 3.2. Postfix Genetic Programming. Postfix-GP [30], a GP 4 Daubechies) [28] wavelet as mother wavelet because variant,adoptspostfixnotationforindividualrepresentation. this is one of the commonly used wavelets for separating Individuals represented in form of postfix strings can be fluctuations from the given time series. The smoothness easily evaluated using stack. Moreover, the individual rep- of different wavelets depends on the number of vanishing resentationwithoutpointersassistsPostfix-GPtominimize moments [29]. Db4 wavelet has four vanishing moments, a thefitnessevaluationtimeandrequiredmemorytostorethe smallestlengthwaveletwithsmoothnessproperty.Wehave individual.EachPostfix-GPindividualcontainsthefollowing setmaximumresolutionleveltovalue10fordecomposition three attributes: MinLength, MaxLength, and ValidLength. ofeverymeteorologicaltimeseries. The MinLength and MaxLength attributes define the range The forward discrete wavelet transform is employed ofsyntacticallyvalidPostfix-GPindividuals.TheValidLength to decompose original time series of every meteorological attributereferstotheindexofthelastelementofanindividual variable at different scale (maximum level 𝑛 = 10). The formingavalidpostfixexpression.AnindividualwithVali- wavelettransformproduceshigh-pass(detail)coefficientsat dLengthgreaterthanMinLengthandlessthanMaxLengthis everylevelandonelow-pass(approximation)coefficient.The consideredasvalid[30]. inversediscretewavelettransformisappliedontheproduced Postfix-GP employs idea of Stack count, introduced by coefficients at every level to generate DW subseries, which Keith and Martin [31], to find out the ValidLength of an areofthesamelengthastheoriginalseries.Thus,weobtain individual.TheStackcountofanelementiscalculatedasthe 𝑛 + 1 (𝑛 detail and one approximate) DW subseries for the numberofargumentspushedonthestackminusthenumber original time series of every meteorological variable. The ofargumentspoppedoffthestackbytheelement.Forexam- correlationcoefficientbetweenthegeneratedDWsubseries ple,thestackcountvalueforanoperandis1,whereasitis0 atdifferentlevelandtheoriginalrainfallseriesiscalculated. forunaryoperator.Furthermore,thetotalsumofstackcount The number of DW subseries which have high correlation valuesmustbe1attheValidLengthpositionofanindividual. withtheoriginalrainfallseriesisidentifiedandsummedup A Postfix-GP individual with its syntactically valid portion to generate a new (final) subseries for that meteorological andcorrespondingtreerepresentationisdepictedinFigure2. variable. The objective behind addition of DW subseries It should be noted that the transformation of syntactically havinghighcorrelationwiththeoriginalrainfallseriesisto valid portion to a tree representation is not required for reducethenumberofvariables(dimensionsorinputs)andto fitness calculation; it is shown for better comprehension of increase the correlation between newly generated subseries thereader. and the original rainfall series. This process is repeated for Table2 presents difference between an individual rep- everymeteorologicaltimeseries. resentation scheme of Postfix-GP and other evolutionary AdvancesinArtificialIntelligence 5 Terminal set={a,b}Function set ={+, −,∗, /, Q,S} MinLength=5,MaxLength=12,ValidLength=8 Element position 1 2 3 4 5 6 7 8 9 10 11 12 Elements of individual b b ∗ b a S + ∗ a b b Q Sum of stack count 1 2 1 2 3 3 2 1 2 3 4 4 ∗ Identifying valid portion of an individual ∗ + bb∗baS+∗ b b b S Valid portion of an individual a Tree representation of valid portion of an individual Figure2:Postfix-GPindividualrepresentation. Table2:DifferencebetweenindividualrepresentationschemesofPostfix-GPandotherevolutionaryapproaches. Evolutionaryalgorithms Genotype Phenotype Mapping GA Linearstringoffixedlength Decodinggenotype Problemdependent GP Parsetree Parsetree Traversingtree GEP Linearstringoffixedlength Parsetree Karvanotation Postfix-GP Linearstringoffixedlength Linearstring Postfixnotation approaches, GA [32], GP [17], and gene expression pro- selectsbehaviorallydifferentparentsforgeneratingoffspring, gramming(GEP)[33].ThestandardGArepresentsanindi- whichisusefultominimizethe“nochangetofitness”events vidual using a fixed length binary string. The individual [39].Crossoveroftwodissimilarparentsislikelytoproduce representation scheme of GA does not permit the model achangeinoffspring(solution)quality. structuretovaryduringevolutionaryprocess.StandardGP Postfix-GP extracts all subtrees of an individual having employs a variable length, tree structure for an individual ValidLengthgreaterthanMinLengthandtreattheextracted representation. The representation scheme of GP is more subtrees as separate solutions during the evolutionary pro- generalandflexiblethanGAsinceitallowsmodelstructure cess.Postfix-GPemploysone-pointmutationoperator,where tovaryduringevolution.GEPisaGPvariantthatrepresents the chosen element of an individual is interchanged with a an individual using multiple genes, where each gene repre- differentelementofthesamearity.Anarchiveisusedtostore sents a small subexpression. Each GEP gene is composed “best-so-far”foundsolutions,usefultoexploitgoodsolutions of two different domains—a head and a tail. GEP applies over a number of generations [40]. Postfix-GP uses MAE Karva notation to transform a linear representation to tree as a standardized fitness measure. However, MAE measure representation. Grammatical evolution (GE) [34] employs is not range bound. Therefore, we have normalized the a string of integers to represent an individual (genotype). fitnessvalueofanindividualbetween0and1using(1).The The string of integers is used to determine the sequence of normalizedfitnessisreferredtoasanadjustedfitnessofan productionrulesincontext-freegrammar.Byfollowingthese individualandisusefultodifferentiatebetweenindividuals sequences,anindividualcanbeconvertedintoanexpression havingclosestandardizedfitnessvalues,whichmayhappen tree (phenotype). Adaptive logic programming (ALP) [35] in generations near the end of Postfix-GP run. Postfix-GP also uses a string of integers for individual representation. is developed using.NET [41] framework on Windows XP However,thestringofintegersisusedtoselectclausesina operating system. Zedgraph [42], an open source graph logic program instead of production rules of a context-free library,isusedforplottingcharts. grammar. 1 Postfix-GP produces initial population of individuals at AdjustedFitness= . (1) random.TheValidLengthoftheseindividualswillhavevalue (1+Standardized Fitness) inbetweenMinLengthandMaxLength.Postfix-GPemploys semanticawaresubtreecrossover[36]toimprovepopulation 4.ExperimentalSettings diversity among individuals. The operator checks semantic equivalence [37, 38] of two subtrees, to be swapped, while We set a range of solution search space by specifying a performing crossover operation. Moreover, the operator minimum and maximum number of elements (nodes) that 6 AdvancesinArtificialIntelligence Table3:Correlationcoefficientbetweeneachsubseriesandtheoriginalrainfallseries. Subseries 𝑇Min 𝑇Max 𝑇Mean RH EP RF1 RF2 DW −0.079 −0.029 −0.076 0.033 −0.127 −0.304 −0.013 1 DW −0.068 −0.093 −0.103 0.110 −0.124 0.130 −0.249 2 DW −0.060 −0.098 −0.097 0.101 −0.156 0.355 0.187 3 DW −0.033 −0.098 −0.087 0.114 −0.177 0.302 0.258 4 DW −0.027 −0.115 −0.094 0.132 −0.165 0.273 0.259 5 DW −0.043 −0.053 −0.056 0.052 −0.065 0.138 0.135 6 DW −0.030 −0.145 −0.102 0.160 −0.153 0.211 0.210 7 DW 0.230 0.117 0.211 0.225 0.025 0.251 0.250 8 DW 0.013 −0.045 −0.013 0.072 −0.044 0.081 0.081 9 DW −0.016 −0.018 −0.017 0.019 −0.026 0.019 0.019 10 Approximate 0.018 0.004 0.014 0.034 0.008 0.044 0.044 Summed 0.230 −0.236 0.211 0.356 −0.347 0.613 0.519 a solution can have during an evolutionary run. The Min- 5.Results LengthandMaxLengthparametersofPostfix-GPareusedto attainthistask.TheMinLengthandMaxLengthparameters 5.1. Identification of Important Components. The discrete of Postfix-GP are set to values 10 and 40. The function wavelet transform(DWT) is applied on daily time series of set includes both arithmetic and trigonometric operators: every meteorological variable to decompose the series into {+, −, /, ∗, S, C, E, L, K, Q}, where S, C, E, L, and Q several DW subseries [14]. Each time series is decomposed represent sine, cosine, exponential, logarithmic, and square up to 10 resolution levels. We have selected the resolution root functions. The terminal set includes final subseries for levelof10becausewerequiredaddressingbothannualand 𝑇Min, 𝑇Max, 𝑇Mean, RH, EP, RF1, RF2} and a list of constants seasonal factors. The levels DW2, DW3, DW4, DW5, DW6, in range [−10,...,10]. The population size and the number DW7, DW8, DW9, and DW10 correspond to the temporal of generations are set to values 500 and 100. The crossover scale of 4, 8, 16, 32, 64, 128, 256, 512, and 1024 days. The and mutation rates are set to 0.9 and 0.1. The Postfix-GP subseries,DW8,correspondtoatemporalscaleof256days uses archive based roulette wheel selection [40]. Crossover (approximatelyoneyear).Thecorrelationcoefficientbetween operation is performed between parents chosen from an thegeneratedDWsubseriesatdifferentlevelsandtheoriginal archiveandcurrentpopulation rainfall series is calculated. Table3 presents the values of the correlation coefficient between each DW subseries and theoriginalrainfallseries.Thesecorrelationvaluesareused MAE= 𝑁1 ∑𝑁 󵄨󵄨󵄨󵄨(𝑦𝑖−𝑦̂𝑖)󵄨󵄨󵄨󵄨. troaindfeatlelrpmreindeicttihoen.effective DW components (subseries) for 𝑖=1 AccordingtoTable3,forminimumandmeantempera- MSE= 𝑁1 ∑𝑁 (𝑦𝑖−𝑦̂𝑖)2, (2) truairne,fatlhlesecroirersehlaatsiohnigbhetvwaleueen.ThDWis8resvuebaslesrtiheseafancdttthheatoarnigninuaall 𝑖=1 component of mean and minimum temperature subseries cov(𝑌,𝑌̂) influences the amount of rainfall. The DW7 subseries of 𝑟= , maximum temperature has high correlation compared to 𝜎𝑌𝜎𝑌̂ other DW subseries. Moreover, the DW2, DW3, DW4, and DW5 subseries of maximum temperature have little high correlation compared to other DW subseries. Similarly, the We have shown the evolved Postfix-GP solutions with DW1, DW2, DW3, DW4, DW5, and DW7 subseries of theirmeanabsoluteerror(MAE),meansquarederror(MSE), evaporation have high correlation compared to other DW and correlation coefficient (CC). Equations in (2) are used tocalculatethesestatisticalmeasures,where𝑦𝑖,𝑦̂𝑖,𝑌,𝑌̂,and soufbmsearxiiems.uThmistesmugpgeersattsutrheatanshdoervtearptoirmateiopnerhioadvecocmorpreolnaetinotns cov(𝑌,𝑌̂)standforgiven𝑖thobservation,corresponding𝑖th with the original rainfall series. The high correlation value observationcalculatedbytheevolvedmodel,actualobserved isnoticedforDW8subseriesofrelativehumidity.Moreover, series,estimatedseriesbytheevolvedmodel,andcovariance thecorrelationbetweentheoriginalrainfallandDW2,DW3, between the observed and the estimated series. Value of 𝑟 DW4, DW5, and DW7 subseries of relative humidity is closetozeroindicatesnocorrelationbetweenpredictedand littlehighcomparedtocorrelationbetweentherainfalland observed values. We have used MSE to measure closeness remaining DW subseries of relative humidity. This suggests between the given set of points and those generated by the that both annual and shorter time period components of evolvedsolution. relativehumidityhavecorrelationwiththerainfall. AdvancesinArtificialIntelligence 7 Table4:SelectedDWcomponentsforgeneratingfinalsubseries. 𝑇Min 𝑇Max 𝑇Mean RH EP RF1 RF2 DW +DW + DW +DW + DW +DW + DW +DW + 2 3 DW +DW + 2 3 2 3 1 2 DW +DW + 3 4 DW DW +DW + DW DW +DW + DW +DW + 4 5 DW +DW + 8 4 5 8 4 5 3 4 DW +DW + 5 6 DW DW +DW DW +DW 6 7 DW +DW 7 7 8 5 7 DW 7 8 8 Relative humidity components: dity 50 DW2+DW3+DW4+DW5+DW7+DW8 mi u e h 0 v ati Rel−50 500 1000 1500 2000 2500 3000 3500 Days (a) Evaporation components: DW +DW +DW +DW +DW +DW 10 1 2 3 4 5 7 n o ati 5 or 0 p a v −5 E −10 500 1000 1500 2000 2500 3000 3500 Days (b) Mean temperature components:DW 10 8 ure 5 erat 0 p m −5 Te −10 500 1000 1500 2000 2500 3000 3500 Days (c) Figure3:Summedsubseries:(a)relativehumidity,(b)evaporation,and(c)meantemperature. The high correlation between both shorter and yearly Usage of each selected DW subseries as input increases components of relative humidity and rainfall is observed the dimension of the solution search space. Moreover, it because of direct relation between these two variables. The also increases the complexity of the final model. Therefore, humidity and rainfall data of the examined region show the selected DW subseries, having high correlation with similar meteorological characteristics. The DW3 subseries the original rainfall series, are added together to form a of one-day previous rainfall show high correlation with new (final) subseries. The final subseries are generated for the original rainfall series. The DW2, DW4, DW5, DW6, everymeteorologicalvariable.TheadditionofselectedDW DW7, and DW8 subseries also show little high correlation components(subseries)ishelpfultoimprovethecorrelation compared to other DW subseries. The number of DW between the final subseries and the original rainfall series. subseries which have high correlation with the original For example, the DW8 subseries of relative humidity have rainfallseriesisidentifiedandsummeduptogenerateanew correlation value of 0.225. However, for the added (DW2 + (final)subseriesforthatmeteorologicalvariable.Theselected DW3 +DW4 +DW5 +DW7 +DW8)subseries,thecorrela- DW components for different meteorological variables are tionvalueincreasesto0.355. presented in Table4. However, the selection of number of Theaddedsubseriesoftherelativehumidity,evaporation, DW components (subseries) depends on the user. Partal andmeantemperatureareshowninFigure3.Thesummed and Cigizoglu [14] suggested that applying a threshold on seriesofthemaximumtemperature,minimumtemperature, correlationvaluewouldbehelpfultodeterminethenumber and previous day rainfall are presented in Figure4. We ofDWcomponents(subseries). normalized the data of added subseries in the range (0, 1). 8 AdvancesinArtificialIntelligence Minimum temperature components:DW 10 8 e ur 5 erat 0 p m −5 Te −10 500 1000 1500 2000 2500 3000 3500 Days (a) Maximum temperature components: DW +DW +DW +DW +DW 20 2 3 4 5 7 e ur 10 at er 0 p m −10 Te −20 500 1000 1500 2000 2500 3000 3500 Days (b) One day previous precipitation components: DW +DW +DW +DW +DW +DW +DW 100 2 3 4 5 6 7 8 80 nfall 4600 ai 20 R 0 −20 500 1000 1500 2000 2500 3000 3500 Days (c) Figure4:Summedsubseries:(a)minimumtemperature,(b)maximumtemperature,and(c)1-daypreviousrainfall. 120 notoverwhelmsmallvaluevariables. 100 5.2. Evolved Postfix-GP Solutions. The evolved Postfix-GP mm) 80 solution(forterminalset{𝑇Min,𝑇Max,𝑇Mean,EP,RH,RF1})is all ( 60 shownin(3).Itisnotedthatthesolutioncontainssignificant nf meteorological variables (evaporation, minimum tempera- ai 40 R ture, relative humidity, and maximum temperature) which 20 are useful for rainfall prediction. We got MSE = 48.2860, 0 adjustedfitness=0.3926,and𝑟=0.7469forthetrainingdata 0 100 200 300 400 500 600 700 Time (day) 𝑒(cos(cos(−10.6EP𝑇𝑖))−(𝑋𝑖/(sin(RF1RH)∗RH))+3.7292), (3) Observed points Postfix-GP predicted points where𝑋𝑖 =EP∗𝑒Log10(4.3EP+0.4941)/(𝑇𝑥+RH). TheobservedandPostfix-GPmodelpredictedvaluesfor Figure5:DailyrainfallpredictionbyPostfix-GPModelforthetest the testing period are presented in Figure5. The model has period(2years). predictedthegeneralbehavioroftheobservedrainfalldata. It has accurately predicted the summer days and estimated zero rainfall for these days. However, the model performs satisfactoryforestimatingmaximumrainfallvalues. 5.3.ComparisonofResultObtainedbyWavelet-Postfix-GPand Wavelet-ANN Models. We have compared the performance ofwavelet-Postfix-GPandwavelet-ANNmodelsforthedaily rainfallprediction.Theselectionofarchitecture(numberof layers and number of nodes in each layer) and the training Thedatanormalizationassuresthatlargevaluevariablesdo algorithmisimportantdesignparametersofANN.Thereis AdvancesinArtificialIntelligence 9 Table5:Statisticalmeasuresforthebestwavelet-Postfix-GPandwavelet-ANNmodelsfordifferentinputcombinationsfortestperiod. Wavelet-Postfix-GP Wavelet-ANN Modelinputs MSE Adj 𝑅 ANNstructure MSE Adj 𝑅 Fit Fit 𝑇Min,𝑇Max,𝑇Mean,EP,RH,RF1,RF2 49.2494 0.4002 0.6661 7,5,1 52.6614 0.2804 0.6919 𝑇Min,𝑇Max,𝑇Mean,EP,RH,RF1 47.5766 0.4017 0.6794 6,6,1 50.4010 0.3545 0.7641 120 suggestthatPostfix-GPproducescomparableresultsandcan 100 beanalternativetoANNapproach. m) 80 m 60 nfall ( 40 Rai 20 0 6.Conclusion −20 0 100 200 300 400 500 600 700 We applied four homogeneity tests (SNHT, BR, Pettit, and Time (day) VNR) to detect the variability of the meteorological (mini- mum temperature, mean temperature, maximum tempera- Observed points ture,evaporation,andrelativehumidity)variablesrecorded WANN predicted points at Anand station. The results of homogeneity tests suggest Figure6:DailyrainfallpredictionbyANNmodelforthetestperiod that the time series of the meteorological variables are (2years). homogeneous.Theseriesofeverymeteorologicalvariableis decomposedupto10resolutionlevelusingdiscretewavelet transform.Thecorrelationcoefficientbetweenthegenerated noruleavailabletofindoutthenumberofhiddenlayersand DW subseries at different levels and the original rainfall appropriatenumberofnodesforeachhiddenlayer.However, seriesiscalculated.ThenumberofDWsubserieswhichhave practitioners[43]foundthatANNwithonehiddenlayeris correlation with the original rainfall series is identified and complex enough to model the nonlinear properties of the summeduptogenerateanewsubseriesforeverymeteoro- hydrologic processes. We investigated different architecture logicalvariable.It isobservedthatbothannualandshorter withdifferentactivationfunctionsandnumberofneuronsfor timeperiodcomponentsofrelativehumidityhavecorrelation thehiddenlayerandselectedtheonewhichgivesanoptimal withrainfall.Theannualcomponentsofminimumandmean result.Amultilayerfeed-forwardANNthatcomprisesseven temperatureshowcorrelationwithrainfallseries.Thenewly nodes in the input layer, one hidden layer with five nodes, generatedmeteorologicalsubseriesareregardedasinputsand and one node in the output layer is selected for the rainfall therainfallseriesasoutput. prediction. Moreover, the input and hidden layers have ThePostfix-GPisthenemployedtodevelopamodelthat an additional bias neuron. The ANN is trained with back canexplainrelationshipbetweentheinputsandtheoutput. propagation (BP) learning algorithm. We set log-sigmoid Thedevelopedmodelisthenusedfortherainfallprediction. transferfunctionbetweeninputandhiddenlayersandpure Thepredictionperformanceoftheevolvedmodelwasgood, lineartransferfunctionbetweenhiddenandoutputlayers. giving low values for MAE and MSE and high value for Figure6presentstheobservedandwavelet-ANNmodel thecorrelationcoefficient,suggestingthatPostfix-GPisable predictedrainfallvalues.Themodelhaspredictedthegeneral to evolve an accurate and reliable model. The advantage of behavior of the observed rainfall data. It is observed that Postfix-GPoverANNisthatitgivesanexplicitmathematical wavelet-ANN model has produced negative prediction for nonlinear equation that describes the relationship between some days having zero rainfall, which are not practically inputs and output. The predictive performance of Postfix- possible.PartalandCigizoglu[14]foundthesimilarbehavior GPandANNmodelsiscompared.Theresultsshowthatthe whileapplyingANNforpredictingdailyprecipitation.They Postfix-GPisafaircompetitorofANNapproach.Moreover, noted that the negative prediction problem occurred due thepredictiveperformanceoftheevolvedPostfix-GPmodel to extrapolation ability of feed-forward back-propagation for the nonrainy (zero rainfall) periods and rainy (highest mechanism.Moreover,similartoPostfix-GP,theANNfailed rainfall)periodsissatisfactoryascomparedtoANNmodel, inaccuratelypredictingthepeakvaluesofrainfallduringthe which produces negative prediction for some days of sum- testperiod. merseason.Weconcludethatwavelet-Postfix-GPapproach Table5 presents the statistical results for both wavelet- obtained good quality solutions for the tested rainfall pre- Postfix-GP and wavelet-ANN hybrid models for different diction series and could be explored as an alternative tool inputcombinationsforthetestperiod.ThePostfix-GPmodel forpredictingthehydrometeorologicalvariables.Ourfuture hasproducedslightlybetterresultsthanANNfromtheMSE planistoapplyPostfix-GPfordevelopingmoreaccurateand andfitnessviewpoint.TheANNmodelslightlyoutperforms reliablemodelsusingdifferentcombinationoffunctionand Postfix-GP in terms of correlation coefficient. The results terminalsets. 10 AdvancesinArtificialIntelligence ConflictofInterests artificialneuralnetwork,”ExpertSystemswithApplications,vol. 35,no.3,pp.1415–1421,2008. The authors declare that there is no conflict of interests [16] C.Wu,K.Chau,andC.Fan,“Predictionofrainfalltimeseries regardingthepublicationofthispaper. using modular artificial neural networks coupled with data- preprocessingtechniques,”JournalofHydrology,vol.389,no.1- Acknowledgments 2,pp.146–167,2010. [17] J.R.Koza,GeneticProgramming:OntheProgrammingofCom- TheauthorswouldliketoacknowledgeKrishibhavan,Gand- puters by Means of Natural Selection, MIT Press, Cambridge, hinagar,andAnandAgriculturalUniversity,Anand,fortheir Mass,USA,1992. supportintermsofmeteorologicaldata. [18] V. Babovic, “Introducing knowledge into learning based on geneticprogramming,”JournalofHydroinformatics,vol.11,no. 3-4,pp.181–193,2009. References [19] M. Keijzer and V. Babovic, “Declarative and preferential bias [1] V.Y.Jain,A.Sharma,S.Chaudhary,andV.K.Tyagi,“Spatial in GP-based scientific discovery,” Genetic Programming and analysisforgeneratingrecommendationsforagriculturalcrop EvolvableMachines,vol.3,no.1,pp.41–79,2002. production,”inProceedingsoftheIndiaConferenceonGeospa- [20] V.BabovicandM.B.Abbott,“Theevolutionofequationsfrom tialTechnologiesandApplications(ICGTA’12),2012. hydraulicdataPartI:theory,”JournalofHydraulicResearch,vol. [2] L. R. Nayagam, R. Janardanan, and H. S. R. Mohan, “An 35,no.3,pp.397–430,1997. empirical model for the seasonal prediction of southwest [21] V. Babovic and M. B. Abbott, “Evolution of equations from monsoonrainfalloverKerala,ameteorologicalsubdivisionof hydraulic data. Part II: applications,” Journal of Hydraulic India,”InternationalJournalofClimatology,vol.28,no.6,pp. Research,vol.35,no.3,pp.411–430,1997. 823–831,2008. [22] V. K. Dabhi and S. Chaudhary, “Empirical modeling using [3] T.DelSoleandJ.Shukla,“LinearpredictionofIndianmonsoon genetic programming: a survey of issues and approaches,” rainfall,”JournalofClimate,vol.15,no.24,pp.3645–3658,2002. NaturalComputing,2014. [4] A.R.GangulyandR.L.Bras,“Distributedquantitativeprecipi- [23] O. Kisi and J. Shiri, “Precipitation forecasting using wavelet- tationforecastingusinginformationfromradarandnumerical genetic programming and wavelet-neuro-fuzzy conjunction weatherpredictionmodels,”JournalofHydrometeorology,vol. models,”WaterResourcesManagement,vol.25,no.13,pp.3135– 4,no.6,pp.1168–1180,2003. 3152,2011. [5] T.Diomede,S.Davolio,C.Marsiglietal.,“Dischargeprediction [24] S.-Y.Liong,T.R.Gautam,T.K.Soon,V.Babovic,M.Keijzer,and basedonmulti-modelprecipitationforecasts,”Meteorologyand N.Muttil,“GeneticProgramming:anewparadigminrainfall AtmosphericPhysics,vol.101,no.3-4,pp.245–265,2008. runoff modeling,” Journal of the American Water Resources [6] C.Chatfield,TheAnalysisofTimeSeries:AnIntroduction,CRC Association,vol.38,no.3,pp.705–718,2002. press,NewYork,NY,USA,2003. [25] V.BabovicandM.Keijzer,“Rainfall-runoffmodellingbasedon [7] V.Babovic,“Datamininginhydrology,”HydrologicalProcesses, geneticprogramming,”NordicHydrology,vol.33,no.5,pp.331– vol.19,no.7,pp.1511–1515,2005. 346,2002. [8] N. Q. Hung, M. S. Babel, S. Weesakul, and N. K. Tripathi, [26] S.T.Khu,S.-Y.Liong,V.Babovic,H.Madsen,andN.Muttil, “Anartificialneuralnetworkmodelforrainfallforecastingin “Geneticprogramminganditsapplicationinreal-timerunoff Bangkok,Thailand,”HydrologyandEarthSystemSciences,vol. forecasting,”JournaloftheAmericanWaterResourcesAssocia- 13,no.8,pp.1413–1425,2009. tion,vol.37,no.2,pp.439–451,2001. [9] K.P.Moustris,I.K.Larissi,P.T.Nastos,andA.G.Paliatsos, [27] J. B. Wijngaard, A. M. G. Klein Tank, and G. P. Ko¨nnen, “Precipitationforecastusingartificialneuralnetworksinspe- “Homogeneityof20thcenturyEuropeandailytemperatureand cificregionsofGreece,”WaterResourcesManagement,vol.25, precipitationseries,”InternationalJournalofClimatology,vol. no.8,pp.1979–1993,2011. 23,no.6,pp.679–692,2003. [10] G.-F. Lin and L.-H. Chen, “Application of an artificial neural [28] I.Daubechies,TenLecturesonWavelets,vol.61,SIAM,Philadel- networktotyphoonrainfallforecasting,”HydrologicalProcesses, phia,Pa,USA,1992. vol.19,no.9,pp.1825–1837,2005. [29] S.Mallat,“Awavelettourofsignalprocessing,”AWaveletTour [11] V.Babovic,R.Canˇizares,H.R.Jensen,andA.Klinting,“Neural ofSignalProcessing,2009. networksasroutineforerrorupdatingofnumericalmodels,” JournalofHydraulicEngineering,vol.127,no.3,pp.181–193,2001. [30] V.K.DabhiandS.K.Vij,“Empiricalmodelingusingsymbolic [12] Y. Sun, V. Babovic, and E. S. Chan, “Multi-step-ahead model regressionviapostfixgeneticprogramming,”inProceedingsof error prediction using time-delay neural networks combined the International Conference on Image Information Processing withchaostheory,”JournalofHydrology,vol.395,no.1-2,pp. (ICIIP’11),pp.1–6,November2011. 109–116,2010. [31] M.J.KeithandM.C.Martin,“Geneticprogramminginc++: [13] V.Babovic,S.A.Sannasiraj,andE.S.Chan,“Errorcorrection implementationissues,”inAdvancesin GeneticProgramming, ofapredictiveoceanwavemodelusinglocalmodelapproxima- pp.285–310,1994. tion,”JournalofMarineSystems,vol.53,no.1–4,pp.1–17,2005. [32] J.H.Holland,AdaptationinNaturalandArtificialSystems:An [14] T.PartalandH.K.Cigizoglu,“Predictionofdailyprecipitation IntroductoryAnalysiswithApplicationstoBiology,Controland usingwavelet-neuralnetworks,”HydrologicalSciencesJournal, ArtificialIntelligence,MITPress,Cambridge,Mass,USA,1992. vol.54,no.2,pp.234–246,2009. [33] C. Ferreira, “Gene expression programming: a new adaptive [15] M.Nasseri,K.Asghari,andM.J.Abedini,“Optimizedscenario algorithmforsolvingproblems,”ComplexSystems,vol.13,no. for rainfall forecasting using genetic algorithm coupled with 2,pp.87–129,2001.

Description:
wavelet-postfix-GP model for daily rainfall prediction of Anand region using meteorological variables. The wavelet algorithm (GA) for rainfall forecasting in western suburbs outperforms other strategies for real-time flow forecasting. [26]. the memory requirement and evaluation time compared to.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.