IEEETRANSACTIONSONNEURALNETWORKSANDLEARNINGSYSTEMS,VOL.XX,NO.XX,XXXX2014 1 Stochastic Gradient Based Extreme Learning Machines For Online Learning of Advanced Combustion Engines Vijay Manikandan Janakiraman, XuanLong Nguyen, and Dennis Assanis Abstract—In this article, a stochastic gradient based online are common where the control actions are often made using a 5 learning algorithm for Extreme Learning Machines (ELM) is predictive model of the engine [11], [6], [12]. As alternatives developed (SG-ELM). A stability criterion based on Lyapunov 1 to physics based modeling that might involve significant approach is used to prove both asymptotic stability of estima- 0 developmenttimeandassociatedcosts,databasedapproaches tion error and stability in the estimated parameters suitable 2 for identification of nonlinear dynamic systems. The developed were introduced [13], [14], [15] that takes advantage of the n algorithm not only guarantees stability, but also reduces the extensive experimentation that is performed during the engine a computational demand compared to the OS-ELM approach calibration process. J [1] based on recursive least squares. In order to demonstrate ThekeyrequirementforamodelbasedcontrolofanHCCI 6 the effectiveness of the algorithm on a real-world scenario, an engine is the ability to accurately predict the engine state 1 advancedcombustionengineidentificationproblemisconsidered. Thealgorithmisappliedtotwocasestudies:Anonlineregression variables for several operating cycles ahead of time, so that ] learning for system identification of a Homogeneous Charge a control action with a known effect can be applied to the E CompressionIgnition(HCCI)Engineandanonlineclassification engine. Further, in order to be vigilant against the engine N learning (with class imbalance) for identifying the dynamic drifting towards instabilities such as misfire, ringing, knock, operating envelope of the HCCI Engine. The results indicate . etc [16], [17], the operating limits of the engine particularly s that the accuracy of the proposed SG-ELM is comparable to c that of the state-of-the-art but adds stability and a reduction in in transients, is required. In order to develop controllers and [ computational effort. operate the engine in a stable manner, both models of the 1 Index Terms—Stochastic Gradient, Extreme Learning Ma- engine operating envelope as well as models of engine state v chines, Online Learning, Online Classification, System Identifi- variables are necessary. 5 cation, Class Imbalance Learning, Lyapunov Stability, Homoge- The state variables of an engine can be defined as the 7 neousChargeCompressionIgnition,OperatingEnvelopeModel, fundamental quantities that represent the state of operation of 9 Misfire Prediction, Engine Diagnostics, Engine Control. the engine. As a consequence, these variables also influence 3 0 the performance of the engine such as fuel efficiency, emis- . sionsandstability,andarerequiredtobemonitored/regulated. 1 I. INTRODUCTION For this work, the net mean effective pressure (NMEP) and 0 5 Homogeneous Charge Compression Ignition (HCCI) En- the phasing of combustion event (CA50) with respect to the 1 gines are of significant interest to the automotive industry engine’s top dead center [13] are considered representative v: owing to their ability to reduce emissions and fuel con- states that represents the quality of engine operation. More i sumption significantly compared to traditional spark ignition fundamental state variables such as in-cylinder temperature, X and compression ignition engines [2], [3], [4]. The highly pressure, chemical composition of combustion mixtures can r efficientoperationofHCCIisachievedusingadvancedcontrol be considered but these variables cannot be measured feasibly a strategiessuchasexhaustgasrecirculation(EGR)[5],variable on a production engine. valve timings (VVT) [6], intake charge heating [7] among The HCCI engine has a narrow region of stable operation others. Such complex manipulations of the system results in a defined by an operating envelope. The dynamic operating highly nonlinear behavior [8] with a narrow region of stable envelope of an engine can be defined as a stable region in operation [9], [10]. the operating space of the engine. The significance of the Control of HCCI combustion is a major challenge for auto- operating envelope and data based modeling approaches are motive application. Several factors contribute to the challenge recently introduced by the authors [15]. Knowledge of the including the absence of a direct trigger for combustion, operatingenvelopeiscrucialfordesigningefficientcontrollers narrow operating range and high sensitivity to disturbances. for the following reasons. The developer can get insights on To address the issue, advanced model based control methods the actuator extremes [18], such as the minimum and maxi- mum quantity of fuel to be injected into the engine at a given Vijay Manikandan Janakiraman is currently with UARC @ NASA Ames speed and load conditions. The actuator extremes can then be Research Center, Moffett Field, CA, USA. He was previously with the DepartmentofMechanicalEngineering,UniversityofMichigan,AnnArbor, usedtoenforceconstraintsonthecontrolvariablesfordesired MI,USA.E-mail:[email protected]. engine operation. Furthermore, an operating envelope model XuanLong Nguyen is with the Department of Statistics, University of could enable designing efficient engine diagnostic systems Michigan,AnnArbor,MI,USA. DennisAssanisiswiththeStonyBrookUniversity,NY,USA. based on predictive analytics. For instance, a misfire event is IEEETRANSACTIONSONNEURALNETWORKSANDLEARNINGSYSTEMS,VOL.XX,NO.XX,XXXX2014 2 alackofcombustionwhichproducesnoworkoutputfromthe tion accuracies, global optimal solution and in quick time. engine.Themisfiredfuelenterstheexhaustsystemincreasing In spite of its known advantages, an over-parameterized emissions of hydrocarbon and carbon monoxide [19], [20]. ELM suffers from ill-conditioning problem when a recursive When theengine misfires, pollutantlevels may behigher than least squares type update is performed (as in OS-ELM). This normal. Real time monitoring of the exhaust emission control sometimes results in poor regularization behavior [22], [23], system and engine misfire detection are essential to meet [24], [25], which leads to an unbounded growth of the model requirementsonOn-BoardDiagnostic(OBD)regulations.The parameters and unbounded model predictions. If decisions are envelope model can be used to alarm the onboard diagnostics made as the model is updated (as in case of adaptive control if the engine is about to misfire owing to changes in system forinstance[26]),itisvitalfortheparameterestimationtobe or operating conditions. stablesothatmodelbaseddecisionsarevalid.Henceaguaran- Data based modeling approaches for the HCCI engine state tee of stability and boundedness is of extreme importance. To variables and dynamic operating envelope were demonstrated address this issue, a stable online learning algorithm based on using neural networks [13], support vector machines [14], stochasticgradientdescentisdevelopedandstabilityisproved extreme learning machines [21] by the authors. However, the using Lyapunov stability theory. Although Lyapunov based previous research considered an offline approach where the approaches are popular in control theory, notable prior work datacollectedfromengineexperimentsweretakenofflineand for online learning include a Lyapunov approach applied for models were developed using computer workstations that had identification using radial basis function neural networks [27] high processing and memory. However, a key requirement and GLO-MAP models [28]. The parameter update in such in advancing the capabilities of data based HCCI modeling methods involves complex gradient calculation in real time or task is to perform online learning for the following reasons. firstestimatingalinearmodelandthenestimatinganonlinear The models developed offline are valid only in the controlled difference using orthonormal polynomial basis functions. The experimental conditions. For instance, the experiments are approach proposed in this paper aims to retain the simplicity performed at a controlled ambient temperature, pressure and and generalization power of ELM and OS-ELM algorithms, humidity conditions. As a result, the models developed are and introduce stability in parameter estimation so that such valid for the specified conditions and a when the models are online models could be used for real-time control purposes. implemented, for instance, on a vehicle, the expectation is The objective of this article is to develop a stable online that the model works on a wide range of climatic conditions learningalgorithmforELMmodelsusingstochasticgradients that the vehicle is exposed to, possibly conditions that were and apply to the HCCI engine modeling problem. The contri- not experimented. Hence, an online adaptation to learn the butions of the paper are as follows. A novel online learning behaviorofthesystematnew/unfamiliarsituationsisrequired. algorithm based on stochastic gradient descent for extreme Also, since the offline models are developed directly from ex- learning machines is developed. The stability of parameter perimental data, they may perform poorly in certain operating estimation for dynamic systems is proved using a Lyapunov regionswherethedensityofexperimentaldataislow.Asmore stbility approach. The application of the stochastic gradient data becomes available in such regions, an online mechanism ELM algorithm to the complex HCCI engine identification is can be used to adapt to such data. In addition, the engine thefirstapplication(toourbestknowledge)ofonlinelearning produceshighvelocitystreamingdata;operatingatabout2500 schemes to HCCI engines. This includes both the online state revolutionsperminute,anin-cylinderpressuresensorcanpro- estimation problem as well as the online operating boundary duce about 1.8 million data observations per day. It becomes estimation problem. infeasible to store this data for offline model development. The remainder of the article is organized as follows. The Thus, an online learning framework that processes every data ELMmodelingapproachisdescribedinSectionIIalongwith observation, updates the model and throws away the data is algorithm details on batch (offline) learning as well as the required for advanced engines like HCCI. present state of the art - the OS-ELM. In Section III, the Online learning algorithms exist for linear and nonlin- stochasticgradientbasedELMalgorithmisderivedalongwith ear models. For combustion engine applications, algorithms stabilityproof.InSectionIV,thebackgroundonHCCIengine involving linear models are common in adaptive control. and experimentation are discussed. Sections V and VI cover However, for a system like the HCCI engine, linear models the discussions on the application of the SG-ELM algorithm may be insufficient to capture the complex dynamics and on the two applications, followed by conclusions in Section the authors showed that nonlinear identification models out- VII. performed linear models, particularly for predicting several steps ahead in time [13]. While numerous techniques for II. EXTREMELEARNINGMACHINES online learning do exist in machine learning literature, a complete survey is beyond the scope of this article. The Extreme Learning Machine (ELM) is an emerging learning recent paper on online sequential extreme learning machines paradigmformulti-classclassificationandregressionproblems (OS-ELM) [1] surveys popular online learning algorithms in [29], [30]. An advantage of the ELM method is that the train- the context of classification and regression and develops an ing speed is extremely fast, thanks to the random assignment efficient algorithm based on recursive least squares. The OS- of input layer parameters which do not require adaptation to ELM algorithm seems to be the present state of the art for the data. In such a setup, the output layer parameters can be classification/regression problems achieving high generaliza- analytically determined using a least squares approach. Some IEEETRANSACTIONSONNEURALNETWORKSANDLEARNINGSYSTEMS,VOL.XX,NO.XX,XXXX2014 3 of the attractive features of ELM [29] include the universal that maps the input vector to a high dimensional feature approximation capability of ELM, the convex optimization space while br ∈ Rnh is a bias component assigned in a problem of ELM resulting in the smallest training error randommannersimilartoW .Thenumberofhiddenneurons r without getting trapped in local minima, closed form solution determines the expressive power of the transformed feature ofELMeliminatingiterativetrainingandbettergeneralization space. The elements can be assigned based on any continuous capability of ELM [30]. randomdistribution[30]andremainsfixedduringthelearning Consider the following data set process.Hencethetrainingreducestoasinglestepcalculation (cid:0) (cid:1) given by equation (4). The ELM decision hypothesis can be {(x ,y ),...,(x ,y )}∈ X,Y , (1) 1 1 N N expressedasinequation(5)forclassificationandequation(6) where N denotes the number of training samples, X denotes forregression.Itshouldbenotedthatthehiddenlayerandthe the space of the input features and Y denotes labels whose corresponding activation functions give a nonlinear mapping naturedifferentiatethelearningprobleminhand.Forinstance, ofthedata,whichifeliminated,becomesalinearleastsquares ifY takesintegervalues{1,2,3,..}thentheproblemisreferred (Linear LS) model and is considered as one of the baseline to as classification and if Y takes real values, it becomes models in this study. a regression problem. ELMs are well suited for solving W∗ =(cid:0)HTH +λI(cid:1)−1HTY (4) both regression and classification problems faster than state of the art algorithms [30]. A further distinction could be f(x)=sgn(cid:0)WT[ψ(WTx+b )](cid:1). (5) r r made depending on the availability of training data during the learning process, as offline learning (or batch learning) f(x)=WT[ψ(WrTx+br)] (6) and online learning (or sequential learning). Offline learning Since training involves a linear least squares solution with could make use of all training data simultaneously as all a convex objective function, the solution obtained by ELM is data is available to the algorithm. In addition, as the models extremely fast and is a global optimum for the chosen n , h are developed offline, efficient use of available computational W and b . The above formulation for classification (5), is r r resources could be made enabling offline algorithms to solve not designed to handle imbalanced or skewed data sets. As a complexoptimizationproblems.Typically,theaccuracyofthe modification to weigh the minority class data more, a simple modeling task takes priority over both computational demand weighting method can be incorporated in the ELM objective and training time. On the other hand, situations where data function (2) as is available as high velocity steams where it not feasible to store all data and make inference in quick time, or in min(cid:8)(HW −Y)TΓ(HW −Y)+λWTW(cid:9) (7) W situations where the inference is simultaneously made along with adaptation of model to incoming data, online learning is γ1 0 . . 0 preferred. In an online learning setting, data is available one- Γ= 0 γ2 . . 0 by-one and needs to be processed with limited computational . . . . 0 effort and storage. Further, inference is required to be made 0 0 . . γN with each new available data along with the ones recorded in (cid:40) 1 majority class data thepast.Inthiswork,theonlinesettingisconsideredwherea γ = (8) i stable online learning algorithm is proposed that is compared r×fs minority class data withtheofflineapproachandexistingonlinelearningmethod. whereΓrepresentstheweightmatrix,r representstheratioof number of majority class data to number minority class data A. Batch (Offline) ELM and f represents a scaling factor to be tuned for a given data s set[15].Thisresultsinthetrainingstepgivenbyequation(9) When the entire training data is available and a model is andthedecisionhypothesistakesthesameformasinequation requiredtobelearnedusingallthetrainingdata,batchlearning (5): is adopted. In this case, the ELM algorithm involves solving W∗ =(cid:0)HTΓH +λI(cid:1)−1HTΓY. (9) the following optimization problem min(cid:8)(cid:107)HW −Y(cid:107)2+λ(cid:107)W(cid:107)2(cid:9) (2) W B. Online Sequential ELM (OS-ELM) HT =ψ(WrTx(k)+br)∈Rnh×1, (3) The OS-ELM [1] is a recursive version of the batch ELM algorithm. This version of the algorithm is used for online where λ represents the regularization coefficient, Y represents learning purposes where data is processed one-by-one or the vector of outputs or targets, ψ represents the hidden layer chunk-by-chunk and the model parameters are updated after activationfunction(sigmoidal,sinusoidal,radialbasisetc[30]) which the used data is not required to be stored. In this and Wr ∈ Rn×nh,W ∈ Rnh×yd represents the input and process, training involves two steps - initialization step and output layer parameters respectively. Here, n represents the sequentiallearningstep.Duringtheinitializationstep,asetof dimensionofinputsx(k),nh representsthenumberofhidden data observations (N0) are required to initialize the H0 and neurons of the ELM model, H represents the hidden layer W by solving the following optimization problem 0 output matrix and y represents the dimension of outputs Y. The matrix Wr cdonsists of randomly assigned elements mWi0n(cid:8)(cid:107)H0W0−Y0(cid:107)2+λ(cid:107)W0(cid:107)2(cid:9) (10) IEEETRANSACTIONSONNEURALNETWORKSANDLEARNINGSYSTEMS,VOL.XX,NO.XX,XXXX2014 4 H0 =[g(WrTx0+br)]T ∈RN0×nh. (11) the empirical risk. However, the global solution is not typi- callyobtainedbecauseofcomputationallimitationsandhence The solution W is given by 0 the solution of the learning problem is reduced to finding W0 =K0−1H0TY0 (12) f¯F =argminf∈FEemp(f). Using the above setup, the approximation error (E ) is app where K0 = H0TH0 +λI. Suppose given another new data the error introduced in approximating the true function space x1, the problem becomes with a family of functions F, the estimation error (Eest) is mWi1n(cid:13)(cid:13)(cid:13)(cid:13)(cid:20) HH01 (cid:21)W1−(cid:20) YY01 (cid:21)(cid:13)(cid:13)(cid:13)(cid:13)2. (13) tEaheerxepes(urfrlot)r,otifhnetsrtooodppupticimnegdizatihtnieoonopeptirtmrimoirzizi(naEgtioopontv)etiorsEtfh¯eem.epTr(rhfoer)itinondtsautlecaeeddrrooasrf F The solution can be derived as E can be expressed as tot W1 = W0+K1−1H1T(Y1−H1W0) Eapp = Eexp(f∗)−Eexp(fF∗) K1 = K0+H1TH1. Eest = Eexp(fF∗)−Eemp(f¯F∗) Based on the above, a generalized recursive algorithm for Eopt = Eemp(f¯F∗)−Eemp(f¯F) updatingtheleast-squaressolutioncanbecomputedasfollows E = E +E +E tot app est opt Mk+1 =Mk−MkHkT+1(I+Hk+1MkHKT+1)−1Hk+1Mk The following observations are taken from the asymptotic (14) analysis of SGD algorithms [31], [34]. W =W +M HT (Y −H W ) (15) k+1 k k+1 k+1 k+1 k+1 k 1) The empirical risk Eemp(f) is only a surrogate for the expected risk E (f) and hence an increased effort to where M represents the covariance of the parameter estimate. exp minimize E may not translate to better learning. In opt fact, if E is very low, there is a good chance that the III. STOCHASTICGRADIENTBASEDELMALGORITHM opt prediction function will over-fit the training data. Inthissection,theproposedonlinelearningalgorithmusing 2) SGD are worst optimization algorithms (in terms of stochasticgradientdescent(SGD)isdevelopedfortheextreme reducing E ) but they minimize the expected risk opt learningmachinemodelsforbothclassificationandregression relatively quickly. Therefore, in the large scale setup, problems.SGDmethodshavebeenpopularforseveraldecades when the limiting factor is computational time rather for performing online learning but with severe limitations on than the number of examples, SGD algorithms perform poor optimization and slow convergence rates. However, only asymptotically better. recently, the asymptotic behavior of SGD methods has been 3) SGD results in a faster convergence when the loss analyzed indicating that SGD methods can be very powerful function has strong convexity properties. for learning large data sets [31], [32]. SGD based algorithms The last observation is key in developing the algorithm havebeendevelopedforAdalinenetworks,perceptronmodels, based on ELM models. The ELM models have a squared loss K-means, SVM and Lasso [31]. In this work, the SGD function and when the hidden neurons are randomly assigned algorithmisdevelopedforextremelearningmachinesshowing andfixed,thetrainingtranslatestosolvingaconvexoptimiza- goodpotentialforonlinelearningofhighvelocity(streaming) tion problem. Hence the ELM model can be a good candidate data. to perform SGD type learning and hence the motivation for The justification of SGD based algorithms in machine this study. The SGD based algorithm can be derived for the learning can be briefly discussed as follows. In any learning ELM models as follows. problem, three types of errors are encountered, namely the approximation error, the estimation error and the optimization error [31], and the expected risk E (f) and the empirical A. Algorithm Formulation exp riskEemp forasupervisedlearningproblemdcanbegivenby Let (xi,yi) where i = 1,2,..N be the streaming data in (cid:90) consideration.Thedatacanbeconsideredtobeavailabletothe E (f) = l(f(x),y)dP(x,y) exp algorithm from a one-by-one continuous stream or artificially sampled one-by-one from a very large data set. Let the ELM N E (f) = 1 (cid:88)l(f(x ),y ) empirical risk be defined as follows emp N i i i=1 N 1(cid:88) Let f∗ = argminfEexp(f) be the best possible prediction J(W) = mWin2 (cid:107)yi−φTi W(cid:107)2 function. In practice, the prediction function is chosen from i=1 (cid:26) (cid:27) a family of parametric functions denoted by F. Let fF∗ = = min 1(cid:107)y −φTW(cid:107)2+..+ 1(cid:107)y −φTW(cid:107)2 argminf∈FEexp(f) be the best prediction function chosen W 2 1 1 2 N N from a parameterized family of functions F. When a training = min{J (W)+J (W)+..+J (W)}. (16) 1 2 N W data set becomes available, the empirical risk becomes a proxy for the expected risk for the learning problem [33]. where W ∈ Rnh×yd, yi ∈ R1×yd φ ∈ Rnh×yd is the hidden Let f¯F∗ =argminf∈FEemp(f) be the solution that minimizes layer output (see HT in equation (3)). If an error ei ∈R1×yd IEEETRANSACTIONSONNEURALNETWORKSANDLEARNINGSYSTEMS,VOL.XX,NO.XX,XXXX2014 5 can be defined as (y −φTW), the learning objective for a The instantaneous prediction error e (Here the error e and i i i data observation i can be given by outputy aretransposedasopposedtotheirpreviousdefinition 1 in Section III-A for ease of derivations) can be expressed in Ji(W) = 2eTi ei terms of parametric error (W˜ =W∗−W) as 1 = 2(yi−φTi W)T(yi−φTi W) ei = yi−WTφi = 1yTy + 1WTφ φTW −yTφTW = W∗Tφi−WTφi 2 i i 2 i i i i = W˜Tφ (21) i ∂J ∂Wi = φiφTi W −φiyi =φi(φTi W −yi) where W∗ represents true model parameters. Further, the = −φ e . (17) parametric error dynamics can be obtained as follows. i i In a regular gradient descent (GD) algorithm, the gradient of W˜i+1 = W∗−Wi+1 J(W) is used to update the model parameters as follows. = W −W −Γ φ eT ∗ i SG i i ∂J = ∂J1 + ∂J2 +..+ ∂JN = W˜i−ΓSGφieTi (22) ∂W ∂W ∂W ∂W ∂J Consider the following positive definite, decrescent and ⇒ = −φ e −φ e −..−φ e ∂W 1 1 2 2 N N radially unbounded [35] Lyapunov function V ∂J W = W −Γ V(W˜)=tr(W˜TΓ−1W˜) (23) k+1 k SG∂W SG = Wk+ΓSG(φ1e1)+..+ΓSG(φNeN)(18) where tr represents the trace of a matrix. where k is the iteration count, ΓSG ∈ ∆V(W˜ ) = V(W˜ )−V(W˜ ) i i+1 i mathbbRnh×nh representsthestepsizeorupdategainmatrix = tr(W˜T Γ−1W˜ )−tr(W˜TΓ−1W˜ ) for the GD algorithm. i+1 SG i+1 i SG i It can be seen from equation (18) that the parameter matrix = tr((W˜i−ΓSGφieTi )TΓ−SG1(W˜i−ΓSGφieTi )) W is updated based on gradients calculated from all the −tr(W˜TΓ−1W˜ ) i SG i availableexamples.Ifthenumberofdataobservationsislarge, = tr(−2W˜Tφ eT +e φTΓ φ eT) the gradient calculation can take enormous computational i i i i i SG i i = tr(−2e eT +e φTΓ φ eT) effort.Thestochasticgradientdescentalgorithmconsidersone i i i i SG i i exampleatatimeandupdatesW basedongradientscalculated = −2eTi ei+eTi eiφTi ΓSGφi from (xi,yi) as shown in = −2eTi ei+eTi φTi ΓSGφiei Wi+1 =Wi+ΓSG(φiei). (19) = −eTi MSGei (24) Fromequation(18),itisclearthattheoptimalW isafunction whereM =2−φTΓ φ .ItcanbeseenthatV −V ≤0 SG i SG i i+1 i of gradients calculated from all the examples. As a result, if M >0 or 2−φTΓ φ >0 or SG i SG i as more data becomes available, W converges close to its 0<λ (Γ )<2 (25) optimal value in SGD algorithm. Processing data one-by-one max SG significantly reduces the computational requirement and the When (25) is satisfied, V(W˜)≥0 is non-increasing in i and algorithm is scalable to large data sets. More importantly, for the limit theonlinelearningofHCCIenginedynamicconsideredinthis lim V(W˜)=V (26) ∞ work, the SGD algorithm becomes a strong candidate. k→∞ In order to handle class imbalance learning, the algorithm exists. From (24), in (19) can be modified by weighting the minority class data more. The modified algorithm can be expressed as Vi+1−Vi = −eTi MSGei ∞ ∞ Wi+1 =Wi+ΓimbΓSG(φiei) (20) (cid:88)(Vi+1−Vi) = −(cid:88)eTi MSGei where Γ =r×f , r and f represent the imbalance ratio i=0 i=0 imb s s ∞ (a running count of majority class data to minority class data ⇒(cid:88)eTM e = V(0)−V <∞ (27) until that instant) and the scaling factor that needs to be tuned i SG i ∞ i=0 to obtain tradeoffs between high false positives and missed (28) detections for a given application. Also, ∞ ∞ B. Stability Analysis (cid:88)eTIe ≤(cid:88)eTM e <∞ (29) i i i SG i ThestabilityanalysisoftheSGDbasedELMalgorithmcan i=0 i=0 be derived as follows. The ELM structure makes the analysis when M >I or when SG simple and similar to that of a linear gradient based algorithm [35]. λ (Γ )<1. (30) max SG IEEETRANSACTIONSONNEURALNETWORKSANDLEARNINGSYSTEMS,VOL.XX,NO.XX,XXXX2014 6 Hence, when (30) is satisfied, e ∈ L . From (19), (W − physical variables that influence the performance of HCCI i 2 i+1 W )∈L ∩L . Using discrete time Barbalat’s lemma [36], combustion include intake manifold temperature T , intake i 2 ∞ in manifold pressure P , mass flow rate of air at intake m˙ , lim e = 0 (31) in in i exhaust gas temperature T , exhaust manifold pressure P , i→∞ ex ex lim W = W (32) coolant temperature T , fuel to air ratio (FA) etc. The en- i+1 i c i→∞ gine performance metrics are given by combustion phasing Hence, the SGD learning law in (19) guarantees that the indicated by the crank angle at 50% mass fraction burned estimated output yˆ converges to the actual output y and the i i (CA50),combustion workoutput givenbynet indicatedmean model parameters W converge to some constant values. The effective pressure (NMEP, sometimes abbreviated as IMEP). parameters converge to the true parameters W only under ∗ The combustion features calculated using in-cylinder pressure conditions of persistence of excitation [35] in input signals of such as CA50, NMEP are determined from the high speed in- the system (amplitude and frequency richness of x). Further, cylinder pressure measurements. For further reading on HCCI using boundedness of V , e ∈L which guarantees that the i i ∞ combustion and related variables, please refer [37]. online model predictions are bounded as long as the system output is bounded. As the error between the true model and A. Experiment Design the estimation model converges to zero, the estimation model becomes a one-step ahead predictive model of the nonlinear InordertoidentifybothmodelsforHCCIstatevariablesas system.TheevaluationoftheSG-ELMalgorithmisperformed well as models for dynamic operating boundary in transient using application to a complex HCCI engine identification operation, appropriate experiment design to obtain transient problem. data from the engine is required. The modeled variables such as engine states and operating envelope are dynamic variables and in order to capture both transient and steady IV. HOMOGENEOUSCHARGECOMPRESSIONIGNITION state behavior, a set of dynamic experiments is conducted at ENGINE constant rotational speeds and naturally aspirated conditions The algorithms discussed in Section II are applied to (no supercharging/turbocharging) by varying FM, IVO, EVC streaming sensory data from a gasoline HCCI engine for and SOI in a uniformly random manner. Every input step demonstrating an online learning framework for HCCI engine involves the engine making a transition between two set con- modeling. The engine specifications are listed in Table I [13]. ditions and the transition (transients or dynamics) is recorded A schematic of the experimental setup and instrumentation is as temporal data. In order to capture several such transients, shown in Fig. 1. HCCI is achieved by auto-ignition of the gas an amplitude modulated pseudo-random binary sequence (A- mixtureinthecylinder.Thefuelisinjectedearlyintheintake PRBS) has been used to design the excitation signals. A- stroke and given sufficient time to mix with air forming a PRBS enables exciting the engine at different amplitudes and homogeneousmixture.Alargefractionofexhaustgasfromthe frequencies suitable for the identification problem considered previouscycleisretainedtoelevatethetemperatureandhence in this work. The data is sampled using the AVL Indiset the reaction rates of the fuel and air mixture. The variable acquisition system where in-cylinder pressure is sensed every valvetimingcapabilityoftheengineenablestrappingsuitable crankangleusingwhichthecombustionfeaturesNMEP,CA50 quantities of exhaust gas in the cylinder. are determined on a per-combustion cycle basis. More details TABLE I: Specifications of the experimental HCCI engine on HCCI combustion and experiments can be found in [13], [14], [15] EngineType 4-strokeIn-line Fuel Gasoline B. HCCI Instabilities Displacement 2.0L Bore/Stroke 86/86mm A subset of the data collected from the engine is shown CompressionRatio 11.25:1 in Fig. 2 where it can be observed that for some combina- InjectionType DirectInjection tions of the inputs (left figures), the HCCI engine misfires VariableValveTimingwith hydrauliccamphaserhaving (seen in the right figures where NMEP drops below 0 bar). Valvetrain 119degreeconstantduration HCCI operation is limited by several phenomena that lead to definedat0.25mmlift,3.5mmpeak undesirable engine behavior. As described in [38], the HCCI liftand50degreecrankangle phasingauthority operating range is conceptually constrained to a small region HCCIstrategy Exhaustrecompression of permissible unburned (pre-combustion) and burned (post- usingnegativevalveoverlap combustion) charge temperature states. As previously noted, sufficiently high unburned gas temperatures are required to The engine can be controlled using precalculated inputs achieve ignition in the HCCI operating range without which such as injected fuel mass (FM in mg/cyc), crank angle at completemisfirewilloccur.Iftheresultingcombustioncannot intake valve opening (IVO), crank angle at exhaust valve achieve sufficiently high burned gas temperatures, commonly closing (EVC), crank angle at start of fuel injection (SOI). occurring in conditions with low fuel to diluent ratios or late The valve events are measured in degrees after exhaust top combustion phasing, various degrees of quenching can occur dead center (deg eTDC) while SOI is measured in degrees resulting in reduced work output and increased hydrocarbon aftercombustiontopdeadcenter(degcTDC).Otherimportant and carbon monoxide emissions. Under some conditions, this IEEETRANSACTIONSONNEURALNETWORKSANDLEARNINGSYSTEMS,VOL.XX,NO.XX,XXXX2014 7 Air in 9 VVT 7 15 3 8 Turbocharger 6 13 11 Throttle Intake Manifold Exhaust Manifold Exhaust out 2 5 1 12 10 4 Coolant 8. Fuel injector Piston 1. thermocouple 0 𝑚𝑚𝑚𝑚 Exhaust manifold 𝐶 9. Cam phaser 2. Crank shaft thermocouple 0 10, Cylinder 1 Runner −, 3. Exhaust manifold 𝐶 11. T and P 0 pressure transducer 𝐶 𝑚 14 4. Pthoesrtm-toucrboiunpel e 𝑚𝑚𝑚 𝑚 12. Ithnetarkmeo mcoaunpifleo l d 𝑚0𝑚𝑚 𝑚 0 Intake manifold 𝐶 𝐶 13. 5. Lambda sensor pressure transducer 𝑚𝑚𝑚𝑚 Air mass flow − 14. RPM sensor 6. sensor In-cylinder pressure 𝑅𝑅𝑅 𝑘𝑘/ℎ 15. 7. Spark plug transducer 𝑚𝑚𝑚 − Fig. 1: A schematic of the HCCI engine setup and instrumentation (only relevant instrumentation shown). may lead to high cyclic variation due to the positive feedback time index, f (.) represents the nonlinear function map- NARX loop existing through the trapped residual gas [16], [17]. ping specified by the model, n , n represent the number of u y Operation with high burned gas temperature, although stable past input and output samples required (order of the system) and commonly reached at higher fueling rates where the fuel whileu andy representthedimensionofinputsandoutputs d d to diluent ratio is also high, yields high heat release and thus respectively. Let x represent the augmented input vector ob- pressure rise rates that may pose challenges for engine noise tained by appending the input and output measurements from and durability constraints. A discussion of the temperatures at the system. which these phenomena occur may be found in [38]. x=[u(k−1),..,u(k−n ),y(k−1),..,y(k−n )]T (34) u y C. Learning The HCCI Engine Data Theinputmeasurementsequencecanbeconvertedtotheform of training data In the HCCI modeling problem, both the inputs and the (cid:0) (cid:1) outputsoftheengineareavailableassensormeasurementsand {(x ,y ),...,(x ,y )}∈ X,Y (35) 1 1 N N hencesupervisedlearningcanbeemployed.TheHCCIengine where N denotes the number of training samples, X denotes is a nonlinear dynamic system and sensor measurements the space of the input features (Here X = Rudnu+ydny and represent discrete time sequences. The input-output behavior Y = R for regression and Y = {+1,−1} for a binary can be modeled using a nonlinear auto regressive model with classification). The above conversion of system measurements exogenous input (NARX) [39] as follows to training data is a natural definition for a series-parallel y(k)=f [u(k−1),..,u(k−n ), model architecture and the models can be used for a one-step NARX u ahead prediction (OSAP) i.e., given a set of measurements y(k−1),..,y(k−n )] (33) y until time index k, the model predicts the output at time where u(k) ∈ Rud and y(k) ∈ Ryd represent the inputs and k+1 (see equation (36)). A parallel architecture on the other outputs of the system respectively, k represents the discrete hand can be used to perform multiple step ahead predictions IEEETRANSACTIONSONNEURALNETWORKSANDLEARNINGSYSTEMS,VOL.XX,NO.XX,XXXX2014 8 predictivecontrol.Thissectiondetailstheexperiments,model Fuel Mass (mg/cyc) 11501789200 5400 5600 5800 6000 IMEP (bar)5024200 5400 5600 5800 6000 tsirniaogFinnoliilrnnegaethraanerninHsdygCsvftCaerlamIimdcaeiotdwinoetonnrrotkiolfifiocsrathidteieeonvntieedlde[o6npm]te,ifiodade.denIlnoimnncgoloi,dnneaetrlnasar.sotindtloeinntethifierceeagxtriiesostn-- IVO (deg aCTDC)110285000200 5400 5600 5800 6000 CA50 (deg aTDC)−246825000000200 5400 5600 5800 6000 iaeIsnxsiestsmhtlioinpswglowymcoeoerdknt.hv,oTetdyrhgspeeispcneracaleschfteoiacanrattdculloryecmsouimnongfspuslnietoaxanrbelplienaeerlfaiaommrriiencdtoaeetmrnetdpiufilpemcdxaaatiktsoeyinnsgmtesumatchkshee. VC (deg aCTDC)−−−−1110980000 R (bar/deg)max1005 ahpapnrdo.ach suitable for the complex HCCI engine problem in E 5200 5400 5600 5800 6000 5200 5400 5600 5800 6000 A. Model Structure and Evaluation Metric OI (deg aETDC)233338024600000 λ (−)1.125 avnardFiaobCrlAetsh5e0supcauhrrepaoscsofeunesoildifnegrdee(dmFMoans)s,toreauxtthipoaunut,ssttwhvaehlevvreeaarcislaobstlhienesgcN(oEMnVtErCoP)l S 5200 5400 5600 5800 6000 5200 5400 5600 5800 6000 and fuel injection timing (SOI) are considered inputs. Tran- sient data from the HCCI engine at a constant speed of Fig. 2: A subset of the HCCI engine experimental data show- 1800 RPM and naturally aspirated conditions is used. A ingA-PRBSinputsandengineoutputs.Themisfireregionsare NARX model as shown in section IV-C is considered where shownindottedrectangles.Thedataisindexedbycombustion u=[FM EVC SOI]T and y =[NMEP CA50]T, nu cycles. and ny chosen as 1 (tuned by trial and error). The nonlinear model approximating f is initialized to an extreme NARX learning machine model with random input layer weights and (MSAP) by feeding back the predictions of the OSAP model random values for the covariance matrices and output layer in a recurrent manner (see equation (37)). The series-parallel weights. Four different models are considered including the and parallel architectures are well explained in [40]. state of the art OS-ELM algorithm, the proposed SG-ELM algorithm, a baseline offline (batch) ELM (O-ELM) and a yˆ(k+1)=fˆNARX[u(k),..,u(k−nu+1),y(k), baseline linear system identification model. The purpose of ..,y(k−n +1)] (36) thebaselineofflineELMalgorithmistoevaluatetheefficiency y of the online learning models in learning the HCCI behavior completely as an offline ELM model would do. The offline yˆ(k+n )=fˆ [u(k+n −1),..,u(k−n +n ), pred NARX pred u pred ELM model is expected to produce an accurate model as it yˆ(k+npred−1),..,yˆ(k−ny+npred)] (37) has sufficient time, computation and utilization of all training data simultaneously to learn the HCCI behavior sufficiently The OSAP model is used for training as existing simple well. The purpose of the linear baseline model is to justify training algorithms can be used and once the model becomes the use of a nonlinear model for HCCI dynamics. accurate for OSAP, it can be converted to a MSAP model in All the nonlinear models consist of 100 hidden units with a straightforward manner. The MSAP model can be used for fixed randomized input layer parameters. About 11000 cycles makinglongtermpredictionsusefulforpredictivecontrol[6], ofdataisconsideredone-by-oneasitissampledbytheengine [41], [21]. ECU and model parameters updated in a sequential manner. After the training phase, the parameter update is switched V. APPLICATIONCASESTUDY1:ONLINEREGRESSION off and the models are evaluated for the next 5100 cycles LEARNINGFORSYSTEMIDENTIFICATIONOFANHCCI of data for one step ahead predictions. Further, to evaluate if ENGINE. the learned models represent the actual HCCI dynamics, the As mentioned earlier, a key requirement for model based multi-step ahead prediction of the models are compared using control of the HCCI engine is the ability to accurately predict about600cyclesofdata.Itshouldbenotedthatboththeone- the engine state variables for several operating cycles ahead step ahead and multi-step ahead evaluations were done using of time, so that a control action with a known impact can be data unseen during the training phase. applied to the engine. The state variables of an engine are Theparametersofeachofthemodelsaretunedtoaccurately the fundamental quantities that represent the engine’s state of represent the given dataset. As recommended by OS-ELM operation.Asaconsequence,thesevariablesalsoinfluencethe [1], about 800 cycles of data was used for initializing the performance of the engine such as fuel efficiency, emissions output layer parameters W and covariance matrix M (see 0 0 and stability, and are required to be monitored/regulated. In equations (14) and (15)). The initialization was performed this section, the NMEP and CA50 are considered indicative using the batch ELM algorithm [30]. In order to have a fair of engine state variables and are estimated based on control comparison,theW isusedasaninitialconditionforbothOS- 0 inputs alone, so that the resulting models can be used for ELM and SG-ELM. The only parameter of SG-ELM, namely IEEETRANSACTIONSONNEURALNETWORKSANDLEARNINGSYSTEMS,VOL.XX,NO.XX,XXXX2014 9 thegradientstepsizewastunedtobeΓ =0.0008 I for be more pronounced as the dimension and complexity of the SG 100 best accuracy. This was determined using trial and error and data increase. It could be seen from Table II that the one-step the value of Γ had a significant impact on the prediction ahead prediction accuracies (OSAP RMSE) of the nonlinear SG accuracy. A detailed analysis on the robustness of Γ is models are similar with OS-ELM winning marginally. On SG outside the scope of this paper and will be considered for the other hand, the multi-step prediction accuracies (MSAP future. RMSE) are similar for the nonlinear models with SG-ELM Theperformanceofthemodelsaremeasuredusingnormal- performing marginally better. The MSAP accuracy reflect the ized root mean squared error (RMSE) given by generalization performance of the model and is more crucial (cid:118) for the modeling problem as the models ultimately feed its (cid:117)(cid:117)1 (cid:88)n (cid:88)yd prediction to a predictive control framework that requires RMSE =(cid:116) (yi −yˆi)2 (38) n j j accurate and robust predictions of the engine several steps i=1j=1 ahead of time. From our understanding on model complexity where both yi and yˆi are normalized to lie between -1 and and generalization error, a model that is less complex (in- j j +1. dicated by minimum norm of parameters [30], [33]) tend to generalize better, which is again demonstrated by SG-ELM. B. Results and Discussion The performance of the linear baseline model is significantly low compared to the nonlinear models justifying adopting a Onperformingonlinelearning,itcanbeobservedfromFig. nonlinear identification for the HCCI engine problem. 3 that the parameters of OS-ELM grow more aggressively as The MSAP predictions of the models are summarized in compared to the SG-ELM. In spite of both models having the Figures 4a-4d where model predictions for NMEP and CA50 same initial conditions, the step size parameter Γ for SG- SG are compared against real experimental data. Here the model ELM gives additional control over the parameter growth and is initialized using the experimental data at the first instant keep them bounded as proved in section III-B. On the other and allowed to make predictions recursively for several steps hand, OS-ELM doesn’t have any control over the parameter ahead. It can be seen that the nonlinear models outperform evolution. It is governed by the evolution of the co-variance the linear model and at the same time the online learning matrix M (14). It is expected that the co-variance matrix M modelsperformsimilartotheofflinetrainedmodelsindicating would add stability to the parameter evolution but in practice, that online learning can fully identify the engine behavior it tends to be more aggressive leading to potential instabilities at the operating condition where the data is collected. It as reported by [22], [23], [24], [25]. As a consequence, the should be noted that this task is a case of multi-input multi- parameter values for SG-ELM remain small compared to the output modeling which adds some limitations to the SG-ELM OS-ELM (the norm of estimated parameters for OS-ELM is methods.Whenthemodelcomplexityincreases,theSG-ELM 16.64andSG-ELMis3.71).Thishasasignificantimplication require more excitations for convergence, as opposed to OS- in the statistical learning theory [42]. A small norm of model ELMwhichconvergesmoreaggressively(althoughattheloss parameters implies a simpler model which results in good ofstability).Further,thetuningofgradientstepsizeΓ may generalization. Although this effect is slightly reflected in SG be time-consuming for systems predicting multiple outputs the results summarized in prediction results summarized in withdifferentnoisecharacteristics.TheOS-ELMontheother Table II (see MSAP RMSE for SG-ELM being the lowest), it hand is much more elegant as there are no parameters to be is not significantly better for this problem possibly because tuned once properly initialized. of incomplete convergence. The value of Γ has to be SG tuned correctly along with sufficient training data in order to ensure parameter convergence. Ultimately, the online learning VI. APPLICATIONCASESTUDY2:ONLINE mechanismisaimedtorunalongwiththeengineandhencethe CLASSIFICATIONLEARNING(WITHCLASSIMBALANCE) slowconvergencemaynotbeanissueinavehicleapplication. FORIDENTIFYINGTHEDYNAMICOPERATINGENVELOPEOF ANHCCIENGINE TABLEII:PerformancecomparisonofOS-ELMandSG-ELM for the HCCI online regression learning problem. A baseline The problem considered in this case study is to develop linear model and an offline trained ELM model (O-ELM) are a predictive model of the dynamic operating envelope of the also included for comparison. HCCIengine.Fordevelopingstablemodelbasedcontrollerfor HCCI engines, it is necessary to prevent the engine drifting Training OSAP MSAP towards instabilities such as misfire, ringing, knock, etc [16], Timeins RMSE RMSE Linear 0.3523 0.2004 0.1664 [17]. To this end, a dynamic operating envelope of the HCCI OS-ELM 3.3812 0.0957 0.1024 engine was developed using machine learning models [15]. SG-ELM 0.7269 0.1047 0.0939 However, the modeling was performed offline. In this paper, O-ELM - 0.1015 0.1003 an online learning framework for modeling the operating Thepredictionresultsaswellastrainingtimefortheonline envelope of HCCI engine is developed using both OS-ELM models are compared in Table II. It can be observed that the and SG-ELM algorithms. computationaltimeforSG-ELMissignificantlyless(about4.6 In this paper, the operating envelope defined by two com- times) compared to OS-ELM indicating the SG-ELM features mon HCCI unstable modes - a complete misfire and a high a faster learning. The reduction in computation is expected to variability combustion (a more detailed description is given IEEETRANSACTIONSONNEURALNETWORKSANDLEARNINGSYSTEMS,VOL.XX,NO.XX,XXXX2014 10 Parameter convergence of OS−ELM zoomed−in 5 1 0.5 0 0 −0.5 −1 −5 0 2000 4000 6000 8000 10000 12000 8000 9000 10000 Parameter convergence of SG−ELM zoomed−in 1 0.18 0.17 0 0.16 0.15 −1 0.14 0.13 −2 0 2000 4000 6000 8000 10000 12000 0 5000 10000 Engine Cycles Engine Cycles Fig. 3: Comparison of parameter evolution for the OS-ELM and SG-ELM algorithms during online learning. A zoomed-in plot shows that the parameter update for OS-ELM is more aggressive compared to SG-ELM. Both OS-ELM and SG-ELM are initialized to the same parameters. The less aggressive and slow variation of the SG-ELM parameters along with stability bounds result in better regularization compared to OS-ELM. in section IV-B) is studied. The problem of identifying the A. Model Structure and Evaluation Metric HCCI operating envelope using experimental data can be The HCCI operating envelope is a function of the engine posed as a classification problem. The engine sensor data can control inputs and engine physical variables such as tempera- bemanuallylabeledasstableorunstabledependingonengine ture, pressure, flow rate etc. Also, the envelope is a dynamic based heuristics. Further, the engine dynamic data consists system and so a predictive model requires the measurement of a large number of stable class data compared to unstable history up to an order of N . The dynamic classifier model h classdata,whichintroducesanimbalanceinclassproportions. can be given by As a result, the problem can be posed as a class imbalance yˆ =sgn(f(x )) (39) k+1 k learningofabinaryclassificationdecisionboundary.Forclass imbalancelearning,acost-sensitiveapproachthatmodifiesthe where sign(.) represents the sign function, yˆk+1 indicates objectivefunctionofthelearningsystemtoweightheminority model prediction for the future cycle k + 1, f(.) can take class data more heavily, is preferred over under-sampling and any structure depending on the learning algorithm and xk is over-sampling approaches [15]. given by x =[IVO,EVC,FM,SOI,T ,P ,m˙ , k in in in T ,P ,T ,FA,NMEP,CA50]T (40) ex ex c Online learning algorithms using OS-ELM, SG-ELM are at cycle k upto cycle k−N +1. In the following sections, h compared for classification performance. The above nonlinear the function f(.) is learned using the available engine ex- models are compared against a baseline linear classification perimental data using the two online ELM algorithms. The model and an offline trained nonlinear ELM model to make engine measurements and their time histories (defined by x ) k similar justifications as in the previous case study. The linear are considered inputs to the model while the stability labels baseline model is included to justify the benefits of adopting are considered outputs. The feature vector is of dimension a nonlinear model while the offline trained model is included n=39 includes sensor measurements such as FM, IVO, EVC, toshowtheeffectivenessofonlinealgorithmsincapturingthe SOI, T , T , P , m˙ , T , P , NMEP, CA50 and FA c in in in ex ex underlying behavior. along with N = 1 cycles of history (see (40)). The engine h