ebook img

Bayesian inference in non-Markovian state-space models with applications to fractional order systems PDF

1.6 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bayesian inference in non-Markovian state-space models with applications to fractional order systems

1 Bayesian inference in non-Markovian state-space models with applications to fractional order systems Pierre E. Jacob, S.M.Mahdi Alavi, Adam Mahdi, Stephen J. Payne and David A. Howey Abstract—Battery impedance spectroscopy reviewingspecificissues,werecallthatacontinuous- models are given by fractional order (FO) timestate-spacemodelofFOsystemsisgivenby[2]: differential equations. In the discrete-time 6 domain, they give rise to state-space models dαx(t) =A¯(β)x(t)+B¯(β)u(t), 1 where the latent process is not Markovian. dtα (1) 0 Parameterestimationforthesemodelsistherefore y(t)=M(β)x(t)+D(β)u(t), 2 challenging, especially for non-commensurate FO models. In this paper, we propose a Bayesian where x ∈ Rn is the state vector; u ∈ R and y ∈ R n a approachtoidentifytheparametersofgenericFO are input and output signals, respectively; A¯(β) ∈ J systems. The computational challenge is tackled Rn×n, B¯(β) ∈ Rn×1, M(β) ∈ R1×n and D(β) ∈ R withparticleMarkovchainMonteCarlomethods, 8 are system matrices which depend on the parameter with an implementation specifically designed for 2 the non-Markovian setting. The approach is vector β ∈Rq to be identified. Moreover, C] tahenbatatperpyliednonto-coemstmimenastueratthee FpOarameqeutievraslenotf dαx(t) =hdα1x1(t),...,dαnxn(t)i (2) circuit model. Extensive simulations are provided dtα dtα1 dtαn O to study the practical identifiability of model is the vector of FO derivatives, with unknown FOs . parameters and their sensitivity to the choice of h prior distributions, the number of observations, denoted by αi ∈(0,1), i=1,...,n. t a the magnitude of the input signal and the By defining the parameter vector as m measurement noise. (cid:2) (cid:3) θ = α ··· α β , (3) [ Index Terms—Parameter estimation, System 1 n identification, Bayesian Inference, Fractional or- 1 thecorrespondingmodelinthediscrete-timedomain der systems, Batteries. v is given by [2]: 2 2 I. Introduction Xk 6 xk+1 = Aj(θ)xk−j +B(θ)uk, (4) 7 FractionalOrder(FO)modelsareimportantinthe j=0 0 . study of electrochemical and biological systems, [1] yk =M(θ)xk+D(θ)uk, 1 and references therein. Certain features in the FO with 0 models make their identification challenging. Before 6 A (θ)=diag{α ,··· ,α }+ 0 1 n 1 : P.E. Jacob is with the Department of Statistics, diag{Tsα1,··· ,Tsαn}A¯(β), v Harvard University, Science Center 7th floor, 1 Oxford (cid:26)(cid:18) α (cid:19) (cid:18) α (cid:19)(cid:27) i Street, Cambridge, MA 02138-2901, USA. Email: A (θ)=(−1)jdiag 1 ,··· , n , (5) X [email protected] j j+1 j+1 r S.M.M. Alavi was with the Energy and Power Group, De- for 1≤j a partment of Engineering Science, University of Oxford. He is now with the Brain Stimulation Engineering Laboratory, De- B(θ)=diag{Tα1,··· ,Tαn}B¯(β), s s partment of Psychiatry & Behavioral Sciences, Duke Univer- sity,Durham,NC27710,USA.Email:[email protected] whereT isthesampletime,k ∈Nisthetimeindex, s A.MahdiandS.J.PaynearewiththeInstituteofBiomed- diag{·} denotes the diagonal matrix and (cid:0)αi(cid:1) is the ical Engineering, Department of Engineering Science, Uni- j binomial coefficient given by versity of Oxford, Old Road Campus Research Building, Oxford, OX3 7DQ, United Kingdom. Emails: {adam.mahdi, (cid:18) (cid:19) α Γ(α +1) stephen.payne}@eng.ox.ac.uk i = i , (6) D.A. Howey is with the Energy and Power Group, De- j Γ(j+1)Γ(α +1−j) i partment of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ, United Kingdom. Email: where, Γ(·) denotes the gamma function [email protected] ˆ ∞ The source code is available online at https://github.com/ Γ(α )= zαi−1e−zdz, forα ∈Cwith<(α )>0, pierrejacob/BatteryMCMC i i i 0 where < denotes the real part of a complex number. comparing the prior and posterior distributions, the Definition 1: An FO system is said to be com- identifiability of the model can be assessed from a mensurate if for all i ∈ {1,...n}, there exists ρ ∈ practical perspective, by answering questions such N, such that α = ρα, where α ∈ R; otherwise it is as: do the data inform the parameters? Are some i said to be non-commensurate [1]. (cid:3) parameters more easily identifiable than others? Now, the main issues associated with parameter Markov chain Monte Carlo methods provide an estimation in FO systems are more evident: 1) the approach to approximate the posterior distribution, state vector x depends on all the past states x provided that the associated probability density k+1 0 up to x , therefore the FO system is non-Markovian function can be evaluated point-wise. More recently, k 1, and 2) the model is highly nonlinear, with respect particle Markov chain Monte Carlo (PMCMC) [15] to the parameters. methods have been proposed for Bayesian inference Several least-squares estimation methods have in state-space models, where the posterior density been proposed in [3]–[5] for the identification of canonlybeestimatedpoint-wise.ForFOmodels,the continuous-time FO transfer functions of the form addeddifficultyliesinthefactthatthelatentprocess F(s)= Y(s) = bmsβm +···+b1sβ1 +b0sβ0, (7) insonveolni-mMpalrekmoveniatna.tiHonertehwatelweaildlsutsoepPrMacCtiMcaCl,apwpitrhoxa- U(s) ansαn +···+a1sα1 +1 imations in the case of non-Markovian models. The where U and Y are Laplace transforms of the input methodenablesBayesianinferencewithoutrequiring and output (observation) signals, respectively. These any model simplifications, such as linearizations or methodshavebeenmodifiedin[6],[7],anddeveloped Gaussian assumption, and is applicable to any state- into the Crone toolbox [8], [9]. The Crone toolbox is space model with non-Markovian latent processes. mainlybasedontheinstrumentalvariablestatevari- Recently, there has been a significant inter- able filter (ivsvf) method and the simplified refined est in the design of model-based battery systems instrumental variable for continuous-time fractional to improve the efficiency and reliability of elec- (srivcf) method, which are both based on instru- tric vehicles and renewable energies [16]. Among mental variable concepts [10]–[12]. In these meth- the employed models, the battery Electrochemical ods, the FO model is linearised by approximating Impedance Spectroscopy (EIS) models with FO dy- all fractional differentiation operators sαi and sβj namicshavereceivedmuchattention.Theyaremore by higher-order rational transfer functions [6]. The accurate than the conventional lumped models and coefficients b ’s and a ’s are identified by using the j i are computationally less expensive than the electro- coefficient map that exists between the original and chemical models defined by highly coupled partial approximated models. Manual search [6], gradient differential equations. A comprehensive survey of descent [7], or interior-point [13] optimizations are battery models has been given in [16]. A general combined with the ivsvf and srivcf functions for schematic of the battery impedance FO models is the estimation of the fractional orders. In the non- shown in Figure 1, where i and v denote the battery commensuratecase,theapproximatemodelisofhigh current and voltage, respectiely. R represents the ∞ ordersothatthecoefficientmapbetweentheoriginal battery ohmic resistance at high frequencies. Each and approximated model becomes very complex and parallel pair is employed to model the battery pro- maybeintractable.Thisissuehighlightstheneedfor cesses over a certain frequency range. The number novel tools to directly identify general FO models. of parallel pairs depends on the required accuracy In [14], an identification method based on swarm for the frequency domain fitting of impedance spec- optimizationhasbeenproposedtoidentifyabattery tra. The terms Cisαi, i = 1,··· ,n, called constant non-commensurate FO model. phase elements (CPEs), model diffusion processes In this paper, a novel method based on Bayesian (e.g. charge transfer resistance and double layer ca- inference is presented. Bayesian inference assumes pacitance) more accurately compared to the lumped a prior distribution on parameters, which is then models as shown in [17] and [13]. In low frequency updated, using the observations, to yield the poste- ranges, the impedance frequency response may show rior distribution. By investigating the posterior dis- constant phase behaviour such that the associated tribution, parameters can be estimated, along with parallelresistorcanbeconsideredasanopencircuit. associated credible regions reflecting uncertainty. By This is referred to as Warburg term in the literature [17]. Reference [17] provides more information on 1InMarkoviansystems,x canbewrittenasfunctionsof k+1 x andinputs. battery EIS and associated FO models. k 2 R1 R2 Rn https://github.com/pierrejacob/BatteryMCMC. R ... i(t) The main contributions of the paper are: 1 + 1 1 1 - Introduction of Bayesian framework for Identifi- C1s↵1 C2s↵2 Cns↵n cation of general FO systems. v(t) +v1(t)� +v2(t)� +vn(t)� - Reduction of the computational and memory cost for the non-Markovian particle filter. - - Applications to identifiability of non- commensurate FO battery models. Fig. 1. The general battery electrochemical impedance spec- troscopymodel. II. Bayesian Inference Bydefiningu=iandy =v,andthevoltageacross the CPEs as the state variables, In this section, we describe Bayesian inference in the context of non-Markovian state-space models. x(cid:44)(cid:2) v ··· v (cid:3), (8) 1 n The vector (v ,...,v ) (resp. (vs,...,vt)) is denoted s t it is easy to show that Aj, B, M and D in the state- by vs:t (resp. vs:t). The notation a ∼ p(· | b) means space model (4) are given by: that a is a random variable distributed according to a distribution p which depends on b. The Gaussian Aj(θ)=diag{a1,j(θ),··· ,an,j(θ)} distribution with mean µ and variance σ2 is denoted (cid:2) (cid:3) B(θ)= b1(θ) ··· bn(θ) by N(µ,σ2). (9) (cid:2) (cid:3) M(θ)= m ··· m 1 n D(θ)=d(θ), A. Inference in state-space models with a (θ)=α − Tsαi , a (θ)=(−1)j(cid:18) αi (cid:19) refGerisvetno tmheeatasuskreomfeensttsimya0t:iTn,g tphaerapmareatmeretienrfevreecntocer i,0 i R C i,j j+1 i i θ in the general state-space model Tαi (10) bi(θ)= Cs , mi =1, d(θ)=R∞ x0 ∼µ(·|θ), i for i=1,...,n and j =1,2,...,T, ∀k ∈{1,...,T} xk ∼fk(·|x0:k−1,θ), (11) ∀k ∈{0,...,T} y ∼g (·|x ,θ), where T is the data length, k =1,...,T. k k k In order to estimate the parameter vector θ from where T ∈ N denotes the total number of observa- v and i, the battery FO models are currently fitted tions [21], [22]. Note that the notation reflects that to frequency domain impedance spectra that are f potentially depends on all the past states x , k 0:k−1 obtainedthroughEIS,[18]–[20].Thisrequiresadata and thus {x ,··· ,x } is not necessarily a Markov 0 k conversionthatmayintroducebiasintheestimation chain. Both f and g could be nonlinear functions. k k [13]. Thus, parameter estimation of the battery FO Therefore, the FO model (4) is in the Bayesian models directly from time-domain data is very ap- framework (11), upon specifying a state noise and pealing. an observation noise. For simplicity, we will consider In [2] we showed that the FO model with just additive Gaussian noises, that is, we consider: a single CPE (where n = 1 in Figure 1) is glob- k ally identifiable. Here we study the identifiability X x = A (θ)x +B(θ)u +σ ε , k+1 j k−j k x k of the model with more than one CPE using the (12) j=0 proposed Bayesian approach. Extensive simulations y =M(θ)x +D(θ)u +σ η , k k k y k on a battery FO system are provided to illustrate how the proposed method enables the study of var- where (εk) and (ηk) are sequences of independent ious effects on parameter identification, such as the standard Gaussian variables, and σx, σy are positive data length, the magnitude of the input signal, the values. choice of prior, and the measurement noise. We also Statistical inference classically relies on the like- show that the parameters of the Warburg term in lihood function, defined by θ 7→ p(y | θ), where 0:T the non-commensurate FO battery model are not p(y |θ)isthedensityoftheobservationsevaluated 0:T identifiable. The source code is available online at with the dataset y for the parameter θ. For state- 0:T 3 space models the likelihood can be written, for any B. Likelihood estimation using particle filters θ, in terms of µ, f and g as follows, ˆ k k p(y |θ)= µ(x |θ)g (y |x ,θ)× For our purposes, particle filters [25]–[27] will pro- 0:T 0 0 0 0 vide approximations of the likelihood in Eq. (13), T Y for any θ ∈ Θ, up to a multiplicative constant. f (x |x ,θ)g (y |x ,θ)dx . k k 0:k−1 k k k 0:T The user sets a number of particles N ∈ N. Larger k=1 values of N yield better precision but the compu- (13) tational cost of the method increases linearly with One way to estimate the parameters is to maximise N. A population of N particles is first sampled from the likelihood function θ 7→ p(y | θ) with respect 0:T the initial distribution µ(· | θ). These particles are to θ, yielding the maximum likelihood estimator. denoted by x1:N =(x1,...,xN). Then, each particle 0 0 0 Confidenceintervalscanbeconstructedbyassuming is weighted, using the first observation y . That is, 0 that the maximum likelihood estimator is asymp- for each i ∈ {1,...,N}, wi = g (y | xi,θ). These 0 0 0 0 totically normal; other ways to obtain confidence weights w1:N are such that the weighted sample, 0 intervals, such as bootstrap, are not readily appli- (wi,xi)N is approximately distributed according 0 0 i=1 cable to state-space model settings. Alternatively, to p(x | y ,θ). This weighting is commonly called 0 0 Bayesianinferencereliesonaprobabilitydistribution importance sampling [24] and concludes the initial- defined on the space Θ, representing our knowledge ization of the algorithm. abouttheparametersgiventheobservations[23].We first specify a prior distribution p(θ), representing The next step consists in selecting some par- the knowledge of θ before observing the data (for ticles and discarding others, according to their instance, based on past experiments, intuition, liter- weights. This is the resampling step, which comes ature search, etc). Then, the posterior distribution in various flavors. The most standard resampling of the parameters given the observations is given by scheme, called multinomial resampling, consists in Bayes formula: drawing, independently for each i ∈ {1,...,N}, an ancestor variable ai ∈ {1,...,N} accord- p(θ)p(y |θ) 0 p(θ |y )= ´ 0:T . (14) ing to a categorical distribution with parameters 0:T p(ϑ)p(y0:T |ϑ)dϑ (w1/PN wj,...,wN/PN wj).Thevariableai is 0 j=1 0 0 j=1 0 0 Inthenumericalexperiments,wewillbeparticularly interpreted as the index of the parent of particle i interested in the changes between the prior and the at time 1, among the N particles at time 0. Once posterior,whichinformusabouthowmuchhasbeen the parents a1:N are chosen, the new particles are 0 learned from the data, and thus give a practical sampledandweightedaccordingtoxi ∼q (·|xai0,θ) 1 1 0 notion of identifiability. The posterior distribution and p(θ | y0:T) can also be used to obtain point esti- f (xi |xai0,θ)g (y |xi,θ) mates, such as the posterior mean, and uncertainty wi = 1 1 0 1 1 1 . is typically measured using credible regions [23]. 1 q (xi |xai0,θ) 1 1 0 Since the posterior distribution typically does not This is another importance sampling step, where belong to a standard class of parametric probability the proposal distribution is q , and the weights are 1 distributions, it is approximated using Monte Carlo computed such that (xi,wi)N is approximately methods, which provide samples approximately dis- 1 1 i=1 distributed according to p(x | y ,y ,θ). Let us 1 0 1 tributed according to the posterior distribution. introduce the paths x¯i = (xai0,xi), for each i ∈ 0:1 0 1 For general state-space models, that is, generic {1,...,N}. Then the weighted paths (wi,x¯i )N 1 0:1 i=1 choicesofµ,f andg ,theintegralinEq.(13)cannot are approximately distributed according to the path k k be evaluated exactly, and has to be numerically ap- distribution p(x ,x | y ,y ,θ). The algorithm then 0 1 0 1 proximated. Therefore, exact posterior density eval- proceedsinasimilarfashionforthesubsequentsteps, uations are not available either. We will describe resampling, sampling and weighting the particles howtoconstructflexibleandefficientapproximations until all the data have been assimilated. A pseudo- of the posterior distribution using a combination of code description of the algorithm is given in Section Markov chain Monte Carlo (MCMC) algorithms [24] V-A. The algorithm provides at every time k a and particle filters [25], called particle Markov chain weighted sample (wi,x¯i )N that approximates the k 0:k i=1 Monte Carlo (PMCMC) [15]. path distribution p(x | y ,θ). More importantly 0:k 0:k 4 for us, the quantity correspondingtothenewgenerationofparticles.The T N ! ancestors give the list of new branches. At the same pˆ(y |θ)= Y 1 Xwi (15) time, some existing branches of the tree can be cut, 0:T N k corresponding to particles that have been discarded k=0 i=1 in the resampling step. As a result, it was shown is an estimator of the likelihood p(y | θ). A rich 0:T in [31] that the number of nodes in the tree is of literature is devoted to the theoretical study of the order k +CNlogN at time k, in expectation. The algorithmandthepropertiesofthistypeofestimator constant C does not depend on N and T. Efficient [28]–[30], at least in the settings of Markovian latent algorithmstoimplementthistreestructurearegiven processes. These results indicate that, as the num- in [31]. Therefore, the memory cost can be reduced ber of observations T goes to infinity, the relative fromN×T toT+CNlogN.Furthermore,computing variance of the likelihood estimator pˆ(y | θ) can 0:T the N sums at time k only requires browsing the be bounded independently of T if one chooses N whole tree once, from the root to the leaves, for a proportionally to T. costoforderk+CNlogN atstepk.Performingthis The user has to choose the proposal distributions operation for each k in {1,...,T} yields an overall q (· | x ,θ) for all k ≥ 1. The default choice of k 0:k−1 computationalcostoforderT2+CTNlogN,instead proposalistouseq =f ,themodeltransition.This k k of N ×T2 with the standard implementation.. choice leads to a simplified expression of the weight, Thus, the memory cost of particle filtering for wi = g (y | xi,θ). Another choice, called locally k k k k the paths of a non-Markovian model is of order optimal proposal, consists in using T + CNlogN, and the computational cost of the q (x |x ,θ)=p(x |y ,x ,θ), (16) algorithmisoforderT2+CTNlogN,whenthetran- k k 0:k−1 k k 0:k−1 sition involves sums as in Eq. (12). The same would whichleadstotheweightwi =p(y |x¯i ,θ).The k k 0:k−1 betrueforanycalculationthatonlyrequiresparsing locally optimal proposal takes the next observation the tree structure once. This is to be compared with y into account when propagating the particles from k a computational cost of NT and a memory cost time k − 1 to time k. Appendix V-B details the of N for Markovian models. Since N is typically calculation leading to the locally optimal proposal comparabletoT,andlogN isasmallvalue,thetree for the battery models of Section III. The choice structure gives an order of magnitude improvement, of proposal distributions has a direct impact on the both computationally and in terms of memory, over variance of the estimator pˆ(y |θ). 0:T the standard implementation. C. Special features of non-Markovian models D. Parameter estimation using particle Markov We consider the following implementation of the chain Monte Carlo standard particle filter. Given the non-Markovianity We next describe briefly how point-wise likelihood of the latent process, sampling each particle xi at estimates pˆ(y | θ), such as the ones produced k 0:T time k requires computing a function ψ of a tra- by particle filters, can be used to obtain samples k jectory, (xi,...,xi ). In the models considered in from the posterior distribution. We consider MCMC 0 k−1 the article, such as (12), this function takes the form algorithms to produce θ = (θ ,...,θ ), the re- 1:M 1 M of a weighted sum: ψ (x¯i ) = Pk α x¯i ,where alization of a Markov chain approximately following k 0:k t=0 k,t k−t (α ) , for each k, are coefficients that can be the posterior distribution of Eq. (14) [24]. Standard k,t t≥0 computed given the parameter θ. Naively computing MCMC requires point-wise evaluations of the poste- this weighted sum for each particle would yield a rior density, and thus is not applicable here. Instead, cost of order k, at time k; and thus an overall we have access to estimators provided by particle computational cost of order N × T2, where T is filters, which leads to particle MCMC [15]. These the number of observations to assimilate and N the methods have been thoroughly analyzed in [32]– number of particles. Furthermore, the naive memory [35]. We will use in particular the Particle Marginal cost of storing all the trajectories would be of order Metropolis–Hastings (PMMH) algorithm, based on N ×T. the standard Metropolis–Hastings algorithm [24], However, we can reduce both the memory cost [36]. Remarkably, the Markov chain θ produced 1:M and the computational cost, by representing the by PMMH still converges to the target distribution trajectories as branches of a tree. At each step of when M goes to infinity, as if exact density evalua- the particle filter, new leaves are added to the tree, tions were used. The PMMH algorithm is described 5 R1 TABLEI 1 Range for each parameter. R1 1 C2s↵2 i(t) Parameter Min Max + C1s↵1 R∞ Ω 0.005 0.10 v(t) R Ω 0.050 0.50 1 - C1 Fcm−2s−α1 1.00 5.00 C2 Fcm−2s−α2 300 500 Fig.2. TheR∞−R1||C11sα1 − C21sα2 circuit. α1, α2 0.40 1.00 0.2 We generate synthetic data using the parameter set 0.15 θ? =(cid:2) 0.01 0.2 3.0 400 0.8 0.5 (cid:3). (17) charge transfer resistance and double layer capacitance Warburg term Ω()mag 0.1 R1 and 1/(C1 sα1) 1/(C2 sα2) cuFitigfourreth3eshaobwovseth‘terufree’qvuaelnuceys rfreosmpon0s.1e omfHthzetcoir2- Zi ohmic - kHz. resistance 0.05 R∞ The data length is set to T = 930 samples. frequency increases 2 kHz 0.1 mHz The standard deviations are set to σx = 0.002 0 and σ = 0.02. A pseudo-random binary sequence y 00.01 0.05 0.1 0.15 0.2 0.25 0.3 Z (Ω) (PRBS) signal between -1 and +1 was generated for real (uk)k∈N, with sampling time Ts=0.5 ms. The output Fig. 3. Frequency response of the circuit Figure 2 for the voltage (yk)k∈N is then generated using the model of parametersvaluesgiveninEq.(17). Eq. (4) with the parameter value of Eq. (17). Figure 4 shows the input-output data for the base scenario. in pseudo-code in Section V-C, along with details on The prior is set to be the uniform distribution algorithmic tuning. over the ranges given in Table I. The PMMH is Remark 1: The literature on particle filters and tuned with N = 128 particles per iteration, and PMCMC methods usually considers the case where M =20,000 iterations. The proposal distribution on the latent process (xk)k∈N is a Markov chain, but θ is a Gaussian distribution tuned using preliminary we see that all the above algorithms are directly runs, in order to match the covariance structure implementable in the non-Markovian setting. Few of the posterior distribution (details on this tuning articleshaveconsideredthenon-Markoviancase[37]– phase are given in Section V-C). We first present [39]. (cid:3) the results of the base scenario (Section III-A), and then consider various modifications: the number of III. Bayesian Inference in Battery Systems observations (Section III-B), the magnitude of the We consider the battery FO model represented in input data (Section III-C), the prior distribution Figure 2, which includes two CPEs. The resistance (Section III-D) and the state-output noise ratio of associatedwithC isopen-circuittomodeltheWar- the generated data (Section III-E). For each of the 2 burg term. The model is given by (12) with system followingexperiments,fiveindependentrunsareper- matrices (9) and (10) for n = 2 and R = ∞. formed. The resulting density estimators of each run 2 Figure 3 shows the typical frequency response of are overlaid, in order to check the consistency of the the circuit which is evident in a large number of method across multiple runs. electrochemical systems and may be measured using Remark 2: The ability to identify model param- EIS [17], [40]. The relation between elements and eters depends on the quantity and quality (uncer- frequencyresponsehavebeenannotatedinthefigure. tainties) of the available data. Although there is a The effect of an additional R in parallel with C is rich literature on the experiment design [41]–[43], it 2 2 also investigated later in this study. The initial value remains unclear how to generate informative data of the state vector is set to zero: x = [0 0]. The for parameter estimation of FO systems. In [44], 0 parameter vector is we applied the persistent excitation concept to the battery Randles circuit models, which are given by (cid:2) (cid:3) θ = R R C C α α . ∞ 1 1 2 1 2 ordinarydifferentialequations(ODEs).Thismethod 6 1.0 nt (A)0.5 ge (V)0.04 e a input curr−00.0.5 output volt−00.0.004 −1.0 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 time (s) time (s) Fig.4. Inputsequence(left)andobservations(right),oflengthT =930,generatedfromthemodelofEq.(4). is adopted here and three different persistently ex- sincetheposteriordistributionisflatonsomeparam- citing PRBS inputs are applied. The authors in [45] eters (such as C ), the mode could be anywhere in 2 propose a method for the design of periodic excita- the admissible interval for these parameters. Thus, tion signals to identify the battery models given by a numerical procedure giving only the maximum ODEs. Further investigations in this regard are left likelihood estimate would return any value in that open for future research. (cid:3) range.In[2],wehaveseenthesimilarresultsthatthe estimation error of Warburg term using the battery Randles model with standard capacitors, is large. A. Posterior samples in the base scenario Therefore, the identification of the Warburg term in The results of the base scenario, computed on five the battery FO model is similar to the identification independent runs, are shown in Figure 5. The graph of the integral term. shows the approximation of each marginal posterior The Bayesian approach is closer in spirit to inte- distribution using kernel density estimators, com- grated likelihood approaches, which are alternatives puted on each run (diagonals), as well as pairwise to profile likelihood. Another advantage of sampling histograms with hexagonal binning (lower triangle) fromtheposteriordistributionisthepossibilitytoin- of all the runs combined. The pairwise correlations vestigatecorrelationsbetweenparameters,whichare are indicated in the upper triangle. particularly large between R and α , and between ∞ 1 On the diagonal, we see that the five independent C1 and α1 according to Figure 5. runs are consistent, indicating that the PMCMC method approximates the posterior distribution in a B. Effect of the number of observations satisfactory way. We can then comment on the pos- We now proceed to investigate whether variations terior itself. We see that for some parameters, such in the experiments change the identification of the as R , the posterior is significantly different from parameters.Wefirstconsiderthenumberofobserva- ∞ the prior, represented by horizontal red lines. This tions.WeconsiderthreesetsofdataofsizesT =635, indicates that the data are informative on these pa- T = 930 and T = 1890, generated from three input rameters.Therefore,wecanexpectthatthemarginal sequences of these lengths. Instead of showing of all posterior distributions of these parameters would themarginaldistributionsasinFigure5,wefocuson concentratearoundthecorrespondingcomponentsof two parameters, R and C , which are respectively ∞ 2 the data-generating parameter θ? of Eq. (17), when easyandchallengingtoidentify(accordingtoFigure the number of observations goes to infinity. On the 5). other hand, nothing seems to be learned on some The results are shown in Figure 6. We see that otherparameters,suchasC ,forwhichtheposterior adding more data makes the posterior distribution 2 resembles the prior. more concentrated for R , and closer to the true ∞ Recall that the prior distribution is set to be uni- value of Eq. (17), whereas it does not seem to have formovertheintervalsdefinedbyTableI.Asaresult, an effect on the distribution of C2. the posterior distribution is simply proportional to the likelihood of Eq. (14), and thus the mode of the C. Effect of the input magnitude posteriorwouldbepreciselythemaximumlikelihood We study the effect of the magnitude of the input. estimate, under the constraints of Table I. However, On top of the base scenario, with input data of mag- 7 400 300 Corr: Corr: Corr: Corr: Corr: R¥200 −0.008 0.0132 0.626 0.00456 −0.0229 100 0 5 4 Corr: Corr: Corr: Corr: 1 C 3 −0.00534 −0.505 0.0378 0.0096 2 1 0.5 0.4 Corr: Corr: Corr: 10.3 R 0.028 0.0317 −0.00889 0.2 0.1 0.9 0.8 Corr: Corr: 1 a 0.7 −0.0352 −0.059 0.6 500 450 Corr: 2 C400 −0.0322 350 300 0.8 2 a 0.6 0.4 0.005 0.007 R0.0¥09 0.011 0.0131 2 C31 4 5 0.1 0.2R01.3 0.4 0.5 0.6 0.a7 10.8 0.9 300 350 C4002 450 5000.4 0.6a 2 0.8 Fig.5. ResultsofthePMMHalgorithmforthebasescenario,samplingapproximatelyfromtheposteriordistributionforthe model (4). On the diagonal, the posterior density function of each run is overlaid with the prior distribution (red horizontal lines),whichisuniform.Onthelowertriangle,thefiverunsarepooledtogetherinahistogramwithhexagonalbinning.Onthe uppertriangle,thecorrelationcoefficientsbetweenpairsofparametersaredisplayed. 8 800 0.006 600 T= 0.004 T= 400 6 6 3 3 5 0.002 5 200 0 0.000 800 0.006 nsity460000 T=9 nsity0.004 T=9 de200 30 de0.002 30 0 0.000 800 0.006 600 T T = 0.004 = 400 18 18 200 90 0.002 90 0 0.000 0.0050 0.0075 0.0100 0.0125 0.0150 300 350 400 450 500 R¥ C2 Fig.6. ResultsofthePMMHalgorithm,comparingtheposteriordistributionsobtainedforT =635,T =930andT =1890data points.Forsomeparameters(suchasR∞ ontheleft),addingmoredatahelpstheconcentrationoftheposteriordistribution, whileforothers(C2 ontheright),increasingdatadoesnotseemtohaveanyeffect. nitude1,weconsideraninputsequenceofmagnitude (i.e.signaltonoiseratios)as1.0and0.1respectively. 5 (thus, oscillating between −5 and +5). The results The results are shown in Figure 9. We see that are shown in Figure 7. Increasing the magnitude of increasing the state-output noise ratio has more of the input helps identifying R , but still does not aneffectonsomeparametersthanothers.Notethat, ∞ seem to impact the posterior distribution of C . this time, the posterior distribution of C seems to 2 2 deviate from the prior distribution, with a slight inclination of the posterior toward the right-hand D. Effect of the prior distribution side of the interval [300,500]. We study the effect of the prior distribution. We In our recent work [2] we showed that non- consider a Gaussian prior, centered in the middle of commensurate FO models are structurally (i.e. as- the range indicated in Table I, and with standard suming noise-free, perfect data) locally identifiable. deviation one fourth of the range, so that roughly For the Ohmic resistor R we showed that it can 95% of the Gaussian mass lands in the range; we ∞ be globally structurally identifiable. In the present truncatetheGaussiandistributionoutsidetherange. we study the practical identifiability (i.e. assuming Since some of the parameters are barely identifiable, real, noisy data) using the framework of Bayesian we expect the choice of prior to have an impact. inference. In particular we note that the term R Figure8showsthatindeed,forsomeparameterssuch ∞ is also practically globally identifiable (see Figures as C , the posterior is essentially equal to the prior, 2 above). and hence the choice of prior has a huge impact. For We believe that identification of the FO Warburg other parameters such as R , the choice of prior ∞ term is similar to the identification of integral terms distribution has no noticeable effect, indicating that in ODE systems. Closed-loop identification [46] is a the posterior distribution is highly driven by the common method for the identification of the inte- observations. gral term. Further research is required to investigate whethertheclosed-loopidentificationapproachcould E. Effect of the state-output noise ratio be applied to the battery non-commensurate FO We study the effect of the state to output noise models. ratio. More precisely, we generate data using σ = In the context of Bayesian inference, we sug- y 0.002, instead of using σ = 0.02 as in the base gest other particle MCMC methods such as Particle y scenario. We refer to these state-output noise ratios Gibbs [15], [38] or SMC2 [47] could also be envi- 9 0.006 2000 1500 0.004 − − 1 1 1000 /+ /+ 1 1 0.002 500 y y sit 0 sit0.000 n n0.006 de2000 de 1500 0.004 − − 5 5 1000 /+ /+ 5 5 0.002 500 0 0.000 0.005 0.007 0.009 0.011 0.013 300 350 400 450 500 R¥ C2 Fig.7. ResultsofthePMMHalgorithm,comparingtheposteriordistributionsobtainedforinputsofmagnitude1(toppanels) andofmagnitude5(bottompanels).Forsomeparameters(suchasR∞ ontheleft),aninputwithhighermagnitudehelpsthe concentration of the posterior distribution, while for others (C2 on the right), changing the magnitude does not seem to have anyeffect. 400 0.0075 300 u u 200 nifo 0.0050 nifo r r m m 0.0025 100 y y sit0 sit0.0000 n n e e d400 d 0.0075 300 n n o 0.0050 o 200 rm rm a a l l 0.0025 100 0 0.0000 0.005 0.007 0.009 0.011 0.013 300 350 400 450 500 R¥ C2 Fig.8. ResultsofthePMMHalgorithm,comparingtheposteriordistributionsobtainedfortwopriordistributions:auniform prior (top panels) and a normal prior (bottom panels). For some parameters (such as R∞ on the left), the prior distribution has no noticeable effect on the posterior distribution, which is very concentrated with respect to both priors. On some others (C2 ontheright),theposteriorlooksverymuchlikethepriorandthusthechoiceofpriorhasahugeeffectontheresults. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.