ebook img

Assessment of description quality of models by information theoretical criteria based on Akaike and Schwarz-Bayes applied with stability data of energetic materials PDF

23 Pages·0.595 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Assessment of description quality of models by information theoretical criteria based on Akaike and Schwarz-Bayes applied with stability data of energetic materials

6 - 1 Assessment of description quality of models by information theoretical criteria based on Akaike and Schwarz-Bayes applied with stability data of energetic materials Manfred A. Bohn Fraunhofer Institut fuer Chemische Technologie, ICT, D-76318 Pfinztal, Germany Abstract To describe any measurement data by models or parametric equations is mostly a neces- sity for the interpretation and further evaluations of the data. There is often the case that several models may be applicable and the question arises, which model is the better one. Besides many criteria to argue for one or another model there are objective methods for the assessment of the description quality for models in comparison. Such methods are based on information theoretical conclusions and two criteria have proven helpful. One has been developed by Hirot(s)ugu Akaike and it is called Akaike Information Criterion (AIC). The other method is based on the early work of Thomas Bayes, which was adapted by Gideon Schwarz to the framework of an information criterion and is named Bayes In- formation Criterion (BIC). With several data sets obtained with energetic materials and some models applied on them the usefulness but also limitations of these two information criteria are demonstrated and discussed. The data comprise stabilizer consumption and molar mass degradation of nitrocellulose in a gun propellant as well as the degradation of cellulose in electrical transformer insulation paper. Keywords: model selection, minimization of information loss, AIC, BIC 1. Introduction The selection of a model to describe a set of data is at first controlled by the type of data and their meaning in physical, chemical, biological or any meaning. In most cases one can construct a description scheme with the knowledge one has about the inherence and the origin of the data. For example, if one determines a series of concentrations of substances, which are changing with time, one orients typically to chemical kinetics and select a description or one builds an adapted one. But there may be the case that one does not know what the underlying description of data-time course is. Then one applies parametric descriptions and try to find out, which one is the best. Surely one has the general correlation coefficient R2 and normalized sum of squared deviations between model and data, maybe in the form of squared standard deviation SD2. The can help to find the suitable model. However, there is a lack of some objective evaluation of the description quality in the sense that they do not take in account weighting of the number of para- meters used. Here the so-named information criteria based on probability and statistics have a supportive role. There are a lot of methods to evaluate model description quality. Some key terms are: deviance information criterion (DIC) ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ Paper 6, pages 6-1 to 6-23 in Proc. of the 46th International Annual Conference of ICT on ‘Energetic Materials – Per- formance, Safety and System Applications’, June 23 to 26, 2015, Karlsruhe, Germany. ISSN 0722-4087. Fraunhofer- Institut fuer Chemische Technologie (ICT), D-76318 Pfinztal. Germany. 6 - 2 model selection design of experiments (DoE) Occam's razor (Ockham’s razor) Akaike information criterion (AIC) Bayes factor Bayes information criterion (BIC) Minimum description length (Algorithmic information theory) Minimum message length Kullback–Leibler divergence (KL-divergence), KL entropy, KL-distance Here the concentration is on three terms: Occam's razor Akaike information criterion (AIC) Bayesian information criterion (BIC) also named Schwarz-Bayes IC (SBIC) The Bayes information criterion (BIC) or Schwarz criterion (also SBC, SBIC) as well as the Akaike information criterion (AIC) are criteria for model selection among a finite set of models. The selection method is based on the likelihood function. When fitting models, it is possible to increase the likelihood (description quality) by adding parameters. But doing so may result in overfitting. Both BIC and AIC resolve this problem by introducing a penalty term for the number of parameters in the model; the penalty term is larger in BIC than in AIC. Overfitting is against the principle of Occam’s razor (Ockham’s razor or Ockham’s principle, always powerful declaimed by William of Ockham), which states that only the really necessary amount of variables (model parameters) may be used to describe a data set. The imperative is to shave away unnecessary assumptions and cutting apart two similar models. In other word: > If there is more than one theoretical explanation for the same subject the simplest theory has to be preferred; > A theory is simple if it contains as less variables and hypotheses as possible, which are in reasonable relation to another, which can explain the subject also reasonably. But Ockham’s principle is only one criterion for the quality of a theory and it allows no real assessment on the validity of the theory and the explanations 2. Formulation of the information criteria AIC and BIC Quantities which are used in AIC and BIC with equal meaning are: n number of data points p number of model parameters to be fitted (without constraint ones) y measured value of dependent variable at data point i i f(x) by model calculated value of dependent variable at data point i i σ2 true variance of the data σ 2 estimated variance (biased variance) e L maximized value of likelihood function (maximum likelihood) n (y −f(x ))2 χ2 squared standard deviation or error variance, generallyχ2 =∑ i i 2 σ i=1 i 6 - 3 n FQS Fehlerquadradsumme, sum of squared deviations (SSD), FQS=∑(y −f(x ))2 i i i=1 2.1 The Akaike information criterion (AIC) The general formulation based on Akaike is formulated with the likelihood function and is given in Eq.(1). AIC=2⋅p−2⋅ln(L)=2⋅p−2⋅C+χ2 (1) C is a constant, which cancels in relative model evaluations and one obtains Eq.(2) AIC=2⋅p+χ2 (2) The quantity χ2 denotes the fit of models according to minimization of the sum of squared deviations (SSD) or Fehlerquadratsumme (FQS). The maximum likelihood L is given by Eq.(3), in the used logarithmic form. ⎜⎛ n ⎛ 1 ⎞1/2⎟⎞ 1 n (y −f(x ))2 1 ln(L)=ln ∏⎜ ⎟ − ∑ i i =C− ⋅χ2 (3) ⎜⎜⎝ i=1 ⎜⎝2⋅π⋅οi2 ⎟⎠ ⎟⎟⎠ 2 i=1 οi2 2 Case 1 All σ are identical, σ = σ , and σ is unknown and is taken as inherent fit parameter. Then i i e e Eq.(4) results, whereby the number of fit parameters has been increased by 1, because of use of a priori unknown σ . By expansion the Eq.(6) results for the maximum likelihood e function, used mostly in logarithmic form. n/2 ⎛ 1 ⎞ 1 1 n ln(L)=ln⎜ ⎟ − ⋅ ⋅∑(y −f(x ))2 (4) ⎜⎝2⋅π⋅ο2e ⎟⎠ 2 ο2e i=1 i i 1 n FQS ο2 = ⋅∑(y −f(x ))2 = (5) e i i n n i=1 n ⎛ FQS⎞ n n n n ⎛FQS⎞ n ⎛FQS⎞ ln(L)=− ⋅ln⎜2π⋅ ⎟− =− ⋅ln(2π)− − ⋅ln⎜ ⎟=C− ⋅ln⎜ ⎟ (6) 2 ⎝ n ⎠ 2 2 2 2 ⎝ n ⎠ 2 ⎝ n ⎠ By omitting C in relative use of AIC by using the statistical weights (see later) the Eq.(7) follows. There a short illustration is given on the meaning of the two terms. ⎛FQS⎞ AIC=n⋅ln⎜ ⎟+2⋅(p+1) ⎝ n ⎠ (7) ⎛FQS⎞ AIC=n⋅ln⎜ ⎟ + 2⋅(p+1) ⎝ n ⎠ likelihood term penalty term The lower the AIC value the better is the assessment of the model. The more fit para- meters are used the higher the AIC value. This means formulated in information content 6 - 4 about the data: The higher the AIC of a model, the higher the information loss about the data. The often used form of AIC is advantageous with models using to much fit parameters. Means the penalty term is considered to small. Therefore a modified version was proposed and formulated /1/, which should or even must be used if n is relatively small or n/p is << 1000. The AIC corrected for small sample sizes is named AICc and has an additional penalty term formed from p and n, Eq.(8). ⎛FQS⎞ 2⋅p⋅(p+1) AICc =n⋅ln⎜ ⎟+2⋅(p+1)+ (8) ⎝ n ⎠ n−p−1 The normalized statistical AIC weight wA or wAc of a model i among a set of u models to i i fit the data best, means which minimizes most the information loss by the description of the data, is obtained by Eq.(9). It uses the ‘distance’ ΔA or ΔAc between the best model i i with smallest AICc and the AICc of model i, Eq.(10). The quantity exp(-ΔA /2) or exp(-ΔAc i i i /2) is the relative maximum likelihood of model i in comparison with the other models. ( ) exp −ΔAc /2 wAc = i (9) i u ∑exp(−ΔAc /2) i i=1 ΔA = AICc −AICc (10) i i min Case 2 All σ are different. i AIC=χ2 +2⋅p (11) n (y −f(x ))2 χ2 =∑ i i (12) 2 σ i=1 i The evaluation with the relative likelihood function and the application of statistical weights is the same as above. 2.1 The Bayes (Schwarz-Bayes) information criterion (BIC, SBIC) Schwarz has formulated the information criterion in the ‘modern’ way, but based on the ideas of Bayes /2/. The start of the BIC is Eq.(13). ( ) BIC=−2⋅ln(L)+p⋅ ln(n)−ln(2π) (13) Case 1 All σ are identical, σ = σ, then Eq.(14) follows. i i ( ) 2 BIC=n⋅lnσ +p⋅(ln(n)−ln(2π)) (14) e With n large ln(2π) = 1.83: n >1000.000, Eq.(15) can be used. Introducing Eq.(16) results in Eq,(17), the mostly used form of BIC. , ( ) BIC=n⋅lnσ 2 +p⋅ln(n) (15) e 6 - 5 1 n σ 2 = ⋅∑(y −f(x ))2 (16) e i i n i=1 ⎛FQS⎞ BIC=n⋅ln⎜ ⎟+p⋅(ln(n)−ln(2π)) (17) ⎝ n ⎠ Case 2 All σ are different then again the general squared standard deviation χ2 from Eq.(12) is i used. BIC=χ2 +p⋅(ln(n)−ln(2π)) (18) For both cases the following evaluation holds as above the assessment with the normalized statistical weight wB of a model i in comparison in a set of u models. i ( ) exp −ΔB /2 wB = i (19) i u ∑exp(−ΔB /2) i i=1 ΔB =BIC −BIC (20) i i min Notes Finally the difference between AIC and BIC is the different penalty term. Sometimes the part ln(2π) in BIC penalty term is omitted, which increases the penalty. AIC and BIC can tell nothing about the quality of the model in an absolute sense. If all candidate models fit poorly, AIC and BIC will not give any warning about this. Further on, there is no selection of a possible ‘true’ model. The general assumption about the selected models to be tested is that none of them is a ‘true’ model. The only assessment result is in terms of inform- ation loss in using a description (model) for the experimental data. This means that the entity of the experimental data itself has the highest degree of information. Any regression or data reduction or model description reduces this amount of information (information is in the context of thermodynamics an extensive quantity). AIC and BIC help to select the description, which provides with the least information loss about the data. 3. Examples of model evaluations with AIC and BIC The experimental data used have been compiled in Appendix A1. 3.1 Reaction kinetic models used with the examples The following five reaction kinetic models to describe stabilizer consumption /3, 4/ and molar mass decrease /5/ have been tested with several data sets. Model ‘S: linear’, zero order consumption of stabilizer. The reaction rate constant has the unit mass-%1/time. S(t,T)=S(0)−k (T)⋅t (21) 0 6 - 6 Model ‘S: exponential’, first order consumption of stabilizer. The reaction rate cons tant has the unit 1/time. S(t,T)=S(0)⋅exp(−k (T)⋅t) (22) 1 Model ‘S: exponential + linear’, combination of fist and zero order. The reaction rate constant k1 has the unit 1/time the other one mass-%/time. ⎛ k (T)⎞ k (T) S(t,T)=⎜S(0)+ 0 ⎟⋅exp(−k (T)⋅t)− 0 (23) ⎜ ⎟ 1 ⎝ k1(T)⎠ k1(T) Model ‘S: nth order’, stabilizer consumption according to nth order. The reaction rate constant has the unit 1/time. ⎛ 1 ⎞ ⎜ ⎟ S(t,T)=S(0)⋅[1−(1−n)⋅k(T)⋅t] ⎝1−n⎠ (24) Model ‘S: extended’, stabilizer consumption according to a complete but still basic reaction scheme assuming a zero order (approximated first order) NC degradation by k and stabilizer reaction by k . NE(0) the part of NC in the formulation in mass N S parts, which has no unit. The two reaction rate constants have the unit 1/time. ( ) exp−NE(0)⋅k ⋅k ⋅t2 N S S(t)= (25) 1 π k ⎛ NE(0)⋅k ⋅k ⎞ + ⋅ S ⋅erf⎜ N S ⋅t⎟ S(0) 2 NE(0)⋅k ⎜ 2 ⎟ N ⎝ ⎠ Model for molar mass decrease; all models are based on random chain splitting, but by two different mechanisms: bond scission between chain elements and chain element de- composition. In the last version, the chain elements decrease one has mass loss and in the very final end all material is decomposed /5/. Model ‘M: CS - BS’, chain splitting (CS) by bond scission (BS) between chain ele- ments. The reaction rate constant has the unit 1/time. ( ) exp +k ⋅t Mn(t)=m⋅ B1 (26) m ( ( ) ) + exp +k ⋅t −1 B1 Mn(0) Model ‘M: CS - ED’, chain splitting (CS) by chain element decomposition (ED). The reaction rate constant has the unit 1/time. ( ) Mn(t) exp −k ⋅t = M1 (27) m m ( ( )) + 1−exp −k ⋅t M1 Mn(0) Model ‘M: CS - BS + CR’, chain splitting (CS) by bond scission (BS) between chain elements and chain recombination (CR). The two reaction rate constants have the unit 1/time. ( ) Mn(t) exp +(k +k )⋅t = B1 B2 (28) m m k +k ⋅ m B1 B2 Mn(0) ( ( ) ) + ⋅ exp +(k +k )⋅t −1 B1 B2 Mn(0) k +k B1 B2 6 - 7 Model ‘M: CS - ED + CR’, chain splitting (CS) by chain element decomposition (ED) and chain recombination (CR). The two reaction rate constants have the unit 1/time. ( ) exp −k ⋅t Mn(t)=m⋅ M1 (29) m k + M1 ⋅(exp(−k ⋅t)−exp(−k ⋅t)) M2 M1 Mn(0) k −k M1 M2 Till now only isothermal formulations of the models have been given. Sometimes non- isothermal determinations of data are made. Then non-isothermal formulations of the models must be used /5/. Mostly one performs the measurement with constant heating rate, but in principle also time or temperature dependent heating rates can be used with the following model formulations, here done for the two models for molar mass decrease with chain recombination. Model ‘M: niso CS - BS + CR’, non-isothermal (niso) chain splitting (CS) by bond scis- sion (BS) between chain elements and chain recombination (CR). The two reaction rate constants have the unit 1/time. ⎛Mn(T)⎞ d⎜ ⎟ 2 ⎝ m ⎠ 1 ⎛ m ⎞ ⎛Mn(T)⎞ 1 ( ) ⎛Mn(T)⎞ =− ⋅⎜k (T)+k (T)⋅ ⎟⋅⎜ ⎟ + ⋅ k (T)+k (T) ⋅⎜ ⎟ (30) ⎜ B1 B2 ⎟ B1 B2 dT h ⎝ Mn(0)⎠ ⎝ m ⎠ h ⎝ m ⎠ Model ‘M: niso CS - ED + CR’, non-isothermal (niso) chain splitting (CS) by chain element decomposition (ED) and chain recombination (CR). The two reaction rate constants have the unit 1/time. ⎛Mn(T)⎞ d⎜ ⎟ 2 ⎝ m ⎠ =−1⎜⎛k (T)+k (T)exp(+k (T)⋅1⋅(T−Ta)) m ⎟⎞⎜⎛Mn(T)⎟⎞ −1(k (T)−k (T))⎜⎛Mn(T)⎟⎞ ⎜ M1 M2 M1 ⎟ M1 M2 dT h⎝ h Mn(0)⎠⎝ m ⎠ h ⎝ m ⎠ (31) The non-isothermal models for molar mass decrease, Eq.(30) und Eq.(31) could be inte- grated but the resulting equations are quite cumbersome. It is therefore much easier, to use the differential forms given above and apply them via combined Runge-Kutta numeri- cal integration and non-linear parameter fitting to the data. 3.2 Description of DPA decrease in M10 flake propellant From the five models the model ‘S: extended’ describes the data best. With a weight of 54% in AICc. Iit surpasses the next best model ‘S: nth order’, which achieved 21 %. 6 - 8 Fig. 1: Graphical presentation of the fit qualities of the five models for stabilizer consump- tion. DPA decrease at 65.5°C. Data from US-Army ARDEC, Picatinny Arsenal. Table 1: Results of the model assessments of the data of M10 flake propellant. DPA de- crease at 65.5°C row Model → linear expon. expn. + linear, nth order extended summa- ↓ Quantity k k k , k k, n k , k 1 0 O N S tions n 9 9 9 9 9 p 1 1 2 2 2 SD2 0.02909 0.00107 0.00066 0.00064 0.00052 R2 0.708 0.98925 0.99411 0.99429 0.99539 0.00429 ± 0.012 ± 0.0103 ± 0.01014 ± 8.10E-3 ± k / k / k [1/d] 1 N 0.0003 0.0005 7.20E-4 6.45E-4 0.0017 0.763 ± n [-] - - - - O 0.081 6.45E-4 ± 1.21E-2 ± k [m.-%] / k [1/d] - - - 0 S 2.38E-4 0.0012 FQS 0.23272 0.00856 0.00462 0.00448 0.00364 AICc -28.325 -58.049 -60.171 -60.448 -62.317 - exp(-ΔAci/2) 4.157E-08 0.118 0.342 0.393 1 1.853 wAci 2.243E-08 0.064 0.185 0.212 0.54 1 AIC -28.896 -58.621 -62.171 -62.448 -64.317 - exp(-ΔAi/2) 2.034E-08 0.058 0.342 0.393 1 1.793 wAi 1.135E-08 0.032 0.191 0.219 0.558 1 BIC -32.537 -62.262 -67.453 -67.73 -69.598 - 6 - 9 exp(-ΔBi/2) 8.960E-09 0.026 0.342 0.393 1 1.761 wBi 5.088E-09 0.015 0.194 0.223 0.568 1 BICoπ -30.699 -60.424 -63.777 -64.054 -65.923 - exp(-ΔBoπi/2) 2.245E-08 0.064 0.342 0.393 1 1.799 wBoπi 1.248E-08 0.036 0.19 0.218 0.556 1 3.3 Description of DPA decrease in single base GP From the five models the model ‘S: extended’ describes the data best. With a weight of 61% in AICc. It surpasses the next best model ‘S: nth order’, which achieved 39%. In the case of using a reduced data set up to 7 days, the result changes: the best model is ‘S: exponential + linear’ with 49% followed by model ‘S: nth order’ with 36%; model ‘S: extended’ reaches 15%. Fig. 2: Graphical presentation of the fit qualities of the five models for stabilizer consump- tion in a single base GP. DPA decrease at 90°C over 19 days. Table 2: Results of the model assessments of the data of single base GP GS, DPA decrease at 90°C. Data measured over 19 days. row Model → linear expon. expn. + linear nth order extended summa- ↓ Quantity k k k , k k, n k , k 1 0 O N S tions n 26 26 26 26 26 p 1 1 2 2 2 6 - 10 SD2 0.13955 0.00254 0.00204 0.00067258 0.00065 R2 0.02824 0.98229 0.98639 0.99563 0.1097, 0.3155, 0.2879, 0.013 0.4792, k / k / k [1/d] 1 N 0.008 0.01 0.059 n [-] O 0.0140, 0.1882, k [m.-%] / k [1/d] 0 S 0.0051 0.013 FQS 3.48875 0.0635 0.04896 0.016141919 0.0156 AICc -48.056 -152.218 -156.624 -185.473 -186.361 exp(-ΔAci/2) 9.275E-31 3.854E-08 3.489E-07 0.642 1 1.642 wAci 5.650E-31 2.348E-08 2.125E-07 0.391 0.609 1 AIC -48.222 -152.385 -157.146 -185.995 -186.883 exp(-ΔAi/2) 7.766E-31 3.227E-08 3.489E-07 0.642 1 1.642 wAi 4.731E-31 1.966E-08 2.125E-07 0.391 0.609 1 BIC -50.802 -154.965 -160.306 -189.155 -190.043 exp(-ΔBi/2) 5.812E-31 2.415E-08 3.489E-07 0.642 1 1.642 wBi 3.541E-31 1.471E-08 2.125E-07 0.391 0.609 1 Fig. 3: Graphical presentation of the fit qualities of the five models for stabilizer consump- tion in a single base GP. DPA decrease at 90°C over 7 days. Table 3: Results of the model assessments of the data of single base GP GS, DPA decrease at 90°C. Data measured over 19 days, but only used up to 7 days. expn. + lin- row Model → linear expon. nth order extended ear, summa- ↓ Quantity k k k, n k , k k , k O N S tions 1 0

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.