ebook img

Generalized Symmetric Divergence Measures and Inequalities PDF

0.34 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Generalized Symmetric Divergence Measures and Inequalities

GENERALIZED SYMMETRIC DIVERGENCE MEASURES AND INEQUALITIES INDER JEET TANEJA 5 Abstract. Inthispaper,wehavestudiedthefollowingtwodivergence measures oftype 0 s: 0 n 2 Js(P||Q)=[s(s−1)]−1 psiqi1−s+p1i−sqis −2 , s6=0,1 n s(P Q)= n (cid:20)i=1 (cid:21) Ja V || J(P||Q)=i=1(pi−qi)lnPpqii(cid:0) , (cid:1) s=0,1 (cid:16) (cid:17) 19 and  P h.ST] Ws(P||Q)=TII(T(PsP(|P|QQ||))Q==)=21n(cid:20)[siP=n(s1pip−+iql1in)](cid:16)−lnp1i2+(cid:20)piiqPp=nii(cid:17)1+(cid:16)q+ip1iiP−=n,s1+2qqii1l−ns(cid:16)(cid:17)p(cid:0)i2+pqiiq+2iq(cid:17)i(cid:21)(cid:1)s,−1(cid:21), sss=6==001,1 mat The first measure g||eneraliiPz=e1s(cid:0)the2 w(cid:1)ell k(cid:16)n2o√wpinqiJ(cid:17)-divergence due to Jeffreys [16] and [ Kullback and Leibler [17]. The second measure gives a unified generalizationof Jensen- Shannon divergence due to Sibson [22] and Burbea and Rao [2, 3], and arithmetic- 1 v geometric mean divergence due toTaneja[27]. Thesetwomeasurescontaininparticular 1 some well knowndivergencessuch as: Hellinger’s discrimination, triangular discrimina- 0 tion and symmetric chi-square divergence. In this paper we have studied the properties 3 of the above two measures and derived some inequalities among them. 1 0 5 0 / h 1. Introduction t a Let m n v: Γn = P = (p1,p2,...,pn) pi > 0, pi = 1 , n > 2, i ( (cid:12) ) X (cid:12) Xi=1 be the set of all complete finite discrete pro(cid:12)bability distributions. For all P,Q Γ , the r (cid:12) n a following measures are well known in the lit(cid:12)erature on information theory and s∈tatistics: Hellinger Discrimination (Hellinger [15]) • n 1 (1) h(P Q) = 1 B(P Q) = (√p √q )2 , i i || − || 2 − i=1 X 2000 Mathematics Subject Classification. 94A17; 26D15. Key words and phrases. J-divergence; Jensen-Shannon divergence; Arithmetic-Geometric divergence; Triangular discrimination; Symmetric chi-square divergence; Hellinger discrimination; Csisza´r’s’s f- divergence; Information inequalities. 1 2 INDER JEET TANEJA where n (2) B(P Q) = √p q . i i || i=1 X is the well-known Bhattacharyya [1] coefficient. Triangular Discrimination • n (p q )2 i i (3) ∆(P Q) = 2[1 W(P Q)] = − , || − || p +q i i i=1 X where n 2p q i i (4) W(P Q) = , || p +q i i i=1 X is the well-known harmonic mean divergence. Symmetric Chi-square Divergence (Dragomir et al. [14]) • n (p q )2(p +q ) (5) Ψ(P Q) = χ2(P Q)+χ2(Q P) = i − i i i , || || || p q i i i=1 X where n (p q )2 n p2 (6) χ2(P Q) = i − i = i 1, || q q − i i i=1 i=1 X X is the well-known χ2 divergence (Pearson [21]) − J-Divergence (Jeffreys [16]; Kullback-Leibler [17]) • n p i (7) J(P Q) = (p q )ln( ). i i || − q i i=1 X Jensen-Shannon Divergence (Sibson [22]; Burbea and Rao [2, 3]) • n n 1 2p 2q i i (8) I(P Q) = p ln + q ln . i i || 2 p +q p +q "i=1 (cid:18) i i(cid:19) i=1 (cid:18) i i(cid:19)# X X Arithmetic-Geometric Divergence (Taneja [27]) • n p +q p +q i i i i (9) T(P Q) = ln . || 2 2√p q i=1 (cid:18) (cid:19) (cid:18) i i(cid:19) X After simplification, we can write (10) J(P Q) = 4[I(P Q)+T(P Q)]. || || || GENERALIZED DIVERGENCE MEASURES 3 The measures J(P Q), I(P Q) and T(P Q) can also be written as || || || (11) J(P Q) = K(P Q)+K(Q P), || || || 1 P +Q P +Q (12) I(P Q) = K P +K Q || 2 || 2 || 2 (cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) and 1 P +Q P +Q (13) T(P Q) = K P +K Q , || 2 2 || 2 || (cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) where n p i (14) K(P Q) = p log , i || q i=1 (cid:18) i(cid:19) X is the well known Kullback-Leibler [17] relative information. The measure (9) is also known by Jensen difference divergence measure (Burbea and Rao [2, 3]). The measure (10) is new in the literature and is studied for the first time by Taneja [27] and is called arithmetic and geometric mean divergence measure. For simplicity, thethreemeasuresappearingin(8),(9)and(10)weshallcall, JS-divergence,J- divergence andtheAG-divergence respectively. Moredetailsonthesedivergence measures can be seen in on line book by Taneja [28]. We call the measures given in (1), (3), (5), (7), (9) and (10) by symmetric divergence measures, since they are symmetric with respect to the probability distributions P and Q. While the measures (6) and (14) are not symmetric with respect to probability distri- butions. 2. Generalizations of Symmetric Divergence Measures In this section, we shall present new generalizations of the symmetric divergence mea- sures given in Section 1. Before that, first we shall present a well known generalization of Kullback-Leibler’s relative information. Relative Information of Type s • n Ks(P||Q) = [s(s−1)]−1 psiqi1−s −1 , s 6= 0,1  (cid:20)i=1 (cid:21) P    n (15) Φ (P Q) = K(Q P) = q ln qi , s = 0 , s ||  || i=1 i pi  (cid:16) (cid:17) P n K(P||Q) = piln pqii , s = 1  i=1  (cid:16) (cid:17) for all s R.  P  ∈ 4 INDER JEET TANEJA The measure (15) is due to Cressie and Read [7]. For more studies on this measure refer to Taneja [29] and Taneja and Kumar [33, 18] and reference therein. The measure 15 admits the following particular cases: (i) Φ (P Q) = 1χ2(Q P). −1 || 2 || (ii) Φ (P Q) = K(Q P). 0 || || (iii) Φ (P Q) = 4[1 B(P Q)] = 4h(P Q). 1/2 || − || || (iv) Φ (P Q) = K(P Q). 1 || || (v) Φ (P Q) = 1χ2(P Q). 2 || 2 || Here, we observe that Φ (P Q) = Φ (Q P) and Φ (P Q) = Φ (Q P). 2 1 1 0 || − || || || 2.1. J-Divergence of Type s. Replace K(P Q) by Φ (P Q) in the relation (11), we s || || get (16) (P Q) = Φ (P Q)+Φ (Q P) s s s V || || || n Js(P||Q) = [s(s−1)]−1 psiqi1−s +p1i−sqis −2 , s 6= 0,1  (cid:20)i=1 (cid:21) = P(cid:0) (cid:1) .    n J(P Q) = (p q )ln pi , s = 0,1 || i − i qi i=1  P (cid:16) (cid:17)  The expression (16) admits the following particular cases: (i) (P Q) = (P Q) = 1Ψ(P Q). V−1 || V2 || 2 || (ii (P Q) = (P Q) = J(P Q). 0 1 V || V || || (iii) (P Q) = 8h(P Q). 1/2 V || || Remark 1. The expression (16) is the modified form of the measure already known in the literature: n J (P Q) = (s 1) 1 psq1 s +p1 sqs 2 , s = 1,s > 0 s || − − i i− i− i − 6  (cid:20)i=1 (cid:21) (17) 1(P Q) = P(cid:0) (cid:1) . Vs ||  n J(P Q) = (p q )ln pi , s = 1 || i − i qi i=1  P (cid:16) (cid:17)  Forthe propertiedof the measure (17) refer to Burbea and Rao [2,3], Taneja [26,27,28], etc. For the axiomatic characterization of this measure refer to Rathie and Sheng [23] and Taneja [25]. The measures (16) considered here differs in constant and it permits in considering negative values of the parameter s. 2.2. Unified AG and JS – Divergence of Type s. Replace K(P Q) by Φ (P Q) s || || in the relation (13) interestingly we have a unified generalization of the AG and JS – divergence given by GENERALIZED DIVERGENCE MEASURES 5 1 P +Q P +Q (18) (P Q) = Φ P +Φ Q s s s W || 2 2 || 2 || (cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) ITs(P||Q) = [s(s−1)]−1 n p1i−s+2qi1−s pi+2qi s −1 , s 6= 0,1  (cid:20)i=1 (cid:21) (cid:16) (cid:17)  P (cid:0) (cid:1)    n n = I(P Q) = 1 p ln 2pi + q ln 2qi , s = 0 .  || 2 (cid:20)i=1 i (cid:16)pi+qi(cid:17) i=1 i (cid:16)pi+qi(cid:17)(cid:21) P P  n T(P Q) = pi+qi ln pi+qi , s = 1  || i=1 2 2√piqi  P(cid:0) (cid:1) (cid:16) (cid:17) The measure (18) admits the following particular cases:  (i) (P Q) = 1∆(P Q). W−1 || 4 || (ii) (P Q) = I(P Q). 0 W || || (iii) (P Q) = 4d(P Q). 1/2 W || || (iv) (P Q) = T(P Q). 1 W || || (v) (P Q) = 1 Ψ(P Q). W2 || 16 || The measure d(P Q) given in part (iii) is not studied elsewhere and given by || n √p +√q p +q i i i i (19) d(P Q) = 1 . || − 2 2 i=1 (cid:18) (cid:19) r ! X A relation of the measure (19) with Hellinger’s discrimination is given in the last section. Connections of the measure (19) with mean divergence measures can be seen in Taneja [32]. We can also write 1 P +Q P +Q (20) (P Q) = Φ P +Φ Q . 1 s s s W − || 2 || 2 || 2 (cid:20) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) Thus we have two symmetric divergences of type s given by (16) and (18) generalizing the six symmetric divergence measures given in Section 1. In this paper our aim is to study the symmetric divergences of type s and to find inequalities among them. These studies we shall do by making use of the properties of Csisz´ar’s f-divergence. 3. Csisza´r’s f Divergence and Its Properties − Given a function f : (0, ) R, the f-divergence measure introduced by Csisz´ar’s [5] ∞ → is given by n p i (21) C (P Q) = q f , f i || q i=1 (cid:18) i(cid:19) X for all P,Q Γ . n ∈ The following theorem is well known in the literature [5, 6]. 6 INDER JEET TANEJA Theorem 1. If the function f is convex and normalized, i.e., f(1) = 0, then the f- divergence, C (P Q) is nonnegative and convex in the pair of probability distribution f || (P,Q) Γ Γ . n n ∈ × The theorem given below give bounds on the measure (21). Theorem 2. Let f : R R be differentiable convex and normalized i.e., f(1) = 0. If + → P,Q Γ , are such that 0 < r 6 pi 6 R < , i 1,2,...,n , for some r and R with ∈ n qi ∞ ∀ ∈ { } 0 < r 6 1 6 R < , r = R, then we have ∞ 6 (22) 0 6 C (P Q) 6 E (P Q) 6 A (r,R), f || Cf || Cf and (23) 0 6 C (P Q) 6 B (r,R) 6 A (r,R), f || Cf Cf where n p i (24) E (P Q) = (p q )f ( ), Cf || i − i ′ q i i=1 X 1 (25) A (r,R) = (R r)( f (R) f (r)) Cf 4 − ′ − ′ and (R 1)f(r)+(1 r)f(R) (26) B (r,R) = − − . Cf R r − The proof is based on the following lemma due to Dragomir [8]. Lemma 1. Let f : I R R be a differentiable convex function on the interval I, + o o ⊂ → n o x I (I is the interior of I), λ > 0 (i = 1,2,...,n) with λ = 1. If m, M I and i i i ∈ ∈ i=1 m 6 xi 6 M, i = 1,2,...,n, then we have the inequalities: P ∀ n n (27) 0 6 λ f(x ) f λ x i i i i − ! i=1 i=1 X X n n n 6 λ x f (x ) λ x λ f (x ) i i ′ i i i i ′ i − ! ! ı=1 i=1 i=1 X X X 1 6 (M m)(f (M) f (m)). ′ ′ 4 − − As a consequence of above theorem we have the following corollary. GENERALIZED DIVERGENCE MEASURES 7 Corollary 1. For all a,b,υ,ω (0, ), the following inequalities hold: ∈ ∞ υf(a)+ωf(b) υa+ωb (28) 0 6 f υ +ω − υ +ω (cid:18) (cid:19) υaf (a)+ωbf (b) υa+ωb υf (a)+ωf (b) 6 ′ ′ ′ ′ υ +ω − υ +ω υ +ω (cid:18) (cid:19)(cid:18) (cid:19) 1 6 (b a)(f (b) f (a)). ′ ′ 4 − − Proof. It follows from Lemma 1, by taking λ = υ , λ = ω , λ = ... = λ = 0, x = a, 1 υ+ω 2 υ+ω 3 n 1 x = b, x = ... = x = 0. (cid:3) 2 2 n Proof. of the Theorem 2. For all P,Q Γ , take x = pi in (27), λ = q and sum over all ∈ n qi i i i = 1,2,...,n we get the inequalities (22). Again, take υ = R x, ω = x r, a = r and b = R in (28), we get − − (R x)f(r)+(x r)f(R) (29) 0 6 − − f(x) R r − − (R x)(x r) 6 − − [f (R) f (r)] ′ ′ R r − − 1 6 (R r)(f (R) f (r)). ′ ′ 4 − − From the first part of the inequalities (29), we get (R x)f(r)+(x r)f(R) (30) f(x) 6 − − . R r − For all P,Q Γ , take x = pi in (29) and (30), multiply by q and sum over all ∈ n qi i i = 1,2,...,n, we get (31) 0 6 B (r,R) C (P Q) Cf − f || (R 1)(1 r)(f (R) f (r)) 6 − − ′ − ′ 6 A (r,R) R r Cf − and (32) 0 6 C (P Q) 6 B (r,R), f || Cf respectively. The expression (32) completes the l.h.s. of the inequalities (23). In order to prove r.h.s. of the inequalities (23), let us take x = 1 in (29) and use the fact that f(1) = 0, we get (R 1)(1 r)(f (R) f (r)) (33) 0 6 B (r,R) 6 − − ′ − ′ 6 A (r,R). Cf R r Cf − From (33), we conclude the r.h.s. of the inequalities (23). (cid:3) Remark 2. We observe that the inequalities (22) and (23) are the improvement over Dragomir’s [9, 10] work. From the inequalities (23) and (33) we observe that there is 8 INDER JEET TANEJA better bound for B (r,R) instead of A (r,R). From the r.h.s. of the inequalities (33), Cf Cf we conclude the following inequality among r and R: 1 (34) (R 1)(1 r) 6 (R r)2. − − 4 − Theorem 3. (Dragomir et al. [11, 12]). (i) Let P,Q Γ be such that 0 < r 6 pi 6 ∈ n qi R < , i 1,2,...,n , for some r and R with 0 < r 6 1 6 R < , r = R. Let ∞ ∀ ∈ { } ∞ 6 f : [0, ) R be a normalized mapping, i.e., f(1) = 0 such that f is locally absolutely ′ ∞ → continuous on [r,R] and there exists α, β satisfying (35) α 6 f (x) 6 β, x (r,R). ′′ ∀ ∈ Then 1 1 (36) C (P Q) E (P Q) 6 (β α)χ2(P Q) f || − 2 Cf || 8 − || (cid:12) (cid:12) (cid:12) (cid:12) and (cid:12) (cid:12) (cid:12) (cid:12) 1 (37) C (P Q) E (P Q) 6 (β α)χ2(P Q), f || − C∗f || 8 − || (cid:12) (cid:12) where E (P Q) is a(cid:12)s given by (24), χ2(P Q(cid:12)) is as given by (6) and Cf || (cid:12) ||(cid:12) n P +Q p +q i i (38) E (P Q) = 2E Q = (p q )f . C∗f || Cf 2 || i − i ′ 2q (cid:18) (cid:19) i=1 (cid:18) i (cid:19) X (ii) Additionally, if f : [r,R] R with f absolutely continuous on [r,R] and f ′′′ ′′′ → ∈ L [r,R], then ∞ 1 1 (39) C (P Q) E (P Q) 6 f χ 3(P Q) f || − 2 Cf || 12 k ′′′k | | || (cid:12) (cid:12) ∞ (cid:12) (cid:12) and (cid:12) (cid:12) (cid:12) (cid:12) 1 (40) C (P Q) E (P Q) 6 f χ 3(P Q), f || − C∗f || 24 k ′′′k | | || ∞ (cid:12) (cid:12) where (cid:12) (cid:12) (cid:12) (cid:12) n p q 3 (41) χ 3(P Q) = | i − i| | | || q2 i=1 i X and (42) f = ess sup f . ′′′ ′′′ k k | | ∞ x [r,R] ∈ Theorem 4. (Dragomir et al. [13]). Suppose f : [r,R] R is differentiable and f is ′ → R R of bounded variation, i.e., V(f ) = f (t) dt < . Let the constants r,R satisfy the r ′ r | ′′ | ∞ conditions: R (i) 0 < r < 1 < R < ; ∞ (ii) 0 < r 6 pi 6 R < , for i = 1,2,...,n. qi ∞ GENERALIZED DIVERGENCE MEASURES 9 Then 1 R (43) Cf(P||Q)− 2ECf(P||Q) 6 Vr (f′)V(P||Q) (cid:12) (cid:12) (cid:12) (cid:12) and (cid:12) (cid:12) (cid:12) (cid:12) 1 R (44) Cf(P||Q)−EC∗f(P||Q) 6 2 Vr (f′)V(P||Q), (cid:12) (cid:12) where (cid:12) (cid:12) (cid:12) (cid:12) n (45) V(P Q) = p q . i i || | − | i=1 X Remark 3. (i) If the third order derivative of f exists and let us suppose that it is either positive or negative. Thenthe function f is eithermonotonicallyincreasingor decreasing. ′′ In view of this we can write (46) β α = k(f)[f (R) f (r)], ′′ ′′ − − where 1, if f is monotonically decresing ′′ (47) k(f) = − . (1, if f′′ is monotonically increasing (ii) Let the function f(x) considered in the Theorem 3.4 be convex in (0, ), then ∞ f (x) > 0. This gives ′′ R R (48) V(f′) = f′′(t) dt r Zr | | R = f (t)dt = f (R) f (r) ′′ ′ ′ − Zr 4 = A (r,R). R r Cf − Under these considerations, the bounds (43) and (44) can be re-written as 1 4 (49) C (P Q) E (P Q) 6 A (r,R)V(P Q) f || − 2 Cf || R r Cf || (cid:12) (cid:12) − (cid:12) (cid:12) and (cid:12) (cid:12) (cid:12) (cid:12) 2 (50) C (P Q) E (P Q) 6 A (r,R)V(P Q). f || − C∗f || R r Cf || (cid:12) (cid:12) − Based on above(cid:12)remarks we can restate(cid:12)and combine the Theorems 3 and 4. (cid:12) (cid:12) Theorem 5. Let P,Q Γ be such that 0 < r 6 pi 6 R < , i 1,2,...,n , for some r and R with 0 < r < 1∈<nR < . Let f : R qRi be diffe∞ren∀tia∈ble{convex, n}ormalized, + ∞ → of bounded variation, and second derivative is monotonic with f absolutely continuous ′′′ on [r,R] and f L [r,R], then ′′′ ∈ ∞ 10 INDER JEET TANEJA 1 (51) C (P Q) E (P Q) f || − 2 Cf || (cid:12) (cid:12) (cid:12) (cid:12) 1 (cid:12)6 min k(f)[f (R) (cid:12)f (r)]χ2(P Q), (cid:12) ′′ (cid:12) ′′ 8 − || (cid:26) 1 f χ 3(P Q), [f (R) f (r)]V(P Q) , ′′′ ′ ′ 12 k k | | || − || ∞ (cid:27) and (52) C (P Q) E (P Q) f || − C∗f || (cid:12) (cid:12) 1 (cid:12)6 min k(f)[f (R) (cid:12) f (r)]χ2(P Q), (cid:12) ′′ (cid:12) ′′ 8 − || (cid:26) 1 1 f χ 3(P Q), [f (R) f (r)]V(P Q) , ′′′ ′ ′ 24 k k | | || 2 − || ∞ (cid:27) where k(f) is as given by (47). Remark 4. The measures (6), (41) and (45) are the particular cases of Vajda [34] χ m divergence given by | | − n p q m (53) χ m(P Q) = | i − i| , m > 1. | | || qm 1 i=1 i − X The above measure (53) [4] [11] satisfies the following properties: (1 r)(R 1) (54) χ m(P Q) 6 − − (1 r)m 1 +(R 1)m 1 − − | | || (R r) − − − R r m (cid:2) (cid:3) 6 − , m > 1 2 (cid:18) (cid:19) and 1 rm Rm 1 (55) − V(P Q) 6 χ m(P Q) 6 − V(P Q), m > 1. 1 r || | | || R 1 || (cid:18) − (cid:19) (cid:18) − (cid:19) Take m = 2,3 and 1, in (54), we get (R r)2 (56) χ2(P Q) 6 (R 1)(1 r) 6 − , || − − 4 1(R 1)(1 r) 1 (57) χ 3(P Q) 6 − − (1 r)2 +(R 1)2 6 (R r)3 | | || 2 R r − − 8 − − and (cid:2) (cid:3) 2(R 1)(1 r) 1 (58) V(P Q) 6 − − 6 (R r). || (R r) 2 − − respectively.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.