On Uncertainty and Information Properties Of Ranked Set Samples Mohammad Jafari Jozania,∗ and Jafar Ahmadib aUniversity of Manitoba, Department of Statistics, Winnipeg, MB, CANADA, R3T 2N2 3 bDepartment of Statistics, Ferdowsi University of Mashhad, P.O. Box 91775-1159, Mashhad, Iran 1 0 2 Abstract n a Ranked set sampling is a sampling design which has a wide range of applications in industrial statistics, J 8 andenvironmentalandecologicalstudies,etc.. ItiswellknownthatrankedsetsamplesprovidemoreFisher 1 information than simple random samples of the same size about the unknown parameters of the underlying ] distribution in parametric inferences. In this paper, we consider the uncertainty and information content of T ranked set samples in both perfect and imperfect ranking scenarios in terms of Shannon entropy, R´enyi and S . Kullback-Leibler (KL) information measures. It is proved that under these information measures, ranked h t set sampling design performs better than its simple random sampling counterpart of the same size. The a m information content is also a monotone function of the set size in ranked set sampling. Moreover, the effect [ of ranking error on the information content of the data is investigated. 1 v AMS 2010 Subject Classification: 62B10, 62D05. 2 9 Keywords: Ranked set sampling; Kullback-Leibler; Order statistics; Imperfect ranking; R´enyi Information; 2 Shannon entropy. 4 . 1 0 3 1. Introduction and Preliminaries 1 : v i X During the past few years, ranked set sampling has emerged as a powerful tool in statistical inference, and r a it is now regarded as a serious alternative to the commonly used simple random sampling design. Ranked set sampling and some of its variants have been applied successfully in different areas of applications such as industrial statistics, environmental and ecological studies, biostatistics and statistical genetics. The feature of ranked set sampling is that it combines simple random sampling with other sources of information such as professional knowledge, auxiliary information, judgement, etc., which are assumed to be inexpensive and easily obtained. This extra information helps to increase the chance that the collected sample yields more representative measurements (i.e., measurements that span the range of the value of the variable of interest in theunderlyingpopulation). Initsoriginalform, rankedsetsamplinginvolvesrandomlydrawingk units(called ∗Corresponding author. E-mail addresses: m jafari [email protected], (M. Jafari Jozani), [email protected] (J. Ahmadi). − − 1 a set of size k) from the underlying population for which an estimate of the unknown parameter of interest is required. The units of this set are ranked by means of an auxiliary variable or some other ranking process such as judgmental ranking. For this ranked set, the unit ranked lowest is chosen for actual measurement of the variable of interest. A second set of size k is then drawn and ranking carried out. The unit in the second lowest position is chosen and the variable of interest for this unit is quantified. Sampling is continued until, from the kth set, the kth ranked unit is measured. This entire process may be repeated m times (or cycles) to obtain a ranked set sample of size n = mk from the underlying population. Let X = {X ,i = 1,...,n} be a simple random sample (SRS) of size n ≥ 1 from a continuous distribution SRS i with probability distribution function (pdf) f(x). Let F(x) denote the cumulative distribution function (cdf) of the random variable X and define F¯(x) = 1−F(x) as the survival function of X with support S . Also X assume that X = {X ,i = 1,...,k,j = 1,...,m} denotes a ranked set sample (RSS) of size n = mk from RSS (i)j f(x) where k is the set size and m is the cycle size. Here X is the ith order statistic in a set of size k (i)j obtained in cycle j with pdf k! f (x) = F(i−1)(x)F¯(k−i)(x)f(x), x ∈ S . (i) (i−1)!(k−i)! X When ranking is imperfect we use X∗ = {X ,i = 1,...,k,j = 1,...,m} to denote an imperfect RSS of size RSS [i]j n = mk from f(x). We also use f (x) to show the pdf of the judgemental order statistic X which is given [i] [i] by n (cid:88) f (x) = p f (x), (1) [i] i,r (r) r=1 where p = P(X = X ) denotes the probability with which the rth order statistic is judged as having rank i,r [i] (r) i with (cid:80)k p = (cid:80)k p = 1. Readers are referred to Wolfe (2004, 2010), Chen et al. (2004) and references i=1 i,r r=1 i,r therein for further details. The Fisher information plays a central role in statistical inference and information theoretic studies. It is well known that RSS provides more Fisher information than SRS of the same size about the unknown parameters of the underlying distribution in parametric inferences (e.g., Chen, 2000, Chapter 3). Park and Lim (2012) studied the effect of imperfect rankings on the amount of Fisher information in ranked set samples. Frey (2013) showed by example that the Fisher information in an imperfect ranked set sample may be higher than the Fisher information in a perfect ranked-set sample. The concept of information is so rich that there is no single definition that will be able to quantify the information content of a sample properly. For example, from an engineering perspective, the Shannon entropy or the R´enyi information might be more suitable to be used as measures to quantify the information content of a sample than the Fisher information. In this paper, we study the notions of uncertainty and information content of RSS data in both perfect and imperfect ranking scenarios under the Shannon entropy, R´enyi and Kullback-Leibler (KL) information measures and compare them with their counterparts with SRS data. These measures are increasingly being used in various contexts such as order statistics by Wong and Chen (1990) and Park (1995), Ebrahimi et al. (2004), Bratpour et al. (2007a, b), censoreddatabyAbo-Eleneen, (2011), recorddataandreliabilityandlifetestingcontextbyRaqab and Awad (2000, 2001), Zahedi and Shakil (2006), Ahmadi and Fashandi (2008) and in testing hypothesis by Park (2005), Balakrishnan et al. (2007) and Habibi Rad et al. (2011). So, it would be of interest to use these 2 measures to calculate the information content of RSS data and compare them with their counterparts with SRS data. To this end, in Section 2, we obtain the Shannon entropies of RSS and SRS data of the same size. We show that the difference between the Shannon entropy of X and X is distribution free and it is a monotone RSS SRS function of the set size in ranked set sampling. In Section 3, similar results are obtained under the R´enyi information. Section 4 is devoted to the Kullback-Leibler information of RSS data and its comparison with its counterpart under SRS data. We show that the Kullback-Leibler information between the distribution of X and distribution of X is distribution-free and increases as the set size increases. Finally, in Section SRS RSS 5, we provide some concluding remarks. 2. Shannon Entropy of Ranked Set Samples The Shannon entropy or simply the entropy of a continuous random variable X is defined by (cid:90) H(X) = − f(x)logf(x)dx, (2) provided the integral exists. The Shannon entropy is extensively used in the literature as a quantitative measure of uncertainty associated with a random phenomena. The development of the idea of the entropy by Shannon (1948) initiated a separate branch of learning named the “Theory of Information”. The Shannon entropyprovidesanexcellenttooltoquantifytheamountofinformation(oruncertainty)containedinasample regarding its parent distribution. Indeed, the amount of information which we get when we observe the result on a random experiment can be taken to be equal to the amount of uncertainty concerning the outcome of the experiment before carrying it out. In practice, smaller values of the Shannon entropy are more desirable. We refer the reader to Cover and Thomas (1991) an references therein for more details. In this section, we compare the Shannon entropy of SRS data with its counterparts under both perfect and imperfect RSS data of the same size. Without loss of generality, we take m = 1 throughout the paper. From (2), the Shannon entropy of X is given by SRS n (cid:90) (cid:88) H(X ) = − f(x )logf(x )dx = nH(X ). SRS i i i 1 i=1 Under the perfect ranking assumption, it is easy to see that n (cid:90) n (cid:88) (cid:88) H(X ) = − f (x)logf (x)dx = H(X ), (3) RSS (i) (i) (i) i=1 i=1 where H(X ) is the entropy of the ith order statistic in a sample of size n. Ebrahimi et al. (2004) explored (i) some properties of the Shannon entropy of the usual order statistics (see also, Park, 1995; Wong and Chen, 1990). Using (2) and the transformation X = F−1(U ) it is easy to prove the following representations for (i) (i) the Shannon entropy of order statistics (see, Ebrahimi et al. 2004, page 177): H(X ) = H(U )−E(cid:2)log[f(F−1(W ))](cid:3), (4) (i) (i) i 3 where W has the beta distribution with parameters i and n−i+1 and U stands for the ith order statistic i (i) of a random sample of size n from the Uniform(0,1) distribution. In the following result, we show that the Shannon entropy of RSS data is smaller than its SRS counterpart when ranking is perfect. Lemma 1. H(X ) ≤ H(X ) for all set size n ∈ N and the equality holds when n = 1. RSS SRS Proof. To show the result we use the fact that f(x) = 1 (cid:80)n f (x) (see Chen et al., 2004). Using the n i=1 (i) convexity of g(t) = tlogt as a function of t > 0, we have n (cid:32) n (cid:33)(cid:32) n (cid:33) 1 (cid:88) 1 (cid:88) 1 (cid:88) f (x)logf (x) ≥ f (x) log f (x) = f(x)logf(x). (5) n (i) (i) n (i) n (i) i=1 i=1 i=1 Now, the result follows by the use of (3) and (5). In the sequel, we quantify the difference between H(X ) and H(X ). To this end, by (4), we first get RSS SRS n (cid:90) n (cid:88) (cid:88) H(X ) = H(U )− f (x)logf(x)dx RSS (i) (i) i=1 i=1 n (cid:88) = H(U )+H(X ). (i) SRS i=1 Note that since H(X ) ≤ H(X ) we must have (cid:80)n H(U ) ≤ 0, for all n ∈ N. Also, H(X ) − RSS SRS i=1 (i) RSS H(X ) = (cid:80)n H(U(i)) is distribution-free (doesn’t depend on the parent distribution). Ebrahimi et al. SRS i=1 (2004) obtained an expression for H(U ) which is given by (i) H(U ) = logB(i,n−i+1)−(i−1)[ψ(i)−ψ(n+1)]−(n−i)[ψ(n−i+1)−ψ(n+1)], (i) where ψ(z) = d logΓ(z) is the digamma function and B(a,b) stands for the complete beta function. Hence, dz we have n−1 n (cid:88) (cid:88) H(X )−H(X ) = 2 (n−2j)logj −nlogn−2 (i−1)ψ(i)+n(n−1)ψ(n+1) RSS SRS j=1 i=1 = k(n), say. By noting that ψ(n + 1) = ψ(n) + 1/n, for n ≥ 2, we can easily find the following recursive formula for calculating k(n): k(n+1) = k(n)+n+logΓ(n)−(n+1)log(n+1). Table 1 shows the numerical values of H(X )−H(X ) for n ∈ {2,...,10}. From Table 1, it is observed RSS SRS that the difference between the Shannon entropy of RSS data and its SRS counterpart increases as the set size increases. However, intuitively, this can be explained by the fact that ranked set sampling provides more structure to the observed data than simple random sampling and the amount of the uncertainty in the more structured RSS data set is less than that of SRS. 4 Table 1: The numerical values of k(n) for n=2 up to 10. n 2 3 4 5 6 7 8 9 10 k(n) -0.386 -0.989 -1.742 -2.611 -3.574 -4.616 -5.727 -6.897 -8.121 Now, assume that X∗ = {X ,i = 1,...,n} is an imperfect RSS of size n from f(x). Similar to the perfect RSS [i] RSS we can easily show that n (cid:88) H(X∗ ) = H(X ), (6) RSS [i] i=1 (cid:82) where we assume that the cycle size is equal to one and k = n. Also H(X ) = − f (x)logf (x)dx, or [i] [i] [i] equivalently (cid:90) (cid:32) n (cid:33) (cid:32) n (cid:33) (cid:88) (cid:88) H(X ) = − p f (x) log p f (x) dx. [i] i,r (r) i,r (r) r=1 r=1 Again, using the convexity of g(t) = tlogt and the equalities (cid:80)n p = (cid:80)n p = 1, we find r=1 i,r i=1 i,r n (cid:88) H(X∗ ) = H(X ) RSS [i] i=1 (cid:90) (cid:32) n n (cid:33) (cid:32) n n (cid:33) 1 (cid:88) (cid:88) 1 (cid:88) (cid:88) ≤ −n ( p )f (x) log ( p )f (x) dx n i,r (r) n i,r (r) r=1 i=1 r=1 i=1 (cid:90) = −n f(x)logf(x)dx = H(X ). SRS So, we have the following result. Lemma 2. H(X∗ ) ≤ H(X ) for all set size n ∈ N and the equality holds when the ranking is done RSS SRS randomly and p = 1, for all i,r ∈ {1,...,n}. i,r n In the following result we compare the Shannon entropies of perfect and imperfect RSS data. We observe that the Shannon entropy of X is less than the Shannon entropy of X∗ . RSS RSS Lemma 3. H(X ) ≤ H(X∗ ) for all set size n ∈ N and the equality happens when the ranking is perfect. RSS RSS Proof. Using the inequality f (x) logf (x) ≤ (cid:80)n p f (x)logf (x), we have [i] [i] r=1 i,r (r) (r) n (cid:90) n (cid:88) (cid:88) H(X ) ≥ − p f (x)logf (x)dx = p H(X ). [i] i,r (r) (r) i,r (r) r=1 r=1 Now, the result follows from (6) upon changing the order of summations and using (cid:80)n p = 1. i=1 i,r Summing up, we find the following ordering relationship among the Shannon entropies of X∗ , X and RSS RSS X : SRS H(X ) ≤ H(X∗ ) ≤ H(X ). RSS RSS SRS 5 Example 1. Suppose X has an exponential distribution with pdf f(x) = λe−λx, x > 0, where λ > 0 is the unknown parameter of interest. We consider the case where n = 2. For an imperfect RSS of size n = 2, we use the ranking error probability matrix (cid:34) (cid:35) p p 1,1 1,2 P = . p p 2,1 2,2 Using (1), we have f (x) = 2λe−λx(cid:2)(p −p )e−λx+p (cid:3), i = 1,2. Straightforward calculations show that [i] i,1 i,2 i,2 H(X ) = 2−2logλ, H(X ) = 3−2log(2λ), and SRS RSS H(X∗ ) = 2−2log(2λ)+(p −p )+η(p )+η(p ), RSS 2,2 1,1 1,1 2,2 where η(a) = 2 (cid:90) 1−aulogu du = 1 + 1 (cid:2)(1−a)2log(1−a)−a2loga(cid:3), 1−2a 2 1−2a a with 0 < a < 1. It is easy to show that H(X )−H(X ) = 1−2log2 ≈ −0.3863 < 0, RSS SRS H(X∗ )−H(X ) = η(p )+η(1−p ) = 2η(p )−2log2 < 0, RSS SRS 1,2 1,2 1,2 H(X )−H(X∗ ) = 1−η(p )−η(1−p ) = 1−2η(p ) < 0. RSS RSS 1,2 1,2 1,2 Figure 1 shows the differences between H(X∗ ) and H(X ) with H(X ). It also presents the effect of RSS RSS SRS ranking error on the amount of the Shannon entropy of the resulting RSS data by comparing H(X∗ ) with RSS H(X ). It is observed that, the maximum difference occurs for p = 0.5. RSS 1,2 0 0. H(XR*SS)- H(XRSS) H(XRSS)- H(XSRS) 0.2 H(XR*SS)- H(XSRS) − 4 0. − 6 0. − 8 0. − 0.0 0.2 0.4 0.6 0.8 1.0 p1,2 Figure 1: ComputedvaluesofthedifferencebetweentheShannonentropiesofX andX∗ comparedwiththatofX of RSS RSS SRS the same size as a function of the ranking error probability p . 1,2 6 3. R´enyi Information of Ranked Set Samples A more general measure of entropy with the same meaning and similar properties as that of Shannon entropy has been defined by R´enyi (1961) as follows (cid:90) H (X) = 1 log fα(x)dν(x) = 1 logE(cid:2)fα−1(X)(cid:3), (7) α 1−α 1−α where α > 0, α (cid:54)= 1 and dν(x) = dx for the continuous and dν(x) = 1 for discrete cases. It is well known that (cid:90) lim H (X) = − f(x)logf(x)dx = H(X). α α→1 R´enyi information is much more flexible than the Shannon entropy due to the parameter α. It is an important measure in various applied sciences such as statistics, ecology, engineering, economics, etc. In this section, we obtain the R´enyi information of X and X∗ and compare them with the R´enyi information of X . To RSS RSS SRS this end, from (7), it is easy to show that the R´enyi information of a SRS of size n from f is given by n (cid:88) H (X ) = H (X ) = nH (X ). (8) α SRS α i α 1 i=1 Also, for a RSS of size n, we have n (cid:88) H (X ) = H (X ). (9) α RSS α (i) i=1 To compare H (X ) with H (X ) and H (X∗ ), we consider two cases, i.e. 0 < α < 1 and α > 1. α SRS α RSS α RSS First, we find the results for 0 < α < 1 which are stated in the next lemma. Lemma 4. For any 0 < α < 1 and all n ∈ N, we have H (X ) ≤ H (X∗ ) ≤ H (X ). α RSS α RSS α SRS Proof. We first show that for any 0 < α < 1, H (X ) ≤ H (X∗ ). To this end, using α RSS α RSS α n (cid:90) n 1 (cid:88) (cid:88) Hα(X∗RSS) = 1−α log pi,jf(j)(x) dx, (10) i=1 j=1 and concavity of h (t) = tα, for 0 < α < 1, t > 0, we have 1 n (cid:90) n 1 (cid:88) (cid:88) H (X∗ ) ≥ log p fα (x)dx α RSS 1−α i,j (j) i=1 j=1 n n (cid:90) 1 (cid:88)(cid:88) ≥ p log fα (x)dx 1−α i,j (j) i=1 j=1 n (cid:90) 1 (cid:88) = log fα (x)dx = H (X ), 1−α (j) α RSS j=1 where the second inequality is obtained by using the concavity of h (t) = logt, for t > 0. This, with (8), shows 2 the result. To complete the proof we show that H (X∗ ) ≤ H (X ) for any 0 < α < 1 and all n ∈ N. To α RSS α SRS 7 this end, from (10), and using f(x) = 1 (cid:80)n f (x), we have n i=1 [i] n (cid:90) 1 (cid:88) H (X∗ ) = log fα(x)dx α RSS 1−α [i] i=1 n (cid:90) n (cid:88) 1 ≤ log fα(x)dx 1−α n [i] i=1 (cid:90) (cid:32) n (cid:33)α n 1 (cid:88) ≤ log f (x) dx 1−α n [i] i=1 (cid:90) n = log fα(x)dx = H(X ). SRS 1−α InLemma4, wewereabletoshowanalyticallyanorderingrelationshipamongtheR´enyiinformationofX∗ , RSS X and X when 0 < α < 1. It would naturally be of interest to extend such a relationship to the case RSS SRS where α > 1. It appears that similar relationship as in Lemma 4 holds when α > 1. However, we have not analytical proof here. Conjecture 1. For any α > 1 and all n ∈ N, we have H (X ) ≤ H (X∗ ) ≤ H (X ). α RSS α RSS α SRS In Example 2 we compare the R´enyi information of X∗ , X and X as a function of α in the case of RSS RSS SRS an exponential distribution. The results are presented in Figure 2, which do support Conjecture 1. Example 2. Suppose the assumptions of Example 1 hold, then the R´enyi information of a SRS of size n = 2 is given by 2 H (X ) = −2logλ− logα, α (cid:54)= 1. α SRS 1−α Straightforward calculations show that 1 H (X ) = −logλ−log2− logα, α (1) 1−α (cid:26) (cid:27) α 1 Γ(α+1)Γ(α) H (X ) = −logλ+ log2+ log , α (2) 1−α 1−α Γ(2α+1) and so the R´enyi information of X is given by H (X ) = H (X )+H (X ). Now, RSS α RSS α (1) α (2) (cid:26) (cid:27) α 1 Γ(α+1)Γ(α) H (X )−H (X ) = (1−log2)+ log . α RSS α SRS 1−α 1−α Γ(2α+1) To obtain H (X∗ ), let α RSS U (x,t) = a (t)e−λx+b (t), i = 1,2, i,λ i i where a (t) = (−1)i(1−2t) and b (t) = t(1−i)(1−t)(2−i), i i and p = P(X = X ) is defined in Example 1. Now, the R´enyi information of X∗ is obtained as follows 1,1 (1) [1] RSS α 1 (cid:88)2 (cid:90) ∞ H (X∗ ) = log2λ+ log e−αλxUα (x,p )dx, α (cid:54)= 1, α RSS 1−α 1−α i,λ 1,1 0 i=1 8 which can be calculated numerically. Figure 2(a) shows the values of H (X∗ )−H (X ) as a function of α RSS α SRS α for p ∈ {0.8,0.9,0.95,1}. When p = 1, H (X∗ )−H (X ) = H (X )−H (X ). In Figure 1,1 1,1 α RSS α SRS α RSS α SRS 2(b) we show the effect of the ranking error on the R´enyi information of X by comparing H (X∗ ) and RSS α RSS H (X ) as functions of α for different values of p . α RSS 1,1 (a) (b) 0 7 0. 0. p1,1=1 −0.1 0.6 pp11,,11==00..995 −0.2 0.5 p1,1=0.8 −0.3 0.4 −0.4 0.3 −0.5 0.2 p1,1=1 −0.6 pp11,,11==00..995 0.1 −0.7 p1,1=0.8 0.0 0 1 2 3 4 0 1 2 3 4 a a Figure 2: Comparison of the R´enyi information of X and X∗ with that of X as a function of α. The value of RSS RSS SRS H (X∗ )−H (X ) are presented in (a) while H (X∗ )−H (X ) are given in (b). α RSS α SRS α RSS α RSS Note that for α > 1 the difference between the R´enyi information of X with its counterpart under SRS RSS can be written as follows n (cid:90) (cid:90) 1 (cid:88) n H (X )−H (X ) = log fα (x)dx− log fα(x)dx α RSS α SRS 1−α (i) 1−α i=1 1 (cid:88)n (cid:32)(cid:82) f(αi)(x)dx(cid:33) = log (cid:82) 1−α fα(x)dx i=1 n = α nlogn+ 1 (cid:88)logE(cid:2){P (T = i−1)}α(cid:3), (11) 1−α 1−α F(W) i=1 where T|W = w ∼ Bin(n−1,F(w)) and W has a density proportional to fα(w), i.e. g(w) = fα(w) . Since (cid:82) fα(w)dw P (T = i−1) ≤ 1 for all i = 1,...,n−1 and fixed w, we have F(w) logE(cid:2){P (T = i−1)}α(cid:3) ≤ 0, F(W) for all α > 1. This results in a lower bound for the difference between the R´enyi information of X and RSS X as H (X ) − H (X ) ≥ α nlogn. In the following result, we find a sharper lower bound for SRS α RSS α SRS 1−α H (X )−H (X ) when α > 1. α RSS α SRS Lemma 5. For any α > 1 and all n ≥ 2, we have H (X )−H (X ) ≥ Ψ(α,n), with α RSS α SRS α (cid:88)n (cid:40) (cid:18)n−1(cid:19)(cid:18)i−1(cid:19)i−1(cid:18)n−i(cid:19)n−i(cid:41) Ψ(α,n) = log n , 1−α i−1 n−1 n−1 i=1 9 (cid:104) (cid:17) where Ψ(α,n) ∈ nα logn,0 . 1−α Proof. Using (9), the pdf of X and the transformation F(X) = U, we have (i) H (X ) = 1 (cid:88)n log(cid:90) 1(cid:8)f∗ (u)(cid:9)αfα−1(F−1(u))du, α RSS 1−α i,n−i+1 0 i=1 where f∗ (u) is the pdf of a Beta(i,n−i+1) random variable with its mode at u∗ = i−1. Now, since i,n−i+1 n−1 f∗ (u) ≤ f∗ (i−1), we have i,n−i+1 i,n−i+1 n−1 α (cid:88)n (cid:26) (cid:18)n−1(cid:19) i−1 n−i (cid:27) n (cid:90) 1 H (X ) ≥ log n ( )i−1( )n−i + log fα−1(F−1(u))du α RSS 1−α i−1 n−1 n−1 1−α 0 i=1 = Ψ(α,n)+H(X ), SRS where n (cid:26) (cid:18) (cid:19) (cid:27) α (cid:88) n−1 i−1 n−i Ψ(α,n) = log n ( )i−1( )n−i . (12) 1−α i−1 n−1 n−1 i=1 It is easy to show that for all n ∈ N and α > 1, Ψ(α,n) < 0. To do this, one can easily check that Ψ(α,n+1) ≤ Ψ(α,n) for all n ≥ 2 with Ψ(α,2) = 2α < 0. Also, since (cid:0)n−1(cid:1)(i−1)i−1(n−i)n−i ≤ 1 for all 1−α i−1 n−1 n−1 i = 1,...,n, we have Ψ(α,n) ≥ α (cid:80)n logn = nα logn. 1−α i=1 1−α 4. Kullback-Leibler Information of Ranked Set Samples In 1951 Kullback and Leiber introduced a measure of information from the statistical point of view by compar- ing two probability distributions associated with the same experiment. The Kullback-Leibler (KL) divergence isameasureofhowdifferenttwoprobabilitydistributions(overthesamesamplespace)are. TheKLdivergence for two random variables X and Y with cdfs F and G and pdfs f and g, respectively, is given by (cid:90) (cid:18) (cid:19) f(t) K(X,Y) = f(t)log dt. (13) g(t) Using the same idea, we define the KL discrimination information between X and X as follows: RSS SRS (cid:90) (cid:18) (cid:19) f(x ) SRS K(X ,X ) = f(x )log dx . SRS RSS SRS SRS f(x ) Xn RSS It is easy to see that n (cid:90) (cid:18) (cid:19) n (cid:88) f(x) (cid:88) K(X ,X ) = f(x)log dx = K(X,X ). (14) SRS RSS f (x) (i) (i) i=1 i=1 10