ebook img

Information Theory based on Non-additive Information Content PDF

9 Pages·0.16 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Information Theory based on Non-additive Information Content

Information Theory based on Non-additive Information Content Takuya Yamano Department of Applied Physics, Faculty of Science, Tokyo Institute of Technology, Oh-okayama, Meguro-ku, Tokyo,152-8551, Japan 1 Wegeneralize theShannon’sinformation theory in anonadditiveway byfocusing on thesource 0 coding theorem. The nonadditiveinformation contentweadopted is consistent with theconcept of 0 theforminvariancestructureofthenonextensiveentropy. Somegeneralpropertiesofthenonadditive 2 information entropy are studied, in addition, the relation between the nonadditivity q and the n codeword length is pointed out. a J 05.20.y, 89.70.+c 7 1 ] h I. INTRODUCTION c e The intuitive notion of what a quantitative expression for information should be has been addressed in the devel- m opment of transmissionofinformation which led to the informationtheory(IT). The IT today is consideredto be the - most fundamental field which connects other various fields such as physics (thermodynamics), electrical engineering t a (communicationtheory),mathematics(probabilitytheoryandstatistics),computerscience(Kolmogorovcomplexity) t s and so on [1]. Accordingly, the selection of the information measure becomes a influential matter. The introduction t. of logarithmic form of information measure dates back to Hartley [2]. He defined the practical measure of informa- a tion as the logarithm of the number of possible symbol sequences. After that, Shannon established the logarithmic m basedIT fromthe reasons: (a)practicalusefulness,(b) closenessto our intuitive feeling, (c)easinessof mathematical - manipulation [3,4]. d n o On the other hand, however,non-logarithmic form of (or nonextensive) entropy is currently considered as a useful c measurein describing thermostatisticalproperties ofa certainclassof physicalsystems which entaillong-rangeinter- [ actions, long-time memories and (multi)fractal structures. The form of the nonextensive entropy proposed by Tsallis 2 [5] has been intensively applied to many such systems [6]. The reason why the formalism violating the extensivity of v the statistical mechanical entropy seems to be essential for convincing description of these systems is not sufficiently 4 revealed in the present status. Nevertheless the successful application to some physical systems seems to lead us to 7 investigate into the possibility of the nonadditive IT since Shannon’s information entropy has the same form as the 0 logarithmic statistical mechanical entropy. 0 1 It is desirable to employ the nonadditive information content which the associated IT contains Shannon’s IT in a 0 0 special case. The concept of the form invariance to the structure of nonextensive entropies was consideredto provide / a guiding principle for a clear basis for generalizations of logarithmic entropy [7]. This structure seems to give a hint t a at the selection of the nonadditive information content. The form invariant structure require normalization of the m original Tsallis entropy by pq, where p is a probability of event i and q is a real number residing in the interval i i i - (0,1) from the preservation of concavity of the entropy. In addition, Kullback-Leibler (KL) relative entropy which d measures distance betweenPtwo probability distribution is also modified [7]. n o This paper explores consequences of adopting nonadditive information content in the sense that the associatedav- c : erageinformationi.eentropytakesaformofthemodifiedTsallisentropy. TheuseofmodifiedformofTsallisentropy v is in conformity with the appropriate definitions of expectation value (the normalized q-expectation value [8]) of the i X nonadditive information content. Since the information theoretical entropy is defined as the average of information content, it is desirable to unify the meaning of the average as the normalized q-expectation value throughout the r a nonadditive IT. Moreover we shall later see how the Shannon’s additive IT is extended to the nonadditive one by addressing the source coding theorem which is the one of the most fundamental theorem in IT. Theorganizationofthispaperisasfollows. InSec.II,wepresentthemathematicalpreliminariesofthenonadditive entropyandthe generalizedKLentropy. Sec.III addressesanoptimalcodewordwithinthe frameworkofnonadditive context. Weshallattempttogiveapossiblemeaningofnonadditiveindexqintermsofcodewordlengththere. Sec.IV deals with the extension of the Fano’s inequality which gives upper bound on the conditional entropy with an error probability in a channel. In the last section, we devote to concluding remarks. 1 II. NONADDITIVE ENTROPY AND THE GENERALIZED KL ENTROPY A. Properties of nonadditive entropy of information For a discrete set of states with probability mass function p(x), where x belongs to alphabet H, we consider the following nonadditive information content I (p), q I (p)≡−ln p(x), (1) q q where ln x is a q−logarithm function defined as ln x = (x1−q −1)/(1−q). In the limit q → 1, ln x recovers the q q q standardlogarithmlnx. InShannon’sadditiveIT,aninformationcontentisexpressedas−lnp(x)innat unit,which is monotonically decreasing function with respect to p(x). This behavior matches our intuition in that we get more informationinthecasetheleast-probableeventoccursandlessinformationinthecasetheeventwithhighprobability occurs. It is worthnoting that this propertyis qualitativelyvalid for nonadditive informationcontentfor all q except the factthat there existsupper limit 1/(1−q)atp(x)=0. Thereforethe Shannon’s reason(b) wereferredinSec.Iis considered to be no crucial element for determining the logarithmic form. Moreover, it is easy to see that the Renyi information of order q [9], HR =ln pq(x)/(1−q), which is additive informationmeasure can be written with this q x∈H X nonadditive information content as q ln [1−(1−q)Iq(p(x))]1−q HR = xX∈H . (2) q 1−q The entropy H (X) of a discrete randomvariableX is defined as an averageof the informationcontent, where the q average means the normalized q-expectation value [8], − pq(x)ln p(x) 1− pq(x) q x∈H x∈H H (X)= X = X , (3) q pq(x) (q−1) pq(x) x∈H x∈H X X where we have used the normalization of probability p(x) = 1. In a similar way, we define the nonadditive x∈H conditional information content and the joint one as follows P p1−q(y |x)−1 I (y |x)= , (4) q q−1 p1−q(x,y)−1 I (x,y)= , (5) q q−1 where y belongs to a different alphabet Y. Corresponding entropy conditioned by x and the joint entropy of X and Y becomes pq(y |x)I (y |x) 1− pq(y |x) q y∈Y y∈Y H (Y |x)= X = X (6) q pq(y |x) (q−1) pq(y |x) y∈Y y∈Y X X and pq(x,y)I (x,y) 1− pq(x,y) q x∈H,y∈Y x∈H,y∈Y H (X,Y)= X = X , (7) q pq(x,y) (q−1) pq(x,y) x∈H,y∈Y x∈H,y∈Y X X respectively. Then we have the following theorem. Theorem 1. The joint entropy satisfies 2 H (X,Y)=H (X)+H (Y |X)+(q−1)H (X)H (Y |X). (8) q q q q q Proof. From Eq(7) we can rewrite H (X,Y) with the relation p(x,y)=p(x)p(y |x) as q 1 1 Hq(X,Y)=  −1. (9) q−1 pq(x) pq(y |x)   xX∈H yX∈Y    Since Eq(6) gives pq(y |x)=[1+(q−1)H (Y |x)]−1, we get q y∈Y X 1 1 H (X,Y)=  −1. (10) q q−1 pq(x)    1+(q−1)Hq(Y |x)  x∈H  X   Here, we introduce the following definition [10] 1 1 h i(X) ≡ (11) 1+(q−1)H (Y |x) q 1+(q−1)H (Y |X) q q where the bracket denotes the normalized q-expectation value with respect to p(x). Then we have from Eq(10), 1 1+(q−1)H (Y |X) q Hq(X,Y)=  −1. (12) q−1 pq(x)    xX∈H    Putting pq(x)=[1+(q−1)H (X)]−1 into this yields the theorem.2 q x∈H X This theorem has a remarkable similarity to the relation with which the Jackson basic number in q-deformation theory satisfies, which was pointed out in Ref. [7]. That is, [X] ≡(qX −1)/(q−1) is the Jacksonbasic number of a q quantity X. Then, for the sum of two quantities X and Y, the assosiatedbasic number [X+Y] is shownto become q [X] +[Y] +(q−1)[X] [Y] . Obviously this theorem recovers ordinary relation H(X,Y) = H(X)+H(Y | X) in q q q q the limit q → 1. In this modified Tsallis formalism, q appears as q−1 instead of as 1−q [11,12]. When X and Y are independent events each other, Eq(8) gives the pseudoadditivity relation [8]. However, it is converse to the case of the original Tsallis one in that q > 1 yields superadditivity and q < 1 subadditivity. It is worth mentioning that theconceptofnonextensiveconditionalentropyintheframeworkofthe originalTsallisentropyhasfirstlyintroduced for discussing quantum entanglement in Ref. [11]. From this theorem, we immediately have the following corollary concerning the equivocation. Corollary. H (Y,Z |X)−H (Z |Y,X)+(q−1){H (X)H (Y,Z |X)−H (X,Y)H (Z |Y,X)} q q q q q q H (Y |X)= (13) q 1+(q−1)H (X) q Proof. In Eq(8), when we see Y as Y,Z we have H (X,Y,Z)=H (X)+H (Y,Z |X)+(q−1)H (X)H (Y,Z |X), (14) q q q q q on the other hand, when we regard X as Y,X and Y as Z, we get H (X,Y,Z)=H (X,Y)+H (Y,Z |X)+(q−1)H (X,Y)H (Z |Y,X). (15) q q q q q SubtractingthebothsidesoftheabovetwoequationsandarrangingwithrespecttoH (Y |X)withEq(8),weobtain q the corollary. 2 3 Remarks: In the additive limit(q →1), we recover the relation H(Y |X)=H(Y,Z |X)−H(Z |Y,X). Moreover,with the help of Eq(13), we have the following theorem. Theorem 2. Hierarchicalstructure of entropy H q The joint entropy of n random variables X ,X ,...,X satisfies 1 2 n n H (X ,X ,···,X )= [1+(q−1)H (X ,···,X )]H (X |X ,···,X ). (16) q 1 2 n q i−1 1 q i i−1 1 i=1 X Proof. From Eq(8), H (X ,X )=H (X )+[1+(q−1)H (X )]H (X |X ) holds. Next, from Eq(14), we have q 1 2 q 1 q 1 q 2 1 H (X ,X ,X )=H (X )+H (X ,X |X )+(q−1)H (X )H (X ,X |X ). (17) q 1 2 3 q 1 q 2 3 1 q 1 q 2 3 1 Since H (X ,X |X ) is written using Eq(13) as q 2 3 1 1+(q−1)H (X ,X ) q 1 2 H (X ,X |X )=H (X |X )+ H (X |X ,X ), (18) q 2 3 1 q 2 1 q 3 2 1 1+(q−1)H (X ) q 1 Eq(17) can be rewritten as H (X ,X ,X )=H (X )+[1+(q−1)H (X )]H (X |X )+[1+(q−1)H (X ,X )]H (X |X ,X ). (19) q 1 2 3 q 1 q 1 q 2 1 q 1 2 q 3 2 1 Similarly, repeating application of the corollary gives the theorem. 2 n Remark: Intheadditivelimit,wegetH(X ,X ,···,X )= H(X |X ,···,X )whichstatesthattheentropy 1 2 n i=1 i i−1 1 of n variables is constituted by the sum of the conditional entropies (Chain rule). P FromthisrelationEq(16),weneedalljointentropybelowthe levelofnrandomvariablestoacquirethe jointentropy H (X ,...,X ), which the situation is similar to the BBGKY hierarchy in the N-body distribution function. Let us q 1 n next define the mutual information I (Y;X), which quantifies the amount of information that can be gained from q one event X about another event Y, H (X)+H (Y)−H (X,Y)+(q−1)H (X)H (Y) q q q q q I (Y;X)≡H (Y)−H (Y |X)= . (20) q q q 1+(q−1)H (X) q ThereforeI (Y;X)expressesthereductionintheuncertaintyofY duetotheacquisitionofknowledgeofX. Herewe q postulatethatthemutualinformationinnonadditivecaseisnon-negative. Thenon-negativitymaybeconsideredtobe arequirementratherthantheonetobeprovedinordertobeinconsistentwiththeusualadditivemutualinformation. I (Y;X) also converges to the usual mutual information I(Y;X)= H(Y)−H(Y |X)= H(X)+H(Y)−H(X,Y) q in the additive case(q → 1). We note that the mutual information of a random variable with itself is the entropy itself I (X;X)=H (X). When X and Y are independent variables, we have I (Y;X)=0 [13]. Then, we have the q q q following theorem. Theorem 3. Independence bound on entropy H q n H (X ,X ,...,X )≤ [1+(q−1)H (X ,...,X )]H (X ) (21) q 1 2 n q i−1 1 q i i=1 X with equality if and only if each X is independent. i Proof. From the assumption of I (X;Y)≥0 introduced above, we have q n n H (X |X ,...,X )≤ H (X ) (22) q i i−1 1 q i i=1 i=1 X X with equality if and only if each X is independent of X ,...,X . Then the theorem holds from the previous i i−1 1 theorem Eq(16). 2 4 B. The generalized KL entropy TheKLentropyortherelativeentropyisameasureofthedistancebetweentwoprobabilitydistributionsp (x)and i p′(x). Here, we define it as the normalized q-expectation value of the change of the nonadditive information content i ∆I ≡I (p′(x))−I (p(x)) [14], q q q pq(x)∆I pq(x)(ln p(x)−ln p′(x)) q q q D (p(x)kp′(x))≡ xX∈H = xX∈H . (23) q pq(x) pq(x) x∈H x∈H X X We note that the above generalized KL entropy satisfies the form invariant structure which has introduced in Ref. [7,15]. We review the positivity of the generalized KL entropy in case of q > 0 which can be considered to be a necessary property to develop the IT. Theorem 4. Information inequality For q >0, we have D (p(x)kp′(x))≥0 (24) q with equality if and only if p(x)=p′(x) for all x∈H. Proof. The outline of the proof is the same as the one in Ref. [16,17] except for the factor pq(x). From the definition x∈H Eq(23), P 1 p′(x) 1−q D (p(x)kp′(x))= p(x) 1− pq(x) (25) q 1−q p(x) xX∈H ( (cid:18) (cid:19) ).xX∈H 1−q 1 p′(x) ≥ 1− p(x) pq(x)=0 (26) 1−q  p(x)  !  xX∈H .xX∈H where Jensen’s inequality forthe convexfunction has been used: p(x)f(x)≥f( p(x)x) withf(x)=−ln (x), x x q f′′(x)>0. Wehaveequalityinthesecondlineifandonlyifp′(x)/p(x)=1forallx,accordinglyD (p(x)kp′(x))=0. q 2 P P III. SOURCE CODING THEOREM Having presented some properties of the nonadditive entropy and the generalized KL entropy as a preliminary, we are now in a status to address our main results that the Shannon’s source coding theorem can be extended to the nonadditive case. Let us consider encoding the sequence of source letters generated from an information source to the sequence of binary codewords as an example. If a code is allocated for four source letters X ,X ,X ,X as 1 2 3 4 0,10,110,111respectively, the source sequence X X X X is coded into 1011111010. On the other hand, if another 2 4 3 2 code assignsthemas0,1,00,11,then the codewordbecomes 111001. The differencebetween the twocodesis striking in the coding. In the former codeword, the first two letters 10 corresponds to X and not the beginning of any 2 other codeword , then we can observe X . Next there are no codeword corresponding to 1 and 11 but 111 is, and 2 is decoded into X . Then the next 110 is decoded into X , leaving 10 which is decoded into X . Therefore we can 4 3 2 uniquely decode the codeword. Inthe lattercase,however,we havepossibilitiesto interpretthe firstthree letters 111 as X X X , X X or as X X . Namely, this code cannot be uniquely decoded into the source letter that gave rise 2 2 2 2 4 4 2 to it. Accordingly, we need to deal with so-called the prefix code or the instantaneous code such as the former case. Theprefixcodeisacodewhichnocodewordisaprefixofanyothercodeword(prefixconditioncode)[1,18]. Werecall that any code which satisfies the prefix condition over the alphabet size D(D = 2 is a binary case) must satisfy the Kraft inequality [1,18], M D−li ≤1 (27) i X 5 wherel is acodelengthofithcodeword(i=1,...M). Moreoverifacode isuniquely decodable,the Kraftinequality i holds for it [1,18]. We usually hope to encode the sequence of source letters to the sequence of codewords as short as possible, that is our problem is finding a prefix condition code with the minimum average length of a set of codewords{l }. The optimalcode is givenby minimizing the followingfunctional constrainedby the Kraftinequality, i pql J = i i i +λ D−li (28) pq P i i i ! X P where p is the probability of realization of the word length l and λ is a Lagrange multiplier. We have assumed i i equality in the Kraft inequality and have neglected the integer constrainton l . Differentiating with respect to l and i i setting the derivative to 0 yields pq D−li = i . (29) ( pq)λlogD i i Here, it is worth noting that from the Kraft inequPality the Lagrange multiplier λ is related as λ ≥ (logD)−1. Furthermorewhentheequalityholds,thefractionD−l∗i whichisgivenbytheoptimalcodewordlengthli∗ isexpressed as D−l∗i = pqipq. (30) i i Thereforel∗ canbewrittenaslog ( pq)−qlog p andPintheadditivelimit,weobtainl∗ =−log p . However,we i D i i D i i D i cannotalwaysdetermine the optimalcodewordlengthlike this since the l ’s must be integers. We havethe following i P theorem. Theorem 5. The averagecodeword length hLi of any prefix code for a random variable X satisfies q hLi ≥H (X) (31) q q 1 with equality if and only if pi =[1−(1−q)li]1−q Proof. From Eq(23) the generalized KL entropy between two distributions p and r is written as pq(ln p −ln r ) 1− pqr1−q D (pkr)= i i q i q i = i i i q pq (1−q) pq P i i P i i 1− pq pq(r1−q −1) = Pi i − i i i P. (32) (1−q) pq (1−q) pq P i i P i i IfwetaketheinformationcontentassociatedwithproPbabilityrasthei-Pthcodewordlengthl ,i.e,−(r1−q−1)/(1−q)= i i l , then the average codeword length can be written from Eq(32) as i hLi =H (X)+D (pkr) (33) q q q Since D (pkr)≥0 for q >0 from Theorem 4, we have the theorem. The equality holds if and only if p =r . 2 q i i We note that the relation −(r1−q −1)/(1−q) = l means that the codeword length l equals to the information i i i content different from p, l =I (r). When the equality is realized in the above theorem, we can derive an interesting i q interpretationonthenonadditivityparameterq. Theconditionoftheequalitystatesthattheprobabilityisexpressed as the Tsallis’scanonicalensemblelike factor inani-wise manner. Then eachl has the limit in lengthcorresponding i to q such as lmax =1/(1−q). i Since log ( pq)−qlog p obtained by the optimization problem is not always equal to an integer, we impose D i i D i the integer condition on the codewords {l } by rounding it up as l = ⌈log ( pq)−qlog p ⌉, where ⌈x⌉ denotes i i D i i D i the smallest inPteger ≥x [1]. Moreover the relation ⌈log ( pq)−qlog p ⌉≥log ( pq)−qlog p leads to D i i D i D i i D i P D−⌈logD( ipqi)−qlogDpi⌉ ≤P D−(logD( ipqi)−qlogDpPi) Xi P Xi P pq = i =1. (34) pq i i i X P 6 Hence {l } satisfies the Kraft inequality. Moreover,we have the following theorem. i Theorem 6. The averagecodeword length assigned by l =⌈log ( pq)−qlog p ⌉ satisfies i D i i D i H (p)+D (pkr)≤hLPi <H (p)+D (pkr)+1. (35) q q q q q Proof. The integer codeword lengths satisfies log pq −qlog p ≤l <log pq −qlog p +1. (36) D i D i i D i D i ! ! i i X X Multiplying by pq/ pq and summing over i with Eq(33) yields the theorem. 2 i i i This means that the distribution different from the optimal one provokes a correction of D (p k r) in the average q P codeword length as does in the case of additive one. We have discussed the properties of the nonadditive entropy in the case of one letter so far. Next, let us consider the situation which we transmit a sequence of n letters from the source in such a way that each letter is to be generated independently as identically distributed random variables according to p(x). Then the average codeword length per letter hL i =hl(X ,...,X )i /n is bounded as we derived in the preceding theorem, n q 1 n q H (X ,...,X )≤hl(X ,...,X )i <H (X ,...,X )+1. (37) q 1 n 1 n q q 1 n Since we are now considering independently, identically distributed random variables X ,...,X , we obtain 1 n n n [1+(q−1)H (X ,...,X )]H (X ) [1+(q−1)H (X ,...,X )]H (X ) 1 i=1 q i−1 1 q i ≤hL i < i=1 q i−1 1 q i + , (38) n q n n n P P where we have used Theorem 3. This relation can be considered to be the generalized source coding theorem for the finite number of letters. We note that we obtain H(X) ≤ hL i < H(X) + 1/n in the additive limit since n H(X ,...,X )= H(X )=nH(X) holds. 1 n i i P IV. THE GENERALIZED FANO’S INEQUALITY Fano’s inequality is an essential ingredient to prove the converse to the channel coding theorem which states that the probability of error that arises over a channel is bounded away from zero when the transmission rate exceeds the channel capacity. [19]. In the estimation of an original message generated from the information source, the original variable X may be estimated as X′ on the side of a destination. Therefore, we introduce the probability of error P = Pr{X′ 6= X} due to the noise of the channel through which the signal is transmitted. With an error random e variable E defined as 1 if X′ 6=X E = (39) 0 if X′ =X (cid:26) we have the following theorem which is considered to be the generalized(nonadditive version) Fano’s inequality. Theorem 7. The generalized Fano’s inequality 1+(q−1)H (E,Y) Pq 1−(|H|−1)1−q H (X |Y)≤H (P )+ q e (40) q q e 1+(q−1)H (Y) Pq+(1−P )q (q−1)(|H|−1)1−q q e e where |H| denotes the size of the alphabet of the information source. Proof. The proof can be done along the line of the usual Shannon’s additive case(e.g. [1]). Using the corollary Eq(13), we have two different expressions for H (E,X |Y), q 1+(q−1)H (X,Y) q H (E,X |Y)=H (X |Y)+ H (E |X,Y) (41) q q q 1+(q−1)H (Y) q and 7 1+(q−1)H (E,Y) q H (E,X |Y)=H (E |Y)+ H (X |E,Y) (42) q q q 1+(q−1)H (Y) q where we have used the corollary by regarding H (E,X | Y) as H (X,E | Y) in the second expression Eq(42). We q q are now considering the following situation. That is, we wish to know the genuine random variable Y which can be related to the X by the nonadditive conditional information content I (y |x). Hence we calculate X′, an estimate of q X,as a function of Y suchas g(Y) [1]. Thenwe see H (E |X,Y) becomes 0 since E is a function ofX and Y by the q definition Eq(39). Therefore the first expressionof H (E,X |Y) reduces to H (X |Y). On the other hand, from the q q non-negativity property of I(E;Y) we assumed and from the relation H (E) = H (P ), we can evaluate H (E | Y) q q e q as H (E |Y)≤H (E)=H (P ). Moreover,H (X |E,Y) can be written as q q q e q (Pr{E})qH (X |Y,E) H (X |E,Y)= E q q (Pr{E})q P E (1−P )qH (X |Y,0)+PqH (X |Y,1) = eP q e q . (43) Pq+(1−P )q e e For E = 0, g(Y) gives X resulting in H (X | Y,0)= 0 and for E = 1, we have upper bound on H (X | Y,1) by the q q maximum entropy comprising of the remaining outcomes |H|−1, 1−(|H|−1)1−q H (X |Y,1)≤ . (44) q (1−q)(|H|−1)1−q Then it follows from Eq(43) that 1−(|H|−1)1−q [Pq+(1−P )q]H (X |E,Y)≤Pq . (45) e e q e (1−q)(|H|−1)1−q Combining the above results with Eq(42), we have the theorem.2 Remark: In the additive limit, we obtain the usual Fano’s inequality H(X |Y)≤H(P )+P ln(|H|−1) in nat unit. e e V. CONCLUDING REMARKS WehaveattemptedtoextendtheShannon’sadditiveITtothenonadditivecasebyusingthenonadditiveinformation content Eq(1). In developing the nonadditive IT, this postulate of the nonadditive information content seems to plausibleselectionintermsoftheunificationofthemeaningofaverage throughouttheentiretheory. Asaconsequence, theinformationentropybecomesthemodifiedtypeofTsallisnonextensiveentropy. Wehaveshownthattheproperties of the nonadditive information entropy, conditional entropy and the joint entropy in the form of theorem, which are necessary elements to develop IT. These results recover the usual Shannon’s ones in the additive limit(q → 1). Moreover, the source coding theorem can be generalized to the nonadditive case. As we have seen in the theorem 5, thenonadditivityoftheinformationcontentcanberegardedthatitdeterminesthelongestcodewordwecantransmit to the channel. Philosophy of the present attempt can be positioned as a reverse of Jaynes’s pioneering work [20,21]. Jaynes has brought a concept of IT to statistical mechanics in the form of maximizing entropy of a system (Jaynes maximum entropy principle). The information theoretical approach to statistical mechanics is now considered to be veryrobustindiscussingsomeareasofphysics. Inturn,wehaveapproachedITinanonadditiveway. Webelievethat thepresentconsiderationbasedonthenonadditiveinformationcontentmaytriggersomepracticalfutureapplications in such an area of the information processing. ACKNOWLEDGMENTS The author would like to thank Dr.S.Abe for useful comments at the Yukawa Institute for Theoretical Physics, Kyoto, Japan and for suggesting him the nonadditive conditional entropy form Eq(11). [1] T.M. Cover and J.A. Thomas, Elements of Information Theory (Wiley,New York,1991). 8 [2] R.V.L Hartley,Bell Syst. Tech.J. 7, 535 (1928). [3] C.E.Shannon, Bell Syst.Tech. J. 27, 379 (1948); ibid 623 (1948). [4] C.E. Shannonand W.Weaver,The Mathematical Theory of Communication (University of Illinois Press, Urbana,1963). [5] C. Tsallis, J. Stat. Phys. 52, 479 (1988); E.M.F. Curado and C. Tsallis, J.Phys.A 24, L69 (1991); Corrigenda: 24, 3187 (1991) and 25, 1019 (1992). [6] http://tsallis.cat.cbpf.br/biblio.htm for an updated bibliography. [7] A.K.Rajagopal and S. Abe,Phys. Rev.Lett. 83, 1711 (1999). [8] C. Tsallis, R.S. Mendes and A.R.Plastino, Physica A 261, 534 (1998). [9] J.Balatoni and A.Renyi, Pub.Math. Int. Hungarian Acad Sci. 1, 9 (1956); A.Renyi, Probability Theory (North-Holland, Amsterdam 1970). [10] The definition Eq(11) corresponds to taking the nonadditive conditional entropy as the form Hq(Y | X) = −1 q−11 h1+(q−1)Hq(Y |x)iq(X) −1 . [11] S.Ab(cid:20)e(cid:16)and A.K. Rajagopal, Phys(cid:17)ica A 2(cid:21)89, 157 (2001). [12] S.Abe,Phys. Lett.A 271, 74 (2000). [13] From the definition, we can easily confirm Hq(Y | X) = Hq(Y) if and only if X and Y are independent variables. That is, Hq(Y | x) = Hq(Y). Moreover, since h[1+(q−1)Hq(Y)]−1i(qX) = h pq(y)i(qX) = pq(y), we obtain Hq(Y | X) = y∈Y y∈Y X X 1/ pq(y)−1 /(q−1)=Hq(Y). ! y∈Y X [14] S.KullbackandR.A.Leibler,Ann.Math.Stat.22,79(1951); S.Kullback,InformationTheoryandStatistics(Wiley,New York,1959). [15] Wealso note that our definition of lnqx corresponds to −lnqx−1 in theRef. [7]. [16] C.Tsallis, Phys.Rev. E 58, 1442 (1998). [17] L. Borland, A.R. Plastino, C. Tsallis, J. Math. Phys. 39, 6490 (1998); Errata:ibid 40, 2196 (1999). [18] R.G. Gallager, Information Theory and Reliable Communication (Wiley,New York,1968). [19] Wewill report theextension of the theorem elsewhere. [20] E.T.Jaynes, Phys. Rev. 106, 620 (1957); 108, 171 (1957). [21] E.T.Jaynes: Papers on Probability, Statistics and Statistical Physics, ed. R.D.Rosenkrantz (D.Reidel, Boston, 1983); E.T.Jaynes in Statistical Physics, ed. W.K.Ford, Benjamin, New York (1963). 9

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.