ebook img

Information theory and generalized statistics PDF

15 Pages·0.19 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Information theory and generalized statistics

Information theory and generalized statistics 3 0 0 2 Petr Jizba n a Instituteof Theoretical Physics, University of Tsukuba, Ibaraki 305–8571, Japan J 0 2 Abstract. In this lecture we present a discussion of generalized statistics based on ] R´enyi’s,Fisher’sandTsallis’smeasuresofinformation.Theunifyingconceptualframe- h work whichweemploy hereisprovidedbyinformation theory.Importantapplications c e of generalized statistics to systems with (multi–)fractal structureare examined. m - t 1 Introduction a t s . One of the important approaches to statistical physics is provided by informa- t a tion theory erected by Claude Shannon in the late 1940s. Central tenet of this m approach lies in a construction of a measure of the “amount of uncertainty” - inherentin a probabilitydistribution [1]. This measure ofinformation(orShan- d non’s entropy)quantitativelyequals to the number ofbinary (yes/no)questions n which brings us from our present state of knowledge about the system in ques- o c tion to the one of certainty. The higher is the measure of information (more [ questions to be asked) the higher is the ignorance about the system and thus moreinformationwillbeuncoveredafteranactualmeasurement.UsageofShan- 1 v non’sentropyisparticularlypertinenttotheBayesianstatisticalinferencewhere 3 one deals with the probability distributionassignmentsubject to priordata one 4 possess about a system [2,3]. Here the prescription of maximal Shannon’s en- 3 tropy (i.e., maximal ignorance) - MaxEnt, subject to given constraints yields 1 the least biased probability distribution which naturally protects against con- 0 3 clusions which are not warranted by the prior data. In classical MaxEnt the 0 maximum–entropy distributions are always of an exponential form and hence / thenamegeneralizedcanonicaldistributions.NotethatMaxEntprescription,in t a a sense, resembles the principle of minimal action of classicalphysics as in both m cases extremization of a certain functionals - the entropy or action functionals - - yields physicalpredictions. In fact, the connection between information and the d n action functional was conjectured by E.T. Jaynes [3] and J.A. Wheeler [4], and o most recently this line of reasonings has been formalized e.g., by B.R. Frieden c in his“principle of extreme physical information - EPI” [5]. : v On a formal level the passage from information theory to statistical ther- i modynamics is remarkably simple. In this case a maximal–entropy probability X distributionsubjecttoconstraintsonaverageenergy,orconstantaverageenergy r a and number of particles yields the usual canonical or grand–canonicaldistribu- tions of Gibbs, respectively. Applicability in physics is, however, much wider. Aside of statistical thermodynamics, MaxEnt has now become a powerful tool in non–equilibrium statistical physics [6] and is equally useful in such areas as 2 P. Jizba astronomy,geophysics, biology, medical diagnosis and economics. For the latest developments in classicalMaxEnt the interested reader may consult ref. [7] and citations therein. As successful as Shannon’s information theory has been, it is clear by now that it is capable of dealing with only a limited class ofsystems one might hope to address in statistical physics. In fact, only recently it has become appar- ent that there are many situations of practical interest requiring more “exotic” statistics which does not conform with the generalized canonical prescription of the classical MaxEnt (often referred as Boltzmann–Gibbs statistics). Perco- lation, polymers, protein folding, critical phenomena or stock market returns provide examples. On the other hand, it cannot be denied that MaxEnt ap- proach deals with statistical systems in a way that is methodically appealing, physically plausible and intrinsically nonspeculative (i.e., MaxEnt invokes no hypotheses beyond the sample space and the evidence that is in the available data). It might be therefore desirable to inspect the axiomatics of Shannon’s information theory to find out whether some “plausible” generalization is pos- sible. If so, such an extension could provide a new conceptual frame in which generalized measures of information (i.e., entropies) could find their theoretical justification. The additivity of independent mean information is the most natu- ral axiom to attack. On this level three modes of reasoning can be formulated. One may either keepthe additivity of independent information but utilize more generaldefinition of means, or keep the the usual definition of linear means but generalize the additivity law or combine both these approaches together. In the first case the crucial observationis that the most general means com- patible with Kolmogorov’s axioms of probability are the so called quasi–linear means which are implemented via Kolmogorov–Nagumofunctions [8]. This ap- proach was pioneered by A. R´enyi [9], J. Acz´el and Z. Dar´oczy [10] in 60s and 70s. The corresponding measure of information is then called R´enyi’s entropy. Becausetheindependentinformationareinthisgeneralizationstilladditiveand because the quasi–linear means basically probe dimensionality of the sample space one may guess that this theory should play an important rˆole in classical information–theoreticalsystemswithanon–standardgeometry,suchasfractals, multi–fractals or systems with embedded self–similarity. These include phase transitions and critical phenomena, chaotic dynamical systems with strange at- tractors, fully developed turbulence, hadronic physics, cosmic strings, etc. Secondcaseamountstoamodificationoftheadditivitylaw.Outoftheinfin- ity of possible generalizations the so called q–additivity prescription has found widespread utility. The q–calculus was introduced by F. Jackson [11] in 20’s and more recently developed in the framework of Quantum Groups by V. Drin- feld [12] and M. Jimbo [13]. With the help of q–calculus one may formalize the entire approach in an unified manner by defining q–derivative (Jackson deriva- tive), q–integration (Jackson integral), q–logarithms, q–exponentials, etc. The correspondingmeasureofinformationis calledTsallisornon–extensiveentropy. The q–additivity is in a sense minimal generalization because the non–additive part is proportionalto both respective information and is linearly parametrized Information theory and generalized statistics 3 by the only one “coupling” constant. The non–additivity prescription might be understood as a claim that despite given phenomena being statistically inde- pendent there still might be non–vanishing correlationbetween them and hence information might get “entangled”. One may thus expect that Tsallis entropy shouldbeimportantinsystemswithlong–rangecorrelationsorlong–timemem- ories. One may even guess that quantum non–locality might become crucial playground for non–extensive statistics. Thirdgeneralizationis stillnotexploredinthe literature.Itcanbe expected that it should become relevant in e.g., critical phenomena in a strong quantum regime. The latter can be found, for instance, in the early universe cosmolog- ical phase transitions or in currently much studied quantum phase transitions (frustrated spin systems or quantum liquids being examples). Thestructureofthislectureisthefollowing:InSections2wereviewthebasic information–theoreticsetupfor R´enyi’sentropy.We showits relationto (multi– )fractalsystemsandillustratehowtheR´enyiparameterisrelatedtomultifractal singularityspectrum.TheconnectionofR´enyi’sentropywithFisherinformation andthemetricstructureofstatisticalmanifolds(i.e.,Fisher–Raometric)arealso discussed.In Section3 the informationtheoretic rationaleof Tsallis entropyare presented. 2 R´enyi’s entropy R´enyi entropies (RE) were introduced into mathematics by A. R´enyi [9] in mid 60’s.The originalmotivationwasstrictly formal.R´enyiwantedto findthe most general class of information measures which preserved the additivity of statisti- cally independent systems and were compatible with Kolmogorov’s probability axioms. Letusassumethatoneobservestheoutcomeoftwoindependenteventswith respectiveprobabilitiespandq.Additivity ofinformationthenrequiresthatthe corresponding information obeys Cauchy’s functional equation I(pq)=I(p)+I(q). (1) Therefore,asidefromamultiplicativefactor,theamountofinformationreceived by learning that an event with probability p took place must be I(p)=−log p. (2) 2 Here the normalization was chosen so that ignorant’s probability (i.e. p = 1/2) sets the unit of information - bit. Formula (2) is known as Hartley’s informa- tion measure [9]. In general, if the outcomes of some experiment are A ,...,A 1 n with respective probabilities p ,...,p , and if A outcome delivers I bits of 1 n k k information then the mean received information reads n I =g−1 p g(I ) . (3) k k ! k=1 X 4 P. Jizba Here g is an arbitrary invertible function - Kolmogorov–Nagumofunction. The mean defined in Eq.(3) is the so called quasi–linear mean and it constitutes the mostgeneralmeancompatible with Kolmogorov’saxiomatics[8,14].R´enyithen provedthatwhenthepostulateofadditivityforindependenteventsisappliedto Eq.(3) it dramaticallyrestricts the class ofpossible g’s. Infact, only two classes are possible; g(x)=cx+d which implies the Shannon information measure n I(P)=− p log (p ), (4) k 2 k k=1 X and g(x)=c 2(1−q)x+d which implies n 1 I (P)= log pq , (5) q (1−q) 2 k ! k=1 X with q > 0 (c and d are arbitrary constants). In both cases P = {p ,...,p }. 1 n Notethatforlinearg’sthequasi–linearmeanturnsouttobetheordinarylinear meanandhence Shannon’sinformationis the averagedinformationinthe usual sense.Informationmeasuredefinedby (5) is calledR´enyi’sinformationmeasure (of order q) or R´enyi’s entropy. Term “entropy” is chosen in a close analogy with Shannon’s theory because R´enyi’s entropy also represents the disclosed information (or removed ignorance) after performed experiment. On a deeper levelitmightbesaidthatR´enyi’sentropymeasuresadiversity(ordissimilarity) within a given distribution [15]. In Section 2.5 we will see that in parametric statisticsFisherinformationplaysasimilarrˆole.Itwillbe shownthatthelatter measures a diversity between two statistical populations. To find the most fundamental (and possibly irreducible) set of properties characterizing R´enyi’s information it is desirable to axiomatize it. Various ax- iomatizations can be proposed [9,16]. For our purpose the most convenient set of axioms is the following [16]: n 1. For a given integer n and given P = {p ,p ,...,p } (p ≥ 0, p = 1), 1 2 n k k k I(P) is a continuous with respect to all its arguments. P 2. For a given integer n, I(p ,p ,...,p ) takes its largest value for p = 1/n 1 2 n k (k =1,2,...,n) with the normalization I 1,1 =1. 2 2 3. For a given q ∈IR; I(A∩B)=I(A)+I(B|A) with (cid:0) (cid:1) I(B|A)=g−1( ̺ (q)g(I(B|A=A ))), k k k and ̺ (q)=pq/ pq (distribuPtion P corresponds to the experiment A). k k k k 4. g is invertible and positive in [0,∞). P 5. I(p ,p ,...,p ,0) = I(p ,p ,...,p ), i.e., adding an event of probability 1 2 n 1 2 n zero (impossible event) we do not gain any new information. Note particularly the appearance of distribution ̺(q) in axiom 3. This, so called, zooming (or escort) distribution will prove crucial is Sections 2.4 and 3. Information theory and generalized statistics 5 Further characteristics of expressions (5) were studied extensively in [9,16]. We list here a few of the key ones. (a) RE is symmetric: I (p ,...,p )=I (p ,...,p ) ; q 1 n q k(1) k(n) (b) RE is nonnegative: I (P)≥0 ; q (c) RE is decisive: I (0,1)=I (1,0) ; q q (d) For q ≤1 R´enyi’s entropy is concave. For q >1 R´enyi’s entropy in not pure convex nor pure concave ; (e) RE is bounded, continuous and monotonous in q ; (f) RE is analytic in q ∈CII∪III ⇒ for q =1 it equals to Shannon’s entropy, i.e., limq→1Iq =I . DespiteitsformaloriginR´enyi’sentropyprovedimportantinvarietyofprac- tical applications. Coding theory [10], statistical inference [17], quantum me- chanics [18], chaotic dynamical systems [19,20,21,22] and multifractals provide examples. The rest of Section 2 will be dedicated to applications in multifractal systems. For this purpose it is important to introduce the concept of renormal- ized information. 2.1 Continuous probability distributions - renormalization Let us assume that the outcome space (or sample space) is a continuous d– dimensionalmanifold.Itisthenheuristicallyclearthataswerefinethemeasure- menttheinformationobtainedtendstoinfinity.Yet,undercertaincircumstances a finite information can be extracted from the continuous measurement. Toshowthiswepavetheoutcomespace1 withboxesofthesizel=1/n.This divides the d–dimensional sample space into cells labelled by an index k which runsfrom1uptond.IfF(x)isacontinuousprobabilitydensityfunction(PDF), the corresponding integrated probability distribution P = {p } is generated n nk via prescription p = F(x)ddx. (6) nk Zk-thbox Generic form of I (P ) it then represented as q n I (P )=divergent in n+finite+o(1), (7) q n where the symbol o(1) means that the residual error tends to 0 for n → ∞. The finite part (≡ I (F)) is fixed by the requirement (or by renormalization q prescription)thatitshouldfulfillthe postulate ofadditivity inorderto be iden- tifiable with an information measure. Incidentally, the latter uniquely fixes the divergent part [16] as dlog n. So we may write 2 I (p )≈dlog n+h+o(1), (8) q nk 2 1 For simplicity’s sake we consider that theoutcome space has volumeV =1. 6 P. Jizba which implies that 1 I (F) ≡h= lim (I (P )−dlog n) = log Fq(x)ddx . (9) q n→∞ q nk 2 (1−q) 2 (cid:18)Z (cid:19) Thelattermightbegeneralizedtopiecewise–continuousF(x)’s(Stiltjesintegra- tion) and to Lebesgue measurable sets [9]. Needless to say that R´enyi’s entropy I (F) exists iff. the integral on the RHS of (9) exists. q Note that (9) can be recast into a form I (F)≡ lim (I (P )−I (E )). (10) q q n q n n→∞ with E = 1 ,..., 1 being the uniform distribution. Expression (10) repre- n nd nd sents nothing but R´enyi’s generalization of the Szilard–Brillouin negentropy. (cid:8) (cid:9) 2.2 Fractals, multifractals and generalized dimension Aforementioned renormalization issue naturally extends beyond simple metric outcome spaces (like IRd). Our aim in this Subsection and the Subsection to follow is to discuss the renormalization of information in cases when the out- come space is fractal or when the statistical system in question is multifractal. Conclusions of such a reneormalizationwill be applied is Subsection 2.4. Fractalsare sets with a generally non–integerdimension exhibiting property of self–similarity. The key characteristic of fractals is fractal dimension which is defined as follows: Consider a set M embedded in a d–dimensional space. Let us cover the set with a mesh of d–dimensional cubes of size ld and let N (M) is l a number of the cubes needed for the covering. The fractal dimension of M is then defined as [23,24] lnN (M) l D =−lim . (11) l→0 lnl Inmostcasesofinterestthefractaldimension(11)coincideswiththeHausdorff– Besicovich dimension used by Mandelbrot [23]. Multifractals, on the other hand, are related to the study of a distribution of physical or other quantities on a generic support (be it or not fractal) and thusprovideamovefromthegeometryofsetsassuchtogeometricpropertiesof distributions. Let a support is coveredby a probability of some phenomenon. If wepavethesupportwithagridofspacinglanddenotetheintegratedprobability in the ith box as p , then the scaling exponent α is defined [23,24] i i p (l)∼lαi. (12) i The exponent α is called singularity or Lipshitz–Ho¨lder exponent. i CountingboxesN(α)wherep hasα ∈(α,α+dα),thesingularityspectrum i i f(α) is defined as [23,24] N(α)∼l−f(α). (13) Information theory and generalized statistics 7 Thus a multifractal is the ensemble of intertwined (uni)fractals each with its own fractal dimension f(α ). For further investigation it is convenient to define i a “partition function” [23] Z(q)= pq = dα′ρ(α′)l−f(α′)lqα′. (14) i i Z X In the small l limit the method of steepest descent yields the scaling [23] Z(q)∼lτ, (15) with ′ τ(q)=min(qα−f(α)), f (α(q))=q. (16) α This is precisely Legendre transform relation. So pairs f(α),α and τ(q),q, are conjugates with the same mathematical content. ConnectionofR´enyientropieswithmultifractalsisfrequentlyintroducedvia generalized dimensions 1 logZ(q) D = lim =−limI (l)/log l. (17) q l→0 (q−1) logl l→0 q 2 (cid:18) (cid:19) These have direct applications in chaotic attractors [19,20,21,22] and they also characterize,forinstance,intermittencyofturbulence[17,25]ordiffusion–limited aggregates (DLA) like patterns [26]. In chaotic dynamical systems all D are q necessarytodescribeuniquelye.g.,strangeattractors[22].Whiletheproofin[22] isbasedonarathercomplicatedself–similarityargumentation,byemployingthe information theory one can show that the assumption of a self–similarity is not really fundamental [16]. For instance, when the outcome space is discrete then allD withq ∈[1,∞)areneededtoreconstructtheunderlyingdistribution,and q when the outcome space is d–dimensional subset of IRd then all D , q ∈(0,∞), q are required to pinpoint uniquely the underlying PDF. The latter examples are nothing but the information theoretic variants of Hausforff’s moment problem of mathematical statistics. 2.3 Fractals, multifractals and renormalization issue InacloseanalogywithSection2.1itcanbeshown[16]thatforafractaloutcome space the following asymptotic expansion of R´enyi’s entropy holds I (p )≈Dlog n+h+o(1), (18) q kn 2 where D corresponds to the Hausdorff dimension. The finite part h is, as be- fore, chosen by the renormalization prescription - additivity of information for independent experiments. Then I (F)≡ h= lim (I (P )−Dlog n)= lim (I (P )−I (E )) q n→∞ q n 2 n→∞ q n q n 1 = log dµFq(x) . (19) (1−q) 2 (cid:18)ZM (cid:19) 8 P. Jizba Measure µ in (19) is the Hausdorff measure µ(d;l)= ld −l→→0 0 if d<D . (20) ∞if d>D k-thbox (cid:26) X Technical issues connected with integration on fractal supports can be found, for instance, in [27,28]. Again, renormalized entropy is defined as long as the integral on the RHS of (19) exists. We may proceed analogously with multifractals. The corresponding asymp- totic expansion now reads [16] τ(q) I (p )≈ log n+h+o(1). (21) q nk (1−q) 2 This implies that τ(q) h≡Iq(µP)= ll→im0 Iq(Pn)− (q−1)log2n =ll→im0 (Iq(Pn)−Iq(En)) (cid:18) (cid:19) 1 = log dµ(q)(a) . (22) (1−q) 2 P (cid:18)Za (cid:19) Here the multifractal measure is defined as [24] µ(q)(d;l)= pqnk −l→→0 0 if d<τ(q) (23) P ld ∞if d>τ(q). k-thbox (cid:26) X Itshouldbestressedthatintegrationonmultifractalsisratherdelicatetechnical issue which is not yet well developed in the literature [28]. 2.4 Canonical formalism on multifractals We shall now present an important connection of R´enyi’s entropy with mul- tifractal systems. The connection will be constructed in a close analogy with canonicalformalismofstatisticalmechanics.Asthisapproachisthoroughlydis- cussed in [16] we will, for shortness’s sake,mention only the salient points here. Letusfirstconsideramultifractalwithadensitydistributionp(x).Ifweuse, as previously, the covering grid of spacing l then the coarse–grained Shannon’s entropy of such a process will be I(P (l))=− p (l)log p (l). (24) n k 2 k X Important observation of the multifractal theory is that when q =1 then dτ(1) p (l)log p (l) I(P (l)) a(1)= =f(a(1))= lim k k 2 k =−lim n , (25) dq l→0P log2l l→0 log2l describes the Hausdorffdimensionofthe setonwhichthe probabilityis concen- trated - measure theoretic support. In fact, the relative probability of the com- plementsetapproacheszerowhenl→0.This statementisknownas Billingsley theorem [29] or curdling [23]. Information theory and generalized statistics 9 For the following considerations it is useful to introduce a one–parametric family of normalized measures ̺(q) (zooming or escort distributions) [p (l)]q ̺ (q,l)= i ∼lf(ai). (26) i [p (l)]q j j Because P ≤da if q ≤1, df(a)= (27) ≥da if q ≥1, (cid:26) we obtain after integrating (27) from a(q =1) to a(q) that ≤a if q ≤1, f(a)= (28) ≥a if q ≥1. (cid:26) So for q > 1 ̺(q) puts emphasis on the more singular regions of P , while for n q <1theaccentuationisonthelesssingularregions.Parameterqthusprovidesa “zoomin”mechanismtoprobevariousregionsofadifferentsingularityexponent. As the distribution (26) alters the scaling of original P , also the measure n theoretic support changes. The fractal dimension of the new measure theoretic support M(q) of ̺(q) is 1 d (M(q))= lim ̺ (q,l)log ̺ (q,l). (29) h l→0log2l k k 2 k X Notethatthecurdling(29)mimicsthesituationoccurringinequilibriumstatis- ticalphysics.There,inthecanonicalformalismoneworkswith(usuallyinfinite) ensemble of identical systems with all possible energy configurations. But only theconfigurationswithE ≈hE(T)idominateatn→∞.Choiceoftemperature i then prescribes the contributing energy configurations. In fact, we may define the “microcanonical” partition function as Z = 1 =dN(a ). (30) mic i   ak∈(aXi,ai+dai)   Then the microcanonical (Boltzmann) entropy is H(E(a ))=log dN(a )=log Z , (31) i 2 i 2 mic and hence H(E(a )) i ≈ −hf(a)i . (32) mic log ε 2 InterpretingE =−a log εas“energy”wemaydefinethe“inversetemperature” i i 2 1/T =β/ln2 (note that here k =1/ln2) as B ∂H 1 ∂Z mic ′ 1/T = =− =f (a )=q. (33) i ∂E lnε Z ∂a (cid:12)E=Ei mic i (cid:12) (cid:12) (cid:12) 10 P. Jizba On the other hand, with the “canonical” partition function Z = p (ε)q = e−βEi, (34) can i i i X X and β =qln2 and E =−log (p (ε)) the corresponding means read i 2 i a ̺ (q,ε)log p (ε) a(q)≡hai = i e−βEi ≈ i i 2 i , (35) can Z log ε i can P 2 X f(a ) ̺ (q,ε)log ̺ (q,ε) f(q)≡hf(a)i = i e−βEi ≈ i i 2 i . (36) can Z log ε i can P 2 X Let us note particularly that the fractal dimension of the measure theoretic support d (M(q)) is simply f(q). By gathering the results together we have h micro–canonical ensemble canonical ensemble - unifractals - multifractals Z ; H=S =log Z Z ;S =log Z −qhai log ε mic mic 2 mic can can 2 can can 2 hai =a = a /Z hai = a e−βEk/Z mic i k k mic can k k can P P hf(a)i =−S /log ε hf(a)i =−S /log ε mic mic 2 can can 2 q = ∂S /∂E| q =∂S /∂hEi mic E=Ei can can β =ln2/T =q β =ln2/T =q E =−log p =−a log ε hEi =−hai log ε i 2 i i 2 can can 2 hf(a)i =qhai −τ hf(a)i =qhai −τ mic mic can can Lookingatfluctuations ofa inthe “canonical”ensemble we canestablishan equivalencebetweenunifractalsandmultifractls.RecallingEq.(15)andrealizing that ∂2(log Z )/∂q2 =hE2i −hEi2 ≈(log ε)2, (37) 2 can can can 2 ∂2(τlog ε)/∂q2 =(∂a/∂q)log ε≈log ε, (38) 2 2 2 we obtain for the relative standard deviation of “energy” hE2i −hEi2 1 can can = ha2i −hai2 ≈ →0. (39) log ε can can −log ε p 2 2 p So for small ε (i.e., exactmultifractal) the a–fluctuatipons become negligible and almost all a equal to hai . If q is a solution of the equation a = τ′(q) then i can i

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.