Generalized Statistics Framework for Rate Distortion Theory with Bregman Divergences 7 0 R. C. Venkatesan 0 Systems Research Corporation 2 Aundh, Pune 411007, India n [email protected] a J 5 1 Abstract—A variational principlefor the rate distortion (RD) theorywithBregmandivergencesisformulatedwithintheambit x1−q −1 ln (x)= , (4) ] ofthegeneralized(nonextensive)statisticsofTsallis.TheTsallis- q 1−q h BregmanRDlowerboundisestablished.Alternateminimization c and schemes for the generalized Bregman RD (GBRD) theory are e derived.AcomputationalstrategytoimplementtheGBRDmodel 1 m ispresented.TheefficacyoftheGBRDmodelisexemplifiedwith ex = [1+(1−q)x] /(1−q);1+(1−q)x≥0 (5) - the aid of numerical simulations. q t 0;otherwise, a st I. INTRODUCTION respectively. The Tsallis entropy (1), and, the generalized K- . Thegeneralized(nonextensive)statisticsofTsallis[1,2]has Ld (3) may be written as t a recentlybeenthefocusofmuchattentioninstatisticalphysics, m and allied disciplines 1. Nonextensive statistics generalizes S (p(x))=− p(x)qln p(x), (6) q q d- the extensiveBoltzmann-Gibbsstatistics, andhasfoundmuch Xx utility in complexsystemspossessing longrangecorrelations, n and, o fluctuations, ergodicity,chirality and fractal behavior. By def- r(x) I (p(x)kr(x))=− p(x)ln , (7) c inition, the Tsallis entropy is defined in terms of discrete q q p(x) [ variables as Xx respectively. Employing the relation [4] 3 v 1− pq(x) x 18 Sq(x)=− 1Px−q ,where, p(x)=1. (1) lnq(cid:18)y(cid:19)=yq−1(lnqx−lnqy), (8) 2 Xx the generalizedK-Ld (7)may be expressedas the generalized 1 The constant q is referred to as the nonextensivity param- mutual information 0 eter. Given two independent variables x and y, one of the 07 fundamental consequences of nonextensivity is demonstrated Iq X;X˜ =− p(x,x˜)lnq p(p(x˜x˜|x)) / by the pseudo-additivity relation (cid:16) (cid:17) xP,x˜ (cid:16) (cid:17) at = pq−11(x˜)Sq(X)− p(x˜)Sq X|X˜ =x˜ . (9) m Sq(xy)=Sq(x)+Sq(y)+(1−q)Sq(x)Sq(y). (2) Px˜ Xx˜ (cid:16) (cid:17) d- Hqe→re,1(.1T)aaknindg(2th)eimlipmlyittqha→te1xteinns(i1v)easntadtiesvtiocksiinsgrel’cHoovsepreitdala’ss | hSq(X|X{˜z=x˜)ip(x˜) } n Here,h•idenotestheexpectationvalue.Theseminalpaperon o rule,Sq(x)→S(x),theShannonentropy.Thejointlyconvex source coding within the frameworkof nonextensivestatistics c generalizedKullback-Leiblerdivergence(K-Ld)isoftheform : [3] by Landsberg and Vedral [5], has provided the impetus for v a number of investigations into the use of nonextensive in- i X q−1 formation theory within the context of coding problems. The p(x) −1 r I (p(x)kr(x))= p(x)(cid:16)r(x)(cid:17) . (3) works of Yamano [6,7] represent a sample of some of the a q q−1 prominenteffortsinthisregard.Thesourcecodingtheoremin Xx a nonextensive setting has been derived by Yamano [8]. In the limit q → 1, the extensive K-Ld is readily recovered. A statistical physics model for the variational problem Akin to the Tsallis entropy, the generalized K-Ld obeys the encountered in rate distortion (RD) theory [9, 10] and the pseudo-additivity relation [3]. informationbottleneckmethod[11],derivedwithina general- Defining the q-deformed logarithm and the q-deformed ized statistics framework, has been recently established [12]. exponential as [4] A nonextensive Blahut-Arimorto (BA) alternate minimization scheme [13] has been derived. The noteworthy result of the 1A continually updated bibliography of works related to nonextensive statistics maybefoundathttp://tsallis.cat.cbpf.br/biblio.htm. study in [12] is that the nonextensive RD curves possesses a lower threshold for the minimum compression information in A number of Bregman divergences have been tabulated in the distortion-compression plane, as compared to equivalent [15]. The generalized K-Ld in (3), also referred to as the RD curves derived on the basis of the Boltzmann-Gibbs- Csisza´r generalized K-Ld is not a Bregman divergence.Since Shannon framework. As was the case in [12], this paper the seminal work by Naudts [17], Bregman divergences have employsvaluesofqintherange0<q <1.Thispaperutilizes been the object of much research in nonextensive statistics. the statistical physics theory presented in [12] to formulate a Defining φ(p) = −plnq 1/p , the Bregman generalized K- generalized Bregman RD (GBRD) model. The GBRD model (cid:16) (cid:17) Ld is defined as extendspreviousworksbyRose [14]andBanerjee et. al. [15, d d 16] to achieve a principled and practical strategy to evaluate d (p,r)= pi pq−1−rq−1 − (p −r )rq−1 φ (q−1) i i i i i RD functions. iP=1 (cid:16) (cid:17) iP=1 =I (pkr) q,B II. RATIONALEFORTHE GBRD MODEL (10) A. Background information Setting q−1→κ, (10) is consistent with Eq. (35) in [17]. The RD problem in terms of discrete random variables is III. THE GBRD MODEL stated as follows: given a discrete random variable X ∈ Ξ called the source or the codebook, and, another discrete ran- ThisSectionprovidesastrategytojointlyobtaintheoptimal dom variable X˜ ∈Ξ˜ which is a compressed representation of supportX˜s2ofthequantizedcodebookwithcardinality X˜s = X (alsoreferredtoasthequantizedcodebookand/ortherepro- k , and, the conditional probability p(x˜|x) that charac(cid:12)(cid:12)teri(cid:12)(cid:12)zes ductionalphabet),theinformationratedistortionfunctionthat theRD problem.Thisis accomplishedbya jointoptim(cid:12)izat(cid:12)ion istobeobtainedistheminimizationofthemutualinformation of I (X,X˜) over all conditional probabilities p(x˜|x). q≥1 tioTnhoefctrhuex RofDthfeunRcDtionproubsilnegmtihsethBeAnuscmheermicea.lTdehteeramcitnuaa-l X˜sm,p(inx˜|x)LqGBRD = X˜sm,p(inx˜|x)nIq(cid:16)X;X˜(cid:17)+βhdφ(x,x˜)ip(x,x˜)o. |X˜s|=k implementation of the BA scheme is sometimes impractical (11) owing to a lack of knowledge of the optimal support of the The optimal Lagrange multiplier β, hereafter referred to as quantized codebook X˜. Exact analytical solutions exist only the inverse temperature, depends upon the optimal toler- for a few cases consisting of a combination well behaved ance level of the expectation of D =< d (x,x˜) > = φ p(x,x˜) sources and distortion measures. An initial attempt to achieve p(x,x˜)d (x,x˜). φ a practical and tractable solution to the RD problem was xP,x˜ performed by Rose [14], for the case of Euclidean square A. Constraint terms distortion functions. Therein, it was demonstrated that for sources whose support is a bounded set, the RD function Generalized statistics has utilized a number of constraints eitherequalstheShannonlowerbound,or,theoptimalsupport todefineexpectations.TheoriginalTsallis(OT)constraintsof for the quantized codebook is finite thus permitting the use the form hAi = p A [1], were convenient owing to their i i of a numerical procedure called the mapping method. The similarity to thePmi aximum entropy constraints. These were pioneering work of Rose was generalized by Banerjee et. al abandonedbecause of difficultiesencounteredin obtaining an [15, 16] to include a wider class of distortion functions using acceptable form for the partition function, with the exception Bregman divergences.One of the significant features in these of a few specialized cases. works involved the formulation of a Shannon-Bregmanlower The OT constraints were subsequently replaced by the bound. Curado-Tsallis (C-T) constraints hAiC−T = pqA . The Motivated by the recent results [12], this paper provides q i i C-T constraints were later replaced by thePi normalized two means to solve the GBRD problem for sources with Tsallis-Mendes-Plastino (T-M-P) constraints hAiT−M−P = boundedsupport.First,theanalyticalsolutionmaybeobtained pq from a Tsallis-Bregman lower bound (Section IV). Next, ipqAi. The dependence of the expectation value on the for a finite reproduction alphabet, the RD function may be Pi i numericallyobtainedbyacomputationalmethodologyderived norPmialization pdf pqi, renders the T-M-P constraints to from generalized statistics (Section III). be self-referential. PTihis paper, like [12], utilizes a recent methodology [18] to “rescue” the OT constraints, and, has B. Bregman Divergences linked the OT, C-T, and, T-M-P constraints. Definition1(Bregmandivergences):Letφbearealvalued strictly convex function defined on the convex set S ⊆ℜ, the B. The nonextensive variational principle domainofφsuchthatφisdifferentiableonint(S),theinterior Keeping X˜ fixed, taking the variational derivative of (11) of S. The Bregman divergence B : S ×int(S) 7→ ℜ+ is s φ with respect to p(x˜|x) while enforcing p(x˜|x) = 1 with definedas B (z ,z )=φ(z )−φ(z )−(z −z ,∇φ(z )), φ 1 2 1 2 1 2 2 Px˜ where B (z ,z ) = φ(z )−φ(z )−(z −z ,∇φ(z )) is φ 1 2 1 2 1 2 2 the gradient of φ evaluated at z2. 2Calligraphic symbolsindicate sets the normalization Lagrange multiplier λ(x), yields vanishes. The inverse temperature β and the effective inverse temperature β∗ relate as 1/(q−1) p(x˜|x)=p(x˜) (1−q) λ˜(x)+βd (x,x˜) , h q n φ oi β = qℵq(x)β∗ . (19) λ˜(x)= λp((xx)) −p(x)(1−q). h1−β∗(q−1)hdφ(x,x˜)ip(x˜|X=x)i (12) Multiplying the terms in the square brackets in (12) by the C. Support estimation step conditional probability p(x˜|x), and summing over x˜ yields Theprocedurein SectionIII.B.iscarriedoutfora givenβ, q p(x˜) p(x˜|x) q+β d (x,x˜)p(x˜|x)+ correspondingtoasinglepointontheRDcurve.Reproduction q−1 p(x˜) φ alphabetswithoptimalsupportareobtainedbykeepingp(x˜|x) Px˜ (cid:16) (cid:17) Xx˜ fixed, and solving (13) βhdφ(x,x˜)ip(x˜|X=x) +λ˜(x) p(x˜|x)=0. | {z } minLq =min I X;X˜ +βhd (x,x˜)i . Px˜ X˜s GBRD X˜s n q(cid:16) (cid:17) φ p(x,x˜)o (20) Evoking p(x˜|X =x)=1, yields The solution to (20) is the optimal estimate/predictor[15, 16] Px˜ q λ˜(x)= (1−q)ℵq(x)−βhdφ(x,x˜)ip(x˜|X=x). (14) x˜∗ =hxip(x|X˜=x˜) = p x|X˜ =x˜ x. (21) Xx (cid:16) (cid:17) q Here, ℵ (x) = p(x˜) p(x˜|x) . The conditional pdf q p(x˜) Algorithm1depictsthepseudo-codeforthegreedyjointnon- Px˜ (cid:16) (cid:17) p(x˜|x) acquires the form convexoptimization procedurethat constitutes the solution to the GBRD model. p(x˜|x)= p(x˜){1−(q−1)β∗dφ(x,x˜)}1/(q−1), ℑ1/(1−q) ℑRD =qℵq(x)+(q−R1D)βhdφ(x,x˜)ip(x˜|X=x), (15) IV. THETSALLIS-BREGMAN LOWERBOUND β∗ = β , ℑRD Theorem 1 For OT constraints, the nonextensive RD func- where β∗ is the effective inverse temperature. Transforming tion with source X ∼ p(x) and a Bregman divergence d is φ q →2−q∗ inthenumeratorandevoking(5),(15)isexpressed always lower bounded by the Tsallis-Bregman lower bound in the form of a q-deformed exponential defined by p(x˜)exp (−β∗d (x,x˜)) p(x˜|x)= Zq˜∗(x,β∗)φ . (16) RqL(D)=Px˜ pq−11(x˜)Sq(X)+ (22) +sup −βD+hln γ (x)i , Thepartitionfunctionevaluatedateachinstanceofthesource q∗ L,β∗ p(x) β≥0n o distribution is where γ is a unique function satisfying 1 Lβ∗ Z˜(x,β∗)=ℑ(1−q) = p(x˜)exp [−β∗d (x,x˜)] (17) RD q φ Xx˜ p(t)γ (t)exp (−β∗d (t,µ))dt=1;∀µ∈dom(φ). The term {1−(1−q∗)β∗d(x,x˜)} in the numerator of (16) Zdom(φ) L,β∗ q∗ φ (23) is a manifestation of the Tsallis cut-off condition [18]. This implies that solutions of (16) are valid when β∗d(x,x˜) < To highlight the critical dependence of the Tsallis-Bregman 1/(1−q∗). The effective nonextensive RD Helmholtz free en- lower bound upon the type of constraint employed to solve theGBRD variationalprobleminSectionIIIB, andenunciate ergy is certain aspects of q-deformed algebra and calculus, a slightly −1 1 ℑ −1 Fq (β∗)= ln Z˜(x,β∗) = RD weaker bound (see Appendix C, [16]) is stated and proved. RD β∗ D q Ep(x) β∗ (cid:28) q−1 (cid:29)p(x). Theorem 2 The generalized RD function is defined by (18) Solution of (16) may be viewed from two distinct per- R (D)= sup −βD+ p(x)ln γ∗(x)dx , spectives, i.e. the canonical perspective and the parametric q β≥0,γ∗∈Λβ∗(cid:26) Zx q∗ (cid:27) perspective. Owing to the self-referential nature of the ef- (24) fective inverse temperature β∗, the analysis and solution of where Λ is the set of all admissible partition functions. β∗ (16) within the context of the canonical perspective is a Further, for each β ≥0, a necessary and sufficient condition formidable undertaking. For practical applications, the para- for γ∗(x) to attain the supremum in (24)is a probability metric perspective is utilized by evaluating the conditional density p(x˜) related to γ∗(x) as pdf p(x˜|x), employing the nonextensive BA algorithm, for a-priori specified β∗ ∈ [0,∞]. Note that within the context −1 γ∗(x)= p(x˜)exp (−β∗d (x,x˜))dx˜ . (25) ofthe parametricperspective, the self-referentialnatureof β∗ (cid:18)Z q φ (cid:19) x˜ Proof that LqGBRD[p(x˜|x)]= D(q)L(GqB)RD = x p(x)expqN∗((−x,ββ∗∗d)(x,x˜))dx =ko∀x˜∈Bǫ ⊂X˜s, − p(x)p(x˜|x)ln p(x˜) dxdx˜+ ⇒ p(xR)p(x˜)expq∗(−β∗d(x,x˜))dxdx˜ =k p(x˜)dx˜, x,x˜ q(cid:16)p(x˜|x)(cid:17) x˜∈B∈ x N(x,β∗) o x˜∈B∈ +βR x,x˜p(x)p(x˜|x)dφ(x,x˜)dxdx˜ ⇒Rx˜∈B∈Rxp(x)p(x˜|x)dxdx˜=ko x˜∈B∈p(x˜R)dx˜⇒1=ko, (=a) R p(x)p(x˜|x)ln p(x˜|x) dxdx˜+ R R R (30) x,x˜ q∗(cid:16) p(x˜) (cid:17) holds true for all x˜ ∈ Bǫ. Defining γβ∗∗(x) = +(=b)βRRx,x˜pp(x(x)p)p((x˜x˜||xx))lndφ(xe,xx˜p)q∗d(x−dβx˜∗dφ(x,x˜)) dxdx˜+ (cid:0)Rx˜p(x˜)expq(−β∗dφ(x,x˜))dx˜(cid:1)−1, (30) yields x,x˜ q∗(cid:16) Z˜(x,β∗) (cid:17) p(x)γ∗ (x)exp (−β∗d(x,x˜))dx=1;∀x˜∈B . (31) +βR x,x˜p(x)p(x˜|x)dφ(x,x˜)dxdx˜ Zx β∗ q∗ ∈ (=c)−R x,x˜p(x)p(x˜|x)Z˜(q−1)(x,β∗)(β∗dφ(x,x˜)+ Using the relation −lnq(x)=lnq∗ 1/x , (26) becomes ln ZR˜(x,β∗) dxdx˜+β p(x)p(x˜|x)d (x,x˜)dxdx˜ (cid:16) (cid:17) q∗ x,x˜ φ (=d)− x,x˜p(x)(cid:17)p(x˜|x)lnqRZ˜(x,β∗)dxdx˜ LqGBRD[p(x˜)]=Zxp(x)lnq∗γβ∗∗(x)dx. (32) =− Rxp(x)lnqZ˜(x,β∗)dx Thus, γβ∗∗(x) satisfies (25) and attains the supremum in (24) (=e)−R p(x)ln p(x˜)exp (−β∗d (x,x˜))dx˜ dx for a given β and corresponding β∗ x q x˜ q∗ φ =LqR [p(x˜)](cid:0).R (cid:1) GBRD (26) Rq(Dβ)=−βDβ +Z p(x)lnq∗γβ∗∗(x)dx, (33) Here, (a) follows from lnq(x) = −lnq∗ 1/x ;q∗ = 2 − where D is the distortion vxalue at which the supremum in (cid:16) (cid:17) β q, (b) follows from (16) where the partition function in the (24)isattainedforagivenβ.Thus,theTsallis-Bregmanlower continuumisZ˜(x,β∗)= p(x˜)exp (−β∗d (x,x˜))dx˜,(c) x˜ q φ bound is followsfrom(8),(d)folloRwsfrom(15),and,(e)followsfrom (17). Analogous to the mapping method of [14], a mapping R (D )=−βD + p(x)ln γ∗ (x)dx=R (D ). qL β β Z q∗ β∗ q β from the unit interval to the reproduction alphabet is sought x (34) such that the Lebesgue measure over the unit interval results in an optimal p(x˜) over the reproduction alphabet. Thus, for V. NUMERICAL SIMULATIONS each instance of a probabilitymeasure correspondingto p(x˜) The efficacy of the GBRD model is computationally in- on X˜ there exists a unique map x˜ : [0,1] 7→ X˜ that maps vestigatedbydrawinga sampleof1000two-dimensionaldata the Lebesgue measure (µ) to p(x˜) such that for any function points,fromthreesphericalGaussiandistributionswithcenters f defined over x˜, f(x˜)p(x˜)dx˜ = f(x˜(u))dµ(u). (2,3.5),(0,0),(0,2) (the quantized codebook). The priors x˜ u∈[0,1] Thus, (26) yields R R and standard deviations are 0.3,0.4,0.3, and, 0.2,0.5,1.0, respectively. To test effectiveness of the support estimation Lq [x˜]= GBRD step, thequantizedcodebookis shifted from the true meansto =− p(x)ln p(x˜)exp (−β∗d (x,x˜))dx˜ dx, x q x˜ q∗ φ positionsat the edgesof the sphericalGaussian distributions. =−Rxp(x)lnq(cid:0)Ru∈[0,1]expq∗(−β∗dφ(x,x˜(u)))(cid:1)dµ(u) dx.The computational procedure described in Algorithm 1 is R (cid:16)R (2(cid:17)7) repeatedly solved for each value of β∗, till a reproduction To obtain the optimality condition, the q-deformed terms alphabet with optimal support is obtained. This consistently characteristic to generalized statistics are to be operated on coincideswiththetruemean,foranegligibleerror.Thiseffect bythedualJacksonderivativeinsteadoftheusualNewtonian is particularly pronounced, and rapidly achieved, for regions derivative [4]. To obtain expressions analogous to the case of of low and intermediate values of β∗, thus providing implicit Newton-Leibniz calculus, the dual Jackson derivative defined proof of the relation between soft clustering and RD with as Bregman divergences [15]. Fig.1depictstheRDcurvesforextensiveRDwithBregman D(q)f(x)= 1 df;1+(1−q)f(x)6=0 1+(1−q)f(x)dx divergences [15, 16] and the GBRD model, with the con- (28) ⇒D(q)ln Z˜(x,β∗)= 1 dZ˜(x,β∗) stituentdiscretepointsoverlaiduponthem.AEuclideansquare q Z˜(x,β∗) dx distortion (a Bregman divergence) is employed. Each curve is employedto enforcethe optimalitycondition.The optimal- has been generated for values of β ∈ [.1,2.5] (the extensive ity condition for (27) becomes case),andβ∗ ∈[.1,100](theGBRDcases),respectively.Note that for the GBRD cases, the slope of the RD curve is −β D(q)Lq =− p(x)ddx˜ expq∗(−β∗dφ(x,x˜(u))) dx. and not −β∗. Note that all GBRD curves inhabit the non- GBRD Z (cid:0)exp (−β∗d (x,x˜(u)))u(cid:1)dµ achievable(nocompression)regionoftheextensiveRDmodel x u∈[0,1] q∗ φ R (29) with Bregmandivergences.Further, GBRD modelspossessing Evoking Borel isomorphism, and, assuming that the optimal alowernonextensivityparameterq inhabitthenon-achievable supportX˜ contains a non-emptyopen ball B , ǫ>0 implies regions of GBRD models possessing a higher value of q. It s ǫ Algorithm 1 GBRD Model [5] P.T. Landsberg and V. Vedral. Distributions and Channel Capacities in Input GeneralizedStatisticalMechanics. Phys.Lett.A,247,pp.211-217,1998. 1. X ∼p(x) over {x }n ⊂dom(φ)⊆ℜm. [6] T. Yamano. Information Theory based on Nonadditive Information i i=1 Content. Phys.Rev.E,63,pp.046105-046111, 2001. 2. Bregman divergence dφ, |X˜s| = k, effective variational [7] T.Yamano. Generalized SymmetricMutualInformationAppliedforthe parameterβ∗ ∈[0,∞],eachβ∗ isasinglepointontheRD Channel Capacity. Phys. Rev. E., To Appear. Manuscript available at http://arxiv.org/abs/cond-mat/0102322. curve. [8] T.Yamano. SourceCodingTheorembasedonaNonadditiveInformation Output Content. PhysicaA,305,1,pp.190-195,2002. n 1.X˜∗ = {x˜}k ,P∗ = {x˜ |x }k that locally [9] T. Cover and J. Thomas. Elements of Information Theory. John Wiley s j=1 j i j=1 & Sons,NewYork,NY,1991. n oi=1 optimizes (11), (R ,D ) tradeoff at each β∗ [10] T.Berger. RateDistortionTheory. Prentice-Hall,EnglewoodCliffs,NJ, β β 2.Value of R (D) where its slope equals −β = 1971. −qℵq(xq)β∗ . [11]PNro.cTeiesdhibnyg,sFo.fCt.hPee3re7itrha,AWn.nBuaialleAkl.leTrthoenInCfoonrmferaetinocneBoonttCleonmecmkuMniceathtioodn,. [1−β∗(q−1)hdφ(x,x˜)ip(x˜|X=x)] ControlandComputing, pp368-377, 1999. Method [12] R.C.Venkatesan. GeneralizedStatisticsFrameworkforRateDistortion 1.Initialize with some {x˜}k ⊂dom(φ). Theory. Physica A, To Appear, 2007. Preliminary manuscript available j=1 2.Set up outer β∗ loop. athttp://arxiv.org/abs/cond-mat/0611567. [13] R. E. Blahut. Computation of Channel Capacity and Rate Distortion repeat Functions. IEEETrans.onInform.Theory,IT18,pp.460473,1972. Blahut-Arimoto loop (16) for single value of β∗ [14] K. Rose. A Mapping Approach to Rate Distortion Computation and repeat Analysis. IEEETrans.onInform.Theory,40,6pp.19391952,1994. [15] A. Banerjee, S. Merugu, I. Dhillon, and, J. Ghosh. Clustering with for i=1 to n do Bregman Divergences. Journal of Machine Learning Research (JMLR), for j=1 to k do 6,pp1705-1749, 2005. p(x˜ |x )← p(x˜j)expq∗(−β∗dφ(xi,x˜j)) [16] A. Banerjee. Scalable Clustering Algorithms Ph.D. dissertation, The j i Z˜(xi,β∗) University ofTexasatAustin,2005. end for [17] J. Naudts. Continuity of a Class of Entropies and Relative Entropies. end for Rev. Math. Phys., 16, pp. 809-824 , 2004. Manuscript available at http://arxiv.org/abs/cond-mat/0208038. for j=1 to k do [18] G. L. Ferri, S. Martinez, and A. Plastino. Equivalence of the Four p(x˜j)← ni=1p(x˜j|xi)p(xi) VersionsofTsallisStatistics. J.Stat.Phys.,04,pp.04009-04024,2004. end for P Manuscript available athttp://arxiv.org/abs/cond-mat/0503441. until convergence Support Estimation Step (X˜s using (21)) 1 β→∞ for j=1 to k do 0.9 β*→∞ x˜j ← ni=1p(xi|x˜j)xi 0.8 end foPr 0.7 Extensive RD with Bregman Divergences until convergence 0.6 Calculate Dβ and Rβ for the β∗. RD0.5 advance β∗ 0.4 GBRD Model:q=0.75 β∗ =β∗+δβ∗ 0.3 0.2 β→ 0 GBRD Model:q=0.50 β*→ 0 0.1 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 is observed that the GBRD model undergoes compression D and clustering more rapidly than the equivalent extensive RD model with Bregman divergences. A primary cause for such Fig. 1. Rate Distortion Curves for GBRD Model and Extensive RD with BregmanDivergences behavior is the rapid increase in β∗ for marginal increases in β, as depicted in Fig. 2 and obtained from (19). ACKNOWLEDGEMENT 100 90 This work was supportedby RAND-MSR contract CSM-DI 80 & S-QIT-101107-2005.Gratitude is expressed to A. Plastino, GBRD Model: q=0.5 70 S. Abe, and, K. Rose for helpful discussions. 60 *β 50 REFERENCES GBRD Model: q=0.75 40 [1] C. Tsallis. Possible Generalizations of Boltzmann-Gibbs Statistics. J. 30 Stat.Phys.,542,pp479-487,1988. 20 [2] M. Gell-Mann and C. Tsallis (Eds.). Nonextensive Entropy- 10 InterdisciplinaryApplications. OxfordUniversityPress,NewYork,2004. 0 [3] C. Tsallis. Generalized Entropy-Based Criterion for Consistent Testing. 0 0.5 1 1.5 2 2.5 3 3.5 4 β Phys.Rev.E,58,pp1442-1445,1998. [4] E. Borges. A Possible Deformed Algebra and Calculus Inspired in Nonextensive Thermostatistics. Physica A, 340, pp. 95-111, 2004. Fig.2. Curvesforβ v/sβ∗ forGBRDModel Manuscript available athttp://arxiv.org/abs/cond-mat/0304545.