Geometry of the Central Limit Theorem in the Nonextensive Case 9 0 0 C. Vignat1 and A. Plastino2 2 n a 1I.G.M., Universit´e de Marne la Vall´ee, Marne la Vall´eee, France J 2Exact Sci. Fac., National University La Plata and IFLP-CCT-CONICET 9 2 C.C. 727, 1900 La Plata, Argentina ] h c e Abstract m - We uncover geometric aspects that underlie the sum of two independent stochastic t a variables when both are governed by q Gaussian probability distributions. The t s − pertinent discussion is given in terms of random vectors uniformly distributed on a . t a p sphere. m − - d n o c [ 1 Introduction 1 v 1 2 Nonextensive statistical physics provides a rich framework for the interpreta- 6 tion of complex systems’ behavior whenever classical statistical physics fails 4 [1]. The basic tool for this approach is the extension of the classical Boltz- . 1 mann entropy to the wider class of Tsallis entropies. In this context, the usual 0 9 Gaussian distributions is extended to the q-Gaussian distributions, to be de- 0 fined below. The study of the properties of these distributions is an interesting : v problem, being the subject of a number of recent publications [1]. Of special i X interest is the extension of the usual stability result that holds in the Gaussian r case, namely, that if X R and X R are independent Gaussian random a 1 2 ∈ ∈ variables with unit variance, then the linear combination Z = a X +a X 1 1 2 2 is again Gaussian and Z a2 +a2X, (1) 1 2 ∼ q Email address: [email protected], [email protected](C. Vignat1 and A. Plastino2). Preprint submitted to Elsevier January 29, 2009 where X is Gaussian with unit variance, and denotes equality in distribu- ∼ tion. This stability property is at the core of the central limit theorem (CLT), which describes the behavior of systems that result of the additive superpo- sition of many independent phenomena. The CLT can be ranked among the most important results in probability theory and statistics, and plays an es- sential role in several disciplines, notably in statistical mechanics. Pioneers like A. de Moivre, P.S. de Laplace, S.D. Poisson, and C.F. Gauss have shown that the Gaussian distribution is the attractor of the superposition process of independent systems with a finite second moment. Distinguished authors like Chebyshev, Markov, Liapounov, Feller, Lindeberg, and L´evy have also made essential contributions to the CLT-development. As far asphysics is concerned one can state that, starting from any system with any finite variance distribu- tion function (for some measurable quantity x), and combining additively a sufficiently large number of such independent systems together, the resultant distribution function of x is always Gaussian. Anaturalquestion isthustheextension ofthestabilityresult (1)tothenonex- tensive case, that is, for q-Gaussian distributions. This interesting problem is currently the subject of several publications (see for example [2]) in which possible extensions of the CLT to the nonextensive context are studied. The aim of this communication is to give some geometric insight into the behavior of q-Gaussian distributions for the case q < 1. 2 Definitions and notations In nonextensive statistics, the usual Shannon entropy of a density probability f , namely X H (X) = f logf 1 X X − Z is replaced by its Tsallis version 1 H (X) = 1 fq q 1 q − X − (cid:18) Z (cid:19) where the nonextensivity index q is a real parameter, usually taken to be pos- itive. It can be checked by applying L’Hospital’s rule that Shannon’s entropy coincides with the limit case limH (X) = H (X) q 1 q→1 It is a well-known result that the distribution that maximizes the Shannon entropy under a covariance matrix constraint EXXT = K (where K is a 2 symmetric definite positive matrix) is the Gaussian distribution 1 f (X) = exp XTK−1X . X 1 πK 2 − | | (cid:16) (cid:17) Its nonextensive counterpart, called a q-Gaussian, is defined as follows. Definition 1 The n variate distribution with zero mean and given covari- − ance matrix EXXT = K having maximum Tsallis entropy is denoted as G (K) and defined as follows for 0 < q < 1 : q 1 f (X) = A 1 XtΣ−1X 1−q , (2) X q − + with matrix Σ = pK, parameter p(cid:16)defined as p(cid:17)= 22−q + n and notation 1−q (x) = max(x,0). Moreover, the partition function is + Γ 2−q + n A = q−1 2 . q Γ 2(cid:16)−q πΣ(cid:17)1/2 1−q | | (cid:16) (cid:17) We note that this distribution has bounded support; namely, f (X) = 0 only X 6 when X belongs the ellipso¨ıd = Z Rn; ZtΣ−1Z 1 . Σ E ∈ ≤ n o We also need the notion of spherical vector, defined as follows: Definition 2 A random vector X Rn is spherical if its density f is a X ∈ function of the norm X of X only, namely | | f (X) = g( X ) X | | for some function g : R+ R+. → An alternative characterization of a spherical vector is as follows [3]: Proposition 3 A random vector X Rn is spherical if ∈ X AX ∼ for any orthogonal matrix A, where sign denotes equality in distribution. ∼ This property highlights the importance of spherical vectors in physics since they describe systems that are invariant by orthogonal transformation. A fundamental property of a spherical vector is the following: 3 Proposition 4 [3] If X Rn is a spherical random vector, then it has the ∈ stochastic representation X rU ∼ where U is a uniform vector on the sphere = X Rn; XTX = 1 and r is n S ∈ a positive scalar random variable independent ofnU. Moreover, r has ostochastic representation r X . (3) ∼ | | 3 A heuristic approach We start with a heuristic approach to the stability problem, namely the be- havior of the random variable Z = a X +a X when X and X are two unit 1 1 2 2 1 2 variance, q-Gaussian independent random vectors in Rn with nonextensivity parameter q < 1; let us assume that the following hypothesis - called (H) hypothesis : 2 n+ N, (4) 1 q ∈ − holds so that 1 = p−n 1 where p > n is an integer; a classical result is that 1−q 2 − X (resp. X ) can then be considered as the n dimensional marginal vector 1 2 − of a random vector U (resp. U ) that is uniformly distributed on the unit 1 2 sphere p−1 in Rp. Thus, there exist random vectors X˜1 and X˜2 in Rp−n such S that X X 1 2 U = and U = 1 2 X˜ X˜ 1 2 aretwop dimensionalindependentvectorsuniformlydistributedon .Then, p − S the sum U +U is a spherical vector and has stochastic representation 1 2 a U +a U rU 1 1 2 2 ∼ where U is uniform on . Now, by equation (3), the random variable r is p S distributed as r a U +a U = a2 +a2 +2λa a 1 1 2 2 1 2 1 2 ∼ | | where λ = UTU : this can be easily dedquced from 1 2 a U +a U = a2UTU +a2UTU +2a a UTU 1 1 2 2 1 1 1 2 2 2 1 2 1 2 | | q remarking that UTU = UTU = 1. But λ is a random variable with q- 1 1 2 2 Gaussian distribution! We prove this result by noticing that, conditioned to U = u , random variable λ is the angle between U and the fixed direction u . 2 2 1 2 Since U is spherical, we may restrict our attention to the angle between U 1 1 4 and the first vector of the canonical basis in Rn, so that we look for the distri- bution of the first component of U , which follows a q-Gaussian distribution 1 with parameter q such that λ 1 p 1 = − 1. 1 q 2 − λ − Since this distribution does not depend on our initial choice U = u , random 2 2 variable λ follows unconditionally the above cited distribution. We conclude that the n dimensional marginal Z = a X +a X of vector a U +a U is 1 1 2 2 1 1 2 2 − distributed as a X +a X rX 1 1 2 2 ∼ where X is the n dimensional marginal vector of U so that X is again − q Gaussian with parameter q. Moreover, this result extends to the case where − X and X both have1 a covariance matrix K = I by multiplying vectors X 1 2 1 1 6 and X2 by matrix K2. Consequently, we have deduced the following Theorem 5 If X and X are two q-Gaussian independent random vectors 1 2 in Rn with covariance matrix K and nonextensivity parameter q < 1 and if hypothesis (H) holds then a X +a X (a a )X 1 1 2 2 1 2 ∼ ◦ where X is again q-Gaussian with same covariance matrix K and same nonex- tensive parameter q as X , and where 1 a a = a2 +a2 +2λa a , (5) 1 2 1 2 1 2 ◦ q the random variable λ being independent of X and again q-Gaussian dis- tributed with nonextensive parameter q defined by λ (n 1) (n 3)q q = − − − . (6) λ (n+1) (n 1)q − − Two remarks are of interest at this point: the univariate framework n = 1 is the only case for which random variable • λ has the same nonextensivity parameter q as X and X ; λ 1 2 however, we note that • lim q = 1. n→+∞ λ This means that for large dimensional systems, the random variable λ con- verges to the constant 0 and we recover the deterministic convolution; this 1 the case where X1 and X2 have distinct covariance matrices is more difficult and left to further study 5 q l 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 q q Figure 1. nonextensivity parameter q as a function of q for dimensions λ n = 1,2,3,5,10 and 100 (bottom to top) is coherent with the fact that large dimensional q Gaussian vectors are − ”close” to Gaussian vectors by De-Finetti inequality. The curves in Figure 1 show the nonextensive parameter q as a function of q λ for several values of dimension n. More can be said about the algebra a a : 1 2 ◦ Theorem 6 The algebra a a defined as in (5) is associative and for any 1 2 ◦ n 2, ≥ n a a ... a = a2 +2 λ a a 1 ◦ 2 ◦ ◦ n v i ij i j ui=1 i<j uX X where random variables λ = UTU atre q-Gaussian. ij i j As an example, a a a = a2 +a2 +a2 +2λ a a +2λ a a +2λ a a . 1 2 3 1 2 3 12 1 2 13 1 3 23 2 3 ◦ ◦ q PROOF. By definition, n a a ... a = a U 1 2 n i i ◦ ◦ ◦ | | i=1 X n = a2UtU +2 a a UtU v i i i i j i j ui=1 i<j uX X t 6 Since U = 1, we deduce, by denoting UtU = λ , that | i| i j ij n a a ... a = a2 +2 λ a a . 1 ◦ 2 ◦ ◦ n v i ij i j ui=1 i<j uX X t By the same proof as above, we deduce that each λ is q Gaussian distributed i − with parameter q . We remark that random variables λ are independent λ i,j pairwise but are obviously not mutually independent. 4 Generalization The preceding result was derived under the hypothesis (H) as expressed by (4), that is, for specific values of q < 1 only; we show in this section that this result holds in fact without this hypothesis - for all values of q < 1 - but the proof requires more elaborate analytic tools. Our main result is Theorem 7 Theorem 5 holds for all values of q such that 0 < q < 1. PROOF. The characteristic function associated to the q Gaussian distribu- − tion (2) is ϕ (u) =d Eexp iuTX = 2p2−1Γ p Jp2−1 √uTKu X (cid:16) (cid:17) (cid:18)2(cid:19) √u(cid:16)TKu p2−1(cid:17) (cid:16) (cid:17) where Jp2−1 is the Bessel function of the first kind and with parameter p2 −1 where 2 q p = 2 − +n. 1 q − According to Gegenbauer [4, 367, eq.16], 2νΓ ν + 1 Γ 1 Jν (Z)Jν (z) = π Jν √Z2 +z2 −2Zzcosφ sin2ν φdφ. 2 2 Zν zν 0 ((cid:16)Z2 +z2 2Zzcosφ)ν2 (cid:17) (cid:18) (cid:19) (cid:18) (cid:19) Z − Choosing Z = a √uTKu, z = a √uTKu, λ = cosφ and ν = p 1, this 1 2 − 2 − equality can be rewritten as ϕ (u)ϕ (u) = ϕ (u) a1X1 a2X2 √a21+a22+2λa1a2X where λ is distributed according to f (λ) = Γ(ν +1) 1 λ2 ν−12 . Γ ν + 1 Γ 1 − 2 2 (cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17) 7 Since q is defined by λ 1 1 p 1 = ν = − 1, 1 q − 2 2 − λ − we deduce (6). Let us recall the scaling behavior of Gaussian vectors a X +a X a2 +a2X 1 1 2 2 1 2 ∼ q which can be probabilistically interpreted in the context of α stable distri- − butions: a distribution f is α stable if, for X and X independent with α 1 2 − distribution f , the linear combination α 1 a1X1 +a2X2 ( a1 α + a2 α)α X, ∼ | | | | whereX followsagaindistributionf .Thus,aGaussiandistributionisα stable α − with α = 2. The result of Thm.1 can be viewed as follows: q Gaussians are − not α stable (unless q = 1 which corresponds to the Gaussian case α = 2); − however, their scaling behavior is close to the Gaussian α = 2 case, except for the fact that the scaling variable a a includes an additional random term 1 2 ◦ λ. 5 Geometric interpretation Geometrically, the Gaussian scaling factor a2 +a2 can be interpreted, ac- 1 2 cording to Pythagoras’ theorem, as the lenqgth of the hypotenuse of a right triangle with sides of lengths a and a . The q Gaussian case corresponds 1 2 | | | | − toatriangleforwhichtheanglebetween a and a ,letuscallitφ,fluctuates 1 2 | | | | around rectangularity. The distribution of the angle φ where λ = cosφ is given by − Γ(ν +1) 1 1 f (φ) = sin2ν φ, 0 φ π, ν = + . φ Γ ν + 1 Γ 1 ≤ ≤ 1 qλ 2 2 2 − (cid:16) (cid:17) (cid:16) (cid:17) This distributions is shown in Figure 3 for values of the parameter q = 0.99, 0.9, 0.5 and 0.1 (top to bottom). 8 a 1 a oa 1 2 a 2 Figure 2. the geometric interpretation of a1 a2 in the Gaussian case (q = 1 left); in ◦ the q Gaussian case (left), a1 a2 is randomly chosen as one of the hypothenuses − ◦ represented, the angle φ between sides a1 and a2 being distributed as shown on Figure 3 f f ( $ & ' ! # f #"# #"( !"# !"( '"# '"( &"# % Figure 3. the distribution of angle φ for values of the parameter q = 0.99, 0.9, 0.5 λ and 0.1 (top to bottom). We remark that this distribution is symmetric around the angle φ = π and 2 that, as q 1, the angle φ becomes deterministic and equal to π. Further, → 2 the usual scaling law for Gaussian distributions (1) is recovered. 9 5.1 An optical analogy We remark that formula (5) exhibits a close resemblance with the interference formula for the amplitude of the superposition of two optical beams. Interfero- metric optical testing is based on these phenomena of interference. Two-beam interference is the superposition of two waves, such as the disturbance of the surface of a pond by a small rock encountering a similar pattern from a second rock. When two wave crests reach the same point simultaneously, the wave height is the sum of the two individual waves. Conversely, a wave trough and a wave crest reaching a point simultaneously will cancel each other out. Water, sound, and light waves all exhibit interference. A light wave can be described by its frequency, amplitude, and phase, and the resulting interference pattern between two waves depends on these properties, among others. Our present interest lies in the two-beam interference equation. It gives the irradiance I [6] for monochromatic waves of irradiance I , and I in terms of the phase 1 2 difference ∆ expressed as cosφ = cos(φ φ ). We have 1,2 1 2 − I = I +I +2 I I cosφ, 1 2 1 2 and, in terms of the A amplitudes I = Aq2, − A2 = A2 +A2 +2A A cosφ. 1 2 1 2 If theemission ofthe two beams couldbeso arrangedthat thephase difference becomes random [7,8,9], this physical analogy would be exact. 5.2 Study of the composition law ◦ The composition law a a = a2 +a2 +2λa a 1 2 1 2 1 2 ◦ q has been studied in [5], in the more general case where a and a are indepen- 1 2 dent, positive random variables. The associativity result is as follows Theorem 8 [5, p.18 thm.1] The composition law is associative if and only ◦ if either a a = a2 +a2 1 2 1 2 ◦ or q a a = a + a 1 2 1 2 ◦ | | | | or a a = a2 +a2 +2λa a (7) 1 2 1 2 1 2 ◦ where λ G (0,1) for some q 0q. q ∼ ≥ 10