The tensor network representation of high order cumulant and algorithm for their calculation. 7 1 0 Krzysztof Domino∗, Piotr Gawron†, and Łukasz Pawela‡ 2 n Institute of Theoretical and Applied Informatics, a J Polish Academy of Sciences, 9 1 Bałtycka 5, 44-100 Gliwice, Poland ] A January 19, 2017 N . s c [ 1 v Abstract 0 2 4 In this paper we introduce a novel algorithm of calculating arbitrary order 5 th 0 cumulants of multidimensional data. Since the n order cumulant can be . presented in the form of an n-dimensional tensor, the algorithm is presented 1 0 using the tensor network notation. The presented algorithm exploits the 7 super–symmetry of cumulant and moment tensors. We show, that proposed 1 : algorithm highly decreases the computational complexity of cumulants cal- v i culation, compared to the naïve algorithm. X r a Keywords Highordercumulants, tensornetworks, non–normallydistributed data, numerical algorithms ∗[email protected] †[email protected] ‡[email protected] 1 1 Introduction – cumulant tensors. 1.1 The motivation Cumulantsofordern ≥ 3,highordercumulants, haverecentlystartedtoplay an important role in the analysis of non–normally distributed multivariate data. The major potential application of high order cumulants is financial data analysis [1, 2, 3]. The analysis of large sets of financial data, such as large time series of many shares’ prices, is a big challenge that requires an efficient method of high order cumulants calculation. Another application of high order cumulants is the analysis of hyper–spectral data, especially small target detection procedure [4]. High order cumulants have been also used for quantum noise investigation [5], or signal auto–correlation analysis [6]. To illustrate other potential applications of high order cumulants, let us mention thattherearealsoothertypesofdatathatarenot–normallydistributed, such as: weather data [7, 8, 9], medical data [10], cosmological data [11] or data generated for machine learning purpose [12]. In most cases of no–normally distributed data analysis only cumulants of ordern ≤ 4areused, sincethecomputationcomplexityandtheusageofcom- putationalresources increases strongly withthecumulants’ order. Hence, the application of cumulants of order n > 4 is an almost unexplored field because of the lack of efficient methods of their calculation and storage. However, the use of cumulants of order n > 4 is required to analyse data where ex- treme events are crucial, such as financial data [1, 3, 13, 14]. In this paper we introduce an efficient method of high order cumulant’s calculation that exploits the recursive relation between cumulants and the cumulants super– symmetric structure. To exploit the super–symmetry of cumulants tensors we use a block structure introduced in [15]. In general we structure a super– symmetric tensor into a tensor of blocks and store only one pyramid part of such tensor of blocks. This operation decreases significantly the computer memory usage necessary to store cumulants tensors. The paper is organised as follows. In Section 2 we introduce the recursive formula for the calculation of any cumulant of order n ≥ 3. In Subsec- tions 2.1 − 2.2 this formula is presented using tensor operations and some examples are presented graphically in tensor networks form. The general graphic representation is presented in Appendix B. In Section 3 we present the algorithm that can be implemented for the cumulant calculation. In Subsection 3.1 we discuss the structure of boxes that is used to store and calculate super–symmetric cumulant tensors. In Subsection 3.2 we present the algorithm implementation and compare it with the naïve algorithm. In Appendix A we present an alternative to [16] proof of the moment cumulant 2 recurrence relation. 1.2 Basic definitions Let us start with a random process generating discrete multi–dimensional values. A sequence of T samples of a M dimensional random variable is represented in form of a matrix X ∈ RT,M such that x ... x 1,1 1,M X = ... xt,m ... . (1) x ... x T,1 T,M This matrix can be represented by a sequence of M marginal variables X i X = [X ,...,X ,...,X ], (2) 1 i M where ⊺ X = [x ,...,x ,...,x ] . (3) i 1,i t,i T,i We recall the cumulant generating function [17, 18] T exp([x ···x ···x ]·τ⊺) K : RM → R K(τ) = log t=1 t,1 t,m t,M , (4) T ! P where τ = [τ ,...,τ ] and the dot represents scalar product. 1 M M ×...×M Let A ∈ R n = RM×n be an n mode tensor, with elements a . The n-tuple I = (i ,...,i ) is its multi–index. Hereafter we will i1,...,in | {z }1 n write a instead of a when appropriate. List of symbols used in the I i1,...,in paper is gathered in the Table 1. Definition 1.1. Let X ∈ RT×M be as in Eq. (1). We define the n’th moment as a tensor M (X) with elements n T 1 m = E(X ,...,X ) = x ·...·x . (5) I i1 in T t,i1 t,in t=1 X Definition 1.2. Let X ∈ RT×M be as in Eq (1). We define the centred variable X˜ ∈ RT×M as a matrix X˜ = [X˜ ,...,X˜ ,...,X˜ ], with X˜ = X −m (X). (6) 1 i m i i i 3 symbol description / explanation th I = (i ,··· ,i ) n element multi–index 1 n |I| = n size of multi–index (number of el- ements) X ∈ RT×M matrix of T realisations of M di- mensional random variable ⊺ th X = [x ,...,x ] vector of T realisations of i i 1,i T,i marginal random variable E(X ,...,X ) = 1 T x ·...·x expectational value operator i1 in T t=1 t,i1 t,in A matrix with elements a P i1,i2 A tensor with elements a I X˜ ∈ RT×M matrix of T realisations of M di- mensional centred random vari- able C (X) ∈ RM×n the nth cumulant tensor of X with n elements c I M (X˜) ∈ RM×n the nth central moment tensor of n X with elements m I Table 1: Symbols used in the paper. 1.3 Examples The first four cumulants can be represented using simple formulas [17, 18]. The first cumulant C 1 C (X) = [c (X),...,c (X)] (7) 1 1 M is the mean vector with elements c such as i c (X) = E(X ). (8) i i The second cumulant C is the symmetric covariance matrix 2 c (X) ... c (X) 1,1 1,M RM×2 ∋ C2(X) = ... ci,j(X) ... . (9) c (X) ... c (X) M,1 M,M Using normalized random variable X˜ its elements are easily computable c (X) = E X˜ X˜ (10) i,j i j (cid:18) (cid:19) 4 The third cumulant C ∈ RM×3 elements are: 3 c (X) = E X˜ X˜ X˜ , (11) i,j,k i j k (cid:16) (cid:17) The fourth cumulant C ∈ RM×4 elements are: 4 c (X) = E X˜ X˜ X˜ X˜ −E X˜ X˜ E X˜ X˜ i,j,k,l i j k l i j k l (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) (12) −E X˜ X˜ E X˜ X˜ −E X˜ X˜ E X˜ X˜ . i k j l i l k j (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) Our goalis to represent equations such as Eq. (11) and Eq. (12) for higher or- derscumulants inthetensornetworks formandprovideanefficient numerical algorithm for their calculation. 1.4 Normally and non–normally distributed data Let us consider the M–dimensional normally distributed random variable X ∼ N(µ,Σ), with Σ is being a positive–definite covariance matrix and µ ˜ being a mean value vector. In this case the characteristic function φ(τ) and the cumulant generation function are 1 1 K(τ) = log(φ˜(τ)) = τ⊺µ+ τ⊺Στ, φ˜(τ) = exp τ⊺µ+ τ⊺Στ , (13) 2 2 (cid:18) (cid:19) It is easy to see that K(τ) is quadratic in τ and hence its third and higher derivatives over τ, that are elements of high order cumulants [17, 18], are zero, i.e. ∀ ,∀ c (X) = 0. (14) n≥3 I∈{1,...,M}n i1,...,in If the data has a frequency distribution other than the multivariate nor- mal distribution, the characteristic function may be expanded in more terms than quadratic and cumulants of order higher that two may have non–zero elements. 2 Cumulant calculation. In this section we are going to present a recursive formula that can be used to calculate the nth cumulant of X for, n ≥ 3. For this purpose we start with some definitions, mainly concerning combinatorics. Let us denote the set {1,2,...,n} by 1 : n, permutation of the multi–index I = (i ,...,i ) as 1 n π(I), and permutation of a tuple of multi–indices (I ,...,I ), where I = 1 σ k (i ,...,i ), as π′. k1 kr 5 Definition 2.1. Let A ∈ RM×n be a tensor indexed by the multi–index I. The tensor A is super–symmetric if it is invariant under any permutation π of the multi–index, i.e. ∀ ∀ A = A . (15) I∈{1,...,M}n π I π(I) Remark. For n ≥ 2, each cumulant tensor C as well as each moment tensor n M is super–symmetric [16]. n Definition 2.2. Partition of the multi–index P (I). Let I be a multi–index, σ I = (i ,...,i ), let σ ∈ 1 : n. Consider the division of I into a σ-tuple of 1 n sub multi–indices: P (I) = (I ,...,I ,...,I ) where I = (i ,...,i ), such σ 1 k σ k k1 kr that σ,Rk {k } = 1 : n and {k ,...,k }∩{k′,...,k′ } = ∅ for k 6= k′. k=1,r=1 r 1 r 1 r′ DefinSition 2.3. Consider following partitions of the multi–index I: P (I) = σ (I ,...,I ) and P′(I) = (I′,...,I′). Let us introduce the equivalence rela- 1 σ σ 1 σ tion: Pσ(I) ∼ Pσ′(I) ⇔ ∃π′ ∀k∈1:σ ∃πk(I1′,...,Iσ′) = π′(π1(I1),...,πσ(Iσ)) . (cid:16) (cid:17)(16) This relation defines an abstraction class. Henceforth we will take only one representative of each abstraction class and denote it simply by [P (I)]. σ Remark. The number ofpartitionsofset I ofsize ninto σ partsistheStirling Number of the second kind, [19] σ 1 σ |{P (I)}| = S(n,σ) = (−1)(σ−j) jn. (17) σ σ! j j=0 (cid:18) (cid:19) X Definition 2.4. Let X ∈ RT×M, following [16] let us define F : P (I) 7→ c (X). (18) σ I∗ I∗∈YPσ(I) Since cumulants aresuper–symmetric andmultiplication iscommutative, the following formulas are uniquely defined. Example 2.1. Let I = (i ,i ,i ) and σ = 2. 1 2 3 F(P (I)) = F(((i ),(i ,i )))+F(((i ),(i ,i )))+F(((i ),(i ,i ))) = σ 1 2 3 2 1 3 3 2 1 [PXσ(I)] = c (X)·c (X)+c (X)·c (X)+c (X)·c (X). i1 i2,i3 i2 i1,i3 i3 i2,i1 (19) 6 The following recursive relation can be used to relate moments and cu- mulants of the multidimensional random variable [16]: |I| m (X) = c (X) . (20) I I∗ Xσ=1[PXσ(I)] I∗∈YPσ(I) A proof of formula Eq. (20) is presented in Appendix A. Let us now study the case σ = 1 separately. Obviously [P (I)] = I and 1 F(I) = F(I) = c (X), then [P1(I)] I P |I| m (X) = c (X)+ c (X) . (21) I I I∗ Xσ=2[PXσ(I)] I∗∈YPσ(I) th th Then cumulant’s elementcanbecalculatedgiventhen moment’selement and elements of cumulants of order r ∈ 1 : n |I| c (X) = m (X)− c (X) . (22) I I I∗ Xσ=2[PXσ(I)] I∗∈YPσ(I) The cumulant of order 2 or higher of a non centred variable and a centred variable are equal. The cumulant of order 1 of a centred variable is equal to zero. Thus, hereafter we will only consider such partitions P (I) that σ ∀ |I | > 1. Following this argumentation we can derive the final formula: k k σmax cI(X) = cI(X˜) = mI(X˜)− cI∗(X˜) Xσ=2 [PXσ(I)] I∗∈YPσ(I) (23) σmax = mI(X˜)− cI∗(X) . Xσ=2 [PXσ(I)] I∗∈YPσ(I) If the size of I is even, it can be divided at most into σ = |I| parts of max 2 size 2; if size of I is odd, it can be divided at most into σ = |I|−1 parts: max 2 |I|−1 −1 parts of size 2 and one part of size 3. Hence we can conclude that 2 σ = ⌊|I|⌋. max 2 Number of the elements of the sum equals to the number of set [Pσ(I)] partitions of I into exactly σ parts, such that no part is of size 1: P |{Pσ(I) : ∀I∗∈I |I∗| ≥ 2}| = S′(n,σ). (24) 7 That is some alternation of the Stirling number of the second kind, and can be computed as follows: n−2σ+2 n−2σ+2k−Pki=1ni n! S′(n,σ) = ··· ··· , n !...n !...(n− (σ−1)n )!σ! nX1=2 nXk=2 1 k i=1 i σ−1 P (25) | {z } where we counts a number of ways to divide n elements into subsets of σ length, n ≥ 2,...,n ≥ 2,...,n ≥ 2 such that n = n, hence n = 1 k σ k=1 k σ n − σ−1n . The σ! in denominator is to ensure that any permutation of k=1 k P n ,...,n is count only once. 1 σ P Let us mention that S′(4,2) = 3, S′(5,2) = 10, S′(6,2) = 25 and S′(6,3) = 10. To compute each element of in Eq. (23), we need [Pσ(I)] σ − 1 multiplications, hence to compute each element of σmax we need P σ=2 S′(n,σ) · (σ − 1) multiplications. Finally the number of multiplications to P compute the second term of the RHS of Eq. (23) is ⌊n⌋ ⌊n⌋ 2 2 # mult(n) = S′(n,σ)·(1−σ) < S(n,σ)·(1−σ) (26) σ=2 σ=2 X X foreachI. However thecomputationofm requiresT·(n−1)multiplications, I what isdominant ifT islargeandnissmall. Forthecomputationalcomplex- ity purpose, let us see that: # mult(4) = 3, # mult(5) = 10, # mult(6) = 55, # mult(7) = 266, # mult(8) = 1414. 2.1 Cumulant in the form of the tensor network. In the following Subsection we are going to present the cumulant calculation algorithm using operations on tensors. This representation and the graphic representation introduced in Subsection 2.2 provides a good introduction for the algorithm presented in Section 3. Definition 2.5. The unit tensor ∈ RT×n is the tensor with elements (n) 1 a = a = ... = a = 1 and all other elements equal to zero. 1,...,1 2,...,2 T,...,T Definition 2.6. Let A ∈ RT1×···×Tk×···×Tn and let Y ∈ RM×Tk. The tensor- th matrix multiplication in the k mode is given by: T k B = (A× Y) = a y , (27) i1,...,ik−1,j1,ik+1,...in k i1,...,ik−1,j1,ik+1,...in t1,...,tn j1,tk tXk=1 8 where B ∈ RT1×...×Tk−1×M×Tk+1×...×Tn Definition 2.7. Let A ∈ RT×n and Y ∈ RM×T. The tensor-matrix product in modes 1 to n is given by T T B = (A× Y) = ··· a y ·...·y , (28) i1,...,in 1,...,n i1,...,in t1,...,tn i1,t1 in,tn tX1=1 tXn=1 n n | {z } where B ∈ RM×n. | {z } Let X˜ ∈ RT×M be as in Eq (6). The nth central moment’s of X˜ element m is I T T 1 1 m (X˜) = (x˜ ·...·x˜ ) = x˜ ·...·x˜ = I T t,i1 t,in T 1(n)t1,...,tn i1,t1 in,tn Xt=1 t1,.X..,tn=1 n 1 = ( × X˜⊺) . | {z } (n) 1,...,n I T 1 (29) The whole central moment tensor is 1 RM×n ∋ M (X˜) = ( × X˜⊺). (30) n (n) 1,...,n T 1 Example 2.2. The second cumulant matrix and the third cumulant tensor are central moments, and can be represented as 1 RM×2 ∋ C (X) = M (X˜) = X˜⊺X˜, (31) 2 2 T 1 RM×3 ∋ C (X) = M (X˜) = × X˜⊺. (32) 3 3 (3) 1,2,3 T1 Definition 2.8. ConsidertensorsB ∈ RT1×...×Tk×...×Tn1,C ∈ RK1×...×Kk×...×Kn2 indexed by I = (i ,...,i ) and J = (j ,...,j ) respectively, their outer 1 n1 1 n2 product B ◦C = A ∈ RT1×...×Tk×...×Tn1×K1×...×Kk×...×Kn2 defined as ∀ a = b ·c , (33) I,J (I,J) I J where (I,J) denotes index (i ,...,i ,j ,...,j ). 1 n1 1 n2 If B is a symmetric matrix, A = B◦B is partially symmetric, a = i1,i2,i3,i4 a = a = a , but in general a 6= a 6= i2,i1,i3,i4 i1,i2i4,i3 i3,i4,i1,i2 i1,i2,i3,i4 i1,i3,i2,i4 a . i1,i4,i3,i2 9 Definition 2.9. Let A ∈ RM×n be indexed by I = (i ,...,i ). Let (1,...,n) 1 n be a tuple of its modes. For given σ consider P (1 : n). Now we can define σ a sum of outer products of B(n1) ∈ RM×n1,...,B(nk) ∈ RM×nk,...,B(nσ) ∈ RM×nσ as a = b(nk), (34) I I∗ ζ∈X[Pσ(I)]IY∗∈ζ further we will use the following abbreviation: A = B(n1) ◦···◦B(nk) ◦···◦B(nσ) (35) [PXσ(I)] Example 2.3. Consider a matrix B ∈ RM×2, and a tensor A ∈ RM×4 such that: A = B◦B, (36) [P2(Xi1,...,i4)] then a = b b +b b +b b . i1,i2,i3,i4 i1,i2 i3,i4 i1,i3 i2,i4 i1,i4 i3,i2 Consider A = C ◦ ···◦ C ◦···◦ C , where C ∈ RM×nk are [Pσ(I)] n1 nk nσ nk supper–symmetricPcumulants, such that Cnk = Cnk′ iff nk = nk′. Due to the supper–symmetry, permutations of indices inside each C do not matter. n k Further, since cumulant’s tensors are determined by the size of their muti– index: c ·...·c = c ·...·c . (37) i1,...in1 j1,...jn2 j1,...jn2 i1,...in1 Only permutations of sub multi–index inside each C as well as the per- n k mutation of a tuple of sub multi–indices, such as in Eq. (37) are inside the class of abstraction of multi–index partitions introduced in Def. 2.3. Those permutations do not change the outcome of C ◦···◦C ◦···◦C . Since n1 nk nσ we sum over classes of abstraction, the outcome A is super-symmetric. th Example 2.4. Consider the 4 cumulant element: c (X) = m (X˜)− c (X)c (X), (38) I I I1 I2 [PX2(I)] where I ,I ∈ [P (I)] and |I | = |I | = 2. Noting I = (i ,i ,i ,i ) 1 2 2 1 2 1 2 3 4 c (X) = m (X˜)−c (X)c (X)−c (X)c (X)−c (X)c (X), I I i1,i2 i3,i4 i1,i3 i2,i4 i1,i4 i2,i3 (39) using tensor notation 1 RM×4 ∋ C (X) = × X˜⊺ − C (X)◦C (X). (40) 4 (4) 1,2,3,4 2 2 T1 [P2(Xi1,...,i4)] 10