ebook img

Weighted tensor decomposition for approximate decoupling of multivariate polynomials PDF

0.37 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Weighted tensor decomposition for approximate decoupling of multivariate polynomials

Weighted tensor decomposition for approximate decoupling of ∗ multivariate polynomials 6 Gabriel Hollander, Philippe Dreesen, Mariya Ishteva, Johan Schoukens† 1 0 January 29, 2016 2 n a J Abstract 8 2 Multivariatepolynomialsariseinmanydifferentdisciplines. Representingsuchapolynomialas avectorofunivariatepolynomialscanofferusefulinsight,aswellasmoreintuitiveunderstanding. ] For this, techniquesbased on tensor methods are known,but these haveonly been studied in the C exact case. In this paper, we generalize an existing method to the noisy case, by introducing a O weight factor in the tensor decomposition. Finally, we apply the proposed weighted decoupling . algorithm in thedomain of system identification, and observe smaller model errors. h t a m 1 Introduction and notations [ 1 The starting point in this paper is a multivariate vector function f Rm Rn, where fi (1 i n) is a ∶ → ≤ ≤ v polynomial in m variables of degree at most d. The variables of f will be denoted as u= u1,...,um 0 and the values as f u =y= y1,...,yn = f1 u ,...,fn u . This function may contain(cross term)s 80 of monomials, for e(xam) ple c1(u1u2 or c2)u23u(22, w(he)re c1,c2(∈)R), in which case it is called coupled. The principal goal of this article is to find a decoupled representation of f as illustrated in Fig. 1. 7 0 Given f, we wish to find transformation matrices V ∈ Rm×r and W ∈ Rn×r and a vector g x = ( ) . 1 0 6 u1 y1 u1 x1 g1(x1) z1 y1 1 f u VT ⋮ ⋮ W : ⋮ ( ) ⋮ → ⋮ ⋮ v u y u y Xi m n m xr gr(xr) zr n r a Figure 1: The decoupling process: given f, find the matrices V and W and the univariate func- tions g ,...,g . 1 r g x ,...,g x of univariate polynomials, such that, 1 1 r r ( ( ) ( )) f u ≈Wg VTu , ( ) ( ) ∗ThisworkwassupportedinpartbytheFundforScientificResearch(FWO-Vlaanderen), theFlemishGovernment (Methusalem), the Belgian Government through the Interuniversity Poles of Attraction (IAP VII) Program, the ERC advancedgrantSNLSIDundercontract320378,theERCstartinggrantSLRAundercontract258581,andFWOproject G028015N. †Department of VUB-ELEC, Vrije Universiteit Brussel (VUB), B-1050 Brussels ([email protected], [email protected],[email protected], [email protected]). 1 Weighted tensor decompositions 2 for all inputs u. The first internal variable is denoted as x = VTu, and the second as z = g VTu . ( ) Furthermore,thenumberr ofinternalbranches isassumedtobepredefined. Thisdecoupledrepresen- tationoffersawaytostudyf withoutcrossterms,whichcanbe anadvantageforcertainapplications, as it helps its physical or intuitive understanding. Toourknowledge,[8]and[19]offerasolutiontothisproblemunderthespecialassumptionthatan exact decomposition with r branches exists. Under the extra condition of homogeneous polynomials, this has also been studied in [21]. Section 1.2 of [8] also refers to the related Waring problem. In the case of state-space models, this problem is addressed in [22]. Because the solution of [8] seems to be computationally easier, we have chosen to use and generalize this algorithm, which is based on the first-order derivative information of f. At its core, tensor decompositions are used and the method is outlined in Algorithm 1. Fig. 2 shows a graphical representation. An overview of tensor decompositions can be found in [5], [6], [11] and [7]. V = W J h 1 h r = ... v 1 + + v w r 1 w r Figure 2: Graphical representation of the core of Algorithm 1. Algorithm 1. Decomposing a multivariate polynomial f having an exact decomposition. In this section, we shortly introduce the algorithm of [8]. 1. Evaluate the Jacobian matrix of f ∂f1 u ∂f1 u J u =⎡⎢⎢∂u1( ) ⋯ ∂um( )⎤⎥⎥ ( ) ⎢⎢∂fn⋮ u ⋱ ∂fn⋮ u ⎥⎥ ⎢⎣∂u1( ) ⋯ ∂um( )⎥⎦ in N randomly chosenpoints u(1),...,u(N). The number N of sampling points is chosenby the user. The equality f u =Wg VTu implies for the Jacobians that ( ) ( ) g′ vTu 0 ⎡ 1 1 ⎤ J u =W⎢⎢ ( ) ⎥⎥VT, ( ) ⎢⎢ 0 ⋱ g′ vTu ⎥⎥ ⎢ r( r )⎥ ⎣ ⎦ wherethe matrixofderivativesis zerooutsideofthediagonalandv ,...,v denotethe columns 1 r of V, and g′ denote the derivative of the i-th component of g. i 2. Stack the Jacobians into a three-way tensor of dimensions n m N. Here, the k-th frontal slice of consists of the first-order informatJion of f evaluated ×in th×e sampling point u(k), i.e., , ,kJ=J u(k) (where 1≤k≤N). Here, the matlab-notation , ,k is used for the k-th J(∶ ∶ ) ( ) J(∶ ∶ ) frontal slide of . J 3. Compute the Canonical Polyadic Decomposition (CPD) of r = wi vi hi. (1) J ∑ ○ ○ i=1 Weighted tensor decompositions 3 Here,thevectorsw (respectivelyv )definethecolumnsofthematrixW(respectivelyV)ofthe i i decoupledrepresentation. Furthermore,thevectorsh ,...,h containthefirst-orderinformation 1 r of the internal univariate functions g x ,...,g x evaluated in the N sampling points, after 1 1 r r transformation by VT, i.e., x(k) =VT(u(k)). We th(us)have hij =gj′ vjTu(i) . ( ) 4. Startingfromthevectorsh ,...,h ,reconstructtheinternalunivariatefunctionsg x ,...,g x . 1 r 1 1 r r Thisworksbyfittingthederivativesg′ usingthevectorsh ,...,h ,andthentorec(ove)rthefun(c- ) j 1 r tions g withan integrationstep. This method is described inSection II.C of[9]. Since we focus j our attentionto noisy coefficients off, itseems reasonableto use allthe informationavailablein the vectors h ,...,h , instead of the method proposed in Section 2.4 of [8]. 1 r In [8], it is shown that Algorithm 1 works well in case that the function f admits an exact de- composition. In this paper, we generalize the method to the noisy case: here, it is not assumed that an exact decoupling of f exists, and instead, we will search for an approximated decoupling. In this regard, the coefficients of f are thus considered “noisy”: f =f0 vf, where f0 has an exact decoupling + and vf is zero-mean noise. In order to decouple this noisy coupled function f, the covariance matrix Σf =cov v of f will be ( ) assumed to be known throughout this paper. Because this matrix is easy to approximate when doing numericalexperimentsormeasurements,thisseemsareasonableassumption. ThematrixΣf contains the variances of and covariances between the different coefficients. This will lead to the creation of a weight matrix to be used during the decoupling process. This weight matrix will be defined, as common in a weighted least squares approximation problem, as the (pseudo) inverse of a covariance matrix. In conclusion, returning to the outline of the decoupling method, the attention in this paper will be focused on step 3 of Algorithm 1: the CPD will be generalized to a weighted CPD. This paper is organized as follows: in Section 2, the “covariance” matrix Σ of the Jacobian ele- J mentswillbeconstructedasalineartransformationofthe matrixΣf. InSection3,thegeneralization of the CPD with weights will be discussed. Finally, Section 4 summarizes the numerical experiments of the weighted CPD and shows an application of the results in the domain of system identification, while Section 5 contains the conclusion and ideas for future work. 2 Constructing the covariance matrix Σ J In order to create a weighted CPD, the covariance matrix Σ of the Jacobian elements will be con- J structed as a linear transformation of Σf, the covariance matrix of the coefficients of f, except the constant terms. Because contains mnN elements, Σ will have dimensions nmN nmN . J J ( )×( ) In practice, three different matrices (and hence, three different weight matrices) will be discussed in Section 3: (1) only the variances of each Jacobian element will be taken into account while the other covariances will be set to 0, (2) the slice-wise covariances will be taken into account as well, keepingthe restas0,and,(3)allthecovariancesof willbeused,formingadensecovariancematrix. J This way, even though the element-wise and slice-wise defined matrices are approximationsof the full covariance matrix, and are not by themselves well-defined covariance matrices, we will still use this term to denote them. Furthermore, we will denote them respectively Σe , Σs and Σd, and will use J J J Σ if we wish to denote any one of them. J For ease of reading, we will often illustrate dimensions and values for the special case where m = n=d=2. The number ℓ of monomials of degree at most d, given by m+d , is in this case 6, and the ( m ) coupled function f can be written as f u ,u c c u c u c u2 c u u c u2 f(u1,u2)=[f21(u11,u22)]=[d11+d22u11+d33u22+d44u121+d55u11u22+ d66u222], ( ) + + + + + where we use ci and di (1≤i≤6) for the coefficients of the first and second output of f, respectively. The covariance matrix Σf of these coefficients is assumed to be known, and this is a 10 10 matrix. × Weighted tensor decompositions 4 It is defined as follows: c2 ⋯ c6 d2 ⋯ d6 c2 ⎡ var(c2) ⋯ cov(c2,c6) cov(c2,d2) ⋯ cov(c2,d6) ⎤ Σf =c⋮6 ⎢⎢⎢⎢⎢ cov(c⋮6,c2) ⋯⋱ var(⋮c6) cov(c⋮6,d2) ⋯⋱ cov(c⋮6,d6) ⎥⎥⎥⎥⎥. d2 ⎢ cov d2,c2 cov d2,c6 var d2 cov d2,d6 ⎥ ⎢ ( ) ⋯ ( ) ( ) ⋯ ( ) ⎥ ⋮ ⎢ ⎥ ⎢ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎥ d6 ⎢⎢ cov d6,c2 cov d6,c6 cov d6,d2 var d6 ⎥⎥ ⎣ ( ) ⋯ ( ) ( ) ⋯ ( ) ⎦ Because the first-order information of f will be used, the information about the constant terms is not included in the matrix Σf. In general, the dimensions of Σf are ℓ 1 n ℓ 1 n . (( − ) )×(( − ) ) e 2.1 Constructing the element-wise covariance matrix Σ J Using the notations of the previoussection, it is possible to write the first Jacobianelement evaluated at a point u(k) (1≤k≤N) as follows: ⎡c ∂f ⎢ 2⎤ J11(u(k))= ∂u11(u(k))=[102u(1k)u(2k)0 ]⎢⎢⎢⎢c⋮6⎥⎥⎥⎥⎥. ⎣ ⎦ Derivatives of polynomials do not depend of the constant term c . It then follows for the variance of 1 the element J u(k) of that 11 ( ) J ⎡ 1 ⎢ ⎤ ⎢⎢ 0 ⎥⎥ var(J11(u(k)))=[102u(1k)u(2k)0]Σc⎢⎢⎢⎢⎢2uu((1kk))⎥⎥⎥⎥⎥, (2) ⎢ 2 ⎥ ⎢ 0 ⎥ ⎥ ⎣ ⎦ where var c cov c ,c ⎡ 2 2 6 ⎤ Σc=⎢⎢ ( ) ⋯ ( ) ⎥⎥ ⎢ ⋮ ⋱ ⋮ ⎥ ⎢⎢cov(c6,c2) ⋯ var(c6) ⎥⎥ ⎣ ⎦ is the portion of Σf consisting of the variances of and covariances between the coefficients c2,...,c6. When repeating this in a similar way for the other Jacobian elements, one finds all the variances of the elements of , which, once collected into a matrix, give the element-wise matrix Σe : J J varJ(1) ⎡ 11 ⎤ ⎢⎢ varJ(1) ⎥⎥ ⎢ 21 ⎥ ⎢⎢⎢ varJ(112) ⎥⎥⎥ ⎢ varJ(1) ⎥ ⎢ 22 ⎥ ΣeJ =⎢⎢⎢ ⋱ ⎥⎥⎥. (3) ⎢ varJ(N) ⎥ ⎢ 11 ⎥ ⎢⎢ varJ(N) ⎥⎥ ⎢ 21 ⎥ ⎢⎢⎢ varJ1(N2) ⎥⎥⎥ ⎢ varJ(N)⎥ ⎢ 22 ⎥ ⎣ ⎦ Here,J(k) isusedasshorthandnotationforJ u(k) ,andallemptyspacesinthematrixdenotezero. 11 11( ) Hence, the covariance matrix obtained when considering solely at the variances of Jacobian elements is diagonal. Implementing the weighted CPD decomposition using an element-wise weight has been described in [1] or in [3] (in the case of incomplete data). In the following two sections, this will be generalized to covariance matrices with more cross-covariancestaken into account. Weighted tensor decompositions 5 s 2.2 Constructing the slice-wise covariance matrix Σ J As a next step, the covariances between Jacobian elements evaluated at a single sampling point u(k) aretakenintoaccountforthe constructionofthe matrixΣ ,butthe covariancesoverseveralu(k) are J neglected. This is done bynoting that the Jacobianelements ofsampling pointu(k) canbe writtenas linear combinations of the monomials of degree at least one: c 2 ⎡⎢⎢⎢⎢⎢⎢⎢⎣JJJJ12121122((((uuuu((((kkkk))))))))⎤⎥⎥⎥⎥⎥⎥⎥⎦=⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣ 0010 0001 2u0001(k) uu(2(100kk)) 2u000(2k) 0100 0001 2u000(1k) uu(2(100kk)) 2u000(2k) ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢ddc⋮⋮626⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥. ⎣ ⎦ This can be denoted shortly as c 2 ⎡ ⎤ ⎢ ⎥ ⎢⎢c⋮ ⎥⎥ vec J u(k) =A u(k) ⎢⎢ 6⎥⎥, (4) ( ( )) ( )⎢d2⎥ ⎢ ⎥ ⎢ ⎥ ⎢⎢d⋮ ⎥⎥ ⎢ 6⎥ ⎣ ⎦ where, the matrix A∈R4×10 (in general, A∈R(mn)×((ℓ−1)n)) contains the linear relationships between coefficients of the monomials and Jacobian elements. In the case that d > 2, the matrix A may of coursecontainhigherpowersoftheu(k). ItfollowsthatthecovariancematrixoftheJacobianelements i of sampling point u(k) is givenby A u(k) ΣfA u(k) T ∈R(mn)×(mn). When we repeat this for all the sampling points, we obtain the slice-(wise c)ovari(ance )matrix Σs J A u(1) ΣfA u(1) T ΣsJ =⎡⎢⎢⎢⎢ ( ) ( ) A(u(2))ΣfA(u(2))T ⎤⎥⎥⎥⎥, (5) ⎢ ⋱ ⎥ ⎢⎢ A u(N) ΣfA u(N) T⎥⎥ ⎣ ( ) ( ) ⎦ which is a block-diagonalmatrix. Once again, the empty spaces in this matrix denote zero. d 2.3 Constructing the dense covariance matrix Σ J Finally, we generalize the slice-wise covariancematrix to the dense case,considering the (co)variances in all the elements of . With the same definition for the matrix A as in Equation (4), the dense covariance matrix Σd Jis defined as J A u(1) ⎡ ⎤ ΣdJ =⎢⎢⎢⎢A(u⋮(N))⎥⎥⎥⎥Σf[A(u(1))T⋯A(u(N))T]. (6) ⎢ ( )⎥ ⎣ ⎦ Itfollowsimmediately fromthis definitionthat,since ΣdJ ∈R(mnN)×(mnN) andΣf ∈R((ℓ−1)n)×((ℓ−1)n), Σd is rank-deficient whenever N > ℓ−1: its rank is then bounded by (ℓ 1)n. This will lead to J m − two different weighted CPD decompositions. On the one hand, decompositions using element-wise or slice-wiseweightswillbediscussedinSection3.1becausethe covariancematrix(andhencethe weight matrix)hasfullrank;ontheotherhand,thedense-weightdecompositionswitharank-deficientweight matrix will be discussed in Section 3.2. Weighted tensor decompositions 6 3 Computing the weighted CPD Giventhe n m N-tensor ofJacobianelements,it is possible to (approximately)write it asa sum × × J of r rank-one tensors r ≈ wi vi hi. (7) J ∑ ○ ○ i=1 Here, the notation is used to denote the so-called outer product and it is defined as follows. Given ○ vectors a = (a1,...,ana), b = (b1,...,bnb) and c = (c1,...,cnc), the outer product a○b○c is the na nb nc tensor whose element in position (i,j,k) (1≤i≤na, 1≤j ≤nb, 1≤k≤nc) is given by × × (a b c)i,j,k =aibjck. ○ ○ The outer product a b c is said to have rank one. In expression (7), the tensor is said to be ○ ○ J approximated by a sum of rank one tensors. We will use the same notation as in [11] and will denote this sum as JW,V,HK. Finding the matrices V, W and H leads to optimizing the following non-linear cost function min ∥ JW,V,HK∥2 V,W,H J − F T = min (vec( ) vec(JW,V,HK)) (vec( ) vec(JW,V,HK)), V,W,H J − J − where vec(X) denotes the vectorization of the matrix or tensor X to a column vector. While this optimization problem is used for computing the unweighted CPD decomposition, the weighted CPD can be computed using a weighted norm ∥∥Ω in the cost function, which includes a weight matrix Ω as follows, ⋅ 2 min ∥vec( ) vec(JW,V,HK)∥ V,W,H J Ω − (8) T = min (vec( ) vec(JW,V,HK)) Ω(vec( ) vec(JW,V,HK)). V,W,H J J − − Optimizing this expression is described in the next sections, once for the element-wise and slice-wise weight, and once for the dense weight matrix. Naively, the weight matrix Ω can be intuitively seen as the (pseudo-) inverse of the covariance matrix Σ . This will be detailed in the following sections. J 3.1 Using element-wise and slice-wise weights In order to optimize the nonlinear cost function (8), we will use the workhorsemethod for computing the (unweighted) CPD: the Alternating Least Squares method described in [4], [10] and [11]. This works by optimizing only one of the factors V, W and H at a time, while keeping the two others fixed. Then, by alternating the optimized factor iteratively, a solution of the optimization problem can be found. Optimizing one factor (in the unweighted case) leads to three different least squares optimizations: min∥ T (H V)WT∥2 , (9) W J(1) F − ⊙ min∥ T (H W)VT∥2 , V J(2) F − ⊙ and min∥ T (V W)HT∥2 . H J(3) F − ⊙ Here, denotes the Khatri-Rao product (see, among others, [12]), and , , denote the (1) (2) (3) J J J matric⊙izations of the tensor to the first, second and third mode, respectively. For this, we use the J matricization order described in [11]. In order to take the weight matrix Ω into account, these (unweighted) least squares problems will be generalized to a set of weighted least squares optimizations, leading to a Weighted Alternating Weighted tensor decompositions 7 Least Squares method for finding the weighted CPD. Because this is done in a similar way for the three factors, more attention will be given to the update of W and only the differences with the two other factor updates will be emphasized. 3.1.1 Updating the matrix W The optimization problem in (9) is a least-squares problem with multiple right-hand sides. It can be rewritten as follows as a problem with a single right-hand side: min∥ T (H V)WT∥2 W J(1) F − ⊙ =min∥vec( T ) vec((H V)WT)∥2 W J(1) − ⊙ =mWin∥vec(J(T1)) (In (H V))vec(WT)∥2 (10) − ⊗ ⊙ Here, In ∈ Rn×n is the identity matrix and we denote the Kronecker product as . If we denote B1=In (H V), so ⊗ ⊗ ⊙ ⎡⎢H V B1=⎢⎢ ⊙ ⎤⎥ ⎢ ⎥ ⎢ ⋱ H V⎥ ⎥ ⊙ ⎥ ⎣ ⎦ nrepetitions ´¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¸¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¹¶ then the minimization problem in (10) can be rewritten as min vec T B vec WT 2. (11) W J(1) 1 ∥ ( )− ( )∥ On this level, the permuted weightΩ is included as an extrafactor in the costfunction, namely (1) min vec T B vec WT 2 (12) W J(1) 1 Ω(1) ∥ ( )− ( )∥ =mWin vec J(T1) B1vec WT TΩ(1) vec J(T1) B1vec WT . ( ( )− ( )) ( ( )− ( )) The weighted least squares solution is then given by vec WT = BT1Ω(1)B1 −1BT1Ω(1)vec J(T1) . (13) ( ) ( ) ( ) Here, care should be taken as the weight matrix Ω should be permuted to Ω using the same (1) permutationthat permutes vec to vec T . If we denote the permutationmatrix as P , then we J J(1) 1 have ( ) ( ) P1vec J =vec J(T1) and ( ) ( ) Ω(1)=P1ΩPT1. Using matlab, the permutation matrix P is found by the following instructions: 1 T = reshape(1:m*n*N, [n,m,N]); I = eye(m*n*N); P_1 = I(vec(reshape(T, [n, m*N])’), :); Finally, reshaping the solution vector of Equation (13) to the dimensions of WT gives after transpo- sition the updated matrix W. 3.1.2 Updating the matrices V and H When updating the matrix V, the permutations from vec to vec T and vec T should also be updated. This is done similarly as in Section 3.1.1. Fig(J. 3)shows(aJg(r2a))phical rep(rJes(e3)n)tation of the permuted matrices Ω , Ω and Ω . (1) (2) (3) Weighted tensor decompositions 8 1 2 3 4 5 6 7 8 2 9 10 11 12 13 14 15 3 10 16 17 18 19 20 21 4 11 17 22 23 24 25 26 5 12 18 23 27 28 29 30 6 13 19 24 28 31 32 33 7 14 20 25 29 32 34 35 8 15 21 26 30 33 35 36 Ω 1 3 5 7 2 4 6 8 1 2 5 6 3 4 7 8 1 2 3 4 5 6 7 8 3 16 18 20 10 17 19 21 2 9 12 13 10 11 14 15 2 9 10 11 12 13 14 15 5 18 27 29 12 23 28 30 5 12 27 28 18 23 29 30 3 10 16 17 18 19 20 21 7 20 29 34 14 25 32 35 6 13 28 31 19 24 32 33 4 11 17 22 23 24 25 26 2 10 12 14 9 11 13 15 3 10 18 19 16 17 20 21 5 12 18 23 27 28 29 30 4 17 23 25 11 22 24 26 4 11 23 24 17 22 25 26 6 13 19 24 28 31 32 33 6 19 28 32 13 24 31 33 7 14 29 32 20 25 34 35 7 14 20 25 29 32 34 35 8 21 30 35 15 26 33 36 8 15 30 33 21 26 35 36 8 15 21 26 30 33 35 36 Ω(1) Ω(2) Ω(3) Figure 3: A graphical representation of how the permutations of Ω work, in the case where m =n= N =2. These permutations are used due to the vectorizations of several tensors in Section 3.1.1 and beyond. Under the hood, these are defined in the same fashion as matlab’s reshape function. We note that the permutation matrix P is, in fact, the identity matrix. 3 3.2 Using dense weights As mentioned earlier, the dense covariance matrix Σd is rank-deficient and does not have an inverse, J which could be used as weight matrix. Although the pseudo-inverse of Σ could be computed, it J does not yield enough equations in order to solve the problem (8) uniquely: the system (8) is in fact undetermined. That is why the following technique will be used in order to incorporate the weight matrix into the CPD decomposition. It consists of two parts: Section 3.2.1 describes the first set of equations, and Section 3.2.2 contains the second set of equations. 3.2.1 Using the singular values of Σ J Let r =rank ΣJ denote the rank of the covariance matrix, then we can decompose the matrix ΣJ using the sin(gular)value decomposition (SVD), and obtain r×r mnN×r ΣJ =UΣDΣUTΣ = ¬U(Σ1) U(Σ2) ¬D0(Σ1) 00 U(Σ1) U(Σ2) T =U(Σ1)D(Σ1) U(Σ1) T. [ ][ ][ ] ( ) mnN±×(mnN−r) Here, U is an orthogonal matrix and D is a diagonal matrix containing the singular values of Σ . Σ Σ J We call U(Σ1) the submatrix of UΣ containing the first r = rank ΣJ columns of UΣ, and D(Σ1) the submatrix of DΣ containing the non-zero singular values. ( ) As in Section 3.1.1, every factor during the Alternating Least Squares algorithm uses a permuted version of Σ . For example, at the update of the factor W, we define J Σ(1)=P1ΣJPT1, (14) Weighted tensor decompositions 9 whereP isthe permutationmatrix,asdefinedinSection3.1. ItfollowsthattheSVDofΣ isgiven 1 (1) by Σ(1)=VΣ(1)DΣ(1)VΣT(1) = P1U(Σ1) D(Σ1) U(Σ1) TPT1 ( ) (( ) ) Let Q denote the r mnN-matrix 1 × Q1=√ D1Σ −1 U(Σ1) TPT1, ( ) ( ) such that QT1Q1=Ω(1). Then the solution of the minimization problem (12) is given by Q B † Q vec T , (15) 1 1 1 (1) J ( )( ( ) ) where B is the block-matrix of Khatri-Rao products and X† denotes the pseudo-inverse of X. In 1 Appendix A, a derivation for solution (15) is shown. Similar expressions are found for the other factors being updated. The following table shows how the matrix B changes, depending on the i updated factor: i Updating Matrix B Dimensions Dimensions i factor of B of Q B i i i H V 1 W ⎡⎢ ⊙ ⎤⎥ mnN rn r rn ⎢ ⎥ ⎢ ⋱ H V⎥ × × ⎢ ⎥ ⎢ ⊙ ⎥ ⎣ ⎦ H W 2 V ⎡⎢ ⊙ ⎤⎥ mnN rm r rm ⎢ ⎥ ⎢ ⋱ H W⎥ × × ⎢ ⎥ ⎢ ⊙ ⎥ ⎣ ⎦ V W 3 H ⎡⎢ ⊙ ⎤⎥ mnN rN r rN ⎢ ⎥ ⎢ ⋱ V W⎥ × × ⎢ ⎥ ⎢ ⊙ ⎥ As we can see from this table⎣, the dimensions of th⎦e systems being solved in (15) depend on the factor being updated. We note that if the number N of operating points is too large, then the system with coefficient matrix Q B is underdetermined, whereas the two other systems are not, as long 3 3 as r > max rn,rm . To remedy this, the U(Σ2)-part of the UΣ-matrix will be used to add extra constraints t{o the cu}rrent systems of equations. These extra conditions will also be used for updating the matrices W and V. 3.2.2 Adding more equations to the existing set If the covariance matrix Σ of the noise v is not of full rank, there exist linear relations L between J the noise disturbances v, such that Lv = 0. In Appendix B, it is shown how these can be retrieved from Σ . J In order to find the extra equations for the update of the current factor, we reconsider the mini- mization problem (12) for the updating factor W: min vec T B vec WT 2 vec(WT) J(1) 1 Ω(1) ∥ ( )− ( )∥ Rewriting this expression as vec J(T1) = B1vec WT v, where v is correlated noise with Σ(1) as in Equation (14), Appendix B ca(n be )used in ord(er to)a+dd extra equations to the existing set in (15). Weighted tensor decompositions 10 We remark that the noise v is correlated, due to the linear relations between the covariance matrices Σ and Σ . f J Using the notations in Equation (14), the orthogonalfactor of the singular value decomposition of Σ is given by mnN×r (1) P1UΣ= ³P¹¹¹¹¹1¹¹¹¹¹¹·U(Σ¹¹¹¹¹1¹¹¹¹¹µ) P1U(Σ2) . [ ] mnN´¹¹×¹¹¹¹¹¹¹¹(¹¸mn¹¹¹¹¹¹¹N¹¹¹¶−r) In Appendix B, it is proven that the extra conditions are then given by U(2) TPTB † U(2) TPTvec T . (16) Σ 1 1 Σ 1 J(1) (( ) ) (( ) ( )) Similar expression can be found for the other two updating factors. In general, we conclude that the following system of equations must be solved for every updated factor 1≤i≤3: † T [ U2JQTiBPiTi Bi] ⎡⎢⎢ U2JQiTvePcTi(Jve(ic))J(Ti) ⎤⎥⎥. (17) ( ) ⎢( ) ( )⎥ ⎣ ⎦ 3.3 Stopping criteria for the cpd algorithm By iterating the presented algorithm, a sequence of updated factors is obtained W(1),V(1),H(1),W(2),V(2),H(2),W(3),V(3),H(3),... We have the following two stopping criteria for this iteration algorithm: 1. Whentherelativestepsizebetweentwoiterationsisbelowagiventolerance,thenthe algorithm is stopped. The relative step size at iteration step j≥2 is given as 2 2 2 √ W(j) W(j−1) V(j) V(j−1) H(j) H(j−1) F F F, or ∥ − ∥ +∥2 − 2 ∥ +∥ 2 − ∥ √ W(j) V(j) H(j) F F F ∥ ∥ +∥ ∥ +∥ ∥ 2. when an upper bound on the number of iterations is reached, then the algorithm is stopped. This takes care of possible divergence. 3.4 Summary of the proposed algorithm In this section, we summarize the proposedweighted cpd algorithmand incorporateit into the larger decoupling problem of the noisy multivariate polynomial function f. We assume the covariance ma- trix Σf of the coefficients of f to be known and the same notations as in Section 1 will be used. Algorithm 2. Decomposition of the noisy multivariate polynomial f, given Σf. 1. Evaluate the Jacobian matrix J u of f in N randomly chosen sampling points u(1),...,u(N). ( ) 2. Stack the Jacobians into a three-way tensor . J 3. Transform the covariance matrix Σf of f into the matrix ΣJ. Here, three choices are possible: the element-wise as discussed in (3), slice-wise as discussed in (5) or dense covariance matrix discussed in (6). 4. Compute the Weighted Canonical Polyadic Decomposition (WCPD) of J ≈∑ri=1wi vi hi. In the element- and slice-wise case, use (13), in the dense case, use (17). Iterate until○one○of the stopping criteria in Section 3.3 is satisfied. 5. Startingfromthevectorsh ,...,h ,reconstructtheinternalunivariatefunctionsg x ,...,g x 1 r 1 1 r r with an integration step, as in Algorithm 1. ( ) ( )

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.