ebook img

A greedy approach to sparse canonical correlation analysis PDF

0.11 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A greedy approach to sparse canonical correlation analysis

1 A greedy approach to sparse canonical correlation analysis Ami Wiesel, Mark Kliger and Alfred O. Hero III Dept. of Electrical Engineering and Computer Science University of Michigan, Ann Arbor, MI 48109-2122, USA 8 Abstract—Weconsidertheproblemofsparsecanonicalcorre- numerical methods, compression techniques, as well as noise 0 lationanalysis(CCA),i.e.,thesearchfortwolinearcombinations, reduction algorithms. The second motivation for sparsity is 0 oneforeachmultivariate,thatyieldmaximumcorrelationusinga regularizationand stability. One of the main vulnerabilitiesof 2 specifiednumberofvariables.Weproposeanefficientnumerical CCAisitssensitivitytoasmallnumberofobservations.Thus, n approximationbased onadirectgreedyapproachwhichbounds a thecorrelationateachstage.Themethodisspecificallydesigned regularized methods such as ridge CCA [7] must be used. In J to cope with large data sets and its computational complexity this context, sparse CCA is a subset selection scheme which 7 depends only on the sparsity levels. We analyze the algorithm’s allows us to reduce the dimensions of the vectors and obtain performance through the tradeoff between correlation and par- 1 a stable solution. simony. The results of numerical simulation suggest that a To the best of our knowledge the first reference to sparse significant portion of the correlation may be captured using a ] O relativelysmallnumberofvariables.Inaddition,weexaminethe CCA appeared in [2] where backward and stepwise subset useofsparseCCAasaregularization methodwhenthenumber selection were proposed. This discussion was of qualitative C of available samples is small compared to the dimensions of the nature and no specific numerical algorithm was proposed. . multivariates. t Recently, increasing demands for multidimensional data pro- a t cessing and decreasing computational cost has caused the s topic to rise to prominence once again [8]–[13]. The main [ I. INTRODUCTION disadvantages with these current solutions is that there is no 1 Canonicalcorrelationanalysis(CCA),introducedbyHarold direct control over the sparsity and it is difficult (and non- v Hotelling[1],isastandardtechniqueinmultivariatedataanal- intuitive) to select their optimal hyperparameters.In addition, 8 ysisforextractingcommonfeaturesfromapairofdatasources 4 the computationalcomplexityof mostof these methodsis too [2],[3].Eachof these datasourcesgeneratesa randomvector 7 highforpracticalapplicationswithhighdimensionaldatasets. that we call a multivariate. Unlike classical dimensionality 2 SparseCCAhasalsobeenimplicitlyaddressedin[9],[14]and 1. reductionmethodswhichaddressonemultivariate,CCAtakes is intimately related to the recent results on sparse PCA [9], into account the statistical relations between samples from 0 [15]–[17].Indeed,ourproposedsolutionisanextensionofthe 8 two spaces of possibly different dimensions and structure. In results in [17] to CCA. 0 particular, it searches for two linear combinations, one for The main contribution of this work is twofold. First, we : each multivariate, in order to maximize their correlation. It v derive CCA algorithms with direct control over the sparsity i is used in different disciplines as a stand-alone tool or as a in each of the multivariates and examine their performance. X preprocessing step for other statistical methods. Furthermore, Our computationally efficient methods are specifically aimed r CCA is a generalized framework which includes numerous a at understanding the relations between two data sets of large classical methods in statistics, e.g., Principal Component dimensions. We adopt a forward (or backward) greedy ap- Analysis (PCA), Partial Least Squares (PLS) and Multiple proach which is based on sequentially picking (or dropping) Linear Regression (MLR) [4]. CCA has recently regained variables. At each stage, we bound the optimal CCA solution attention with the advent of kernel CCA and its application and bypass the need to resolve the full problem. Moreover, to independent component analysis [5], [6]. the computational complexity of the forward greedy method The last decade has witnessed a growing interest in the doesnotdependonthedimensionsofthedatabutonlyonthe search for sparse representations of signals and sparse nu- sparsity parameters. Numerical simulation results show that a merical methods. Thus, we consider the problem of sparse significantportionofthecorrelationcanbeefficientlycaptured CCA, i.e., the search for linear combinations with maximal using a relatively low number of non-zero coefficients. Our correlation using a small number of variables. The quest second contribution is investigation of sparse CCA as a reg- for sparsity can be motivated through various reasonings. ularization method. Using empirical simulations we examine First is the ability to interpret and visualize the results. A the use of the different algorithms when the dimensions of small number of variables allows us to get the “big picture”, the multivariates are larger than (or of the same order of) while sacrificing some of the small details. Moreover, sparse the number of samples and demonstrate the advantage of representations enable the use of computationally efficient sparse CCA. In this context, one of the advantages of the greedy approach is that it generates the full sparsity path in Thefirsttwoauthorscontributedequallytothismanuscript.Thisworkwas supportedinpartbyanAFOSRMURIunderGrantFA9550-06-1-0324. a single run and allows for efficient parameter tuning using 2 cross validation. invarianttoscalingofaandbanditiscustomarytonormalize The paper is organized as follows. We begin by describing them such that aTΣ a = 1 and bTΣ b = 1. On the other x y the standard CCA formulation and solution in Section II. hand, if Σ is rank deficient, then choosing a and b as the Sparse CCA is addressed in Section III where we review the upper and lower partitions of any vector in its null space will existingapproachesandderivetheproposedgreedymethod.In lead to full correlation, i.e., ρ=1. Section IV, we provideperformanceanalysis using numerical In practice, the covariance matrices Σ and Σ and cross x y simulations and assess the tradeoff between correlation and covariance matrix Σ are usually unavailable. Instead, mul- xy parsimony, as well as its use in regularization. Finally, a tiple independent observations x and y for i = 1...N are i i discussion is provided in Section V. measuredandusedtoconstructsampleestimatesofthe(cross) The following notation is used. Boldface upper case letters covariance matrices: denote matrices, boldface lower case letters denote column 1 1 1 Σˆ = XTX, Σˆ = XTY, Σˆ = XTY, (6) vectors, and standard lower case letters denote scalars. The x N y N xy N superscripts ()T and ()−1 denote the transpose and inverse · · where X = [x1...xN] and Y = [y1...yN]. Then, these operators, respectively. By I we denote the identity matrix. empirical matrices are used in the CCA formulation: Theoperator 2 denotestheL2norm,and 0 denotesthe cardinalityopker·aktor.FortwosetsofindicesI akn·dkJ,thematrix ρ Σˆ ,Σˆ ,Σˆ = max aTΣˆxyb . (7) XI,J denotes the submatrix of X with the rows indexedby I x y xy a6=0,b6=0 aTΣˆ a bTΣˆ b and columns indexed by J. Finally, X 0 or X 0 means (cid:16) (cid:17) x y that the matrix X is positive definite or≻positive se(cid:23)midefinite, Clearly, if N is sufficiently largpe then thiqs sample approach respectively. performswell. However,in many applications, the number of samples N is not sufficient. In fact, in the extreme case in II. REVIEWON CCA whichN <n+mthesamplecovarianceisrankdeficientand ρ = 1 independently of the data. The standard approach in In this section, we provide a review of classical CCA. Let such cases is to regularize the covariance matrices and solve x and y be two zero mean random vectors of lengths n and ρ Σˆ +ǫ I,Σˆ +ǫ I,Σˆ where ǫ > 0 and ǫ > 0 are m, respectively, with joint covariance matrix: x y y x xy x x sm(cid:16)all tuning ridge parameter(cid:17)s [7]. Σ= Σx Σxy 0, (1) CCAcanbeviewedasaunifiedframeworkfordimensional- (cid:20) ΣTxy Σy (cid:21)(cid:23) ityreductioninmultivariatedataanalysisandgeneralizesother existingmethods.ItisageneralizationofPCAwhichseeksthe whereΣ ,Σ andΣ arethecovarianceofx,thecovariance x y xy directions that maximize the variance of x, and addresses the of y, and their cross covariance, respectively. CCA considers theproblemoffindingtwolinearcombinationsX =aTxand directionscorrespondingtothecorrelationbetweenxandy.A Y =bTy with maximal correlation defined as special case of CCA is PLS which maximizes the covariance of x and y (which is equivalent to choosing Σ = Σ = I). x y cov X,Y { } , (2) Similarly, MLR normalizes only one of the multivariates. In var X var Y fact, the regularizedCCA mentionedabovecanbe interpreted { } { } as a combination of PLS and CCA [4]. where var andpcov apre the variance and covariance {·} {·} operators, respectively, and we define 0/0 = 1. In terms of a and b the correlation can be easily expressed as III. SPARSE CCA We considertheproblemofsparseCCA, i.e.,findinga pair aTΣ b xy . (3) of linear combinationsof x and y with prescribed cardinality aTΣxa bTΣyb which maximize the correlation. Mathematically, we define sparse CCA as the solution to Thus, CCA considpers the fopllowing optimization problem aTΣ b ρ(Σx,Σy,Σxy)=a6=m0,abx6=0 aTΣaTxaΣxybbTΣyb. (4)  ms.ta.xa6=0,b6=0 √aaT0Σxak√xaybTΣyb (8) Problem (4) is a multidimensionpalnon-conpcavemaximization  kkbkk0≤≤kb. and therefore appears difficult on first sight. However, it has Similarly,sparse PLS and sparse MLR are defined as special a simple closed form solution via the generalized eigenvalue cases of (8) by choosing Σ = I and/or Σ = I. In general, x y decomposition (GEVD). Indeed, if Σ 0, it is easy to show all of these problems are difficult combinatorial problems. ≻ that the optimal a and b must satisfy: In small dimensions, they can be solved using a brute force search over all possible sparsity patterns and solving the as- 0 Σ a Σ 0 a xy =λ x , (5) sociated subproblem via GEVD. Unfortunately,this approach ΣT 0 b 0 Σ b (cid:20) xy (cid:21)(cid:20) (cid:21) (cid:20) y (cid:21)(cid:20) (cid:21) is impractical for even moderate sizes of data sets due its for some 0 λ 1. Thus, the optimalvalueof (4) is just the exponentially increasing computational complexity. In fact, it ≤ ≤ principal generalized eigenvalue λmax of the pencil (5) and is a generalizationof sparse PCA which has beenprovenNP- theoptimalsolutionaandbcanbeobtainedbyappropriately hard [17]. Thus, suboptimal but efficient approaches are in partitioning the associated eigenvector. These solutions are order and will be discussed in the rest of this section. 3 A. Existing solutions variables in x and y, respectively. The greedy algorithm chooses the first elements in both sets as the solution to We now briefly review the different approaches to sparse CCA thatappearedinthe lastfew years.Mostofthe methods Σi,j max xy . (9) are based on the well known LASSO trick in which the i,j Σii Σjj difficultcombinatorialcardinalityconstraintsareapproximated x y throughtheconvexL1norm.Thisapproachhasshownpromis- q Thus, I = i and J =pj . Next, the algorithm sequentially ing performance in the context of sparse linear regression { } { } examines all the remaining indices and computes [18].Unfortunately,itisnotsufficientintheCCAformulation since the objective itself is not concave. Thus additional maxρ ΣI∪i,I∪i,ΣJ,J,ΣI∪i,J (10) x y xy approximations are required to transform the problems into i∈/I (cid:0) (cid:1) tractable form. and Sparse dimensionality reduction of rectangular matrices maxρ ΣI,I,ΣJ∪j,J∪j,ΣI,J∪j . (11) was considered in [9] by combining the LASSO trick with x y xy j∈/J semidefinite relaxation. In our context, this is exactly sparse (cid:0) (cid:1) Depending on whether (10) is greater or less than (11), we PLSwhichisaspecialcaseofsparseCCA.Alternatively,CCA add index i or j to I or J, respectively. We emphasize that canbeformulatedastwoconstrainedsimultaneousregressions at each stage, only one index is added either to I or J. Once (x on y, and y on x). Thus, an appealing approach to sparse oneofthe setreachesits requiredsize k ork , the algorithm CCA is to use LASSO penalized regressions. Based on this a b continues to add indices only to the other set and terminates idea, [8] proposed to approximate the non convex constraints when this set reaches its required size as well. It outputs the using the infinity norm. Similarly, [10], [11] proposed to use full sparsity path, and returns k k 1 pairs of vectors two nested iterative LASSO type regressions. a b − − associated with the sparsity patterns in each of the stages. There are two main disadvantages to the LASSO based The computational complexity of the algorithm is poly- techniques. First, there is no mathematical justification for nomial in the dimensions of the problem. At each of its theirapproximationsofthecorrelationobjective.Second,there i = 1, ,k + k 1 stages, the algorithm computes is no direct control over sparsity. Their parameter tuning is a b ··· − n+m i CCA solutions as expressed in (10) and (11) in difficult as the relation between the L1 norm and the sparsity − order to select the patterns for the next stage. It is therefore parametersis highlynonlinear.The algorithmsneed to be run reasonable for small problems, but is still impractical for for each possible value of the parameters and it is tedious to many applications. Instead, we now propose an alternative obtain the full sparsity path. approach that computes only one CCA per stage and reduces An alternative approach for sparse CCA is sparse Bayes the complexity significantly. learning [12], [13]. These methods are based on the prob- An approximate greedy solution can be easily obtained by abilistic interpretation of CCA, i.e., its formulation as an approximating (10) and (11) instead of solving them exactly. estimation problem. It was shown that using different prior probabilistic models, sparse solutions can be obtained. The Consider for example (10) where index i is added to the main disadvantageof this approachis again the lack of direct set I. Let aI,J and bI,J denote the optimal solution to ρ ΣI,I,ΣJ,J,ΣI,J . In order to evaluate (10) we need to controlonsparsity,andthedifficultyinobtainingitscomplete x y xy sparsity path. recalculate both aI∪i,J and bI∪i,J for each i / I. However, (cid:0) (cid:1) ∈ the previous bI,J is of the same dimension and still feasible. Altogether,these worksdemonstratethegrowinginterestin Thus, we can optimize onlywith respectto aI∪i,J (whose di- deriving efficient sparse CCA algorithms aimed at large data mensionhasincreased).Thisapproachprovidesthe following sets with simple and intuitive parameter tuning. bounds: Lemma 1: Let aI,J and bI,J be the optimal solution to B. Greedy approach ρ ΣI,I,ΣJ,J,ΣI,J in (4). Then, x y xy A standard approach to combinatorial problems is the for- ward (backward)greedysolution whichsequentiallypicks(or ρ(cid:0)2 ΣI∪i,I∪i,ΣJ,J(cid:1),ΣI∪i,J ρ2 ΣI,I,ΣJ,J,ΣI,J +δI,J x y xy ≥ x y xy i dgrroeepdsy) tahpepvroaaricahblteosCaCt Aeacwhasstapgreopoonseedbyino[n2e].bTuhtenobascpkewcaifirdc ρ2 (cid:0)ΣIx,I,ΣJy∪j,J∪j,ΣIx,yJ∪j(cid:1)≥ρ2(cid:0)ΣIx,I,ΣJy,J,ΣIx,yJ(cid:1)+γjI,J, (12) algorithmwasderivedoranalyzed.Inmodernapplications,the (cid:0) (cid:1) (cid:0) (cid:1) number of dimensions may be much larger than the number where of samples and therefore we provide the details of the more 2 natural forward strategy. Nonetheless, the backward approach bI,J T ΣI,J T ΣI,I −1ΣI,i bI,J T Σi,J T cdaenrivbeeadnereifvfiecdieinnt aapsptrroaixgihmtfaotirownartod tmheansnuebr.prIonblaedmdistiaotne,awche δiI,J=(cid:16)(cid:2) (cid:3) (cid:2)Σxix,yi (cid:3) (cid:2)ΣxIx,i(cid:3)T ΣxIx,−I (cid:2)−1Σ(cid:3)Ix,i(cid:2) xy(cid:3) (cid:17) − stage which significantly reduces the computationalcomplex- h i h i iitny.[1A7]s.imilar approach in the context of PCA can be found aI,J TΣI,J ΣJ,J −1ΣJ,j aI,J TΣI,j 2 ofOinudricgeosaIl iasntdoJfincdorrtheseptownodinspgatrosittyhepiantdteircness,oif.et.h,etwcohosseetns γjI,J=(cid:16)(cid:2) (cid:3)Σjy,jxy (cid:2)ΣyJy,j(cid:3)T ΣyJy,J−−(cid:2)1ΣJy(cid:3),j xy(cid:17) , (13) − h i h i 4 and we assume that all the involved matrix inversions are 1 nonsingular. 0.95 Before proving the lemma, we note that it provides lower 0.9 bounds on the increase in cross correlation due to including 0.85 forward CCA an additional element in x or y without the need of solving 0.8 backward CCA a full GEVD. Thus, we propose the following approximate ρ0.75 full search CCA greedy approach. For each sparsity pattern I,J , one CCA 0.7 { } iscomputedviaGEVDinordertoobtainaI,J andbI,J.Then, 0.65 thenextsparsitypatternisobtainedbyaddingtheelementthat 0.6 maximizes δI,J or γI,J among i / I and j / J. i j ∈ ∈ 0.55 Proof: We begin by rewriting (4) as a quadratic maxi- 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 mization with constraints. Thus, we define aI,J and bI,J as percentage of non−zero coefficients the solution to Fig.1. Correlation vs.Sparsity, n=m=7. maxa,b aTΣIx,yJb ρ ΣI,I,ΣJ,J,ΣI,J = s.t. aTΣI,Ia=1 (14) x y xy  bTΣxJ,Jb=1. IV. NUMERICAL RESULTS (cid:0) (cid:1)  y We now provide a few numerical examples illustrating the Now, consider the problemwhen we add variable j to the set behavior of the greedy sparse CCA methods. In all of the J: simulations below, we implement the greedy methods using maxa,b aTΣxI,yJ∪jb the bounds in Lemma 1. In the first experiment we evaluate ρ ΣI,I,ΣJ∪j,J∪j,ΣI,J∪j = s.t. aTΣI,Ia=1 thevalidityoftheapproximategreedyapproach.Inparticular, x y xy  x bTΣJ∪j,J∪jb=1. wechoosen=m=7,andgenerate200independentrandom (cid:0) (cid:1)  y (15) realizations of the joint covariance matrix using the Wishart  distributionwith7+7degreesoffreedom.Foreachrealization, Clearly, the vector a = aI,J is still feasible (though not we run the approximate greedy forward and backward algo- necessarily optimal) and yields a lower bound: rithmsandcalculatethefullsparsitypath.Forcomparison,we also computetheoptimalsparse solutionsusinganexhaustive ρ ΣI,I,ΣJ∪j,J∪j,ΣI,J∪j maxb aI,J T ΣIx,yJ∪jb search. The results are presented in Fig. 1 where the average x y xy ≥(s.t. b(cid:2)TΣJy(cid:3)∪j,J∪jb=1. correlation is plotted as a function of the number of variables (cid:0) (cid:1) (or non-zero coefficients). The greedy methods capture a (16) significantportionofthepossiblecorrelation.Asexpected,the Changing variables b= ΣJ∪j,J∪j 21 b results in: forward greedy approach outperforms the backward method y whenhighsparsityiscritical.Ontheotherhand,thebackward (cid:2) m(cid:3) ax cTb methodispreferableiflargevaluesofcorrelationarerequired. ρ ΣI,I,ΣJ∪j,J∪j,ΣI,J∪j b (17) x y xy ≥ s.t. b 2 =1, In the second experiment we demonstrate the performance (cid:0) (cid:1) (cid:26) k k of the approximate forward greedy approach in a large scale where problem. We present results for a representative (randomly c= ΣJ∪j,J∪j −12 ΣI,J∪j T aI,J. (18) generated) covariance matrix of sizes n = m = 1000. Fig. 2 y xy shows the full sparsity path of the greedy method. It is easy to see that about90 percentof the CCA correlationvalue can Using the Cauc(cid:2)hy Schwar(cid:3)tz in(cid:2)equality(cid:3) be captured using only half of the variables. Furthermore, if cTb c 2 b 2, (19) we choose to capture only 80 percent of the full correlation, ≤k k k k then about a quarter of the variables are sufficient. we obtain Inthethirdsetofsimulations,weexaminetheuseofsparse CCA algorithms as regularization methods when the number ρ ΣI,I,ΣJ∪j,J∪j,ΣI,J∪j of samples is not sufficient to estimate the covariance matrix x y xy (cid:0) ΣJ∪j,J∪j −(cid:1)12 ΣI,J∪j T aI,J .(20) efficiently.Forsimplicity,werestrictourattentiontoCCAand ≥ y xy 2 PLS (which can be interpreted as an extreme case of ridge Therefore, (cid:13)(cid:13)(cid:2) (cid:3) (cid:2) (cid:3) (cid:13)(cid:13) CCA). In addition, we show results for an alternative method (cid:13) (cid:13) in which the sample covarianceΣˆ and Σˆ are approximated x y ρ2 ΣI,I,ΣJ∪j,J∪j,ΣI,J∪j aI,J T ΣI,J ΣI,j as diagonal matrices with the sample variances (which are x y xy ≥ xy xy easier to estimate). We refer to this method as DiagonalCCA (cid:0) ΣJ,J ΣJ,j −1(cid:1) Σ(cid:2)I,J T(cid:3) (cid:2)T (cid:3) (DCCA). In order to assess the regularization properties of y y xy aI,J. (21) ×(cid:20) ΣJy,J ΣJy,j (cid:21) " (cid:2) ΣIx,yj(cid:3) # CaCsAingwleeu“tsreude”thecofvoallroiawnicnegmpraotcreixduΣre.aWnderuasnedoimt ltyhrgoeungehroautet Finally, (12) is obtained by using t(cid:2)he in(cid:3)version formula for all the simulations. Then, we generate N random Gaussian partitioned matrices and simplifying the terms. samples of x and y and estimate Σˆ. We apply the three 5 0.9 art sparse regression methods, e.g., Least Angle Regression 0.8 (LARS) [19], the algorithms allow for direct control over the 0.7 sparsity and provide the full sparsity path in a single run. We have demonstrated their performance advantage through 0.6 numerical simulations. full CCA 0.5 forward CCA There are a few interesting directions for future research ρ 0.4 in sparse CCA. First, we have only addressed the first order 0.3 sparse canonical components. In many applications, analysis 0.2 of higher order canonical components is preferable. Numer- ically, this extension can be implemented by subtracting the 0.1 firstcomponentsandrerunningthealgorithms.However,there 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 remaininterestingtheoreticalquestionsregardingtherelations percentage of non−zero coefficients between the sparsity patterns of the different components. Fig.2. Correlation vs.Sparsity,n=m=1000. Second, while here we considered the case of a pair of mul- tivariates, it is possible to generalize the setting and address 0.45 multivariate correlations between more than two data sets. 0.4 ρCorrelation 0.03.53 ssppaarrssee CDCCACA REFERENCES sparse PLS [1] H. Hotelling, “Relations between two sets of variates,” Biometrika, 0.25 0 0.1 0.2 0.3Perce0n.4tage of n0o.5n−zero v0a.6riables0.7 0.8 0.9 1 vol.28,pp.321–377,1936. [2] B.Thompson,Canonical correlation analysis: usesandinterpretation. 0.5 SAGEpublications, 1984. 0.45 [3] T.Anderson,AnIntroductiontoMultivariateStatisticalAnalysis,3rded. ρCorrelation 00..0023..5534 ssppaarrssee CDCCACA [4] WPML.ilSeB,yoM-rIgnLateR,rTsac.niLednacnCedC,eAl2iu0,”s0,3Ra.enpdorHt.LKiTnHut-sIsSoYn-R,-“1A99u2n,ifiIeSdY,aNpporvoeamchbetor1P9C9A7., sparse PLS 0.2 [5] F.R.BachandM.I.Jordan,“Kernelindependentcomponentanalysis,” 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 J.Mach.Learn.Res.,vol.3,pp.1–48,2003. Percentage of non−zero variables [6] A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Scho¨lkopf, Fig. 3. Sparsity as a regularized CCA method, n = 10, m = 10 and “Kernel methods for measuring independence,” J. Mach. Learn. Res., N =20. vol.6,pp.2075–2129,2005. [7] H. D. Vinod, “Canonical ridge and econometrics of joint production,” JournalofEconometrics, vol.4,no.2,pp.147–166, May1976. [8] D. R. Hardoon and J. Shawe-Taylor, “Sparse canonical correlation approximategreedysparsealgorithms,CCA,PLSandDCCA, analysis,” University CollegeLondon,Technical Report, 2007. using the sample covariance and obtain the estimates aˆ and [9] A. d’Aspremont, L. E. Ghaoui, I. Jordan, and G. R. G. Lanckriet, “A bˆ. Finally, our performance measure is the “true” correlation directformulationforsparsePCAusingsemidefiniteprogramming,”in Advances in Neural Information Processing Systems 17. Cambridge, value associated with the estimated weights which is defined MA:MITPress,2005. as: [10] S.WaaijenborgandA.H.Zwinderman,“Penalizedcanonicalcorrelation analysis toquantifytheassociation between geneexpressionandDNA aˆTΣxybˆ . (22) markers,”BMCProceedings 2007,1(Suppl1):S122,Dec.2007. [11] E. Parkhomenko, D. Tritchler, and J. Beyene, “Genome-wide sparse aˆTΣ aˆ bˆTΣ bˆ canonical correlation of gene expression with genotypes,” BMC Pro- x y ceedings 2007,1(Suppl1):S119,Dec.2007. q We then repeat thpe above procedure (using the same “true” [12] C. Fyfe and G. Leen, “Two methods for sparsifying probabilistic covariance matrix) 500 times and present the average value canonical correlation analysis,”inICONIP(1),2006,pp.361–370. [13] L. Tan and C. Fyfe, “Sparse kernel canonical correlation analysis,” in of (22) over these Monte Carlo trials. Fig. 3 provides these ESANN,2001,pp.335–340. averages as a function of parsimony for two representative [14] B. K. Sriperumbudur, D. A. Torres, and G. R. G. Lanckriet, “Sparse realizations of the “true” covariance matrix. Examining the eigenmethodsbyD.C.programming,”inICML’07:Proceedingsofthe 24th international conference on Machine learning. New York, NY, curves reveals that variable selection is indeed a promising USA:ACM,2007,pp.831–838. regularizationstrategy. The averagecorrelation increases with [15] H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal component thenumberofvariablesuntilitreachesapeak.Afterthispeak, analysis,” Journal of Computational; Graphical Statistics, vol. 15, pp. 265–286(22), June2006. the number of samples are not sufficient to estimate the full [16] B. Moghaddam, Y. Weiss, and S. Avidan, “Spectral bounds for sparse covariance and it is better to reduce the number of variables PCA:Exactandgreedyalgorithms,”inAdvancesinNeuralInformation throughsparsity.DCCAcanalsobeslightlyimprovedbyusing Processing Systems 18, Y. Weiss, B. Scho¨lkopf, and J. Platt, Eds. Cambridge, MA:MITPress,2006,pp.915–922. fewer variables, and it seems that PLS performs best with no [17] A. d’Aspremont, F. Bach, and L. E. Ghaoui, “Full regularization path subset selection. forsparseprincipal componentanalysis,” inICML’07:Proceedings of the 24th international conference on Machine learning. New York, NY,USA:ACM,2007,pp.177–184. V. DISCUSSION [18] R.Tibshirani,“Regressionshrinkageandselectionviathelasso,”J.Roy. WeconsideredtheproblemofsparseCCAanddiscussedits Statist. Soc.Ser.B,vol.58,no.1,pp.267–288, 1996. [19] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, “Least angle implementationaspectsandstatisticalproperties.Inparticular, regression,” Annals of Statistics (with discussion), vol. 32, no. 1, pp. we derived direct greedy methods which are specifically 407–499,2004. designed to cope with large data sets. Similar to state of the

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.