ebook img

Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination PDF

3.5 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination

1 Bayesian CP Factorization of Incomplete Tensors with Automatic Rank Determination Qibin Zhao, Member, IEEE, Liqing Zhang, Member, IEEE, and Andrzej Cichocki Fellow, IEEE Abstract—CANDECOMP/PARAFAC(CP)tensorfactorizationofincompletedataisapowerfultechniquefortensorcompletionthrough explicitlycapturingthemultilinearlatentfactors.TheexistingCPalgorithmsrequirethetensorranktobemanuallyspecified,however, thedeterminationoftensorrankremainsachallengingproblemespeciallyforCPrank.Inaddition,existingapproachesdonottake intoaccountuncertaintyinformationoflatentfactors,aswellasmissingentries.Toaddresstheseissues,weformulateCPfactorization usingahierarchicalprobabilisticmodelandemployafullyBayesiantreatmentbyincorporatingasparsity-inducingpriorovermultiple latentfactorsandtheappropriatehyperpriorsoverallhyperparameters,resultinginautomaticrankdetermination.Tolearnthemodel, wedevelopanefficientdeterministicBayesianinferencealgorithm,whichscaleslinearlywithdatasize.Ourmethodischaracterized 4 asatuningparameter-freeapproach,whichcaneffectivelyinferunderlyingmultilinearfactorswithalow-rankconstraint,whilealso 1 providingpredictivedistributionsovermissingentries.Extensivesimulationsonsyntheticdataillustratetheintrinsiccapabilityofour 0 methodtorecovertheground-truthofCPrank andpreventtheoverfittingproblem,evenwhenalargeamountofentriesaremissing. 2 Moreover,theresultsfromreal-worldapplications,includingimageinpaintingandfacialimagesynthesis,demonstratethatourmethod t outperformsstate-of-the-artapproachesforbothtensorfactorizationandtensorcompletionintermsofpredictiveperformance. c O IndexTerms—Tensorfactorization,tensorcompletion,tensorrankdetermination,Bayesianinference,imagesynthesis 9 (cid:70) ] G L . 1 INTRODUCTION that the tensor is complete, whereas the problem of s c missing data can arise in a variety of real-world ap- [ TENSORS (i.e., multiway arrays) provide an effec- plications. This issue has attracted a great deal of re- tive and faithful representation of the structural 2 search interest in tensor completion in recent years. The properties of data, in particular, when multidimensional v objective of tensor factorization of incomplete data is 7 data or a data ensemble affected by multiple factors to capture the underlying multilinear factors from only 9 are involved. For instance, a video sequence can be partially observed entries, which can in turn predict the 4 represented by a third-order tensor with dimensionality missingentries.In[7],CPfactorizationwithmissingdata 6 of height width time; an image ensemble measured . × × was formulated as a weighted least squares problem, 1 undermultipleconditionscanberepresentedbyahigher termed CP weighted optimization (CPWOPT). A struc- 0 order tensor with dimensionality of pixel person 4 × × tured CPD using nonlinear least squares (CPNLS) was pose illumination. Tensor factorization enables us to 1 × proposed in [8]. In [9], geometric nonlinear conjugate explicitly take into account the structure information by : gradient (geomCG) based on Riemannian optimization v effectively capturing the multilinear interactions among i on the manifold of tensors were presented. However, X multiple latent factors. Therefore, its theory and algo- as the number of missing entries increases, tensor fac- rithms have been an active area of study during the r torization schemes tend to overfit the model because of a past decade (see e.g., [1], [2]), and have been success- an incorrectly specified tensor rank, resulting in severe fully applied to various application fields, such as face deteriorationoftheirpredictiveperformance.Incontrast, recognition, social network analysis, image and video another popular technique is to exploit a low-rank as- completion, and brain signal processing. The two most sumption for recovering the missing entries; the rank popular tensor factorization frameworks are Tucker [3] minimization can be formulated as a convex optimiza- and CANDECOMP/PARAFAC (CP), also known as tionproblemonanuclearnorm.Thistechniquehasbeen canonical polyadic decomposition (CPD) [4], [5], [6]. extendedtohigherordertensorsbydefiningthenuclear Most existing tensor factorization methods assume norm of a tensor, yielding the tensor completion [10]. Some variants were also proposed, such as a frame- • Q. Zhao is at the Laboratory for Advanced Brain Signal Processing, work based on convex optimization and spectral reg- RIKEN Brain Science Institute, Japan and the Department of Computer ularization, which uses an inexact splitting method [11], ScienceandEngineering,ShanghaiJiaoTongUniversity,China. • L.ZhangisattheMOE-MicrosoftLaboratoryforIntelligentComputing and fast composite splitting algorithms (FCSA) [12]. and Intelligent Systems and the Department of Computer Science and To improve the efficiency, the Douglas-Rachford split- Engineering,ShanghaiJiaoTongUniversity,China. ting technique [13], nonlinear Gauss-Seidal method [14] • A. Cichocki is at the Laboratory for Advanced Brain Signal Processing, RIKENBrainScienceInstitute,JapanandtheSystemsResearchInstitute werealsoinvestigated.Recently,thenuclearnormbased atthePolishAcademyofScience,Warsaw,Poland. optimization was also applied to a supervised tensor dimensionality reduction method [15]. An alternative 2 method for tensor completion is to employ adaptive number of components in factor matrices is constrained sampling schemes [16]. However, the nuclear norm of to be minimum. All the model parameters, including atensorisdefinedstraightforwardlybyaweightedsum noise precision, are considered to be latent variables of the nuclear norm of mode-n matricizations, which is over which the corresponding priors are placed. Due to related to multilinear rank rather than CP rank. In addi- complex interactions among multiple factors and fully tion, these completion-based methods cannot explicitly Bayesian treatment, learning the model is analytically capture the underlying factors. Hence, a simultaneous intractable. Thus, we resort to the variational Bayesian tensor decomposition and completion (STDC) method inferenceandderiveadeterministicsolutiontoapproxi- was introduced in which a rank minimization technique mate the posteriors of all the model parameters and hy- was combined with Tucker decomposition [17]. To im- perparameters. Our method is characterized as a tuning prove completion accuracy, auxiliary information was parameter-free approach that can effectively avoid pa- also exploited in [17], [18], which strongly depends on rameter selections. The extensive experiments and com- the specific application. It is also noteworthy that the parisons on synthetic data illustrate the advantages of rank minimization based on convex optimization of the our approach in terms of rank determination, predictive nuclear norm is affected by tuning parameters, which capability, and robustness to overfitting. Moreover, sev- maytendtoover-orunder-estimatethetruetensorrank. eralreal-wordapplications,includingimagecompletion, Itisimportanttoemphasizethatourknowledgeabout restoration, and synthesis, demonstrate that our method the properties of CP rank, defined by the minimum outperforms state-of-the-art approaches, including both number of rank-one terms in CP decomposition, is sur- tensor factorization and tensor completion, in terms of prisingly limited. There is no straightforward algorithm the predictive performance. tocomputetherankevenforagivenspecifictensor,and The rest of this paper is organized as follows. In Sec- the problem has been shown to be NP-complete [19]. tion 2, preliminary multilinear operations and notations The lower and upper bound of tensor rank was studied are presented. In Section 3, we introduce the proba- in [20], [21]. The ill-posedness of the best low-rank ap- bilistic CP model specification and the model learning proximation of a tensor was investigated in [22]. In fact, via Bayesian inference. A variant of our method using determining or even bounding the rank of an arbitrary mixture priors is proposed in Section 4. In Section 5, we tensorisquitedifficultincontrasttothematrixrank[23], presentthecomprehensiveexperimentalresultsforboth and this difficulty would be significantly exacerbated in synthetic data and real-world applications, followed by the presence of missing data. our conclusion in Section 6. Probabilistic models for matrix/tensor factorization have attracted much interest in collaborative filtering and matrix/tensor completion. Probabilistic matrix fac- 2 PRELIMINARIES AND NOTATIONS torization was proposed in [24], and its fully Bayesian The order of a tensor is the number of dimensions, treatmentusingMarkowchainMonteCarlo(MCMC)in- also known as ways or modes. Vectors (first-order ten- ferencewasshownin[25]andusingvariationalBayesian sors) are denoted by boldface lowercase letters, e.g., a. inferencein[26],[27].Furtherextensionsofnonparamet- Matrices (second-order tensors) are denoted by bold- ric and robust variants were presented in [28], [29]. The face capital letters, e.g., A. Higher-order tensors (order probabilistic frameworks of tensor factorization were 3) are denoted by boldface calligraphic letters, e.g., presented in [30], [31], [32]. Other variants include ex- ≥. Given an Nth-order tensor RI1×I2×···×IN, its tensions of the exponential family model [33] and the A X ∈ (i ,i ,...,i )th entry is denoted by , where the nonparametricBayesianmodel[34].However,thetensor 1 2 N Xi1i2...iN indices typically range from 1 to their capital version, rank or model complexity are often given by a tuning e.g., i =1,2,...,I , n [1,N]. n n parameter selected by either maximum likelihood or ∀ ∈ The inner product of two tensors is defined by , = cross-validations, which are computationally expensive (cid:80) (cid:104)A B(cid:105) , and the squared Frobenius and inaccurate. Another important issue is that the in- noir1m,i2,b..y.,iN Ai21i2=...iNB,i1i2....AiNsanextensiontoN variables, ference of factor matrices is performed by either point (cid:107)A(cid:107)F (cid:104)A A(cid:105) the generalized inner product of a set of vectors, matrices, estimation, which is prone to overfitting, or MCMC or tensors is defined as a sum of element-wise products. inference, which tends to converge very slowly. For example, given A(n) n=1,...,N , we define To address these issues, we propose a fully Bayesian { | } probabilistic tensor factorization model according to the (cid:68)A(1), ,A(N)(cid:69)=(cid:88)(cid:89)A(n). (1) CP factorization framework. Our objective is to infer ··· ij i,j n the underlying multilinear factors from a noisy incom- plete tensor and the predictive distribution of missing The Hadamard product is an entrywise product of two entries, while the rank of the true latent tensor can be vectors, matrices, or tensors of the same sizes. For in- determinedautomaticallyandimplicitly.Toachievethis, stance, given two matrices, A RI×J and B RI×J, ∈ ∈ we specify a sparsity-inducing hierarchical prior over their Hadamard product is a matrix of size I J and multiple factor matrices with individual hyperparame- is denoted by A (cid:126) B. Without loss of genera×lity, the ters associated to each latent dimension, such that the Hadamard product of a set of matrices can be simply 3 c d denoted by (cid:126) A(n) =A(1)(cid:126)A(2)(cid:126) (cid:126)A(N). (2) λ n ··· a b The Kronecker product [1] of matrices A RI×J and B RK×L isamatrixofsizeIK JL,denote∈dbyA B.Th∈e A(1) ··· A(n) ··· A(N) τ Khatri-Rao product of matrice×s A RI×K and B⊗ RJ×K ∈ ∈ is a matrix of size IJ K, defined by a columnwise × Kronecker product and denoted by A B. In particular, Y (cid:12) the Khatri-Rao product of a set of matrices in reverse Fig. 1. Probabilistic graphical model of Bayesian CP order is defined by factorizationofanNth-ordertensor. (cid:75) A(n) =A(N) A(N−1) A(1), (3) (cid:12) (cid:12)···(cid:12) n wthheilneththmeKathriaxt,rid-RenaootpedrobdyucAto(\fna),siestofmatrices,except idnim(6e)nisniodnicaaltelastethnattvYeic1t·o··riNs (cid:8)isa(gne)n(cid:12)(cid:12)nera=ted1,.b.y.,mNu(cid:9)lt,ipwleheRre- in (n) (cid:75)A(k) =A(N)(cid:12)···(cid:12)A(n+1)(cid:12)A(n−1)(cid:12)···(cid:12)A(1). (4) each latent vector ain contributes to a set of observa- tions, i.e., a subtensor whose mode-n index is i . The n k(cid:54)=n essential difference between matrix and tensor factor- ization is that the inner product of N 3 vectors 3 BAYESIAN TENSOR FACTORIZATION ≥ allows us to model the multilinear interaction structure, 3.1 ProbabilisticModelandPriors whichhoweverleadstomanymoredifficultiesinmodel learning. Let be an incomplete Nth-order tensor of size I 1 Y × I I with missing entries. The element In general, the effective dimensionality of the latent is2o×b·s·e·r×vedNif (i1,i2,··· ,iN)∈Ω, where Ω denoYteis1i2a..s.ieNt space, i.e., RankCP(X) = R, is a tuning parameter of indices. For simplicity, we also define a binary tensor whose selection is quite challenging and computational of the same size as as an indicator of observed costly. Therefore, we seek an elegant automatic model O Y entries. We assume is a noisy observation of true selection, which can not only infer the rank of the Y latent tensor , that is, = + ε, where the noise latent tensor , but also effectively avoid overfitting. X Y X X term is assumed to be an i.i.d. Gaussian distribution, To achieve this, a set of continuous hyperparameters i.e., ε (cid:81) (0,τ−1), and the latent tensor can are employed to control the variance related to each be exa∼ctlyir1e,.p..r,ieNseNnted by a CP model, given by X dimensionalityofthelatentspace,respectively.Sincethe minimum R is desired in the sense of low rank approxi- R (cid:88) mation, a sparsity-inducing prior is specified over these = a(1) a(N) =[[A(1),...,A(N)]], (5) X r ◦···◦ r hyperparameters,resultinginitbeingpossibletoachieve r=1 automatic rank determination as a part of the Baybesian where denotes the outer product of vectors and [[ ]] inference process. This technique is related to automatic ◦ ··· is a shorthand notation, also termed as the Kruskal relevance determination (ARD) [35] or sparse Bayesian operator. CP factorization can be interpreted as a sum learning [36]. However, unlike the traditional methods of R rank-one tensors, while the smallest integer R is that place the ARD prior over either latent variables defined as CP rank [1]. A(n) N are a set of factor or weight parameters, such as Bayesian principle com- matrices where mode-n fa{ctor m}na=t1rix A(n) RIn×R can ponent analysis [37], our method considers all model ∈ be denoted by row-wise or column-wise vectors parameters as latent variables over which a sparsity- inducing prior is placed with shared hyperparameters. (cid:104) (cid:105)T (cid:104) (cid:105) A(n) = a(n),...,a(n),...,a(n) = a(n),...,a(n),...,a(n) . 1 in In ·1 ·r ·R More specifically, we place a prior distribution over the latent factors, governed by hyperparameters λ = TheCPgenerativemodel,togetherwithnoiseassump- [λ ,...,λ ] where each λ controls rth component in tion, directly give rise to the observation model, which 1 R r A(n), which is is factorized over observed tensor elements p(cid:16)Y(cid:16)Ω(cid:12)(cid:12)(cid:12){A(n)}Nn(cid:12)(cid:12)=(cid:68)1a,(τ1)(cid:17),a=(2i)(cid:89)1,I=11··,·ai(N(cid:89)INN=)1(cid:69),τ−1(cid:17)Oi1...iN , (6) p(cid:0)A(n)(cid:12)(cid:12)λ(cid:1)=i(cid:89)nI=n1N(cid:0)a(inn)(cid:12)(cid:12)0,Λ−1(cid:1), ∀n∈[1,N], (7) N Yi1i2...iN(cid:12) i1 i2 ··· iN where Λ = diag(λ) denotes the inverse covariance where the parameter τ denotes the noise precision, and (cid:68) (cid:69) matrix,alsoknownastheprecisionmatrix,andisshared a(i11),a(i22),··· ,a(iNN) = (cid:80)r(cid:81)na(innr) denotes a general- by latent factor matrices in all modes. We can further ized inner-product of N vectors. The likelihood model define a hyperprior over λ, which is factorized over 4 latent dimensions posterior distribution of all variables in Θ given the observed data, that is, R (cid:89) p(λ)=r=1Ga(λr|cr0,dr0), (8) p(Θ|YΩ)= (cid:82) pp((ΘΘ,,YΩ))dΘ. (11) Ω Y where Ga(xa,b) = baxa−1e−bx denotes a Gamma distri- | Γ(a) Based on the posterior distribution of Θ, the predictive bution and Γ(a) is the Gamma function. distribution over missing entries, denoted by , can \Ω Sincethesparsityisenforcedinthelatentdimensions, be inferred by Y the initialization point of the dimensionality of latent (cid:90) space (i.e., R) is usually set to its maximum possible p( )= p( Θ)p(Θ )dΘ. (12) \Ω Ω \Ω Ω value, while the effective dimensionality can be inferred Y |Y Y | |Y automatically under a Bayesian inference framework. It should be noted that since the priors are shared across 3.2 ModelLearningviaBayesianInference N latent matrices, our framework can learn the same sparsitypatternforthem,yieldingtheminimumnumber An exact Bayesian inference in (11) and (12) would of rank-one terms. Therefore, our model can effectively integrateoveralllatentvariablesaswellashyperparam- infer the rank of tensor while performing the tensor eters, which is obviously analytically intractable. In this factorization,whichcanbetreatedasaBayesianlow-rank section, we describe the development of a deterministic tensor factorization. approximate inference under variational Bayesian (VB) To complete the model with a fully Bayesian treat- framework [38], [39] to learn the probabilistic CP factor- ment,wealsoplaceahyperprioroverthenoiseprecision ization model. τ, that is, We therefore seek a distribution q(Θ) to approximate p(τ)=Ga(τ|a0,b0). (9) the true posterior distribution p(Θ|YΩ) by minimizing the KL divergence, that is, For simplicity of notation, all unknowns including la- tent variables and hyperparameters are collected and (cid:0) (cid:12)(cid:12) (cid:1) (cid:90) (cid:26) q(Θ) (cid:27) denoted together by Θ = A(1),...,A(N),λ,τ . The KL q(Θ)(cid:12)(cid:12)p(Θ|YΩ) = q(Θ)ln p(Θ ) dΘ probabilistic graph model is{illustrated in Fig. 1}, from (cid:90) (cid:26)p( |Y,ΘΩ)(cid:27) which we can easily write the joint distribution of the =lnp(YΩ)− q(Θ)ln Yq(ΩΘ) dΘ, (13) model as where lnp( ) represents the model evidence, and its p(YΩ,Θ)=p(cid:16)YΩ(cid:12)(cid:12)(cid:12){A(n)}Nn=1,τ(cid:17)(cid:89)N p(cid:16)A(n)(cid:12)(cid:12)(cid:12)λ(cid:17)p(λ)p(τ). lower boundYisΩdefined by L(q)=(cid:82) q(Θ)ln(cid:110)p(qY(ΩΘ,)Θ)(cid:111)dΘ. n=1 Since the model evidence is a constant, the maximum By combining the likelihood in (6), the priors of model of the lower bound occurs when the KL divergence parameters in (7), and the hyperpriors in (8) and (9), the vanishes, which implies that q(Θ)=p(Θ Ω). |Y logarithm of the joint distribution is given by (see Sec. 1 For the initial derivation, it will be assumed that the of Appendix for details) variational distribution is factorized w.r.t. each variable Θ and therefore can be written as j τ (cid:13) (cid:16) (cid:17)(cid:13)2 (cid:96)(Θ)= (cid:13) (cid:126) [[A(1),...,A(N)]] (cid:13) −(cid:32)2 (cid:13)O Y − (cid:33) (cid:18) (cid:13)F (cid:19) q(Θ)=qλ(λ)qτ(τ)(cid:89)N qn(cid:16)A(n)(cid:17). (14) 1 (cid:88) M Tr Λ A(n)TA(n) + +a 1 lnτ n=1 − 2 2 0− n (cid:20)(cid:18)(cid:80) (cid:19) (cid:21) Itshouldbenotedthatthisistheonlyassumptionabout +(cid:88) nIn +(cr 1) lnλ the distribution, while the particular functional forms of r 2 0− r the individual factors qj(Θj) can be explicitly derived in (cid:88)drλ b τ +const, (10) turn. The optimised form of the jth factor based on the − 0 r− 0 maximization of (q) is given by r L where M = (cid:80)i1,...,iN Oi1...iN denotes the total number lnqj(Θj)=Eq(Θ\Θj)[lnp(YΩ,Θ)]+const, (15) of observations. Without loss of generality, we can per- form maximum a posteriori (MAP) estimation of Θ by where E [] denotes an expectation w.r.t. the q dis- q(Θ\Θj) · maximizing (10), which is, to some extent, equivalent tributionsoverallvariablesexceptΘ .Sincethedistribu- j to optimizing a squared error function with regular- tions of all parameters are drawn from the exponential izations imposed on the factor matrices and additional family and are conjugate w.r.t. the distributions of their constraints imposed on the regularization parameters. parents (see Fig. 1), we can derive the closed-form However, our objective is to develop a method that, posterior update rules for each parameter in Θ by using in contrast to the point estimation, computes the full (15). 5 3.2.1 Posteriordistributionoffactormatrices then we have   AFisg.c1a,nthbeeinsfeereenncferoomfmthoedeg-nrafpahctiocarlmmatordixelAs(nh)owcannbine vec (cid:88) (cid:126)n (cid:16)E(cid:104)a(inn)a(inn)T(cid:105)(cid:17)=(cid:16)(cid:75)B(n)(cid:17)T 1(cid:81)nIn, performedbyreceivingthemessagesfromobserveddata i1,...,iN n and its co-parents, including other factors A(k),k = n (21) (cid:54) and the hyperparameter τ, which are expressed by the (cid:81) likelihoodterm(6),andincorporatingthemessagesfrom where 1(cid:81)nIn denotes a vector of length nIn and all elements are equal to one. its parents, which are expressed by the prior term (7). According to Theorem 3.1 and the computation form By applying (15), it has been shown that their posteriors in(21),thetermE (cid:2)A(\n)TA(\n)(cid:3)in(17)canbeevaluated can be factorized as independent distributions of their q in in efficiently by rows, which are also Gaussian (see Sec. 2 of Appendix for details), given by vec(cid:16)E (cid:2)A(\n)TA(\n)(cid:3)(cid:17)=(cid:16)(cid:75)B(k)(cid:17)T vec( ), q in in O···in··· qn(A(n))= (cid:89)In N (cid:16)a(inn)(cid:12)(cid:12)(cid:12)a˜(inn),Vi(nn)(cid:17), ∀n∈[1,N] (16) k(cid:54)=n (22) in=1 where denotes a subtensor by fixing model-n O···in··· where the posterior parameters can be updated by index to i . It should be noted that the Khatri-Rao n product is computed by all mode factors except the a˜(inn) =Eq[τ]Vi(nn)Eq(cid:2)A(in\n)T(cid:3)vec(cid:0)YI(Oin=1)(cid:1) nth mode, while the sum is performed according to the V(n) =(cid:16)E [τ]E (cid:2)A(\n)TA(\n)(cid:3)+E [Λ](cid:17)−1, (17) indices of observations, implying that only factors that in q q in in q interact with a(n) are taken into account. Another com- in where YI(Oin=1) is a sample function denoting a subset plicatedpartin(17)canalsobesimplifiedbymultilinear of the observed entries , whose mode-n index is i , operations, i.e., Ω n Y ia.ei(nn.,).thTeheobmsoersAvtec(ind\onm)eTnpt=lreixe(cid:16)stk(cid:75)e(cid:54)=arsmnsAoicn(ika)(t(cid:17)1eTI7d()Otiiosn=rte1h)le,ateladtetnot fac(1to8r) Eq(cid:2)=A(i(cid:16)n\nk(cid:75))(cid:54)=Tn(cid:3)Eveqc[A(cid:0)Y(kI)(]O(cid:17)iTnv=e1c)(cid:8)(cid:1)(O(cid:126)Y)···in···(cid:9). (23) Finally, the variational posterior approximation of where ((cid:74) A(k))T is of size R (cid:81) I , and each factor matrices can be updated by (17). On the ba- k(cid:54)=n (cid:126) (k)× k(cid:54)=n k sis of the approximated posterior, the posterior mo- column is computed by a with varying mode- sketinodfexcoliukm. nTshesasmypmlebdol ka((cid:54)=c·cn)oI(rOdikiinn=g1)todetnhoetessubatensusobr- manednEtsq,(cid:2)iani(cnnl)uadi(nnin)Tg(cid:3),∀Enq,(cid:2)∀ai(innn,)TEaq(i(cid:2)nna)(i(cid:3)nn,)c(cid:3)a,nVbaer(cid:0)eaa(isnni)ly(cid:1),eEvqa(cid:2)luAa(tne)d(cid:3),, vec( ) = 1. Hence, E [A(\n)TA(\n)] denotes the which are required by the inference of other hyperpa- postOeri··o·irn·c··ovariance matrixqof itnhe Khiantri-Rao product rameters in Θ. of latent factors in all modes except the nth-mode, and An intuitive interpretation of (17) is given as follows. (n) iosbcseormvepduteendtrbieysownlhyostheemcoodluem-nnisndcoerxreisspionn.dIninogrdtoerthtoe TthheepproiostreriniofrorcmovaatiroiannEceq[VΛ]inanids uthpedaptoedstebryiocroimnfboirnminag- evaluate this posterior covariance matrix, first we need tion from other factor matrices computed by (22), while to introduce the following results. the tradeoff between these two terms is controlled by E [τ] that is related to the quality of model fitting. In q Theorem 3.1. Given a set of independent random matrices otherwords,thebetterfitnessofthecurrentmodelleads {A(n) ∈RIn×R(n|n) =1,...,N}, we assume that ∀n,∀in, the to more information from other factors than from prior row vectors {ain } are independent, then information. The posterior mean a˜(inn) is updated firstly E(cid:34)(cid:16)(cid:75)A(n)(cid:17)T(cid:16)(cid:75)A(n)(cid:17)(cid:35)= (cid:88) (cid:126)(cid:16)E(cid:104)a(n)a(n)T(cid:105)(cid:17) b(2y3)l,inwehaerrceotmhebicnoaetfifiocnieonftsalalreotohbesrefravcetdorvsa,leuxepsr.eTshseisdimby- n in in n n i1,...,iN plies that the larger observation leads to more similarity where E(cid:104)a(n)a(n)T(cid:105)=E[a(n)]E[a(n)T]+Var(cid:0)a(n)(cid:1). (19) of its corresponding latent factors. Then, a˜i(nn) is rotated in in in in in by V(n) to obtain the property of sparsity and is scaled Proof: See Sec. 3 of Appendix for details. accordining to the model fitness E [τ]. q For simplicity, we attempt to compute (19) by multi- linear operations. Let n, B(n) of size I R2 denote 3.2.2 Posteriordistributionofhyperparametersλ n ∀ × an expectation of a quadratic form related to A(n) by It should be noted that, instead of point estimation via defining the inth-row vector as optimizations, learning the posterior of λ is crucial for (cid:16) (cid:104) (cid:105)(cid:17) (cid:16) (cid:17) automatic rank determination. As seen in Fig. 1, the bi(nn) =vec Eq a(inn)a(inn)T =vec a˜(inn)a˜(inn)T +Vi(nn) , inference of λ can be performed by receiving messages (20) from N factor matrices and incorporating the messages 6 fromitshyperprior.Byapplying(15),wecanidentifythe Theorem 3.2. Assume a set of independent R-dimensional posteriors of λ , r [1,R] as an independent Gamma random vectors x(n) n=1,...,N , then r ∀ ∈ { | } distribution (see Sec. 4 of Appendix for details), (cid:20)(cid:68) (cid:69)2(cid:21) (cid:68) (cid:104) (cid:105) (cid:104) (cid:105)(cid:69) E x(1),...,x(N) = E x(1)x(1)T ,...,E x(N)x(N)T , R (cid:89) q (λ)= Ga(λ cr ,dr ), (24) (30) λ r| M M where the left term denotes the expectation of the squared r=1 inner product of N vectors, and the right term denotes the where cr , dr denote the posterior parameters learned M M innerproductofN matrices,whereeachmatrixofsizeR R from M observations and can be updated by × denotes an expectation of the outer product of the nth vector, N respectively. 1 (cid:88) crM =cr0+ 2 In, Proof: See Sec. 6 of Appendix for details. n=1 (25) Theorem 3.3. Given a set of independent random matrices N dr =dr+ 1 (cid:88)E (cid:104)a(n)Ta(n)(cid:105). A(n) n = 1,...,N , we assume that n, in, the row M 0 2 q ·r ·r {vectors| a(n) are inde}pendent, then ∀ ∀ n=1 { in } The expectation of the inner product of the rth com- (cid:20)(cid:13) (cid:13)2(cid:21) ponent in mode-n matrix w.r.t. q distribution can be E (cid:13)(cid:13)[[A(1),...,A(N)]](cid:13)(cid:13) F evaluated using the posterior parameters in (16), i.e., = (cid:88) (cid:68)E(cid:104)a(1)a(1)T(cid:105),...,E(cid:104)a(N)a(N)T(cid:105)(cid:69). (31) Eq(cid:104)a(·rn)Ta(·rn)(cid:105)=a˜(·rn)Ta˜(·rn)+(cid:88)(cid:16)Vi(nn)(cid:17)rr. (26) i1,...,iN i1 i1 iN iN in Let B(n) denote the expectation of a quadratic form related to (cid:16) (cid:104) (cid:105)(cid:17) By combining (25) and (26), we can further simplify the A(n) withi th-rowvectorb(n) =vec E a(n)a(n)T ;thus, computation of d =[d1 ,...dR]T as n in in in M M M (31) can be computed by dM =n(cid:88)N=1(cid:40)diag(cid:32)A˜(n)TA˜(n)+(cid:88)in Vi(nn)(cid:33)(cid:41), (27) E(cid:20)(cid:13)(cid:13)(cid:13)[[A(1),...,A(N)]](cid:13)(cid:13)(cid:13)2F(cid:21) = 1T(cid:81)nIn(cid:32)(cid:75)n B(n)(cid:33)1R2. where A˜ = Eq(cid:2)A(n)(cid:3). Hence, the posterior expectation Proof: See Sec. 7 of Appendix for details. can be obtained by E [λ] = [c1 /d1 ,...,cR/dR]T, and thus, E [Λ]=diag(E [qλ]). M M M M From Theorems 3.2 and 3.3, the posterior expectation q q termin(29)canbeevaluatedexplicitly.Duetothemiss- Anintuitiveinterpretationof(25)isthatλ isupdated r ingentriesin ,theevaluationformisfinallywrittenas by the sum of squared L2-norm of rth component, Y (see Sec. 8 of Appendix for details) expressed by (26), from N factor matrices. Therefore, the smaller of (cid:107)a·r(cid:107)22 leads to larger Eq[λr] and updated E (cid:20)(cid:13)(cid:13) (cid:126)(cid:16) [[A(1),...,A(N)]](cid:17)(cid:13)(cid:13)2(cid:21) priors of factor matrices, which in turn enforces more q (cid:13)O Y − (cid:13)F strongly the rth component to be zero. (cid:16) (cid:17) = 2 2vecT( )vec [[A˜(1),...,A˜(N)]] (cid:107)YΩ(cid:107)F − YΩ Ω 3.2.3 Posteriordistributionofhyperparameterτ (cid:32) (cid:33) (cid:75) The inference of the noise precision τ can be performed +vecT( ) B(n) 1R2, (32) O byreceivingthemessagesfromobserveddataanditsco- n parents, including N factor matrices, and incorporating where A˜(n) = E (cid:2)A(n)(cid:3) and B(n) is computed by (20). q the messages from its hyperprior. By applying (15), the Hence,theposteriorapproximationofτ canbeobtained variational posterior is a Gamma distribution (see Sec. 5 by (29) together with the posterior expectation E [τ] = q of Appendix for details), given by a /b . M M An intuitive interpretation of (29) is straightforward. q (τ)=Ga(τ a ,b ), (28) τ | M M aM is related to the number of observations and bM is where the posterior parameters can be updated by related to the residual of model fitting measured by the squared Frobenius norm on observed entries. 1 (cid:88) a =a + M 0 2 Oi1...iN 3.2.4 Lowerboundofmodelevidence i1,...,iN 1 (cid:20)(cid:13) (cid:16) (cid:17)(cid:13)2(cid:21) The inference framework presented in the previous sec- b =b + E (cid:13) (cid:126) [[A(1),...,A(N)]] (cid:13) . M 0 2 q (cid:13)O Y − (cid:13)F tion can essentially maximize the lower bound of model evidence that is defined in (13). Since the lower bound (29) should not decrease at each iteration, it can be used However,theposteriorexpectationofmodelerrorinthe to test for convergence. The lower bound of the log- aboveexpressioncannotbecomputedstraightforwardly, marginal likelihood is computed by andtherefore,weneedtointroducethefollowingresults. (q)=E [lnp( ,Θ)]+H(q(Θ)), (33) q(Θ) Ω L Y 7 where the first term denotes the posterior expectation Algorithm 1 Fully Bayesian CP Factorization (FBCP) of joint distribution, and the second term denotes the Input: an Nth-order incomplete tensor and an Ω Y entropy of posterior distributions. indicator tensor . Various terms in the lower bound are evaluated and Initialization: A˜O(n),V(n), i [1,I ], n [1,N], derived by taking parametric forms of q distribution, a ,b ,c ,d , and τ =ian /∀b n, λ∈ = crn/dr∀, r∈ [1,R]. 0 0 0 0 0 0 r 0 0 ∀ ∈ giving the following results (see Sec. 9 of Appendix for details) repeat for n=1 to N do (q)= L a (cid:20)(cid:13) (cid:16) (cid:17)(cid:13)2(cid:21) Update the posterior q(A(n)) using (17); M E (cid:13) (cid:126) [[A(1),...,A(N)]] (cid:13) end for − 2bM q (cid:13)O Y − (cid:13)F Update the posterior q(λ) using (25); (cid:40) (cid:32) (cid:33)(cid:41) 1Tr Λ˜ (cid:88) A˜(n)TA˜(n)+(cid:88)V(n) Update the posterior q(τ) using (29); − 2 in Evaluate the lower bound using (34); n in Reduce rank R by eliminating zero-components of + 1(cid:88)(cid:88)(cid:110)ln(cid:12)(cid:12)V(n)(cid:12)(cid:12)(cid:111)+(cid:88)(cid:26)lnΓ(cr )(cid:27) (34) (cid:8)A(n)(cid:9) (an optional procedure); 2 (cid:12) in (cid:12) M until convergence. n in r (cid:88)(cid:26) (cid:18) dr (cid:19)(cid:27) Computation of predictive distributions using (35). + cr 1 lndr 0 +lnΓ(a ) M − M − dr M r M b +a (1 lnb 0 )+const. new prior in the subsequent iteration, which in turn M M − − b M affects λ. Hence, if the posterior mean of λ becomes r TheposteriorexpectationofmodelerrordenotedbyEq[] very large, the rth components in A(n) , n [1,N] · { } ∀ ∈ can be computed using (32). are forced to be zero because of their prior information, An intuitive interpretation of (34) is as follows. The and the tensor rank can be obtained by simply count- first term is related to model residual; the second term ing the number of non-zero components in the factor is related to the weighted sum of squared L2-norm of matrices. For implementation of the algorithm, we can eachcomponentinfactormatrices,whiletheuncertainty keep the size of A(n) unchanged during iterations; an { } information is also considered; the rest terms are related alternative method is to eliminate the zero-components to negative KL divergence between the posterior and of A(n) after each iteration. { } prior distributions of hyperparameters. 3.3 PredictiveDistribution 3.2.5 Initializationofmodelparameters The predictive distributions over missing entries, given The variational Bayesian inference is guaranteed to con- observed entries, can be approximated by using varia- verge only to a local minimum. To avoid getting stuck tional posterior distribution, that is, in poor local solutions, it is important to choose an (cid:90) initialization point. In our model, the top level hyperpa- rametersincludingc ,d ,a ,b aresetto10−6,resulting p(Yi1...iN|YΩ)= p(Yi1...iN|Θ)p(Θ|YΩ)dΘ 0 0 0 0 iEn[τa] n=on1i.nFfoorrmtahteivfeacptroirorm. Tahtruicse,sw, eEh[aAv(en)E][ΛN] =caInanbde (cid:39)(cid:90)(cid:90) p(cid:16)Yi1...iN(cid:12)(cid:12)(cid:12)(cid:110)a(inn)(cid:111),τ−1(cid:17)q(cid:16)(cid:110)a(inn)(cid:111)(cid:17)q(τ)d(cid:110)a(inn)(cid:111) dτ. { }n=1 (35) initialized by two different strategies, one is randomly drawn from (0,I) for a(n), i [1,I ], n [1,N]. We can now approximate these integrations, N in ∀ n ∈ n ∀ ∈ 1 yielding a Student’s t-distribution The other is set to A(n) = U(n)Σ(n)2, where U(n) (˜ , ,ν ) (see Sec. 10 ofYAi1p...piNen|YdiΩx fo∼r denotes the left singular vectors and Σ(n) denotes the T Yi1...iN Si1...iN y details) with its parameters given by diagonal singular values matrix, obtained by SVD of (cid:68) (cid:69) mode-nmatricizationoftensor .Thecovariancematrix Y˜ = a˜(1),··· ,a˜(n) , ν =2a , V(n) is simply set to I. The tYensor rank R is usually i1...iN i1 iN y M initialized by the weak upper bound on its maximum S =(cid:40)bM +(cid:88)(cid:40)(cid:18)(cid:126)a˜(k)(cid:19)T V(n)(cid:18)(cid:126)a˜(k)(cid:19)(cid:41)(cid:41)−1. rank, i.e., R≤minnPn, where Pn =(cid:81)i(cid:54)=nIi. In practice, i1...iN aM n k(cid:54)=n ik in k(cid:54)=n ik we can also manually define the initialization value of Thus, the predictive variance can be obtained by R for computational efficiency. Var( )= νy −1 . Yi1...iN νy−2Si1...iN 3.2.6 Interpretaionofautomaticrankdetermination 3.4 ComputationalComplexity The entire procedure of model inference is summarized in Algorithm 1. It should be noted that tensor rank is The computation cost of the N factor matrices in (17) determined automatically and implicitly. More specifi- is O(NR2M +R3(cid:80) I ), where N is the order of the n n cally, updating λ in each iteration results in a new prior tensor, M denotes the number of observations, i.e., the over A(n) ,andthen, A(n) canbeupdatedusingthis inputdatasize.Risthenumberoflatentcomponentsin { } { } 8 each A(n), i.e., model complexity or tensor rank, and is coefficients to be 1. For model learning, we can easily generally much smaller than the data size, i.e., R M. verify that the posterior distribution is also a mixture (cid:28) Hence, it has linear complexity w.r.t. the data size and distribution. For simplicity, we set k,β = 0, thus the k ∀ polynomial complexity w.r.t. the model complexity. It posterior mean of factor matrix can be updated firstly should be noted that, because of the automatic model by (17) and then applying E [A(n)] WE [A(n)], while q q selection, the excessive latent components are pruned the posterior covariance V(n) In← keep unchanged. { in }in=1 outinthefirstfewiterationssuchthatRreducesrapidly Furthermore, the inference of all other variables do not inpractice.Thecomputationcostofthehyperparameter need any changes. λ in (25) is O(R2(cid:80) I ), which is dominated by the n n model complexity, while the computation cost of noise precision τ in (29) is O(R2M). Therefore, the overall 5 EXPERIMENTAL RESULTS complexity of our algorithm is O(NR2M +R3), which scales linearly with the data size but polynomially with We conducted extensive experiments using both syn- the model complexity. thetic data and real-world applications, and compared our fully Bayesian CP factorization (FBCP)1 with seven state-of-the-art methods. Tensor factorization based 3.5 DiscussionofAdvantages schemeincludesCPWOPT[7]andCPNLS[8],[40],while The advantages of our method are discussed as follows: the completion based scheme includes HaLRTC and • The automatic determination of CP rank enables us to FaLRTC [10], FCSA [12], hard-completion (HardC.) [11], obtain an optimal low-rank tensor approximation, geomCG [9] and STDC [17]. Our objective when us- even from a highly noisy and incomplete tensor. ing synthetic data was to validate our method from • Our method is characterized as a tuning parameter- several aspects: i) capability of rank determination; ii) free approach and all model parameters can be reconstruction performance given a complete tensor; iii) inferred from the observed data, which avoids the predictiveperformanceovermissingentriesgivenanin- computationalexpensiveparameterselectionproce- complete tensor. Two real-world applications including dure. In contrast, the existing tensor factorization image inpainting and facial image synthesis were used methodsrequireapredefinedrank,whilethetensor for demonstration. All experiments were performed by completionmethodsbasedonnuclearnormrequire a PC (Intel Xeon(R) 3.3GHz, 64GB memory). several tuning parameters. • The uncertainty information over both latent factors andpredictionsofmissingentriescanbeinferredby 5.1 ValidationonSyntheticData ourmethod,whilemostexistingtensorfactorization The synthetic tensor data is generated by the follow- and completion methods provide only the point ing procedure. N factor matrices A(n) N are drawn estimations. { }n=1 (n) from a standard normal distribution, i.e., n, i ,a • An efficient and deterministic Bayesian inference is ∀ ∀ n in ∼ (0,I ). Then, the true tensor is constructed by = developed for model learning, which empirically N R X [[A1,...,A(N)]], and an observed tensor by = +ε, shows a fast convergence. whereε (cid:81) (0,σ2)denotesi.i.d.addYitiveXnoise. ∼ i1,...,iN N ε Themissingentries,chosenuniformly,aremarkedbyan 4 MIXTURE FACTOR PRIORS indicator tensor . O The low-rank assumption is powerful in general cases, however if the tensor data does not satisfy an intrinsic 5.1.1 Atoyexample low-rank structure and a large amount of entries are missing, it may yield an oversimplified model. In this To illustrate our model, we provide two demo videos in section,wepresentavariantofBayesianCPfactorization the supplemental materials. A true latent tensor is of X model which can take into account the local similarity size10 10 10withCPrankR=5,thenoiseparameter × × in addition to the low-rank assumption. was σε2 =0.001, and 40% of entries were missing. Then, WespecifyaGaussianmixtureprioroverfactormatri- we applied our method with the initial rank being set cessuchthatthepriordistributionin(7)canberewritten to 10. As shown in Fig. 2, three factor matrices are as i [1,I ], n [1,N], inferredinwhichfivecomponentsareeffectivelypruned n n ∀ ∈ ∀ ∈ p(cid:0)a(inn)(cid:12)(cid:12)λ,{a(kn)}(cid:1)=win,inN(cid:0)0,Λ−1(cid:1)+(cid:88) win,kN(cid:0)a(kn),βk−1I(cid:1), olouwt,errebsouultnindgoifnmcoodrerelcetveidsteinmcaetiinocnreoafsetesnmsoornoratonnki.caTlhlye, k(cid:54)=in which indicates the effectiveness and convergence of (cid:80) where kwin,k = 1. This indicates that the inth row our algorithm. Finally, the posterior of noise precision vector a(n) is similar to kth row vectors with the proba- τ 1000 implies the method’s capability of denoising bilityofiwnin,k.Basedonourassumptionthattheadjacent an≈d the estimation of σε2 ≈0.001, SNR=10logτσ−X21. rows are highly correlated, we can define the mixture coefficients by w = z exp( i j 2) where z = 1/(cid:80)jexp(−|i−j|2i,)jisuseditoen−s|ure−the| sumofmixiture ~q1ib.iMn/ahtloambecpoadgees/BaaryeesaTveanislaobrFleacatotrihztattpio:/n/.hwtmwlw.bsp.brain.riken.jp/ 9 Mode−1 Mode−2 Mode−3 10 10 s SVD Init. Random Init. SVD Init. Random Init. ce 2 2 2 8 0% 10% 72% 78% 62% 8 60% 80% 100% 44% 0% Factor matri 10468 10468 10468 Inferred Rank 246 2% 20% 70% 100% 100% Inferred Rank 246 100% 92% 98% 44% 0% 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 Latent components 0 0 −20 −10 0 10 20 0 0.5 0.7 0.9 0.95 Posterior mean of λ Lower bound Posterior distribution SNR (dB) Missing ratio 0 0.01 (a) (b) 100 −1000 0.005 10 20x20x20 50x50x50 Rank=10 Rank=15 50 −2000 8 54% 92% 96% 90% 52% 20 22% 34% 0% Fig. L02a.1te2n3At 4d5itmo6ey7n8se9i1ox0nas−m30p0le00illuItesrt2ar0taiotning40FBC50PN00oaispep1 p0lri0ee0cdisio1o5nn0 τ0an Inferred Rank 246 80% 90% 62% 0% 0% Inferred Rank11055 96% 84% 0% incomplete tensor. The top row shows Hinton diagram 0 0 0 0.5 0.7 0.9 0.95 0 0.5 0.9 of factor matrices, where the color and size of each Missing ratio Missing ratio (c) (d) square represent the sign and magnitude of the value, respectively. The bottom row shows the posterior of λ, Fig. 3. Determination of tensor rank under varying ex- thelowerboundofmodelevidence,andtheposteriorofτ perimental conditions. Each vertical bar shows the mean fromlefttoright. andstandarddeviationofestimationsfrom50repetitions, while the accuracy of detections is shown on the top of the corresponding bar. The red and blue horizontal dash 5.1.2 Automaticdeterminationoftensorrank dottedlinesindicatethetruetensorrank. To evaluate the automatic determination of tensor rank (i.e., CP rank), extensive simulations were performed presented, our model can achieve 90% accuracy under under varying experimental conditions related to tensor the condition of SNR=0 dB and 0.5 missing ratio. It size, tensor rank, noise level, missing ratio, and the should be noted that, when the data size is larger, such initializationmethodoffactormatrices(e.g.,SVDorran- as50 50 50,ourmodelcanachieve90%accuracy,even dom sample). Each result is evaluated by 50 repetitions × × when SNR=0 dB and the missing ratio is 0.9. If the true corresponding to 50 different tensors generated under rank is larger, such as R = 15, the model can correctly thesamecriterion.Therearefourgroupsofexperiments. recover the rank from a complete tensor, but fails to do (A) Given complete tensors of size 20 20 20 with × × so when the missing ratio is larger than 0.5. R = 5, the evaluations were performed under varying We can conclude from these results that the deter- noise levels and by two different initializations (see mination of the tensor rank depends primarily on the Fig.3(a)).(B)Givenincompletetensorsofsize20 20 20 × × number of observed entries and the true tensor rank. In with R = 5 and SNR=20 dB, the evaluations were general, more observations are necessary if the tensor performed under five different missing ratios, and by rank is larger; however, when high-level noise occurs, different initializations (see Fig. 3(b)). (C) Given incom- theexcessivenumberofobservationsmaynotbehelpful plete tensors with R=5 and SNR=0 dB, the evaluations for rank determination. were performed under varying missing ratios and two differenttensorsizes(seeFig.3(c)).(D)Givenincomplete tensors of size 20 20 20 with SNR=20 dB, the evalua- 5.1.3 Predictiveperformance × × tions were performed under varying missing ratios and In this experiment, we considered incomplete tensors two different true ranks (see Fig. 3(d)). of size 20 20 20 generated by the true rank R = × × From Fig. 3, we observe that SVD initialization is 5 and SNR=30 dB under varying missing ratios. The slightly better than random initialization in terms of the initial rank was set to 10. The relative standard error determination of tensor rank. If the tensor is complete, RSE = (cid:107)Xˆ−X(cid:107)F, where ˆ denotes the estimation of the (cid:107)X(cid:107)F X our model can detect the true tensor rank with 100% truetensor ,wasusedtoevaluatetheperformance.To X accuracy when SNR 10 dB. Although the accuracy de- ensure statistically consistent results, the performance is ≥ creasedto70%underahighnoiselevelof0dB,theerror evaluatedby50repetitionsforeachcondition.Asshown deviation is only 1. On the other hand, if the tensor is in Fig. 4, our method significantly outperforms other ± incomplete and almost free of noise, the detection rate algorithms under all missing ratios. Factorization-based is 100%, when missing ratio is 0.7, and is 44% with an methods,includingCPWOPT,andCPNLSshowabetter error deviation of only 1, even under a high missing performance than completion-based methods when the ± ratioof0.9.Asbothmissingdataandhighnoiselevelare missing ratio is relatively small, while they perform 10 0.1 TABLE1 0.08 Performance(RSEs)evaluatedonmissingpixels.“NF”,“N” indicatenoisefreeornoisyimage.“T”,“S”indicatethetext E0.06 S corruptionorscrabbledimage. R0.04 0.02 Facade Lenna Non-random Method 0 0 0.3 0.5 NF N NF N T S Missing ratio FBCP 0.13 0.17 0.17 0.20 0.13 0.14 1 CPWOPT 0.33 0.41 0.25 0.41 0.18 0.18 FBCP 0.8 CPNLS 0.84 0.84 0.62 0.73 0.22 0.30 CPWOPT CPNLS HaLRTC 0.15 0.21 0.19 0.21 0.29 0.28 SE0.6 FaLRTC FaLRTC 0.16 0.21 0.19 0.22 0.29 0.29 R0.4 FHCarSdAC. FCSA 0.19 0.21 0.19 0.21 0.28 0.28 0.2 geomCG HardC. 0.19 0.25 0.20 0.23 0.31 0.30 STDC STDC 0.14 0.22 0.11 0.20 0.15 0.16 0 0.7 0.9 Missing ratio (A) Structural image with uniformly random missing pixels. Fig.4. PredictiveperformancewhenSNR=30dB. A building facade image with 95% missing pixels under two noise conditions, i.e., noise free and SNR=5dB, were considered as observations. (B) Natural image with worse than completion methods when the missing ratio uniformly random missing pixels. The Lena image of size is large, e.g., 0.9. FaLRTC, FCSA, and HardC. achieve 300 300 with 90% missing pixels under two noise similar performances, because they are all based on × conditions, i.e., noise free and SNR=10dB, were con- nuclear norm optimization. geomCG achieves a perfor- sidered. (C) Non-random missing pixels. We conducted mance comparable with that of CWOPT and CPNLS two experiments for image restoration from a corrupted when data is complete, while it fails as the missing ratio image: 1) The Lenna image corrupted by superimposed becomes high. This is because geomCG requires a large text was used as an observed image2. In practice, the number of observations and precisely defined rank. It location of text pixels are difficult to detect exactly; we should be noted that STDC outperforms all algorithms can simply indicate missing entries by a value larger except FBCP as the missing ratio becomes extremely than 200 to ensure that the text pixels are completely high. These results demonstrate that FBCP, as a tensor missing. 2) The scrabbled Lenna image was used as an factorization method, can be also effective for tensor observed image and pixels with values larger than 200 completion, even when an extremely sparse tensor is can be marked as missing. (D) Object removal. Given an presented. imageandamaskcoveringtheobjectarea,ourgoalwas We also conducted two additional experiments. One tocompletetheimagewithoutthatobject.Thealgorithm is the reconstruction from a complete tensor, the other settings of compared methods are described as follows. is the tensor completion when the noise level is high, For factorization-based methods, the initial rank was set i.e., SNR =0 dB. The results of these two experiments to 50 in cases of (A) and (B) due to the high missing are presented in Appendix (see Sec. 11, 12). ratios, and 100 in cases of (C) and (D). For completion- based methods, the tuning parameters were chosen by 5.2 ImageInpainting multiplerunsandperformanceevaluatedontheground- truth of missing pixels. The visual effects of image inpainting are shown in Fig. 6, and the predictive performances are shown in Table 1 where case (D) is not available due to the lack of ground-truth. In case (A), we observe that FBCP outperformsothermethodsforastructuralimageunder an extremely high missing ratio and the superiority is more significant when an additive noise is involved. In case(B),observethatSTDCobtainsthebestperformance followed by FBCP that is better than other methods. Fig.5. Ground-truthofeightbenchmarkimages. However, STDC is severely degraded when noise is involved, and obtains the same performance as FBCP, In this section, the applications of image inpainting while its visual quality is still much better than others. based on several benchmark images, shown in Fig. 5, These indicate that the additional smooth constraints are used to evaluate and compare the performance of in STDC are suitable for natural image. In case (C), differentmethods.Thecolorfulimagecanberepresented FBCP is superior to all other methods, followed by byathird-ordertensorofsize200 200 3.Weconducted × × various experiments under four groups of conditions. 2.Ademovideoisavailableinthesupplementalmaterials.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.