ebook img

Bi-Objective Nonnegative Matrix Factorization: Linear Versus Kernel-Based Models PDF

0.38 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bi-Objective Nonnegative Matrix Factorization: Linear Versus Kernel-Based Models

1 Bi-Objective Nonnegative Matrix Factorization: Linear Versus Kernel-Based Models Fei Zhu, Paul Honeine, Member, IEEE Abstract—Nonnegative matrix factorization (NMF) is a pow- problem consists of extracting the endmembers (recorded erful class of feature extraction techniques that has been suc- in the first low-rank matrix), and estimating the abundance cessfully applied in many fields, namely in signal and image 5 of each endmember at every pixel (recorded in the second processing. Current NMF techniques have been limited to a 1 one).Obviously,theabovephysicalinterpretationrequiresthe single-objective problem in either its linear or nonlinear kernel- 0 based formulation. In this paper, we propose to revisit the nonnegativityon bothabundancesand endmemberspectrums. 2 NMF as a multi-objective problem, in particular a bi-objective The NMF is a linear model, since it can be viewed in n one, where the objective functions defined in both input and a way that each input spectral is approximated by a linear a featurespacesaretakenintoaccount.Bytakingtheadvantageof combinationofsomebasisspectrums.Toestimatethedecom- J the sum-weighted method from the literature of multi-objective position, the objective function for minimization is defined in 2 optimization, the proposed bi-objective NMF determines a set 2 of nondominated, Pareto optimal, solutions instead of a single an Euclidean space — the so-called input space X —, where optimaldecomposition.Moreover,thecorrespondingParetofront thedifferencebetweentheinputmatrixandtheproductofthe ] is studied and approximated. Experimental results on unmixing estimated ones is usually measured either by the Frobenius L real hyperspectral images confirm theefficiency of the proposed norm or by generalized Kullback-Leibler divergence [10]. M bi-objective NMF compared with the state-of-the-art methods. These objective functions are often augmented by including . Index Terms—Kernel machines,nonnegative matrix factoriza- different regularization terms, such as the Fisher constraint t tion, Pareto optimal, hyperspectral image, unmixing problem. a for learning local features [11], the sparseness constraint for t s intuitive and easily interpretable decompositions [12], the [ I. INTRODUCTION temporal smoothness and spatial decorrelation regularization 1 [13],andtheminimumdispersionregularizationforunmixing v accuracy [14]. Other objective functions are also raised from 4 NONNEGATIVE MATRIX FACTORIZATION (NMF) practical standpoints, e.g., the ℓ -norm for the robustness 1 8 providesaparts-basedrepresentationforthenonnegative against outliers and missing data [15] and the Bregman di- 6 data entries, and has becoming a versatile technique with vergence with fast computational performance [16]. 5 plenty of applications [1]. As opposed to other dimension- Many studies have shown the limits of a linear decompo- 0 . alityreductionapproaches,e.g.,principalcomponentanalysis, sition, as opposed to a nonlinear one [17], [18], [19]. While 1 vectorquantizationandlinear discriminantanalysis, the NMF most research activities have been concentrated on the linear 0 5 is based on the additivity of the contributions of the bases NMF, a few works have considered the nonlinear case. In 1 to approximate the original data. Such decomposition model an attempt to extent the linear NMF models to the nonlinear : often yields a tractable physical interpretation thanks to the scope, several kernel-based NMF have been proposed within v i sparse and nonnegative obtained representation of the input theframeworkofferedbythekernelmachines[20].Employing X data. Many real world applicationsbenefit from these virtues, anonlinearfunction,thekernel-basedmethodsmainlymapthe r including hyperspectral unmixing [2], [3], face and facial dataintoahigherdimensionalspace,wheretheexistinglinear a expressionrecognition[4],[5],geneexpressiondata[6],blind techniquesare performedon the transformeddata. The kernel sourceseparation[7], andspectralclustering[8], [9], to name trick enables the estimation of the inner product between any a few. pair of mapped data in a reproducing kernel Hilbert space The NMF approximatesa high-rank nonnegativeinput ma- — the so-called feature space H —, without the need of trix by two nonnegative low-rank ones. As a consequence, it knowing explicitly neither the nonlinear map function nor providesa decompositionsuitable for many signal processing the resulting space. For example, in [20], the linear NMF anddataanalysisproblems,andinparticularthehyperspectral technique is performed on the kernel matrix, whose entries unmixing problem. Indeed, a hyperspectral image is a cube consist of the inner products between input data calculated that consists of a set of images of the scene under scrutiny, with some kernel function. Other kernel-based NMF tech- each correspondingto a groundscene fromwhich the lightof niques presented in [21], [22], [23] follow a similar scheme certain wavelength is reflected. Namely, a reflectance spectral but share an additive assumption originated from the convex over a wavelength range is available for each pixel. It is NMF approach proposed in [21], that is, the basis matrix is assumed that each spectral is a mixture of a few “pure” represented as the convex combination of the mapped input materials, called endmembers. The hyperspectral unmixing datainthefeaturespaceH.Itisworthnotingthattheobjective function is the Frobenius norm of the residual between the F. Zhu and P. Honeine are with the Institut Charles Delaunay (CNRS), kernelmatrixanditsfactorization,foralltheabove-mentioned Universite´ deTechnologie deTroyes,France. Email:[email protected] [email protected] kernel-basedNMFmethods.However,althoughtheinputdata 2 matrix is nonnegative, the nonnegativity of the mapped data The remainder of this paper is organized as follows. We is not guaranteed. A more severe disadvantage is that the first revisit the conventional and kernel-based NMF. The dif- obtained bases lie in the feature space (often of infinite ferencesbetween the input and the featurespace optimization dimension), where a reverse mapping to the input space is are discussed in Section III. In Section IV, we present the difficult.Indeed,oneneedstosolvethepre-imageproblem,an proposed bi-objective NMF framework. Section V demon- obstacleinheritedfromthekernelmachines[24]. In[3], these strates the efficiency of the proposed method for unmixing difficultiesarecircumventedbydefiningamodelinthefeature two real hyperspectral images. Conclusions and future works spacethatcanbeoptimizeddirectlyinthe inputspace.Inthis are reported in Section VI. paper, we revisit this framework to discover the nonlinearity of the input matrix. See Section II-B for more details. II. A PRIMER ONTHE LINEARAND NONLINEARNMF In either its linear conventionalformulationor its nonlinear In this section, we present the two NMF variants, with the kernel-basedformulation,aswellasalloftheirvariations(and linear and the nonlinear models, as well as the corresponding regularizations),theNMFhasbeentacklingasingle-objective optimization problems. optimization problem. In essence, the underlying assumption isthatitisknownin priorthatthelinearmodeldominatesthe nonlinear one, or vice versa, for the input data under study. A. Conventional NMF To obtain such prior information about the given input data Given a nonnegative data matrix X ∈ ℜL×T, the conven- is not practical in most real-world applications. Moreover, it tional NMF aims to approximate it by the product of two is possible that a fusion of the linear and nonlinear models low-rank nonnegative matrices E ∈ ℜL×N and A ∈ ℜN×T, revealsthelatentvariablesclosertothegroundtruththaneach namely single model considered alone. Independently from the NMF X ≈EA, (1) framework,such combinationof the linear modelwith a non- linear fluctuation was recently studied by Chen, Richard and under the constraints E ≥ 0 and A ≥ 0, where the non- Honeinein[18]and[25]where,intheformer,thenonlinearity negativity is element-wise. An equivalent vector-wise model depends only on the spectral content, while it is defined by a is given by considering separately each column of the matrix post-nonlinear model in the latter. A multiple-kernel learning X, namely x for t=1,...,T, with t approach was studied in [26] and a Bayesian approach was N investigated in [27] with the so-called residual component x ≈ a e , analysis. All these methods share one major drawback: they t X nt n n=1 only consist in estimating the abundances, with a nonlinear where each x is represented as a linear combination of the model, while the endmembers need to be extracted in a pre- t columnsof E, denotede for n=1,...,N, with the scalars processing stage using any conventional linear technique (N- n a for n = 1,...,N and t = 1,...,T being the entries of Findr,vertexcomponentanalysis,...[28]).Asopposedtosuch nt thematrixA.Thesubspacespannedbythevectorsx ,aswell separationintheoptimizationproblems,theNMFprovidesan t as the vectors e is denoted the input space X. elegant framework for solving jointly the unmixing problem, n To estimate both matrices E and A, one concentrates on namely estimating the endmembers and the abundances. To theminimizationoftheFrobeniussquarederrornorm 1kX− thebestofourknowledge,therehavebeennopreviousstudies 2 EAk2, subject to E ≥ 0 and A ≥ 0. In its vector-wise thatcombinethe linearandnonlinearmodelswithintheNMF F formulation, the objective function to minimize is framework. In this paper, we study the bi-objective optimization prob- T N 1 lem that performs simultaneously the NMF in both input JX(E,A)= 2Xkxt−Xantenk2, (2) and feature spaces, by combining the linear and kernel-based t=1 n=1 models.Thefirstobjectivefunctiontooptimizestemsfromthe wheretheresidualerrorismeasuredbetweeneachinputvector conventionallinearNMF,whilethesecondobjectivefunction, x and its approximation N a e in the input space defined in the feature space, is derived from a kernel-based Xt. The optimization is operPatend=1witnhtantwo-block coordinate NMF model. In case of two conflicting objective functions, descent scheme, by alternating between the elements of E or thereexistsratherasetofnondominated,noninferiororPareto of A, while keeping the elements in the other matrix fixed. optimal solutions, as opposed to the unique decomposition whendealingexclusivelywithoneobjectivefunction.Inorder B. Nonlinear – kernel-based – NMF to acquire the Pareto optimal solutions, we investigate the sum-weighed method from the literature of multi-objective A straightforward generalization to the nonlinear form is optimization, due to its ease for being integrated to the proposed within the framework offered by the kernel ma- proposed framework. Moreover, propose to approximate the chines.Inthefollowing,wepresentthekernel-basedNMFthat correspondingParetofront.Themultiplicativeupdaterulesare we have recentlyproposedin [3]. Itis worthnotingthatother derivedfor the resulting sub-optimizationproblemin the case variants can also be investigated, including the ones studied where the feature space is induced by the Gaussian kernel. in [20], [21], [22], [23]. However, these variants suffer from The convergence of the algorithm is discussed, as well as the pre-image problem, making the derivations and the study initialization and stopping criteria. more difficult; see [29] for more details. 3 ConsideranonlinearfunctionΦ(·)thatmapsthecolumnsof the matrix X, as well as the columns of the matrix E, from Φ(·) the input space X to some feature space H. Its associated x 1 norm is denoted k · kH, and the correspondinginner product Φ(xt) cinanthbeefeevaatluuraetesdpaucseinigsthoef sthoe-cfaollremdkheΦr(nxelt)f,uΦnc(txiot′n)iκH(x, tw,hxitc′h) e1xt xbtx=2 PeNnn=1anten Φ(e1) ΨbΦt(e=Φn()Pe2Nn)=Φ1(xa1n)tΦ(en) in kernel machines. Examples of kernel functions are the X e2 H Gaussian and the polynomial kernels. Φ(x2) Φ(·) Applying the model (1) in the feature space, we get the following matrix factorization model Fig.1:Inthe linearNMF, eachsample x isapproximatedby t [Φ(x ) Φ(x ) ··· Φ(x )]≈[Φ(e ) Φ(e ) ··· Φ(e )]A, xt in the input space X, while in the kernel-based NMF, the 1 2 T 1 2 N mapped sample Φ(x ) is approximated by Ψ in the feature b t t or equivalently in the vector-wise form, for all t=1,...,T, space H. The proposed bi-objective NMF sbolves simultane- N ously the two optimization problems. Φ(x )≈ a Φ(e ). t X nt n n=1 Under the nonnegativity of all e and entries of A, the B. On augmenting the linear model with a nonlinearity N optimization problem consists in minimizing the sum of the Recently, several works have been investigating the com- residual errors in the feature space H, between each Φ(xt) bination of a linear model, often advocated by a physical and its approximation PNn=1antΦ(en), namely model,withanadditivenonlinearfluctuation,determinedwith a kernel-based term. The model takes the form T N 1 2 JH(E,A)= 2Xt=1(cid:13)(cid:13)(cid:13)Φ(xt)−nX=1antΦ(en)(cid:13)(cid:13)(cid:13)H. (3) xt = XN anten+Ψt, By analogyto the linear case, a two-block coordinatedescent n=1 schemecanbeinvestigatedtosolvethisoptimizationproblem. where Ψ belongs to some nonlinear feature space. Several t models have been proposed to define this nonlinearity, as III. INPUTVERSUS FEATURESPACE OPTIMIZATION outlined here. In [18], the nonlinearity depends exclusively The difference between the linear and the nonlinear cases onthe endmemberse . In [26], theaboveadditivefluctuation n is illustrated in Fig. 1. With the linear NMF, each sample x is relaxed by considering a convex combination with the so- t is approximatedwith a linear combination of the N elements calledmultiplekernellearning.Morerecently,theabundances e , namelybyminimizingthe Euclideandistance in theinput areincorporatedinthenonlinearmodel,with apost-nonlinear n space between each x and x = N a e . With the model as studied in [25] and a Bayesian approach is used in t t Pn=1 nt n nonlinear case, using the kernel-based formalism, the opti- [27] with the so-called residual component analysis. Another b mizationisconsideredinthefeaturespace,byminimizingthe modelisproposedin[19]inthecontextofsupervisedlearning. distanceinHbetweenΦ(x )andΨ = N a Φ(e ).The All these approaches consider that the endmembers e are t t Pn=1 nt n n twomodels,andthecorrespondingboptimizationproblems,are alreadyknown,orestimatedusingsomelineartechniquessuch distinct (except for the trivial linear kernel). asthevertexcomponentanalysis(VCA)[28].Thenonlinearity is only investigated within the abundances a . As opposed nt A. The pre-image problem to these approaches, the method considered in this paper investigatesalso the estimationof the endmemberse , with a An attemptto bridgethis gapis to providea representation n nonlinear relation with respect to it. of Ψ in the input space, namely estimating the element of t X wbhose image with the mapping function Φ(·) is as close as possible to Ψt. This is the pre-imageproblem, which is an IV. BI-OBJECTIVE OPTIMIZATIONIN BOTH INPUT AND ill-posed nonlibnear optimization problem; see [24] for more FEATURESPACES details. As shown in the literature investigating the pre-image In this section, we propose to solve simultaneouslythe two problem, and demonstrated recently in [30, Theorem 1], the optimizationproblems,intheinputandthefeaturespaces.See pre-image takes the form N a′ e , for some unknown coefficientsa′ . These coefPficnie=n1tsndtepenndon the e , making Fig. 1 for an illustration. nt n the model implicitly nonlinear. A. Problem formulation Itisworthnotingthatthisdifference,betweenthelinearand the nonlinear case, is inherited from the framework of kernel Optimizing simultaneously the objective functions machines; see [31]. This drawback spans also the multiple JX(E,A) and JH(E,A), namely in both the input kernel learning models, of the form N a κ(e ,·) where and the feature space, is in a sense an ill-defined problem. Pn=1 nt n thekernelκisa(convex)combinationofseveralkernels[32]. Indeed,itisnotpossibleingeneraltofindacommonsolution While we focus in this paper on the NMF, our work extends thatisoptimalforbothobjectivefunctions.As opposedtothe tothewideclassofkernelmethods,byprovidingaframework single-objective optimization problems where the main focus to optimize in both input and feature spaces, as shown next. would be on the decision solution space, namely the space of 4 all entries (E,A) (of dimension LN+NT), the bi-objective optimization.To this end, we transform the bi-objectiveprob- optimization problem brings the focus on the objective lem into an aggregated objective function which is a convex space, namely the space to which the objective vector combination of the two original objective functions. Let [oJpXti(mEiz,aAtio)n pJroHb(lEem,A, m)]ulbtei-loobnjgesc.tiBveeyoopntdimsiuzcahtiobni-hoabsjebcteievne J(E,A)=αJX(E,A)+(1−α)JH(E,A) widely studied in the literature. Before taking advantage of betheaggregatedobjectivefunction(i.e.,sum-weightedobjec- this literature study and solving our bi-objective optimization tive function, also called scalarization value) for some weight problem, we revisit the following definitions in our context: α ∈ [0,1] that represents the relative importance between • Pareto dominance: The solution (E1,A1) is said to objectives JX and JH. The optimization problem becomes dominate (E2,A2) if and only if JX(E1,A1) ≤ min αJX(E,A)+(1−α)JH(E,A) JX(E2,A2) and JH(E1,A1) ≤ JH(E2,A2), where at E,A (4) least one inequality is strict. subject to E ≥0 and A≥0 • Pareto optimal: A given solution (E∗,A∗) is a global For a fixed value of the weight α, the above problem is (respectivelylocal) Pareto optimal if and only if it is not calledthe suboptimizationproblem.Thesolutionto theabove dominated by any other solution in the decision space suboptimization problem is a Pareto optimal for the original (respectively in its neighborhood). That is, the objective bi-objective problem, as proven in [34] for the general case. vector [JX(E∗,A∗) JH(E∗,A∗)] corresponding to a Paretooptimal(E∗,A∗)cannotbeimprovedinanyspace By solving the above suboptimization problem with a spread of values of the weight α, we obtain an approximationof the (input or feature space) without any degradation in the Pareto front. It is obvious that the model breaks down to the other space. single-objective conventional NMF in (2) with α = 1, while • Paretofront:Thesetoftheobjectivevectorscorrespond- the extreme case with α=0 leads to the kernel NMF in (3). ingtotheParetooptimalsolutionsformstheParetofront The optimization problem (4) has no closed-form solu- in the objective space. tion, a drawback inherited from optimization problems with Various multi-objective optimization techniques have been nonnegativity constraints. Moreover, the objective function is proposedandsuccessfullyappliedintoengineeringfields,e.g., nonconvex and nonlinear, making the optimization problem the evolutionary algorithms [33], sum-weighted algorithms difficult to solve. In the following, we propose iterative [34], [35], ε-constraint method [36], [37], normal boundary techniques for this purpose. It is noteworthy to mention this intersection method [38], to name a few. See the survey [39], yieldsanapproximateoptimalsolution,whoseobjectivevector [40] and the references therein on the methods for multi- approximates a point on the Pareto front. Substituting the objectiveoptimization.Amongtheexistingmethods,thesum- expressionsgivenin(2)and(3)forJX andJH,theaggregated weighted or scalarization method has been always the most objective function becomes popular one, since it is straightforward and easily to imple- T N T N mpreonbtl.emA isnutmo-waesiignhgtleed-otbejcehcntiivqeueprcoobnlveemrtsbya mcoumltbi-ionbinjegcttihvee J = α2 Xt=1(cid:13)(cid:13)(cid:13)xt−nX=1anten(cid:13)(cid:13)(cid:13)2+ 1−2αXt=1(cid:13)(cid:13)(cid:13)Φ(xt)−nX=1antΦ(en)(cid:13)(cid:13)(cid:13)2H. (5) multiple objectives. Under some conditions, the objective vectorcorrespondingtolatter’soptimalsolutionbelongstothe This objective function takes the form convex part of multi-objective problem’s Pareto front. Thus, T N N N by changing the weights among the objectives appropriately, anmti,nenαX(cid:16)−Xante⊤nxt+ 21 X X antamte⊤nem(cid:17) the Pareto front of the original problem is approximated.The t=1 n=1 n=1m=1 drawbacks of the sum-weighted method reside in that the T N 1 N N nonconvex part of the Pareto front is unattainable, and even +(1−α)X(cid:16)−Xantκ(en,xt)+ 2 X X antamtκ(en,em)(cid:17), t=1 n=1 n=1m=1 on the convex part of the front, a uniform spread of weights (6) doesnotfrequentlyresultinauniformspreadofParetopoints after expanding the expressions of the distances in (5), and on the Pareto front, as pointed out in [34]. Nevertheless, the removing the constant terms x⊤x and κ(x ,x ), since they t t t t sum-weighted method is the most practical one, in view of are independent of a and e . nt n the complexityof the NMF problem,which is nonconvex,ill- Although the NMF is nonconvex, its subproblem with one posed and NP-hard. matrix fixed is convex. Similar to most NMF algorithms, we apply the two-block coordinate descent scheme, namely, alternatingoverthe elementsin E or in A, while keepingthe B. Bi-objective optimization with the sum-weighted method elements in the other matrix fixed. The derivative of (6) with Following the formulation introduced in the previous respect to ant is section, we propose to minimize the bi-objective function N [mJaXtr(iEce,sAE) andJAH.(ETh,eAd)e],cisuinodnersoltuhteionn,ononfesgizaetivLitNy +ofNtThe, ∇antJ =α(cid:16)−e⊤nxt+ X amte⊤nem(cid:17) m=1 (7) correspondsto the entriesin the unknownmatrices E and A. N In the following, we use the sum-weighted method, which +(1−α)(cid:16)−κ(en,xt)+ X amtκ(en,em)(cid:17), is the most widely-used approach to tackle multi-objective m=1 5 and the gradient of (6) with respect to e is TABLE I: Some common kernels and their gradients with n respect to e n T N ∇enJ =αXant(cid:16)−xt+ X amtem(cid:17) Kernel κ(en,z) ∇enκ(en,z) t=1 m=1 Gaussian exp(2−σ12ken−zk2) −σ12κ(en,z)(en−z) T N Polynomial (z⊤en+c)d d(z⊤en+c)(d−1)z +(1−α)Xant(cid:16)−∇enκ(en,xt)+X amt∇enκ(en,em)(cid:17). Exponential exp(2−σ12ken−zk) −2σ12κ(en,z)sgn(en−z) t=1 m=1 Sigmoid tanh(γz⊤en+c) γsech2(γz⊤en+c)z (8) Here, ∇enκ(en,·) represents the gradient of the kernel with respect to its argument e , and can be determined for most expression of the gradient (8) into the subtraction of two n valid kernels, as shown in TABLE I. nonnegative terms, i.e., ∇enJ = P − Q, where P and Q have nonnegative entries. To this end, we set the stepsize η Based on the gradient descent scheme, a simple additive n corresponding to e in (10) as (14) (given on top of next update rule can be written as n page), and obtain the following multiplicative update for e n ant =ant−ηnt∇antJ (9) in (15) (given on top of next page), where the division and multiplication are element-wise. for a , and nt en =en−ηn∇enJ (10) On the convergence, initialization and stopping criteria for e . The stepsize parameters η and η balance the rate n nt n Theproposedalgorithmtoleratestheuseofanystrictlypos- of convergencewith the accuracy of optimization, and can be itivematricesasinitialmatrices.Asimpleuniformdistribution set differently dependingon n and t. After each iteration, the on the positive quadrant is shown to be a good initialization rectificationa =max(a ,0)shouldfollowtoguaranteethe nt nt in our experiments. It is an advantage over some NMF nonnegativity of all a and the entries in all e . nt n algorithms where stricter initial conditions are required. For instance, proposed for the hyperspectral unmixing problem, C. Multiplicative update rules for the Gaussian kernel both constrained NMF [2] and minimum volume constrained The additive update rule is easy to implement but the NMF [42] initialize the columns of the endmember matrix E convergence can be slow and very sensitive to the stepsize with randomly chosen pixels from the image under study. value,aspointedoutbyLeeandSeungin[10].Followingthe For eachgivenweightα, the stoppingcriterionis two-fold, spiritofthelatterpaper,weprovidemultiplicativeupdaterules either a stationary point is attained, or the preset maximum fortheproposedbi-objectiveNMF.Withoutlossofgenerality, numberofiterationsisreached.Therefore,thealgorithmstops we restrict the presentationto the case of the Gaussian kernel at the n-th iteration if for the second objective function JH. For most valid kernels, J(n) ≤min{J(n−1),J(n+1)} the corresponding multiplicative update rules can be derived using a similar procedure. or The Gaussian kernel is defined by κ(zi,zj) = n=nmax, exp(−1kz −z k2) for any z ,z ∈X, where σ denotesthe 2σ2 i j i j whereJ(n) denotestheevaluationoftheaggregationobjective tunable bandwidth parameter. In this case, its gradient with respect to e is function J at the n-th iteration and nmax is a predefined n threshold. 1 ∇enκ(en,z)=−σ2κ(en,z)(en−z). (11) Ontheconvergenceoftheproposedalgorithm,itisnotewor- thy to mention that the quotients in the multiplicative update Toderivethemultiplicativeupdateruleforant,wechoosethe rules (12) and (15) are unity if and only if stepsize parameter in the additive update rule (9) as ant ∇antJ =0 and ∇enJ =0, η = , nt N N respectively. Therefore, the above multiplicative update rules α X amte⊤nem+(1−α) X amtκ(en,em) imply a part of the Karush-Kuhn-Tucker (KKT) conditions. m=1 m=1 However, the KKT conditions state only the necessary condi- which yields tionsforalocalminimum.Concerningthenonconvexproblem αe⊤x +(1−α)κ(e ,x ) as the studied one, or any problem with non-unique KKT- ant =ant× n t n t . points (stationary points), a local minimum is not guaranteed. N N α a e⊤e +(1−α) a κ(e ,e ) Similar to other multiplicative-type update rules proposed in X mt n m X mt n m NMF, the proposed algorithm lacks guaranteed optimality m=1 m=1 (12) property,since the convergenceto a stationary point does not Incorporatingthe above expression ∇enκ(en,z) of the Gaus- always correspond to a local minimum. See also the discus- sian kernel in (8), the gradient of the objective function with sionsaroundtheconvergenceoftheconventionalNMFin[43], respect to e becomes (13) (given on top of next page). The [44].Independentlyfromthesetheoreticallackofconvergence, n multiplicative update rule for e is elaborated using the so- we show next that the proposed algorithm provides relevant n called split gradient method [41]. This trick decomposes the results, and also outperforms all state-of-the-art methods. 6 T N T N 1−α ∇enJ =αXant(cid:16)−xt+ X amtem(cid:17)+ σ2 Xant(cid:16)κ(en,xt)(en−xt)− X amtκ(en,em)(en−em)(cid:17) (13) t=1 m=1 t=1 m=1 e n η = (14) n T N T N ασ2 a a e +(1−α) a κ(e ,x )e + a κ(e ,e )e nt mt m nt(cid:16) n t n mt n m m(cid:17) tP=1 mP=1 tP=1 mP=1 T T N ασ2 a x +(1−α) a κ(e ,x )x + a κ(e ,e )e nt t nt(cid:16) n t t mt n m n(cid:17) e =e ⊗ tP=1 tP=1 mP=1 (15) n n T N T N ασ2 a a e +(1−α) a κ(e ,x )e + a κ(e ,e )e nt mt m nt(cid:16) n t n mt n m m(cid:17) tP=1 mP=1 tP=1 mP=1 V. EXPERIMENTS In this section, the performance of the proposed algorithm forbi-objectiveNMFisdemonstratedontheunmixingoftwo well-known hyperspectral images. An approximation of the Pareto front is proposed and a comparison with state-of-the- art unmixing methods is conducted. A. Dataset and settings The first image, depicted in Fig. 2, is the Urban image1, acquired by the Hyperspectral Digital Imagery Collection Experiment(HYDICE) sensor. The top left part with 50×50 Fig. 2: The ground truth of the Urban image pixelsis takenfromthe original307×307pixels’image.The rawdataconsistsof210channelscoveringthebandwidthfrom 0.4µm to 2.5µm. As recommended in [45], only L = 162 B. Approximation of the Pareto front clean bands of high-SNR are of interest. According to the Since to determine the whole Pareto front is unrealistic for groundtruth providedin [45], [46], the studiedarea is mainly anonlinearmulti-objectiveoptimizationproblem,onetargetis composed by grass, tree, building, and road. to approximate the Pareto front by a set of discrete points on The second image is a sub-image with 50 × 50 pixels it[39].TheconceptoftheParetooptimalandtheParetofront selected from the well-known Cuprite image, which was ac- arenotstrictintheproposedalgorithm,duetothesolverofthe quiredby theAirborneVisible/InfraredImagingSpectrometer suboptimization problem not guaranteeing a local minimum, (AVIRIS) in 1997. The data is collected over 244 contiguous not to mention the global minimum. These obstacles are spectral bands, with the wavelength ranging from 0.4µm to inherited from the nonconvexity and the nonlinearity of the 2.5µm.Aftertheremovalofthenoisybands,L=189spectral kernel-based NMF problem. In this case, the Pareto optimal bandsremain.Asinvestigatedin[47],[48],thisareaisknown andtheParetofrontreferactuallytocandidateParetooptimal to be dominated by three materials: muscovite, alunite and and an approximation of Pareto front, respectively [39]. cuprite. ToapproximatetheParetofrontwithadiscretesetofpoints, Experiments are conducted employing the weight set α ∈ we operate as follows: For each value of the weight α, we {0,0.02,...,0.98,1}, which implies the model varying grad- obtain a solution (endmember and abundance matrices) from ually from the nonlinear Gaussian NMF (α = 0) to the the algorithm proposed in Section IV-C; by evaluating the conventionallinearNMF(α=1).Foreachαfromtheweight set, multiplicative update rules given in (12) and (15) are objective functions JX and JH at this solution, we get a single point in the objective space. The approximated Pareto applied, with the maximum iteration number n = 300. max The initial matrices of E and A are generated using a [0, front for the Urban and the Cuprite images are shown in 1] uniform distribution. To choose an appropriate bandwidth Fig.3(a)andFig.3(b).TheevolutionofobjectivesJX,JHand the aggregated objective function J, evaluated at the solution σ in the Gaussian kernel, we first apply the single objective obtained for each weight α, are shown in Fig. 4(a) for the Gaussian NMF on both images, using the same candidate set Urban image and in Fig. 4(b) for the Cuprite image. {0.2,0.3,...,9.9,10,15,20,...,50} for σ. Considering the We observe the following: reconstructionerrorinbothinputandfeaturespace(seebelow for definitions), we fix σ = 3.0 for the Urban image, and 1) For both images under study, solutions generated with σ =2.5 for the Cuprite image as investigated in [29]. α = 1 and α = 0 are dominated, since all the solutions on the Pareto front outperform them, with respect to both 1TheUrbanimageisavailable:http://www.erdc.usace.army.mil/Media/--FactSheets/FoabcjteSchteievtAesrt.icTlehViisewre/tvabeiadl/9s2t5h4a/Atrntiecileth/4e7r66th81e/hcyopnervceubnet.iaosnpxallinear 7 2) Regarding the sum-weighted approach, the minimizer of 350 the suboptimization problem is proven to be a Pareto α=1 optimal for the original multi-objective problem, i.e., the 300 α=0.94 corresponding objective vector belongs to the Pareto front in the objective space [34]. In practice, we obtain 9 and α=0.9 250 23 (out of 51) dominated solutions for the Urban and the Cuprite images, respectively. Such phenomenon, however, 200 α=0.8 is not surprising, since there exist multiple Pareto optimal H J solutions in a problem only if the objectives are conflict- 150 α=0.7 ing to each other, as claimed in [49]2. Other possible explanation could be the applied numerical optimization 100 α=0.5 scheme, due to the weak convergence of the method or α=0.4 to the failure of the solver in finding a global minimum 50 α=0.3 α=0.16 α=0.18 α=0.2 α=0 [40]. For the Urban image as shown in Fig. 3(a) and 0 Fig. 4(a), all the obtained solutions are Pareto optimal 0 50 100 150 200 250 J within the objectives-conflicting interval α ∈ [0.18,0.94]. X Regarding the Cuprite image, as observed in Fig. 3(b) (a) Urbanimage and Fig. 4(b), the objectives-conflicting interval is α ∈ [0.14,0.72], while the Pareto optimal solutions are found using α ∈ {0.04} ∪ {0.26,0.28,...,0.72}. In fact, the 140 obtained solutions with α∈{0.14,0.16,...,0.24}are only α=1 local Pareto optimal, and they are dominated by a global 120 Pareto optimal with α = 0.04. It is pointed out in [40] that, in the nonconvex problem, the global (local) solver 100 generates global (local) Pareto optimal, and local Pareto optimal is not of interest in frontof globalPareto optimal. 80 H 3) As illustrated in both Fig. 3(a) and Fig. 3(b), an even J 60 distribution of weight α between [0, 1] do not lead to an even spread of the solutions on the approximated Pareto 40 front. Moreover, the nonconvex part of the Pareto front cannot be attained using any weight. It is exactly the case 20 α=0.72 αα==00.. 87 α=0 . 9 α=0. 5 α=0.26 α = 0 α.0=40.02 α= 0 . 1α4=0.1 iαn =Fig0..33(ba)n;dinαFi=g.03.(5a)o,na ttrhieviaalppnroonxciomnavteexdpPaarrtebtoetwfreoennt α=0 0 is probably resulted from the nonoptimal solution of the 12 14 16 18 20 22 24 26 28 J suboptimization problem. These are two main drawbacks X of the sum-weighted method. (b) Cuprite image Nevertheless, the obtained approximationof Pareto front is ofhighvalue.Ononehand,itprovidesasetofParetooptimal Fig. 3: Illustration of the approximated Pareto front in the solutions for the DM, instead of a single decomposition. On objective space for Urban and Cuprite images. The objective the other hand, an insight of the trade-off between objectives vectors of the non-dominated solutions (42 for the Urban JX andJH revealstheunderlyinglinearity/nonlinearityofthe image,28fortheCupriteimage),markedinred,approximatea data under study, as illustrated in the following section. partoftheParetofront;theobjectivevectorsofthedominated solutions (9 for the Urban image, 23 for the Cuprite image) are marked in blue. C. Performance In this section, we study the performanceof the method on theunmixingprobleminhyperspectralimagery.Theunmixing NMFnorthenonlinearGaussianNMFbestfitsthestudied performance is evaluated by two metrics introduced in [29]. images. On the contrary, the Pareto optimal solutions, Thefirstoneisthereconstructionerrorintheinputspace(RE) whichresultthepointsontheParetofront,provideasetof defined by feasibleandnondominateddecompositionsforthedecision maker(DM),i.e.,theuser.Itisworthnotingthatwe apply T N v 1 the sum-weighted method as a posteriori method, where RE=uuTLXkxt−Xantetk2. different Pareto optimal solutions are generated, and the t t=1 n=1 DM makes the final comprise among optimal solutions. Alternatively, in a priori method, the DM specifies the 2For example, the Pareto optimal solutions for the well-known Schaffer’s function, defined by J(x) = [x2,(x−2)2]⊤, are found only within the weight α in advance to generate a solution. See [40] for interval [0,2], where a tradeoff between two objectives exists. See [33] for more details. moredetails. 8 TABLE II: Unmixing performance for the Urban image Thesecondoneisthereconstructionerrorinthefeaturespace (REΦ), which is similarly defined as RE×10−2 REΦ×10−2 FCLS 1.44 3.89 T N REΦ =vu 1 Φ(x )− a Φ(e ) 2 , GBM-sNMF 6.50 4.11 utTLXt=1(cid:13)(cid:13)(cid:13) t nX=1 nt t (cid:13)(cid:13)(cid:13)H K-Hype 5.99 4.67 MinDisCo 3.12 4.60 where ConvexNMF 2.96 5.84 N N N KconvexNMF - 43.94 2 (cid:13)(cid:13)Φ(xt)−XantΦ(et)(cid:13)(cid:13)H =X X antamtκ(en,em) KsNMF - 4.33 (cid:13) n=1 (cid:13) n=1m=1 MercerNMF - 2.96 N LinearNMF α=1 1.48 3.96 −2Xantκ(en,xt)+κ(xt,xt), per GaussianNMF α=0 3.49 1.39 n=1 pa α=0.18 2.70 1.27 s and κ(·,·) denotes the Gaussian kernel. It is worth to note hi ParetoOptimal α=0.50 2.38 2.04 t that REΦ can always be evaluated for any given matrices E α=0.94 1.40 3.78 andA,regardlessoftheoptimizationproblemandthesolving TABLE III: Unmixing performance for the Cuprite image procedure that led to these matrices. RE×10−2 REΦ×10−2 State-of-the-art unmixing methods FCLS 0.95 0.59 GBM-sNMF 1.06 0.62 Anunmixingproblemcomprisestheestimationofendmem- K-Hype 2.12 0.93 bers and the corresponding abundance maps. Some existing MinDisCo 1.62 4.54 techniques either extract the endmembers (such as VCA) or ConvexNMF 1.61 2.51 estimate the abundances (such as FCLS)3; other methods en- KconvexNMF - 19.53 able the simultaneousestimations, e.g., NMF and its variants. KsNMF - 1.38 We briefly present all the unmixing algorithms that are used MercerNMF - 2.74 in comparison. LinearNMF α=1 0.89 2.28 The most-known endmember extraction technique is the er GaussianNMF α=0 1.05 0.50 vertex component analysis (VCA) [28]. It is based on the ap p α=0.04 0.92 0.42 linear mixture model and presumes the existence of end- s hi ParetoOptimal α=0.50 0.84 0.58 members within the image under analysis. It seeks to inflate t α=0.72 0.77 0.73 the simplex enclosing all the spectra. The endmembers are the vertices of the largest simplex. This technique is applied for endmember extraction, jointly with three abundance es- a convex combination of certain data points. The kernel timation techniques: FCLS, K-Hype and GBM-sNMF. The convex-NMF(KconvexNMF)andthekernelsemi-NMFbased fully constrained least squares algorithm (FCLS) [51] is a on the nonnegative least squares (KsNMF) are essentially least square approach using the linear mixture model, where the kernelized variants of the ConvexNMF in [21] and the the abundances are estimated considering the nonnegativity alternating nonnegativity constrainted least squares and the and sum-to-one constraints. A nonlinear unmixing model active set method in [53], respectively, as discussed in [22]. for abundance estimation is considered in [18], where the Experiments are also conducted with these two kernel meth- nonlinear term is described as a kernel-based model, with ods, adopting the Gaussian kernel. Nonlinear NMF based on the so-called linear-mixture/nonlinear-fluctuation model (K- constructingMercerkernels(MercerNMF),introducedin[54], Hype). In [52], a generalized bilinear model is formulated, addressesthenonlinearNMFproblemusingaself-constructed with parameters optimized using the semi-nonnegativematrix Gaussian kernel, where the nonnegativity of the embedded factorization (GBM-sNMF). bases and coefficients is preserved. The embedded data are We further consider five NMF-based techniques that are finally factorized with conventional NMF. Of particular note capable to estimate the endmembers and abundances jointly. isthatonlythereconstructionerrorinthefeaturespacecanbe The minimum dispersion constrained NMF (MinDisCo) [14] calculatedfortheaforementionedkernel-basedmethods,since includes the dispersion regularization to the conventional thepre-imagesofthemappedendmember,whicharerequired NMF,byintegratingthesum-to-oneconstraintforeachpixel’s in the computing of reconstruction error in the input space, abundance fractions and the minimization of variance within cannot be exploited. each endmember. The problem is solved by exploiting an alternate projected gradient scheme. In the convex nonnega- Unmixing performance tive matrix factorization (ConvexNMF) [21], the basic matrix The unmixing performance, with respect to the reconstruc- (endmember matrix in our context) is restricted to the span tionerrorsintheinputandthefeaturespaces,iscomparedwith of the input data, that is, each sample can be viewed as the aforementioned unmixing approaches, as demonstrated in TABLE II and TABLE III. As can be observed, the proposed 3See [50] for connections between the endmember extraction techniques andtheabundances estimation techniques. methodwith Paretooptimalsolutionoutperformsnotonlythe 9 End. 1 End. 2 End. 3 End. 4 350 2.5 0.5 0.5 0.5 300 J JX JH Reflectance1.25 Reflectance000...234 Reflectance000...234 Reflectance000...234 0.1 0.1 0.1 250 Conflicting Interval 10.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 α ∈ [0.18, 0.94] 200 Ab.1 Ab.2 Ab.3 Ab.4 10 0.2 10 0.6 10 0.8 10 0.8 150 2300 00..115 2300 0.4 2300 00..46 2300 00..46 40 0.05 40 0.2 40 0.2 40 0.2 50 10 20 30 40 50 0 50 10 20 30 40 50 0 50 10 20 30 40 50 0 50 10 20 30 40 50 0 100 (a) α=1 50 End. 1 End. 2 End. 3 End. 4 0.6 0.5 0.5 0.5 0.5 0.4 0.4 0.4 GMaoudsesl0ia 0n 0.1 0.2 0.3 0.4 0α.5 0.6 0.7 0.8 LM0.i9noedaerl 1 Reflectance0000....1234 Reflectance000...123 Reflectance000...123 Reflectance000...123 (a) Urbanimage 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 Ab.1 Ab.2 Ab.3 Ab.4 1 0.8 120 J JX JH 1200 00..68 1200 0.6 1200 00..68 1200 00..68 30 0.4 30 0.4 30 0.4 30 0.4 40 0.2 40 0.2 40 0.2 40 0.2 100 50 10 20 30 40 50 0 50 10 20 30 40 50 0 50 10 20 30 40 50 0 50 10 20 30 40 50 0 Conflicting Interval (b) α=0.3 80 α ∈ [0.14, 0.72] End. 1 End. 2 End. 3 End. 4 0.5 0.5 0.5 0.5 60 0.4 0.4 0.4 0.4 40 Reflectance00..23 Reflectance00..23 Reflectance00..23 Reflectance00..23 0.1 0.1 0.1 0.1 20 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 Ab.1 Ab.2 Ab.3 Ab.4 0.8 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 0.8 10 0.8 10 0.6 10 0.8 Gaussian α Linear 20 0.6 20 0.6 20 20 0.6 Model Model 30 0.4 30 0.4 30 0.4 30 0.4 40 0.2 40 0.2 40 0.2 40 0.2 (b) Cuprite image 50 10 20 30 40 50 0 50 10 20 30 40 50 0 50 10 20 30 40 50 50 10 20 30 40 50 (c) α=0 Fig. 4: Visualization of the trade-off between the two objec- Fig. 5: Urban image: Endmembers and corresponding abun- tives JX and JH, and the change of the aggregated objective dance maps, estimated using α = 1 (conventional linear function J, along with the increment of weight α, for the NMF); α = 0.3 (a Pareto optimal of the bi-objective NMF); Urban image and the Cuprite image. α=0 (nonlinear Gaussian NMF). VI. CONCLUSION existing linear NMF (α=1) and Gaussian NMF (α=0), but This paper presented a novel bi-objective nonnegative ma- also all the state-of-the-art methods. trixfactorizationbyexploitingthekernelmachines,wherethe The estimated endmembers and the corresponding abun- decompositionwasperformedsimultaneouslyintheinputand dance maps with the proposed method are shown in Fig. 5 thefeaturespace.Themultiplicativeupdateruleswerederived. and Fig. 6. For the Urban image, different areas (the road in The performance of the method was demonstrated for un- particular)arebetterrecognizedwiththeParetooptimalwhen mixingwell-knownhyperspectralimages.TheresultingPareto compared with solutions of the linear and of the Gaussian frontswereanalyzed.Asforfuturework,weareextendingthis NMF. Regarding the Cuprite image, the linear NMF (i.e., approach to include other NMF objective functions, defined α = 1) recognizes two out of three regions; whereas both in the input or the feature space. Considering simultaneously the Pareto optimal (i.e., α = 0.72) and the Gaussian NMF several kernels, and as a consequence several feature spaces, (i.e., α = 0) are able to distinguish three regions. However, is also under investigation. the abundance maps of Gaussian NMF appear to be overly sparse, compared with its counterpart of the Pareto optimal ACKNOWLEDGMENT solution.Itisalsonoticedthattheendmembersextractedwith thelinearNMFarespiky,andevenwithsomezero-parts,thus This work was supported by the French ANR, grant meeting poorly the real situation. HYPANEMA: ANR-12BS03-0033. 10 REFERENCES [1] D. D. Lee and H. S. Seung, “Learning the parts of objects by non- negativematrixfactorization.” Nature,vol.401,no.6755,pp.788–791, Oct.1999. [2] S. Jia and Y. Qian, “Constrained nonnegative matrix factorization for hyperspectralunmixing,”IEEETransactionsonGeoscienceandRemote End. 1 End. 2 End. 3 Sensing,vol.47,no.1,pp.161–173,Jan.2009. 0.8 0.8 0.8 [3] F.Zhu,P.Honeine,andM.Kallas,“Kernelnon-negative matrixfactor- 0.7 0.7 0.7 0.6 0.6 0.6 izationwithoutthepre-imageproblem,”inIEEEworkshoponMachine Reflectance0000....2345 Reflectance0000....2345 Reflectance0000....2345 [4] iLIn.eBapruoncliyinung,omNfo.iraNlSiifkgeonalataulirdePisrs,opcaaencsdes,i”nI.gIP,EiREtaEesi,mT“rsNa,noFsnraancnetcgioea,ntisvSeeopnm.a2Nt0er1iux4ra.falcNtoertiwzaotrikosn, 0.1 0.1 0.1 vol.19,no.6,pp.1090–1100, Jun.2008. 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 [5] R. Zhi, M. Flierl, Q. Ruan, and W. Kleijn, “Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition,”IEEETransactionsonSystems,Man,andCybernetics,Part Ab.1 Ab.2 Ab.3 B:Cybernetics, vol.41,no.1,pp.38–52,Feb.2011. 0.35 1200 000...567 1200 0.3 1200 00..45 [6] aPl.itMy.reKdiumctiaonndoBf.lTarigdeo-rs,c“aSleubgseynsetemexpirdeesnstiiofincadtiaotan.”thGroeungohmdeimreesneasirocnh-, 30 0.4 30 0.25 30 0.3 vol.13,no.7,pp.1706–1718, 2003. 4500 10 20 30 40 50 00..23 4500 10 20 30 40 50 00..125 4500 10 20 30 40 50 00..12 [7] nAe.gaCtiivcheomckait,rixR.faZctdournizeakt,ioanndinSa.p-pI.licAamtioanris,t“oNbelwindalsgoourricthemsespfaorartinoonn,”- in2006IEEEInternationalConferenceonAcoustics,SpeechandSignal (a) α=1 Processing(ICASSP),vol.5. IEEE,2006,pp.V–V. [8] T. Li and C. Ding, “The relationships among various nonnegative End. 1 End. 2 End. 3 0.6 matrixfactorizationmethodsforclustering,”inProceedingsoftheSixth 0.5 0.5 0.5 InternationalConferenceonDataMining,ser.ICDM’06. Washington, Reflectance000...234 Reflectance000...234 Reflectance000...234 [9] DmC.CatD,riUixnSgfa,AcX:to.IrEHizEaeEt,ioaCnnodamnHpd.usDtpe.ercSStiromacloiecntly,u,s“tO2e0rni0n6tgh,,”epipenq.uP3i6rvo2ac–le.3nS7cI1Ae.MofDnaotnanMegiantiinvge 0.1 0.1 0.1 Conf,2005,pp.606–610. 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 [10] Dfa.ctoDr.izaLtieoen,”anidnHA.dvSa.ncSeesuinng,ne“uArlaglorinitfhomrmsaftioornnporno-cneesgsaintigvesymsteamtrisx, 2001,pp.556–562. Ab.1 Ab.2 Ab.3 [11] Y. Wang and Y. Jia, “Fisher non-negative matrix factorization for 0.6 0.7 learning local features,” in Proceedings of the Asian Conference on 10 0.8 10 0.5 10 0.6 Computer Vision,2004,pp.27–30. 20 0.6 20 0.4 20 0.5 3400 00..24 3400 000...123 3400 0000....1234 [12] nP.esHsocyoenrsatrnadinPt.s,D”aJyoaunr,n“aNloonf-nMeagcahtiivneemLaetarrixnifnagctoRreizseaatirocnh,wvitohl.s5p,arpsep-. 50 10 20 30 40 50 50 10 20 30 40 50 50 10 20 30 40 50 1457–1469, 2004. [13] Z.ChenandA.Cichocki, “Nonnegative matrixfactorization withtem- poralsmoothnessand/orspatialdecorrelationconstraints,”inLaboratory (b) α=0.72 forAdvancedBrainSignalProcessing, RIKEN,Tech.Rep,2005. End. 1 End. 2 End. 3 0.8 0.8 0.8 [14] A. Huck, M. Guillaume, and J. Blanc-Talon, “Minimum dispersion 0.7 0.7 0.7 constrained nonnegative matrix factorization to unmix hyperspectral Reflectance0000....3456 Reflectance0000....3456 Reflectance0000....3456 [15] nQdoa.t.aK6,”e,IpaEnpEd.E2T5.T9Kr0a–ann2sa6ad0ce2t,i,o“JnRusnoo.bnu2s0Gt1le0o1.scnioernmcefaacntodriRzaetmioonteinSetnhseinpgre,sveonlc.e4o8f, 0.2 0.2 0.2 outliers andmissingdatabyalternative convexprogramming,”in2005 0.1 0.1 0.1 IEEE Computer Society Conference on Computer Vision and Pattern 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 Recognition (CVPR),vol.1. IEEE,2005,pp.739–746. [16] L. Li, G. Lebanon, and H. Park, “Fast bregman divergence nmf using Ab.1 Ab.2 Ab.3 taylor expansion and coordinate descent.” in Proceedings of the 18th 0.35 ACM SIGKDD international conference on Knowledge discovery and 10 0.8 10 0.8 10 0.3 20 0.6 20 0.6 20 0.25 datamining. ACM,2012,pp.307–315. 30 0.4 30 0.4 30 00..125 [17] J. Chen, C. Richard, and P. Honeine, “Nonlinear estimation of 40 0.2 40 0.2 40 00..015 material abundances of hyperspectral images with ℓ1-norm spatial 50 10 20 30 40 50 50 10 20 30 40 50 50 10 20 30 40 50 regularization,” IEEETransactionsonGeoscienceandRemoteSensing, vol. 52, no. 5, pp. 2654–2665, May 2014. [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=06531654 (c) α=0 [18] ——, “Nonlinear unmixing of hyperspectral data based on a linear- Fig. 6: Cuprite image: Endmembers and corresponding abun- mixture/nonlinear-fluctuationmodel,”IEEETransactionsonSignalPro- cessing,vol.61,no.2,pp.480–492,Jan.2013. dance maps, estimated using α = 1 (conventional linear [19] N.Nguyen,J.Chen,C.Richard,P.Honeine,andC.Theys,“Supervised NMF); α=0.72 (a Pareto optimal of the bi-objective NMF); nonlinearunmixingofhyperspectralimagesusingapre-imagemethod,” α=0 (nonlinear Gaussian NMF). in New Concepts in Imaging: Optical and Statistical Models, In Eds. D. Mary, C. Theys, and C. Aime, ser. EAS Publications Series. EDP Sciences, 2013,vol.59,pp.417–437. [20] D.Zhang,Z.Zhou,andS.Chen,“Non-negativematrixfactorization on kernels,” in Lecture Notes in Computer Science, vol. 4099. Springer, 2006,pp.404–412. [21] C.Ding,T.Li,andM.I.Jordan,“ConvexandSemi-NonnegativeMatrix Factorizations,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.32,no.1,pp.45–55,Nov.2010.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.