Table Of Content1
Bi-Objective Nonnegative Matrix Factorization:
Linear Versus Kernel-Based Models
Fei Zhu, Paul Honeine, Member, IEEE
Abstract—Nonnegative matrix factorization (NMF) is a pow- problem consists of extracting the endmembers (recorded
erful class of feature extraction techniques that has been suc- in the first low-rank matrix), and estimating the abundance
cessfully applied in many fields, namely in signal and image
5 of each endmember at every pixel (recorded in the second
processing. Current NMF techniques have been limited to a
1 one).Obviously,theabovephysicalinterpretationrequiresthe
single-objective problem in either its linear or nonlinear kernel-
0 based formulation. In this paper, we propose to revisit the nonnegativityon bothabundancesand endmemberspectrums.
2
NMF as a multi-objective problem, in particular a bi-objective The NMF is a linear model, since it can be viewed in
n one, where the objective functions defined in both input and a way that each input spectral is approximated by a linear
a featurespacesaretakenintoaccount.Bytakingtheadvantageof
combinationofsomebasisspectrums.Toestimatethedecom-
J the sum-weighted method from the literature of multi-objective
position, the objective function for minimization is defined in
2 optimization, the proposed bi-objective NMF determines a set
2 of nondominated, Pareto optimal, solutions instead of a single an Euclidean space — the so-called input space X —, where
optimaldecomposition.Moreover,thecorrespondingParetofront thedifferencebetweentheinputmatrixandtheproductofthe
] is studied and approximated. Experimental results on unmixing estimated ones is usually measured either by the Frobenius
L real hyperspectral images confirm theefficiency of the proposed
norm or by generalized Kullback-Leibler divergence [10].
M bi-objective NMF compared with the state-of-the-art methods.
These objective functions are often augmented by including
. Index Terms—Kernel machines,nonnegative matrix factoriza- different regularization terms, such as the Fisher constraint
t tion, Pareto optimal, hyperspectral image, unmixing problem.
a for learning local features [11], the sparseness constraint for
t
s intuitive and easily interpretable decompositions [12], the
[ I. INTRODUCTION temporal smoothness and spatial decorrelation regularization
1 [13],andtheminimumdispersionregularizationforunmixing
v accuracy [14]. Other objective functions are also raised from
4 NONNEGATIVE MATRIX FACTORIZATION (NMF) practical standpoints, e.g., the ℓ -norm for the robustness
1
8 providesaparts-basedrepresentationforthenonnegative against outliers and missing data [15] and the Bregman di-
6
data entries, and has becoming a versatile technique with vergence with fast computational performance [16].
5
plenty of applications [1]. As opposed to other dimension- Many studies have shown the limits of a linear decompo-
0
. alityreductionapproaches,e.g.,principalcomponentanalysis, sition, as opposed to a nonlinear one [17], [18], [19]. While
1
vectorquantizationandlinear discriminantanalysis, the NMF most research activities have been concentrated on the linear
0
5 is based on the additivity of the contributions of the bases NMF, a few works have considered the nonlinear case. In
1 to approximate the original data. Such decomposition model an attempt to extent the linear NMF models to the nonlinear
: often yields a tractable physical interpretation thanks to the scope, several kernel-based NMF have been proposed within
v
i sparse and nonnegative obtained representation of the input theframeworkofferedbythekernelmachines[20].Employing
X
data. Many real world applicationsbenefit from these virtues, anonlinearfunction,thekernel-basedmethodsmainlymapthe
r including hyperspectral unmixing [2], [3], face and facial dataintoahigherdimensionalspace,wheretheexistinglinear
a
expressionrecognition[4],[5],geneexpressiondata[6],blind techniquesare performedon the transformeddata. The kernel
sourceseparation[7], andspectralclustering[8], [9], to name trick enables the estimation of the inner product between any
a few. pair of mapped data in a reproducing kernel Hilbert space
The NMF approximatesa high-rank nonnegativeinput ma- — the so-called feature space H —, without the need of
trix by two nonnegative low-rank ones. As a consequence, it knowing explicitly neither the nonlinear map function nor
providesa decompositionsuitable for many signal processing the resulting space. For example, in [20], the linear NMF
anddataanalysisproblems,andinparticularthehyperspectral technique is performed on the kernel matrix, whose entries
unmixing problem. Indeed, a hyperspectral image is a cube consist of the inner products between input data calculated
that consists of a set of images of the scene under scrutiny, with some kernel function. Other kernel-based NMF tech-
each correspondingto a groundscene fromwhich the lightof niques presented in [21], [22], [23] follow a similar scheme
certain wavelength is reflected. Namely, a reflectance spectral but share an additive assumption originated from the convex
over a wavelength range is available for each pixel. It is NMF approach proposed in [21], that is, the basis matrix is
assumed that each spectral is a mixture of a few “pure” represented as the convex combination of the mapped input
materials, called endmembers. The hyperspectral unmixing datainthefeaturespaceH.Itisworthnotingthattheobjective
function is the Frobenius norm of the residual between the
F. Zhu and P. Honeine are with the Institut Charles Delaunay (CNRS),
kernelmatrixanditsfactorization,foralltheabove-mentioned
Universite´ deTechnologie deTroyes,France.
Email:fei.zhu@utt.fr andpaul.honeine@utt.fr kernel-basedNMFmethods.However,althoughtheinputdata
2
matrix is nonnegative, the nonnegativity of the mapped data The remainder of this paper is organized as follows. We
is not guaranteed. A more severe disadvantage is that the first revisit the conventional and kernel-based NMF. The dif-
obtained bases lie in the feature space (often of infinite ferencesbetween the input and the featurespace optimization
dimension), where a reverse mapping to the input space is are discussed in Section III. In Section IV, we present the
difficult.Indeed,oneneedstosolvethepre-imageproblem,an proposed bi-objective NMF framework. Section V demon-
obstacleinheritedfromthekernelmachines[24]. In[3], these strates the efficiency of the proposed method for unmixing
difficultiesarecircumventedbydefiningamodelinthefeature two real hyperspectral images. Conclusions and future works
spacethatcanbeoptimizeddirectlyinthe inputspace.Inthis are reported in Section VI.
paper, we revisit this framework to discover the nonlinearity
of the input matrix. See Section II-B for more details.
II. A PRIMER ONTHE LINEARAND NONLINEARNMF
In either its linear conventionalformulationor its nonlinear
In this section, we present the two NMF variants, with the
kernel-basedformulation,aswellasalloftheirvariations(and
linear and the nonlinear models, as well as the corresponding
regularizations),theNMFhasbeentacklingasingle-objective
optimization problems.
optimization problem. In essence, the underlying assumption
isthatitisknownin priorthatthelinearmodeldominatesthe
nonlinear one, or vice versa, for the input data under study. A. Conventional NMF
To obtain such prior information about the given input data Given a nonnegative data matrix X ∈ ℜL×T, the conven-
is not practical in most real-world applications. Moreover, it
tional NMF aims to approximate it by the product of two
is possible that a fusion of the linear and nonlinear models low-rank nonnegative matrices E ∈ ℜL×N and A ∈ ℜN×T,
revealsthelatentvariablesclosertothegroundtruththaneach
namely
single model considered alone. Independently from the NMF X ≈EA, (1)
framework,such combinationof the linear modelwith a non-
linear fluctuation was recently studied by Chen, Richard and under the constraints E ≥ 0 and A ≥ 0, where the non-
Honeinein[18]and[25]where,intheformer,thenonlinearity negativity is element-wise. An equivalent vector-wise model
depends only on the spectral content, while it is defined by a is given by considering separately each column of the matrix
post-nonlinear model in the latter. A multiple-kernel learning X, namely x for t=1,...,T, with
t
approach was studied in [26] and a Bayesian approach was
N
investigated in [27] with the so-called residual component x ≈ a e ,
analysis. All these methods share one major drawback: they t X nt n
n=1
only consist in estimating the abundances, with a nonlinear
where each x is represented as a linear combination of the
model, while the endmembers need to be extracted in a pre- t
columnsof E, denotede for n=1,...,N, with the scalars
processing stage using any conventional linear technique (N- n
a for n = 1,...,N and t = 1,...,T being the entries of
Findr,vertexcomponentanalysis,...[28]).Asopposedtosuch nt
thematrixA.Thesubspacespannedbythevectorsx ,aswell
separationintheoptimizationproblems,theNMFprovidesan t
as the vectors e is denoted the input space X.
elegant framework for solving jointly the unmixing problem, n
To estimate both matrices E and A, one concentrates on
namely estimating the endmembers and the abundances. To
theminimizationoftheFrobeniussquarederrornorm 1kX−
thebestofourknowledge,therehavebeennopreviousstudies 2
EAk2, subject to E ≥ 0 and A ≥ 0. In its vector-wise
thatcombinethe linearandnonlinearmodelswithintheNMF F
formulation, the objective function to minimize is
framework.
In this paper, we study the bi-objective optimization prob- T N
1
lem that performs simultaneously the NMF in both input JX(E,A)= 2Xkxt−Xantenk2, (2)
and feature spaces, by combining the linear and kernel-based t=1 n=1
models.Thefirstobjectivefunctiontooptimizestemsfromthe wheretheresidualerrorismeasuredbetweeneachinputvector
conventionallinearNMF,whilethesecondobjectivefunction, x and its approximation N a e in the input space
defined in the feature space, is derived from a kernel-based Xt. The optimization is operPatend=1witnhtantwo-block coordinate
NMF model. In case of two conflicting objective functions, descent scheme, by alternating between the elements of E or
thereexistsratherasetofnondominated,noninferiororPareto of A, while keeping the elements in the other matrix fixed.
optimal solutions, as opposed to the unique decomposition
whendealingexclusivelywithoneobjectivefunction.Inorder
B. Nonlinear – kernel-based – NMF
to acquire the Pareto optimal solutions, we investigate the
sum-weighed method from the literature of multi-objective A straightforward generalization to the nonlinear form is
optimization, due to its ease for being integrated to the proposed within the framework offered by the kernel ma-
proposed framework. Moreover, propose to approximate the chines.Inthefollowing,wepresentthekernel-basedNMFthat
correspondingParetofront.Themultiplicativeupdaterulesare we have recentlyproposedin [3]. Itis worthnotingthatother
derivedfor the resulting sub-optimizationproblemin the case variants can also be investigated, including the ones studied
where the feature space is induced by the Gaussian kernel. in [20], [21], [22], [23]. However, these variants suffer from
The convergence of the algorithm is discussed, as well as the pre-image problem, making the derivations and the study
initialization and stopping criteria. more difficult; see [29] for more details.
3
ConsideranonlinearfunctionΦ(·)thatmapsthecolumnsof
the matrix X, as well as the columns of the matrix E, from Φ(·)
the input space X to some feature space H. Its associated x
1
norm is denoted k · kH, and the correspondinginner product Φ(xt)
cinanthbeefeevaatluuraetesdpaucseinigsthoef sthoe-cfaollremdkheΦr(nxelt)f,uΦnc(txiot′n)iκH(x, tw,hxitc′h) e1xt xbtx=2 PeNnn=1anten Φ(e1) ΨbΦt(e=Φn()Pe2Nn)=Φ1(xa1n)tΦ(en)
in kernel machines. Examples of kernel functions are the X e2 H
Gaussian and the polynomial kernels. Φ(x2)
Φ(·)
Applying the model (1) in the feature space, we get the
following matrix factorization model Fig.1:Inthe linearNMF, eachsample x isapproximatedby
t
[Φ(x ) Φ(x ) ··· Φ(x )]≈[Φ(e ) Φ(e ) ··· Φ(e )]A, xt in the input space X, while in the kernel-based NMF, the
1 2 T 1 2 N mapped sample Φ(x ) is approximated by Ψ in the feature
b t t
or equivalently in the vector-wise form, for all t=1,...,T, space H. The proposed bi-objective NMF sbolves simultane-
N ously the two optimization problems.
Φ(x )≈ a Φ(e ).
t X nt n
n=1
Under the nonnegativity of all e and entries of A, the B. On augmenting the linear model with a nonlinearity
N
optimization problem consists in minimizing the sum of the Recently, several works have been investigating the com-
residual errors in the feature space H, between each Φ(xt) bination of a linear model, often advocated by a physical
and its approximation PNn=1antΦ(en), namely model,withanadditivenonlinearfluctuation,determinedwith
a kernel-based term. The model takes the form
T N
1 2
JH(E,A)= 2Xt=1(cid:13)(cid:13)(cid:13)Φ(xt)−nX=1antΦ(en)(cid:13)(cid:13)(cid:13)H. (3) xt = XN anten+Ψt,
By analogyto the linear case, a two-block coordinatedescent n=1
schemecanbeinvestigatedtosolvethisoptimizationproblem. where Ψ belongs to some nonlinear feature space. Several
t
models have been proposed to define this nonlinearity, as
III. INPUTVERSUS FEATURESPACE OPTIMIZATION outlined here. In [18], the nonlinearity depends exclusively
The difference between the linear and the nonlinear cases onthe endmemberse . In [26], theaboveadditivefluctuation
n
is illustrated in Fig. 1. With the linear NMF, each sample x is relaxed by considering a convex combination with the so-
t
is approximatedwith a linear combination of the N elements calledmultiplekernellearning.Morerecently,theabundances
e , namelybyminimizingthe Euclideandistance in theinput areincorporatedinthenonlinearmodel,with apost-nonlinear
n
space between each x and x = N a e . With the model as studied in [25] and a Bayesian approach is used in
t t Pn=1 nt n
nonlinear case, using the kernel-based formalism, the opti- [27] with the so-called residual component analysis. Another
b
mizationisconsideredinthefeaturespace,byminimizingthe modelisproposedin[19]inthecontextofsupervisedlearning.
distanceinHbetweenΦ(x )andΨ = N a Φ(e ).The All these approaches consider that the endmembers e are
t t Pn=1 nt n n
twomodels,andthecorrespondingboptimizationproblems,are alreadyknown,orestimatedusingsomelineartechniquessuch
distinct (except for the trivial linear kernel). asthevertexcomponentanalysis(VCA)[28].Thenonlinearity
is only investigated within the abundances a . As opposed
nt
A. The pre-image problem to these approaches, the method considered in this paper
investigatesalso the estimationof the endmemberse , with a
An attemptto bridgethis gapis to providea representation n
nonlinear relation with respect to it.
of Ψ in the input space, namely estimating the element of
t
X wbhose image with the mapping function Φ(·) is as close
as possible to Ψt. This is the pre-imageproblem, which is an IV. BI-OBJECTIVE OPTIMIZATIONIN BOTH INPUT AND
ill-posed nonlibnear optimization problem; see [24] for more FEATURESPACES
details. As shown in the literature investigating the pre-image
In this section, we propose to solve simultaneouslythe two
problem, and demonstrated recently in [30, Theorem 1], the
optimizationproblems,intheinputandthefeaturespaces.See
pre-image takes the form N a′ e , for some unknown
coefficientsa′ . These coefPficnie=n1tsndtepenndon the e , making Fig. 1 for an illustration.
nt n
the model implicitly nonlinear.
A. Problem formulation
Itisworthnotingthatthisdifference,betweenthelinearand
the nonlinear case, is inherited from the framework of kernel Optimizing simultaneously the objective functions
machines; see [31]. This drawback spans also the multiple JX(E,A) and JH(E,A), namely in both the input
kernel learning models, of the form N a κ(e ,·) where and the feature space, is in a sense an ill-defined problem.
Pn=1 nt n
thekernelκisa(convex)combinationofseveralkernels[32]. Indeed,itisnotpossibleingeneraltofindacommonsolution
While we focus in this paper on the NMF, our work extends thatisoptimalforbothobjectivefunctions.As opposedtothe
tothewideclassofkernelmethods,byprovidingaframework single-objective optimization problems where the main focus
to optimize in both input and feature spaces, as shown next. would be on the decision solution space, namely the space of
4
all entries (E,A) (of dimension LN+NT), the bi-objective optimization.To this end, we transform the bi-objectiveprob-
optimization problem brings the focus on the objective lem into an aggregated objective function which is a convex
space, namely the space to which the objective vector combination of the two original objective functions. Let
[oJpXti(mEiz,aAtio)n pJroHb(lEem,A, m)]ulbtei-loobnjgesc.tiBveeyoopntdimsiuzcahtiobni-hoabsjebcteievne J(E,A)=αJX(E,A)+(1−α)JH(E,A)
widely studied in the literature. Before taking advantage of betheaggregatedobjectivefunction(i.e.,sum-weightedobjec-
this literature study and solving our bi-objective optimization tive function, also called scalarization value) for some weight
problem, we revisit the following definitions in our context: α ∈ [0,1] that represents the relative importance between
• Pareto dominance: The solution (E1,A1) is said to objectives JX and JH. The optimization problem becomes
dominate (E2,A2) if and only if JX(E1,A1) ≤ min αJX(E,A)+(1−α)JH(E,A)
JX(E2,A2) and JH(E1,A1) ≤ JH(E2,A2), where at E,A (4)
least one inequality is strict. subject to E ≥0 and A≥0
• Pareto optimal: A given solution (E∗,A∗) is a global
For a fixed value of the weight α, the above problem is
(respectivelylocal) Pareto optimal if and only if it is not
calledthe suboptimizationproblem.Thesolutionto theabove
dominated by any other solution in the decision space
suboptimization problem is a Pareto optimal for the original
(respectively in its neighborhood). That is, the objective
bi-objective problem, as proven in [34] for the general case.
vector [JX(E∗,A∗) JH(E∗,A∗)] corresponding to a
Paretooptimal(E∗,A∗)cannotbeimprovedinanyspace By solving the above suboptimization problem with a spread
of values of the weight α, we obtain an approximationof the
(input or feature space) without any degradation in the
Pareto front. It is obvious that the model breaks down to the
other space.
single-objective conventional NMF in (2) with α = 1, while
• Paretofront:Thesetoftheobjectivevectorscorrespond-
the extreme case with α=0 leads to the kernel NMF in (3).
ingtotheParetooptimalsolutionsformstheParetofront
The optimization problem (4) has no closed-form solu-
in the objective space.
tion, a drawback inherited from optimization problems with
Various multi-objective optimization techniques have been nonnegativity constraints. Moreover, the objective function is
proposedandsuccessfullyappliedintoengineeringfields,e.g., nonconvex and nonlinear, making the optimization problem
the evolutionary algorithms [33], sum-weighted algorithms difficult to solve. In the following, we propose iterative
[34], [35], ε-constraint method [36], [37], normal boundary techniques for this purpose. It is noteworthy to mention this
intersection method [38], to name a few. See the survey [39], yieldsanapproximateoptimalsolution,whoseobjectivevector
[40] and the references therein on the methods for multi- approximates a point on the Pareto front. Substituting the
objectiveoptimization.Amongtheexistingmethods,thesum- expressionsgivenin(2)and(3)forJX andJH,theaggregated
weighted or scalarization method has been always the most objective function becomes
popular one, since it is straightforward and easily to imple-
T N T N
mpreonbtl.emA isnutmo-waesiignhgtleed-otbejcehcntiivqeueprcoobnlveemrtsbya mcoumltbi-ionbinjegcttihvee J = α2 Xt=1(cid:13)(cid:13)(cid:13)xt−nX=1anten(cid:13)(cid:13)(cid:13)2+ 1−2αXt=1(cid:13)(cid:13)(cid:13)Φ(xt)−nX=1antΦ(en)(cid:13)(cid:13)(cid:13)2H.
(5)
multiple objectives. Under some conditions, the objective
vectorcorrespondingtolatter’soptimalsolutionbelongstothe This objective function takes the form
convex part of multi-objective problem’s Pareto front. Thus,
T N N N
by changing the weights among the objectives appropriately, anmti,nenαX(cid:16)−Xante⊤nxt+ 21 X X antamte⊤nem(cid:17)
the Pareto front of the original problem is approximated.The t=1 n=1 n=1m=1
drawbacks of the sum-weighted method reside in that the T N 1 N N
nonconvex part of the Pareto front is unattainable, and even +(1−α)X(cid:16)−Xantκ(en,xt)+ 2 X X antamtκ(en,em)(cid:17),
t=1 n=1 n=1m=1
on the convex part of the front, a uniform spread of weights (6)
doesnotfrequentlyresultinauniformspreadofParetopoints after expanding the expressions of the distances in (5), and
on the Pareto front, as pointed out in [34]. Nevertheless, the removing the constant terms x⊤x and κ(x ,x ), since they
t t t t
sum-weighted method is the most practical one, in view of are independent of a and e .
nt n
the complexityof the NMF problem,which is nonconvex,ill- Although the NMF is nonconvex, its subproblem with one
posed and NP-hard. matrix fixed is convex. Similar to most NMF algorithms,
we apply the two-block coordinate descent scheme, namely,
alternatingoverthe elementsin E or in A, while keepingthe
B. Bi-objective optimization with the sum-weighted method
elements in the other matrix fixed. The derivative of (6) with
Following the formulation introduced in the previous respect to ant is
section, we propose to minimize the bi-objective function
N
[mJaXtr(iEce,sAE) andJAH.(ETh,eAd)e],cisuinodnersoltuhteionn,ononfesgizaetivLitNy +ofNtThe, ∇antJ =α(cid:16)−e⊤nxt+ X amte⊤nem(cid:17)
m=1 (7)
correspondsto the entriesin the unknownmatrices E and A. N
In the following, we use the sum-weighted method, which +(1−α)(cid:16)−κ(en,xt)+ X amtκ(en,em)(cid:17),
is the most widely-used approach to tackle multi-objective m=1
5
and the gradient of (6) with respect to e is TABLE I: Some common kernels and their gradients with
n respect to e
n
T N
∇enJ =αXant(cid:16)−xt+ X amtem(cid:17) Kernel κ(en,z) ∇enκ(en,z)
t=1 m=1 Gaussian exp(2−σ12ken−zk2) −σ12κ(en,z)(en−z)
T N Polynomial (z⊤en+c)d d(z⊤en+c)(d−1)z
+(1−α)Xant(cid:16)−∇enκ(en,xt)+X amt∇enκ(en,em)(cid:17). Exponential exp(2−σ12ken−zk) −2σ12κ(en,z)sgn(en−z)
t=1 m=1 Sigmoid tanh(γz⊤en+c) γsech2(γz⊤en+c)z
(8)
Here, ∇enκ(en,·) represents the gradient of the kernel with
respect to its argument e , and can be determined for most expression of the gradient (8) into the subtraction of two
n
valid kernels, as shown in TABLE I. nonnegative terms, i.e., ∇enJ = P − Q, where P and Q
have nonnegative entries. To this end, we set the stepsize η
Based on the gradient descent scheme, a simple additive n
corresponding to e in (10) as (14) (given on top of next
update rule can be written as n
page), and obtain the following multiplicative update for e
n
ant =ant−ηnt∇antJ (9) in (15) (given on top of next page), where the division and
multiplication are element-wise.
for a , and
nt
en =en−ηn∇enJ (10)
On the convergence, initialization and stopping criteria
for e . The stepsize parameters η and η balance the rate
n nt n
Theproposedalgorithmtoleratestheuseofanystrictlypos-
of convergencewith the accuracy of optimization, and can be
itivematricesasinitialmatrices.Asimpleuniformdistribution
set differently dependingon n and t. After each iteration, the
on the positive quadrant is shown to be a good initialization
rectificationa =max(a ,0)shouldfollowtoguaranteethe
nt nt
in our experiments. It is an advantage over some NMF
nonnegativity of all a and the entries in all e .
nt n algorithms where stricter initial conditions are required. For
instance, proposed for the hyperspectral unmixing problem,
C. Multiplicative update rules for the Gaussian kernel
both constrained NMF [2] and minimum volume constrained
The additive update rule is easy to implement but the NMF [42] initialize the columns of the endmember matrix E
convergence can be slow and very sensitive to the stepsize with randomly chosen pixels from the image under study.
value,aspointedoutbyLeeandSeungin[10].Followingthe For eachgivenweightα, the stoppingcriterionis two-fold,
spiritofthelatterpaper,weprovidemultiplicativeupdaterules either a stationary point is attained, or the preset maximum
fortheproposedbi-objectiveNMF.Withoutlossofgenerality, numberofiterationsisreached.Therefore,thealgorithmstops
we restrict the presentationto the case of the Gaussian kernel at the n-th iteration if
for the second objective function JH. For most valid kernels, J(n) ≤min{J(n−1),J(n+1)}
the corresponding multiplicative update rules can be derived
using a similar procedure. or
The Gaussian kernel is defined by κ(zi,zj) = n=nmax,
exp(−1kz −z k2) for any z ,z ∈X, where σ denotesthe
2σ2 i j i j whereJ(n) denotestheevaluationoftheaggregationobjective
tunable bandwidth parameter. In this case, its gradient with
respect to e is function J at the n-th iteration and nmax is a predefined
n
threshold.
1
∇enκ(en,z)=−σ2κ(en,z)(en−z). (11) Ontheconvergenceoftheproposedalgorithm,itisnotewor-
thy to mention that the quotients in the multiplicative update
Toderivethemultiplicativeupdateruleforant,wechoosethe rules (12) and (15) are unity if and only if
stepsize parameter in the additive update rule (9) as
ant ∇antJ =0 and ∇enJ =0,
η = ,
nt
N N respectively. Therefore, the above multiplicative update rules
α X amte⊤nem+(1−α) X amtκ(en,em) imply a part of the Karush-Kuhn-Tucker (KKT) conditions.
m=1 m=1 However, the KKT conditions state only the necessary condi-
which yields tionsforalocalminimum.Concerningthenonconvexproblem
αe⊤x +(1−α)κ(e ,x ) as the studied one, or any problem with non-unique KKT-
ant =ant× n t n t . points (stationary points), a local minimum is not guaranteed.
N N
α a e⊤e +(1−α) a κ(e ,e ) Similar to other multiplicative-type update rules proposed in
X mt n m X mt n m NMF, the proposed algorithm lacks guaranteed optimality
m=1 m=1
(12) property,since the convergenceto a stationary point does not
Incorporatingthe above expression ∇enκ(en,z) of the Gaus- always correspond to a local minimum. See also the discus-
sian kernel in (8), the gradient of the objective function with sionsaroundtheconvergenceoftheconventionalNMFin[43],
respect to e becomes (13) (given on top of next page). The [44].Independentlyfromthesetheoreticallackofconvergence,
n
multiplicative update rule for e is elaborated using the so- we show next that the proposed algorithm provides relevant
n
called split gradient method [41]. This trick decomposes the results, and also outperforms all state-of-the-art methods.
6
T N T N
1−α
∇enJ =αXant(cid:16)−xt+ X amtem(cid:17)+ σ2 Xant(cid:16)κ(en,xt)(en−xt)− X amtκ(en,em)(en−em)(cid:17) (13)
t=1 m=1 t=1 m=1
e
n
η = (14)
n
T N T N
ασ2 a a e +(1−α) a κ(e ,x )e + a κ(e ,e )e
nt mt m nt(cid:16) n t n mt n m m(cid:17)
tP=1 mP=1 tP=1 mP=1
T T N
ασ2 a x +(1−α) a κ(e ,x )x + a κ(e ,e )e
nt t nt(cid:16) n t t mt n m n(cid:17)
e =e ⊗ tP=1 tP=1 mP=1 (15)
n n
T N T N
ασ2 a a e +(1−α) a κ(e ,x )e + a κ(e ,e )e
nt mt m nt(cid:16) n t n mt n m m(cid:17)
tP=1 mP=1 tP=1 mP=1
V. EXPERIMENTS
In this section, the performance of the proposed algorithm
forbi-objectiveNMFisdemonstratedontheunmixingoftwo
well-known hyperspectral images. An approximation of the
Pareto front is proposed and a comparison with state-of-the-
art unmixing methods is conducted.
A. Dataset and settings
The first image, depicted in Fig. 2, is the Urban image1,
acquired by the Hyperspectral Digital Imagery Collection
Experiment(HYDICE) sensor. The top left part with 50×50
Fig. 2: The ground truth of the Urban image
pixelsis takenfromthe original307×307pixels’image.The
rawdataconsistsof210channelscoveringthebandwidthfrom
0.4µm to 2.5µm. As recommended in [45], only L = 162
B. Approximation of the Pareto front
clean bands of high-SNR are of interest. According to the
Since to determine the whole Pareto front is unrealistic for
groundtruth providedin [45], [46], the studiedarea is mainly
anonlinearmulti-objectiveoptimizationproblem,onetargetis
composed by grass, tree, building, and road.
to approximate the Pareto front by a set of discrete points on
The second image is a sub-image with 50 × 50 pixels
it[39].TheconceptoftheParetooptimalandtheParetofront
selected from the well-known Cuprite image, which was ac-
arenotstrictintheproposedalgorithm,duetothesolverofthe
quiredby theAirborneVisible/InfraredImagingSpectrometer
suboptimization problem not guaranteeing a local minimum,
(AVIRIS) in 1997. The data is collected over 244 contiguous
not to mention the global minimum. These obstacles are
spectral bands, with the wavelength ranging from 0.4µm to
inherited from the nonconvexity and the nonlinearity of the
2.5µm.Aftertheremovalofthenoisybands,L=189spectral
kernel-based NMF problem. In this case, the Pareto optimal
bandsremain.Asinvestigatedin[47],[48],thisareaisknown
andtheParetofrontreferactuallytocandidateParetooptimal
to be dominated by three materials: muscovite, alunite and
and an approximation of Pareto front, respectively [39].
cuprite.
ToapproximatetheParetofrontwithadiscretesetofpoints,
Experiments are conducted employing the weight set α ∈
we operate as follows: For each value of the weight α, we
{0,0.02,...,0.98,1}, which implies the model varying grad-
obtain a solution (endmember and abundance matrices) from
ually from the nonlinear Gaussian NMF (α = 0) to the
the algorithm proposed in Section IV-C; by evaluating the
conventionallinearNMF(α=1).Foreachαfromtheweight
set, multiplicative update rules given in (12) and (15) are objective functions JX and JH at this solution, we get a
single point in the objective space. The approximated Pareto
applied, with the maximum iteration number n = 300.
max
The initial matrices of E and A are generated using a [0, front for the Urban and the Cuprite images are shown in
1] uniform distribution. To choose an appropriate bandwidth Fig.3(a)andFig.3(b).TheevolutionofobjectivesJX,JHand
the aggregated objective function J, evaluated at the solution
σ in the Gaussian kernel, we first apply the single objective
obtained for each weight α, are shown in Fig. 4(a) for the
Gaussian NMF on both images, using the same candidate set
Urban image and in Fig. 4(b) for the Cuprite image.
{0.2,0.3,...,9.9,10,15,20,...,50} for σ. Considering the
We observe the following:
reconstructionerrorinbothinputandfeaturespace(seebelow
for definitions), we fix σ = 3.0 for the Urban image, and 1) For both images under study, solutions generated with
σ =2.5 for the Cuprite image as investigated in [29]. α = 1 and α = 0 are dominated, since all the solutions
on the Pareto front outperform them, with respect to both
1TheUrbanimageisavailable:http://www.erdc.usace.army.mil/Media/--FactSheets/FoabcjteSchteievtAesrt.icTlehViisewre/tvabeiadl/9s2t5h4a/Atrntiecileth/4e7r66th81e/hcyopnervceubnet.iaosnpxallinear
7
2) Regarding the sum-weighted approach, the minimizer of
350 the suboptimization problem is proven to be a Pareto
α=1 optimal for the original multi-objective problem, i.e., the
300 α=0.94 corresponding objective vector belongs to the Pareto front
in the objective space [34]. In practice, we obtain 9 and
α=0.9
250 23 (out of 51) dominated solutions for the Urban and the
Cuprite images, respectively. Such phenomenon, however,
200
α=0.8 is not surprising, since there exist multiple Pareto optimal
H
J solutions in a problem only if the objectives are conflict-
150 α=0.7 ing to each other, as claimed in [49]2. Other possible
explanation could be the applied numerical optimization
100
α=0.5 scheme, due to the weak convergence of the method or
α=0.4 to the failure of the solver in finding a global minimum
50 α=0.3 α=0.16
α=0.18 α=0.2 α=0 [40]. For the Urban image as shown in Fig. 3(a) and
0 Fig. 4(a), all the obtained solutions are Pareto optimal
0 50 100 150 200 250
J within the objectives-conflicting interval α ∈ [0.18,0.94].
X
Regarding the Cuprite image, as observed in Fig. 3(b)
(a) Urbanimage
and Fig. 4(b), the objectives-conflicting interval is α ∈
[0.14,0.72], while the Pareto optimal solutions are found
using α ∈ {0.04} ∪ {0.26,0.28,...,0.72}. In fact, the
140
obtained solutions with α∈{0.14,0.16,...,0.24}are only
α=1 local Pareto optimal, and they are dominated by a global
120
Pareto optimal with α = 0.04. It is pointed out in [40]
that, in the nonconvex problem, the global (local) solver
100
generates global (local) Pareto optimal, and local Pareto
optimal is not of interest in frontof globalPareto optimal.
80
H 3) As illustrated in both Fig. 3(a) and Fig. 3(b), an even
J
60 distribution of weight α between [0, 1] do not lead to an
even spread of the solutions on the approximated Pareto
40 front. Moreover, the nonconvex part of the Pareto front
cannot be attained using any weight. It is exactly the case
20 α=0.72 αα==00.. 87 α=0 . 9 α=0. 5 α=0.26 α = 0 α.0=40.02 α= 0 . 1α4=0.1 iαn =Fig0..33(ba)n;dinαFi=g.03.(5a)o,na ttrhieviaalppnroonxciomnavteexdpPaarrtebtoetwfreoennt
α=0
0 is probably resulted from the nonoptimal solution of the
12 14 16 18 20 22 24 26 28
J suboptimization problem. These are two main drawbacks
X
of the sum-weighted method.
(b) Cuprite image
Nevertheless, the obtained approximationof Pareto front is
ofhighvalue.Ononehand,itprovidesasetofParetooptimal
Fig. 3: Illustration of the approximated Pareto front in the
solutions for the DM, instead of a single decomposition. On
objective space for Urban and Cuprite images. The objective
the other hand, an insight of the trade-off between objectives
vectors of the non-dominated solutions (42 for the Urban
JX andJH revealstheunderlyinglinearity/nonlinearityofthe
image,28fortheCupriteimage),markedinred,approximatea
data under study, as illustrated in the following section.
partoftheParetofront;theobjectivevectorsofthedominated
solutions (9 for the Urban image, 23 for the Cuprite image)
are marked in blue. C. Performance
In this section, we study the performanceof the method on
theunmixingprobleminhyperspectralimagery.Theunmixing
NMFnorthenonlinearGaussianNMFbestfitsthestudied performance is evaluated by two metrics introduced in [29].
images. On the contrary, the Pareto optimal solutions, Thefirstoneisthereconstructionerrorintheinputspace(RE)
whichresultthepointsontheParetofront,provideasetof defined by
feasibleandnondominateddecompositionsforthedecision
maker(DM),i.e.,theuser.Itisworthnotingthatwe apply T N
v 1
the sum-weighted method as a posteriori method, where RE=uuTLXkxt−Xantetk2.
different Pareto optimal solutions are generated, and the t t=1 n=1
DM makes the final comprise among optimal solutions.
Alternatively, in a priori method, the DM specifies the 2For example, the Pareto optimal solutions for the well-known Schaffer’s
function, defined by J(x) = [x2,(x−2)2]⊤, are found only within the
weight α in advance to generate a solution. See [40] for
interval [0,2], where a tradeoff between two objectives exists. See [33] for
more details. moredetails.
8
TABLE II: Unmixing performance for the Urban image
Thesecondoneisthereconstructionerrorinthefeaturespace
(REΦ), which is similarly defined as RE×10−2 REΦ×10−2
FCLS 1.44 3.89
T N
REΦ =vu 1 Φ(x )− a Φ(e ) 2 , GBM-sNMF 6.50 4.11
utTLXt=1(cid:13)(cid:13)(cid:13) t nX=1 nt t (cid:13)(cid:13)(cid:13)H K-Hype 5.99 4.67
MinDisCo 3.12 4.60
where
ConvexNMF 2.96 5.84
N N N KconvexNMF - 43.94
2
(cid:13)(cid:13)Φ(xt)−XantΦ(et)(cid:13)(cid:13)H =X X antamtκ(en,em) KsNMF - 4.33
(cid:13) n=1 (cid:13) n=1m=1 MercerNMF - 2.96
N LinearNMF α=1 1.48 3.96
−2Xantκ(en,xt)+κ(xt,xt), per GaussianNMF α=0 3.49 1.39
n=1 pa α=0.18 2.70 1.27
s
and κ(·,·) denotes the Gaussian kernel. It is worth to note hi ParetoOptimal α=0.50 2.38 2.04
t
that REΦ can always be evaluated for any given matrices E α=0.94 1.40 3.78
andA,regardlessoftheoptimizationproblemandthesolving
TABLE III: Unmixing performance for the Cuprite image
procedure that led to these matrices.
RE×10−2 REΦ×10−2
State-of-the-art unmixing methods FCLS 0.95 0.59
GBM-sNMF 1.06 0.62
Anunmixingproblemcomprisestheestimationofendmem-
K-Hype 2.12 0.93
bers and the corresponding abundance maps. Some existing
MinDisCo 1.62 4.54
techniques either extract the endmembers (such as VCA) or
ConvexNMF 1.61 2.51
estimate the abundances (such as FCLS)3; other methods en-
KconvexNMF - 19.53
able the simultaneousestimations, e.g., NMF and its variants.
KsNMF - 1.38
We briefly present all the unmixing algorithms that are used
MercerNMF - 2.74
in comparison.
LinearNMF α=1 0.89 2.28
The most-known endmember extraction technique is the
er GaussianNMF α=0 1.05 0.50
vertex component analysis (VCA) [28]. It is based on the ap
p α=0.04 0.92 0.42
linear mixture model and presumes the existence of end- s
hi ParetoOptimal α=0.50 0.84 0.58
members within the image under analysis. It seeks to inflate t
α=0.72 0.77 0.73
the simplex enclosing all the spectra. The endmembers are
the vertices of the largest simplex. This technique is applied
for endmember extraction, jointly with three abundance es- a convex combination of certain data points. The kernel
timation techniques: FCLS, K-Hype and GBM-sNMF. The convex-NMF(KconvexNMF)andthekernelsemi-NMFbased
fully constrained least squares algorithm (FCLS) [51] is a on the nonnegative least squares (KsNMF) are essentially
least square approach using the linear mixture model, where the kernelized variants of the ConvexNMF in [21] and the
the abundances are estimated considering the nonnegativity alternating nonnegativity constrainted least squares and the
and sum-to-one constraints. A nonlinear unmixing model active set method in [53], respectively, as discussed in [22].
for abundance estimation is considered in [18], where the Experiments are also conducted with these two kernel meth-
nonlinear term is described as a kernel-based model, with ods, adopting the Gaussian kernel. Nonlinear NMF based on
the so-called linear-mixture/nonlinear-fluctuation model (K- constructingMercerkernels(MercerNMF),introducedin[54],
Hype). In [52], a generalized bilinear model is formulated, addressesthenonlinearNMFproblemusingaself-constructed
with parameters optimized using the semi-nonnegativematrix Gaussian kernel, where the nonnegativity of the embedded
factorization (GBM-sNMF). bases and coefficients is preserved. The embedded data are
We further consider five NMF-based techniques that are finally factorized with conventional NMF. Of particular note
capable to estimate the endmembers and abundances jointly. isthatonlythereconstructionerrorinthefeaturespacecanbe
The minimum dispersion constrained NMF (MinDisCo) [14] calculatedfortheaforementionedkernel-basedmethods,since
includes the dispersion regularization to the conventional thepre-imagesofthemappedendmember,whicharerequired
NMF,byintegratingthesum-to-oneconstraintforeachpixel’s in the computing of reconstruction error in the input space,
abundance fractions and the minimization of variance within cannot be exploited.
each endmember. The problem is solved by exploiting an
alternate projected gradient scheme. In the convex nonnega- Unmixing performance
tive matrix factorization (ConvexNMF) [21], the basic matrix
The unmixing performance, with respect to the reconstruc-
(endmember matrix in our context) is restricted to the span
tionerrorsintheinputandthefeaturespaces,iscomparedwith
of the input data, that is, each sample can be viewed as
the aforementioned unmixing approaches, as demonstrated in
TABLE II and TABLE III. As can be observed, the proposed
3See [50] for connections between the endmember extraction techniques
andtheabundances estimation techniques. methodwith Paretooptimalsolutionoutperformsnotonlythe
9
End. 1 End. 2 End. 3 End. 4
350 2.5 0.5 0.5 0.5
300 J JX JH Reflectance1.25 Reflectance000...234 Reflectance000...234 Reflectance000...234
0.1 0.1 0.1
250 Conflicting Interval 10.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5
α ∈ [0.18, 0.94]
200 Ab.1 Ab.2 Ab.3 Ab.4
10 0.2 10 0.6 10 0.8 10 0.8
150 2300 00..115 2300 0.4 2300 00..46 2300 00..46
40 0.05 40 0.2 40 0.2 40 0.2
50 10 20 30 40 50 0 50 10 20 30 40 50 0 50 10 20 30 40 50 0 50 10 20 30 40 50 0
100
(a) α=1
50 End. 1 End. 2 End. 3 End. 4
0.6 0.5 0.5 0.5
0.5 0.4 0.4 0.4
GMaoudsesl0ia 0n 0.1 0.2 0.3 0.4 0α.5 0.6 0.7 0.8 LM0.i9noedaerl 1 Reflectance0000....1234 Reflectance000...123 Reflectance000...123 Reflectance000...123
(a) Urbanimage 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5
Ab.1 Ab.2 Ab.3 Ab.4
1 0.8
120 J JX JH 1200 00..68 1200 0.6 1200 00..68 1200 00..68
30 0.4 30 0.4 30 0.4 30 0.4
40 0.2 40 0.2 40 0.2 40 0.2
100 50 10 20 30 40 50 0 50 10 20 30 40 50 0 50 10 20 30 40 50 0 50 10 20 30 40 50 0
Conflicting Interval (b) α=0.3
80 α ∈ [0.14, 0.72]
End. 1 End. 2 End. 3 End. 4
0.5 0.5 0.5 0.5
60 0.4 0.4 0.4 0.4
40 Reflectance00..23 Reflectance00..23 Reflectance00..23 Reflectance00..23
0.1 0.1 0.1 0.1
20 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5
Ab.1 Ab.2 Ab.3 Ab.4
0.8
0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 0.8 10 0.8 10 0.6 10 0.8
Gaussian α Linear 20 0.6 20 0.6 20 20 0.6
Model Model 30 0.4 30 0.4 30 0.4 30 0.4
40 0.2 40 0.2 40 0.2 40 0.2
(b) Cuprite image 50 10 20 30 40 50 0 50 10 20 30 40 50 0 50 10 20 30 40 50 50 10 20 30 40 50
(c) α=0
Fig. 4: Visualization of the trade-off between the two objec- Fig. 5: Urban image: Endmembers and corresponding abun-
tives JX and JH, and the change of the aggregated objective dance maps, estimated using α = 1 (conventional linear
function J, along with the increment of weight α, for the NMF); α = 0.3 (a Pareto optimal of the bi-objective NMF);
Urban image and the Cuprite image. α=0 (nonlinear Gaussian NMF).
VI. CONCLUSION
existing linear NMF (α=1) and Gaussian NMF (α=0), but
This paper presented a novel bi-objective nonnegative ma-
also all the state-of-the-art methods.
trixfactorizationbyexploitingthekernelmachines,wherethe
The estimated endmembers and the corresponding abun- decompositionwasperformedsimultaneouslyintheinputand
dance maps with the proposed method are shown in Fig. 5 thefeaturespace.Themultiplicativeupdateruleswerederived.
and Fig. 6. For the Urban image, different areas (the road in The performance of the method was demonstrated for un-
particular)arebetterrecognizedwiththeParetooptimalwhen mixingwell-knownhyperspectralimages.TheresultingPareto
compared with solutions of the linear and of the Gaussian frontswereanalyzed.Asforfuturework,weareextendingthis
NMF. Regarding the Cuprite image, the linear NMF (i.e., approach to include other NMF objective functions, defined
α = 1) recognizes two out of three regions; whereas both in the input or the feature space. Considering simultaneously
the Pareto optimal (i.e., α = 0.72) and the Gaussian NMF several kernels, and as a consequence several feature spaces,
(i.e., α = 0) are able to distinguish three regions. However, is also under investigation.
the abundance maps of Gaussian NMF appear to be overly
sparse, compared with its counterpart of the Pareto optimal
ACKNOWLEDGMENT
solution.Itisalsonoticedthattheendmembersextractedwith
thelinearNMFarespiky,andevenwithsomezero-parts,thus This work was supported by the French ANR, grant
meeting poorly the real situation. HYPANEMA: ANR-12BS03-0033.
10
REFERENCES
[1] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-
negativematrixfactorization.” Nature,vol.401,no.6755,pp.788–791,
Oct.1999.
[2] S. Jia and Y. Qian, “Constrained nonnegative matrix factorization for
hyperspectralunmixing,”IEEETransactionsonGeoscienceandRemote
End. 1 End. 2 End. 3 Sensing,vol.47,no.1,pp.161–173,Jan.2009.
0.8 0.8 0.8 [3] F.Zhu,P.Honeine,andM.Kallas,“Kernelnon-negative matrixfactor-
0.7 0.7 0.7
0.6 0.6 0.6 izationwithoutthepre-imageproblem,”inIEEEworkshoponMachine
Reflectance0000....2345 Reflectance0000....2345 Reflectance0000....2345 [4] iLIn.eBapruoncliyinung,omNfo.iraNlSiifkgeonalataulirdePisrs,opcaaencsdes,i”nI.gIP,EiREtaEesi,mT“rsNa,noFsnraancnetcgioea,ntisvSeeopnm.a2Nt0er1iux4ra.falcNtoertiwzaotrikosn,
0.1 0.1 0.1 vol.19,no.6,pp.1090–1100, Jun.2008.
00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 [5] R. Zhi, M. Flierl, Q. Ruan, and W. Kleijn, “Graph-preserving sparse
nonnegative matrix factorization with application to facial expression
recognition,”IEEETransactionsonSystems,Man,andCybernetics,Part
Ab.1 Ab.2 Ab.3 B:Cybernetics, vol.41,no.1,pp.38–52,Feb.2011.
0.35
1200 000...567 1200 0.3 1200 00..45 [6] aPl.itMy.reKdiumctiaonndoBf.lTarigdeo-rs,c“aSleubgseynsetemexpirdeesnstiiofincadtiaotan.”thGroeungohmdeimreesneasirocnh-,
30 0.4 30 0.25 30 0.3 vol.13,no.7,pp.1706–1718, 2003.
4500 10 20 30 40 50 00..23 4500 10 20 30 40 50 00..125 4500 10 20 30 40 50 00..12 [7] nAe.gaCtiivcheomckait,rixR.faZctdournizeakt,ioanndinSa.p-pI.licAamtioanris,t“oNbelwindalsgoourricthemsespfaorartinoonn,”-
in2006IEEEInternationalConferenceonAcoustics,SpeechandSignal
(a) α=1 Processing(ICASSP),vol.5. IEEE,2006,pp.V–V.
[8] T. Li and C. Ding, “The relationships among various nonnegative
End. 1 End. 2 End. 3
0.6 matrixfactorizationmethodsforclustering,”inProceedingsoftheSixth
0.5 0.5 0.5 InternationalConferenceonDataMining,ser.ICDM’06. Washington,
Reflectance000...234 Reflectance000...234 Reflectance000...234 [9] DmC.CatD,riUixnSgfa,AcX:to.IrEHizEaeEt,ioaCnnodamnHpd.usDtpe.ercSStiromacloiecntly,u,s“tO2e0rni0n6tgh,,”epipenq.uP3i6rvo2ac–le.3nS7cI1Ae.MofDnaotnanMegiantiinvge
0.1 0.1 0.1 Conf,2005,pp.606–610.
00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 [10] Dfa.ctoDr.izaLtieoen,”anidnHA.dvSa.ncSeesuinng,ne“uArlaglorinitfhomrmsaftioornnporno-cneesgsaintigvesymsteamtrisx,
2001,pp.556–562.
Ab.1 Ab.2 Ab.3 [11] Y. Wang and Y. Jia, “Fisher non-negative matrix factorization for
0.6 0.7 learning local features,” in Proceedings of the Asian Conference on
10 0.8 10 0.5 10 0.6 Computer Vision,2004,pp.27–30.
20 0.6 20 0.4 20 0.5
3400 00..24 3400 000...123 3400 0000....1234 [12] nP.esHsocyoenrsatrnadinPt.s,D”aJyoaunr,n“aNloonf-nMeagcahtiivneemLaetarrixnifnagctoRreizseaatirocnh,wvitohl.s5p,arpsep-.
50 10 20 30 40 50 50 10 20 30 40 50 50 10 20 30 40 50 1457–1469, 2004.
[13] Z.ChenandA.Cichocki, “Nonnegative matrixfactorization withtem-
poralsmoothnessand/orspatialdecorrelationconstraints,”inLaboratory
(b) α=0.72
forAdvancedBrainSignalProcessing, RIKEN,Tech.Rep,2005.
End. 1 End. 2 End. 3
0.8 0.8 0.8 [14] A. Huck, M. Guillaume, and J. Blanc-Talon, “Minimum dispersion
0.7 0.7 0.7 constrained nonnegative matrix factorization to unmix hyperspectral
Reflectance0000....3456 Reflectance0000....3456 Reflectance0000....3456 [15] nQdoa.t.aK6,”e,IpaEnpEd.E2T5.T9Kr0a–ann2sa6ad0ce2t,i,o“JnRusnoo.bnu2s0Gt1le0o1.scnioernmcefaacntodriRzaetmioonteinSetnhseinpgre,sveonlc.e4o8f,
0.2 0.2 0.2 outliers andmissingdatabyalternative convexprogramming,”in2005
0.1 0.1 0.1
IEEE Computer Society Conference on Computer Vision and Pattern
00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 00.5 W1avele1n.g5th (µm2) 2.5 Recognition (CVPR),vol.1. IEEE,2005,pp.739–746.
[16] L. Li, G. Lebanon, and H. Park, “Fast bregman divergence nmf using
Ab.1 Ab.2 Ab.3 taylor expansion and coordinate descent.” in Proceedings of the 18th
0.35 ACM SIGKDD international conference on Knowledge discovery and
10 0.8 10 0.8 10 0.3
20 0.6 20 0.6 20 0.25 datamining. ACM,2012,pp.307–315.
30 0.4 30 0.4 30 00..125 [17] J. Chen, C. Richard, and P. Honeine, “Nonlinear estimation of
40 0.2 40 0.2 40 00..015 material abundances of hyperspectral images with ℓ1-norm spatial
50 10 20 30 40 50 50 10 20 30 40 50 50 10 20 30 40 50 regularization,” IEEETransactionsonGeoscienceandRemoteSensing,
vol. 52, no. 5, pp. 2654–2665, May 2014. [Online]. Available:
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=06531654
(c) α=0
[18] ——, “Nonlinear unmixing of hyperspectral data based on a linear-
Fig. 6: Cuprite image: Endmembers and corresponding abun- mixture/nonlinear-fluctuationmodel,”IEEETransactionsonSignalPro-
cessing,vol.61,no.2,pp.480–492,Jan.2013.
dance maps, estimated using α = 1 (conventional linear
[19] N.Nguyen,J.Chen,C.Richard,P.Honeine,andC.Theys,“Supervised
NMF); α=0.72 (a Pareto optimal of the bi-objective NMF); nonlinearunmixingofhyperspectralimagesusingapre-imagemethod,”
α=0 (nonlinear Gaussian NMF). in New Concepts in Imaging: Optical and Statistical Models, In Eds.
D. Mary, C. Theys, and C. Aime, ser. EAS Publications Series. EDP
Sciences, 2013,vol.59,pp.417–437.
[20] D.Zhang,Z.Zhou,andS.Chen,“Non-negativematrixfactorization on
kernels,” in Lecture Notes in Computer Science, vol. 4099. Springer,
2006,pp.404–412.
[21] C.Ding,T.Li,andM.I.Jordan,“ConvexandSemi-NonnegativeMatrix
Factorizations,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol.32,no.1,pp.45–55,Nov.2010.