Table Of ContentMULTI-TASKIMAGECLASSIFICATIONVIACOLLABORATIVE,HIERARCHICAL
SPIKE-AND-SLABPRIORS
HojjatS.Mousavi,U.Srinivas,V.Monga YuanmingSuo,M.Dao,T.D.Tran
Dept. ofElectricalEngineering Dept. ofElectricalandComputerEngineering
ThePennsylvaniaStateUniversity TheJohnsHopkinsUniversity
5
1 ABSTRACT Motivation and Contribution: Advances in sensing tech-
0 nology have facilitated the easy acquisition of multiple dif-
2 Promisingresultshavebeenachievedinimageclassification
ferent measurements of the same underlying physical phe-
problems by exploiting the discriminative power of sparse
n nomena. Often there is complimentary information embed-
a representationsforclassification(SRC).Recently,ithasbeen ded in these different measurements which can be exploited
J shown that the use of class-specific spike-and-slab priors in
forimprovedperformance. Forexample, infacerecognition
0 conjunction with the class-specific dictionaries from SRC is
oractionrecognitionwecouldhavedifferentviewsofaper-
3 particularly effective in low training scenarios. As a logi-
son’sfacecapturedunderdifferentilluminationconditionsor
cal extension, we build on this framework for multitask sce-
] with different facial postures [2,12–15]. In automatic target
V narios, wherein multiple representations of the same physi-
recognition, multiple SAR (synthetic aperture radar) views
calphenomenaareavailable. Weexperimentallydemonstrate
C are acquired [16]. The use of complimentary information
thebenefitsofminingjointinformationfromdifferentcamera
. from different color image channels in medical imaging has
s viewsformulti-viewfacerecognition.
c been demonstrated in [17]. In border security applications,
[ Index Terms— Image analysis, sparse representation, multi-modalsensordatasuchasvoicesensormeasurements,
1 structuredpriors,spike-and-slab,facerecognition. infraredimagesandseismicmeasurementsarefused[18]for
v activity classification tasks. The prevalence of such a rich
7 variety of applications where multi-sensor information man-
6 1. INTRODUCTION
ifests in different ways is a key motivation for our contri-
8
bution in this paper. Specifically, we extend recent work in
7 Image classification is an important problem that has been
0 class-specific sparse prior-based classification by Srinivas et
studied for over three decades. Practical applications span a
. al.[19]toamultitaskframework.
1 wide variety of areas such as general texture or object cate-
0 WeextendtheBayesianframeworkin[19]inahierarchi-
gorization[1,2],facerecognition[3,4]andautomatictarget
5 calmannerinordertocapturejointinformationacrossmulti-
recognitioninhyperspectralorradarimagery[5,6]. Various
1 pletasks(ormeasurements). Asobservedin[19],animpor-
: methodsoffeatureextraction([7,8]forexample)aswellas
v tantchallengeistodevelopaframeworkbasedonspike-and-
classifiers[1,9]havebeeninvestigated.
i slabpriorsthatcancaptureageneralnotionofsignalsparsity
X The advent of of compressive sensing (CS) [10] has in-
while also leading to tractable optimization problems. Our
r spired research in the direction of applying the central ana-
a contribution successfully addresses both these issues via a
lytical formulation of CS to classification problems. Sparse
generalizedcollaborativeBayesianhierarchicalmethod. Ex-
representation-basedclassification(SRC)[11]isarguablythe
pectedlyitresultsinahardnon-convexoptimizationproblem.
mostwell-knownsuchtoolthathasdemonstratedrobustper-
We propose an efficient solution using Monte Carlo Markov
formance even in the presence of high pixel distortion, oc-
Chain(MCMC)methodwhichresultsinpracticalbenefitsas
clusion or noise. Extensions of SRC have been proposed
demonstratedinSection4.
alongtwolinesofthought: (i)byaddingregularizerandpri-
orswhichpreventoverfittingissuebyintroducingadditional
informationtotheproblemand(ii)byexploitingjointinfor- 2. SPARSEREPRESENTATION-BASED
mation and complementary data in multitask cases. We are CLASSIFICATION(SRC)
alsomovingalongthisdirectiontousetheadvantagesofus-
ingpriorsaswellasjointinformation.
TheaimofCSisessentiallytorecoverahigherdimensional
compressible (sparse) signal xxx∈Rn from a set of lower di-
Copyright (cid:13)c2010 IEEE. Personal use of this material is permitted.
mensionallinearmeasurementsyyy∈Rmoftheformyyy=AAAxxx. It
However,permissiontousethismaterialforanyotherpurposesmustbeob-
tainedfromtheIEEEbysendingarequesttopubs-permissions@ieee.org. isshownthatxxxisrecoverablebysolvingthefollowingunder-
determined(m(cid:28)n)optimizationproblem[20]:
min ||xxx|| subjectto yyy=AAAxxx, (1)
0
xxx
where||xxx|| isbasicallythenumberofnon-zeroelementsofxxx.
0
Itisknownthat(1)isanNP-hardproblembutcanbesolved
Fig.1. Row,blockanddynamicsparsityfromlefttoright.
byrelaxingthenon-convexterm[21]andsolvethefollowing
optimizationprobleminpresenceofnoise.
well-suitedstructuredpriorforcapturingsparsity[25–27]. δ
min ||xxx|| subjectto ||yyy−AAAxxx|| <ε. (2) 0
1 2
xxx is a point mass concentrated at zero (known as ”spike”) and
the other term is the distribution of nonzero coefficients of
Recently, Wrightetal. proposedSRC[11]forfacerecogni-
sparsevectoralsoknownas”slab”. Inthisframeworkγ’sare
tion by designing an over-complete class-specific dictionary i
assumedtohavebinaryvalues,either1(ifx =1)orzero(if
AAAandemployingCSframework. Foramulti-classproblem, i
x (cid:54)=0)inwhichcasetheybecomeindicatorvariablesforeach
the dictionary matrix AAA is built using training images from i
elementofxxx. Itisclearthatγγγcancontrolthesparsitylevelof
allclassesasAAA=[AAA ... AAA ]whereeachcolumnofAAA isa
1 C Cr the signal and at the same time enforce a specific structure.
vectorizedtrainingimage(dictionaryatom)fromclassC .
r
Asaresult, κ, whichistheprobabilityofeachcoefficientto
The minimization problem in (2) is equivalent to maxi-
be nonzero, plays a key role in identifying the structure. In
mizing probability of observing the sparse vector xxx given yyy
the next section, we will show how a smart choice of κ can
assumingxxx has an i.i.d Laplacian distribution in a Bayesian
leadtoaframeworkthatismoregeneralandabletocapture
framework [22]. Therefore, sparsity can be interpreted as a
differentsparsestructuresinthecoefficientvector.
prioronthecoefficientvectorwhichenhancessignalcompre-
hension by incorporating contextual information and signal
structure. Structureandsparsitybothcanbeenforcedbyin-
troducingprobabilisticpriors, f(xxx), onthecoefficientvector 3. MULTI-TASKIMAGECLASSIFICATIONVIA
andsolvingthefollowingoptimizationproblem[23]: COLLABORATIVE,HIERARCHICAL
SPIKE-AND-SLABPRIORS
maxf(xxx) subjectto ||yyy−AAAxxx|| <ε. (3)
2
xxx
3.1. BayesianFramework
Recently, Srinivas et al. proposed the use of spike-and-slab
priors under a Bayesian framework for image classification Iftherearemultiplemeasurementsyyy ,...,yyy ofthesamesig-
111 TTT
purposes [19]. They proposed one class-conditional proba- nal,wenowhaveasparsecoefficientmatrixXXX asfollows:
bilitydistributionfunction(pdf)perclassrepresentedby f
C1
and f with the same class-specific dictionaryAAA as before. YYY :=[yyy ... yyy ]=AAA[xxx ... xxx ]=AAAXXX. (9)
C2 111 TTT 1 T
Givenatestvectoryyyclassassignmentisdonebysolvingthe
following constrained likelihood maximization problem per Dependingontheapplicationandtypeofmeasurement, dif-
classwhere f ’sarelearnedseparatelyusingspike-and-slab ferentnotionsofmatrixsparsitycanoccurwithrespecttothe
Cr
priors: coefficientmatrixXXX. AsillustratedinFig. 1, insomeappli-
cationssuchashyperspectralclassification,rowsparsity-en-
xxxˆCr =argmxxxaxfCr(xxx) subjectto ||yyy−AAAxxx||2<ε. (4) tire rows with all zero or all non-zero coefficients - emerges
naturally in matrix XXX, whereas in other applications, block
Class(yyy)=arg max fCr(xxxˆCr) (5) sparsityorjointdynamicsparsityaremoreappropriate. This
r∈{1,2}
sparsity pattern is an inherent feature of the application and
Inspired by [24] they developed a Bayesian framework for the specific way in which the multi-task dictionary has been
classificationandconsideredalinearmodelyyy=AAAxxx+nnnineach designed. Ourcontributionisaframeworkthatisapplicable
class whereyyy∈Rm,xxx∈Rn,AAA∈Rm×n andnnn is the inherent
inawidevarietyofapplicationsbycapturingageneralnotion
Gaussiannoise(wedroptheclassindicesfornotationalsim-
ofthesparsestructureinthecoefficientmatrix.
plicity).Usingthisset-uptheunderlyingBayesianframework
Assume we have T observations (tasks) from the same
isasfollows:
class and put them together to form a matrixYYY. We desire
yyy|AAA,xxx,γγγ,σ2 ∼ N(AAAxxx,σ2III) (6) to find a sparse matrix XXX such that we can guarantee that
xi|σ2,γi,λ ∼ γiN(0,σ2λ−1)+(1−γi)δ0, i=1,...,n (7) theerrorbetweentestmatrixanditsreconstructedversionis
smallandatthesametime,matrixXXX hasasparsestructure.
γi|κ ∼ Bernoulli(κ), i=1,...,n. (8)
We modified (6)-(8) in order to generalize the Bayesian
where N(.) represents the normal distribution and (7) is framework to multitask case. This modified version should
modelingeachx withaspike-and-slabpriorwhichisavery be able to model the behaviour ofXXX and induce the desired
i
structureandsparsityinXXX. GeneralizedBayesianframework 3.2. SolutiontoOptimizationProblem
formultitaskcaseisasfollows:
Inthissectionweprovideanefficientsolutiontotheresulting
optimization problem in (13). First, we rewrite the problem
YYY|AAA,XXX,ΓΓΓ,σ2n ∼ ∏T N (cid:16)AAAxxxt,σ2nIII(cid:17) (10) asfollows:
t=1 T (cid:18)σ2 n (cid:19)
T n argmin ∑ ||yyy −AAAxxx||2+λ||xxx||2+∑γ ρ (15)
XXX|σ2,ΓΓΓ,λ ∼ ∏∏ γtiN(0,σ2λ−1)+(1−γti)δ0 (11) XXX,ΓΓΓ t=1 σ2n t t 2 t 2 i=1 ti ti
t=1i=1
T n As it can be seen, it consists of T different independent op-
ΓΓΓ|KKK ∼ ∏∏ Bernoulli(κti). (12) timization problems which can be solved individually rather
t=1i=1 thansolvingacomplexmatrixoptimizationproblem. Hence-
forth, for each class we should follow the following proce-
Wherexxx is thetth column of the matrixXXX,KKK ={κ } and
t ti t,i dure:
ΓΓΓ={γ } fort=1...T andi=1...n
Rκeismaasrstiku:mt,Niedotteothtaakteindcioffnetrreanstttvoal(u8e)sasfowraesapchrotpaosskedanidn[e1a9c]h, • minxxxitm,γγγtize σσ2n2||yyyt−AAAxxxt||22+λ||xxxt||22+∑ni=1γtiρti,∀t
• Putxxx’sandγγγ’stogethertoformXXX andΓΓΓ,respectively.
t t
coefficient. Withthisassumptionourframeworkcancapture
Fromnowonwardweonlytalkabouttheoptimizationprob-
more general notions of structure and sparsity in the matrix
lem in classC and taskt. However, it should be solved for
XXX. This is one of our central analytical contributions in this r
each C , r =1,...,C and t, t =1,...,T, separately. For the
paper, in contrast with methods with relaxed and simplified r
sakeofsimplicity,wedropclassandtaskindicesanduseyyy,xxx
assumptions[19,28].
andγγγinsteadofyyy,xxx andγγγ,respectively.
t t t
The benefit of using the Bayesian approach is that it can
Note that the above optimization problem is a hard non-
alleviate the burden on requirement of abundant training.
convex problem. We propose an efficient solution using
One of the central assumptions in SRC is existence of suffi-
MCMC method. One of the advantages of introducing a
cienttraining(overcompletedictionaryAAA)whereasproposed
hierarchical model in (10)-(12) is that we can obtain the
Bayesianapproachcanhandlescenariosthatlackthenumber
marginaldistribution f(γγγ|yyy)∝ f(yyy|γγγ)f(γγγ). Accordingto[26],
oftraining. Thisisenabledbyuseofclass-specificpriorsthat
f(γγγ|yyy)providesaposteriorprobabilitythatcanbeusedtose-
canoffermorediscriminabilityinthedictionary.
lectpromisingatomsthatcontributemoreinreconstructingyyy.
To perform the Bayesian inference we followed a simi- Moreover,findingtheoptimumγγγ∗isalsoequivalenttofinding
lar procedure as in [19] to obtain the joint posterior density the most prominent atoms in the dictionary AAA. Henceforth,
function for MAP estimation. Here we only present the re- we propose the following scheme to solve the optimization
sulting optimization problem; the detailed discussion of the problem: first, we find dictionary atoms that contribute in
optimizationproblemisavailableonlineinatechnicalreport reconstructing the test vector yyy. Then we find the value of
at[29]. Itmustbenotedthatwehaveadifferentframework contributionforeachatomwefoundpreviously.
foreachclassandasaresult,wegetCdifferentoptimization We use MCMC method to extract the promising atoms (see
problemstofindXXXC∗r andΓΓΓC∗r: Algorithm1). Accordingto[26]aMarkovChainofγγγ(j)’sas
in(18)obtainedbyGibbssamplingconvergesindistribution
σ2 T n to f(γγγ|yyy). ThissequencecanbeobtainedbyAlgorithm1and
argmXXX,iΓΓΓn σ2n||YYY−AAAXXX||2F+λ||XXX||2F+t∑=1i∑=1γtiρti (13) tphroosmeinweintht ahtiogmhessitnatphpeeadriacnticoenafrryeqAAAuetnhcayt ccaonrrebsepounsdedtofotrhea
betterreconstruction. Detailsofsamplingfromdistributions
where ρ =σ2log(cid:0)2πσ2(1−κti)2(cid:1). First term in (13) is basi- inAlgorithm1canbefoundin[29].
ti λκt2i After performing Gibbs sampling, prominent dictionary
callytryingtominimizethereconstructionerror,whereasthe
atomsforreconstructionarerevealed,thenwefindtheexact
second and third terms are jointly trying to keep the coeffi-
contribution of each dictionary atom by solving the follow-
cientmatrixsmoothandsparse.
ingconvexoptimizationproblemonthepromisingsubsetof
Remark: Solutiontothisoptimizationproblemhasnotbeen
dictatoryatoms.
addressed yet in the literature but its relaxations reduce to
well-known problems in compressive sensing and statistics σ2
sFuinchallays,LafAteSrSsOol,vEinlagsteiaccNheotp,teitmci[z2a8ti,o3n0,p3r1o]b.lemperclassfor xxxγγ∗γ =argmxxxγiγγnσ2n||yyy−AAAγγγxxxγγγ||22+λ||xxxγγγ||22 (16)
agiventestmatrixYYY,theclassassignmentwillbeasfollows. whereAAA is the new dictionary consisting of only contribut-
γγγ
ing atoms andxxx is the corresponding coefficient vector for
γγγ
Class(YYY) = arg min L (XXX∗ ;ΓΓΓ∗ ) (14) the reduced problem. Afterward, we place the exact contri-
r∈{1,...,C} r Cr Cr bution value of each dictionary element intoxxx based on the
Algorithm1MICHS(perclass,pertask)
Table1. RecognitionrateforC=129classes
Input: AAA,κκκ,yyy
t
View(T) MSM Graph SRC JSRC JDSRC MICHS
(1)Findcontributingatoms: Findsequenceofγγγandxxx
1 36.5 44.5 45.0 45.0 45.0 51.3
xxx(1), γγγ(1), xxx(2), γγγ(2), xxx(3), γγγ(3), ... (17) 3 48.9 63.4 59.5 53.6 72.0 73.0
Initialization: Iterationcounter j=1,γγγ(0)=(1,...,1)
andxxx(0)tobetheleastsquareestimate. applying MICHS to this problem and compare the results
while j<MaxIter do withstate-of-the-artalgorithms.
(1)Samplexxx(j)from f(xxx(j)|yyy,γγγ(j−1)) CMU Multi-PIE database: We conducted the experiments
(2)Sampleithelementofγγγ(j)denotedbyγ(ij)from on the CMU Multi-PIE face database [13] which contains
f(γ(j)|yyy,xxx(j),γγγ(j))fori=1...n a large number of facial images under different illumi-
i (i)
(j) (j) (j) (j−1) (j−1) nations, view points and expressions up to four sessions.
whereγγγ =(γ ,...,γ ,γ ,...,γ ),
(i) 1 i−1 i+1 n There is a collection of cameras at different view angles
endwhile
(θ = {0◦,±15◦,±30◦,±45◦,±60◦,±75◦,±90◦}) that cap-
Extractsequenceofγγγfrom(17):
turesthesamescene. Amongalltheavailableindividuals,we
γγγ(1) , γγγ(2) , γγγ(3), ... (18) pickedC=129 subjects that are present in all four sessions
of acquisition. Illustration of multiple camera configuration
Takethemostfrequentγ sin(18)andformγγγ∗ aswellassampleimagescanbefoundinFig2.
i
(2)Findvaluesofcontributionby(16)
Wefollowtheproceduredescribedin[12]forconductingthe
(3)InsertValuesinxxx∗basedonγγγ∗ experiments and build our training dictionary AAA by picking
Output: γγγ∗,xxx∗ training images from session 1 using a subset of available
views, i.e. θ ={0◦,±30◦,±60◦,±90◦}. Testimagesare
train
obtained from all available view angles and from session
2. This is a more realistic scenario that not all the testing
poses are available in the dictionary. To generate a test ma-
trix with T views we randomly pick one theC subjects and
again randomly pick T different views of that person from
θ. We generated two thousand such test samples and com-
paredMICHS1 performancewithclassicalMutualSubspace
Method (MSM) [4], Graph-based method in [32], SRC [11]
combined with majority voting, JSRC method in [33] and
finallywithJDSRC[12].
To show the effectiveness of our algorithm, first we com-
pared the results for T =1 (single task scenario) and then
weshowedtheresultswherewehaveT =3differentviews.
Fig.2. Imageacquisitionconfigurationandsampleimages TheseresultsareshowninTable1andourmethodisgiving
thebestperformanceamongallalgorithmsinbothcases. Fig
3 illustrates the results for a scenario with reduced number
indicatorvariableγγγ. of classes where we only consider 30 classes out 129 sub-
Bydoingthesameprocedureforeachtaskandputtingthere- jects. WecomparedourresultwithJDSRCwhichwasshown
sultingxxxandγγγvectorstogether,weformXXXC∗r andΓΓΓC∗r matrices to be second best after us. We argued in section 3 that by
forclassCr. Finally,weshoulddothewholeprocessagainin using class specific priors we can expect to have less sensi-
eachclassandobtainthecorresponding(XXXC∗r,ΓΓΓC∗r)anddothe tivity (slower decay) to insufficiency of number of training
classification based on the value of cost function or residual samples. This is verified in Fig. 3 with reduced number of
aswaspresentedin(14). TrainingPerClass(TPC=3,5,7).
4. EXPERIMENTALRESULTS 5. REFERENCES
Multiview face recognition is a multi task image classifica- [1] H.Zhang, A.Berg, M.Maire, andJ.Malik, “Svm-knn: Dis-
criminativenearestneighborclassificationforvisualcategory
tionproblemofsignificantimportanceandZhangetal. have
recognition,” in Computer Vision and Pattern Recognition,
done a thorough investigation on this problem in [12]. In 2006IEEEComputerSocietyConferenceon,vol.2,2006,pp.
order to validate the performance of our proposed Multitask 2126–2136.
ImageClassificationviacollaborativeHierarchicalSpikeand 1Tobenefitfromprobabilisticnatureoftheframeworkasimpleandintu-
slab priors (MICHS), we present the experimental results of itivewayofchoosingKKKmatrixperclasscanbefoundin[29]
[16] H.Zhang,N.M.Nasrabadi,Y.Zhang,andT.S.Huang,“Multi-
viewautomatictargetrecognitionusingjointsparserepresen-
tation,”IEEETrans.Aerosp.Electron.Syst.,vol.48,no.3,pp.
2481–2497,July2012.
[17] U.Srinivas,etal.,“SHIRC:AsimultaneousSparsitymodelfor
HistopathologicalImageRepresentationandClassification,”in
Proc.IEEEInt.Symp.Biomed.Imag.,2013,pp.1118–1121.
Fig.3. Recognitionratesinlowtrainingscenario.C=30individu- [18] N.Nguyen, N.Nasrabadi, andT.Tran, “Robustmulti-sensor
als.Numbersarehigherbecausewereducedthenumberofclasses. classification via joint sparse representation,” in Information
Fusion(FUSION),2011Proceedingsofthe14thInternational
Conferenceon,July2011,pp.1–8.
[2] X.-T.Yuan,X.Liu,andS.Yan,“Visualclassificationwithmul- [19] U.Srinivas, Y.Suo, M.Dao, V.Monga, andT.Tran, “Struc-
titaskjointsparserepresentation,”IEEETrans.ImageProcess- turedsparsepriorsforimageclassification,”InternationalCon-
ing,vol.21,no.10,pp.4349–4360,Oct.2012. ferenceonImageProccessing,2013.
[3] J. Zou, Q. Ji, and G. Nagy, “A comparative study of local [20] E.Cande`s,J.Romberg,andT.Tao,“Robustuncertaintyprin-
matching approach for face recognition,” IEEE Trans. Image ciples:Exactsignalreconstructionfromhighlyincompletefre-
Processing,vol.16,no.10,pp.2617–2628,2007. quencyinformation,”vol.52,no.2,pp.489–509,Feb.2006.
[4] K. Fukui and O. Yamaguchi, “Face recognition using multi- [21] D.L.Donoho,“Formostlargeunderdeterminedsystemsoflin-
viewpointpatternsforrobotvision,”inRoboticsResearch,ser. earequationstheminimall -normsolutionisalsothesparsest
1
Springer Tracts in Advanced Robotics, vol. 15. Springer solution,” Comm. Pure Appl. Math., vol. 59, no. 6, pp. 797–
Berlin/Heidelberg,2005,pp.192–201. 829,2006.
[5] H. Ren and C.-I. Chang, “Automatic spectral target recogni- [22] S.Babacan, R.Molina, andA.Katsaggelos, “Bayesiancom-
tioninhyperspectralimagery,”AerospaceandElectronicSys- pressivesensingusinglaplacepriors,”ImageProcessing,IEEE
tems,IEEETransactionson,vol.39,no.4,pp.1232–1249,Oct Transactionson,vol.19,no.1,pp.53–63,2010.
2003.
[23] V.Cevher,“Learningwithcompressiblepriors.”inNIPS,2009,
[6] Q.ZhaoandJ.Principe,“Supportvectormachinesforsarau- pp.261–269.
tomatictargetrecognition,”AerospaceandElectronicSystems,
IEEETransactionson,vol.37,no.2,pp.643–654,Apr2001. [24] T.-J.Yen, “Amajorizationminimizationapproachtovariable
selectionusingspikeandslabpriors,”TheAnnalsofStatistics,
[7] R.M.Haralick,K.Shanmugam,andI.Dinstein,“Texturalfea- vol.39,no.3,pp.1748–1775,062011.
tures for image classification,” IEEE Trans. Syst., Man, Cy-
bern.,vol.SMC-3,no.6,pp.610–621,1973. [25] H.IshwaranandJ.S.Rao,“Spikeandslabvariableselection:
frequentist and bayesian strategies,” The Annals of Statistics,
[8] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of fea- pp.463–976,2005.
tures: Spatialpyramidmatchingforrecognizingnaturalscene
[26] E. I. George and R. E. McCulloch, “Variable selection via
categories,”inComputerVisionandPatternRecognition,2006
gibbs sampling,” Journal of the American Statistical Associ-
IEEEComputerSocietyConferenceon,vol.2,2006,pp.2169–
ation,vol.88,no.423,pp.881–889,1993.
2178.
[27] M.K.TitsiasandM.La´zaro-Gredilla, “Spikeandslabvaria-
[9] T. Randen and J. Husoy, “Filtering for texture classification:
tionalinferenceformulti-taskandmultiplekernellearning,”in
a comparative study,” Pattern Analysis and Machine Intelli-
NIPS,2011,pp.2339–2347.
gence,IEEETransactionson,vol.21,no.4,pp.291–310,Apr
1999. [28] Y.Suo,M.Dao,T.Tran,U.Srinivas,andV.Monga,“Hierar-
chicalsparsemodelingusingspikeandslabpriors,”inAcous-
[10] D.Donoho,“Compressedsensing,”InformationTheory,IEEE
tics, SpeechandSignalProcessing(ICASSP),2013IEEEIn-
Transactionson,vol.52,no.4,pp.1289–1306,April2006.
ternationalConferenceon,May2013,pp.3103–3107.
[11] J.Wright,A.Y.Yang,A.Ganesh,S.Sastry,andY.Ma,“Ro-
[29] H. S. Mousavi, et al., “Collaborative hierarchical sparse rep-
bust face recognition via sparse representation,” IEEE Trans.
resentation for classification,” Pennsylvania State University,
PatternAnal.MachineIntell.,vol.31,no.2,pp.210–227,Feb.
Tech. Rep. PSU-JHU-EE-TR-201301, Nov. 2013. [Online].
2009.
Available:www.personal.psu.edu/hus160/TechRep.pdf
[12] H.Zhang,N.M.Nasrabadi,Y.Zhang,andT.S.Huang,“Joint [30] R. Tibshirani, “Regression shrinkage and selection via the
dynamic sparse representation for multi-view face recogni- lasso,” Journal of the Royal Statistical Society, Series B,
tion,” Pattern Recognition, vol. 45, no. 4, pp. 1290 – 1298, vol.58,pp.267–288,1994.
2012.
[31] P.Sprechmann,I.Ramirez,G.Sapiro,andY.Eldar,“C-hilasso:
[13] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, Acollaborativehierarchicalsparsemodelingframework,”Sig-
“Multi-pie,” Image Vision Comput., vol. 28, no. 5, pp. 807– nalProcessing,IEEETransactionson,vol.59,no.9,pp.4183–
813,May2010. 4198,Sept2011.
[14] T.GuhaandR.K.Ward,“Learningsparserepresentationsfor [32] E.KokiopoulouandP.Frossard,“Graph-basedclassificationof
humanactionrecognition,”PatternAnalysisandMachineIn- multipleobservationsets,”PatternRecognition,vol.43,no.12,
telligence, IEEE Transactions on, vol. 34, no. 8, pp. 1576– pp.3988–3997,2010.
1588,2012.
[33] J.A.Tropp,A.C.Gilbert,andM.J.Strauss,“Algorithmsfor
[15] S.Bahrampour,A.Ray,N.M.Nasrabadi,andK.W.Jenkins, simultaneous sparse approximation. Part I: Greedy pursuit,”
“Quality-basedmultimodalclassificationusingtree-structured SignalProcessing,vol.86,pp.572–588,Apr.2006.
sparsity,”arXivpreprintarXiv:1403.1902,2014.