IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.X,NO.X,MONTH20XX 1 Joint Sparse Representation for Robust Multimodal Biometrics Recognition Sumit Shekhar, Student Member, IEEE, Vishal M. Patel, Member, IEEE, Nasser M. Nasrabadi, Fellow, IEEE, and Rama Chellappa, Fellow, IEEE . Abstract—Traditional biometric recognition systems rely on a to spoof attacks as it would be difficult for an imposter to single biometric signature for authentication. While the advan- simultaneously spoof multiple biometric traits of a genuine tageofusingmultiplesourcesofinformationforestablishingthe user. Due to sufficient population coverage,these systems are identity has been widely recognized, computational models for able to address the problem of non-universality. multimodal biometrics recognition have only recently received attention. We propose a novel multimodal multivariate sparse Classification in multibiometric systems is done by fus- representation method for multimodal biometrics recognition, whichrepresentsthetest databya sparselinearcombination of ing information from different biometric modalities. The trainingdata,whileconstrainingtheobservations fromdifferent information fusion can be done at different levels, which modalitiesofthetestsubjecttosharetheirsparserepresentations. can be broadly divided into feature level, score level and Thus,wesimultaneouslytakeintoaccountcorrelationsaswellas rank/decision level fusion. Due to preservation of raw infor- coupling information between biometric modalities. We modify mation, feature level fusion can be more discriminative than ourmodelsothatitisrobusttonoiseandocclusion.Amultimodal qualitymeasureisalsoproposedtoweigheachmodalityasitgets score or decision level fusion [4]. But, there have been very fused. Furthermore, we also kernelize the algorithm to handle little effort in exploring feature level fusion in the biometric non-linearityindata.Theoptimizationproblemissolvedusingan community. This is because of the different output formats efficient alternative direction method. Various experiments show of different sensors, which result in features with different thatourmethodcomparesfavorablywithcompetingfusion-based dimensions. Often the features have large dimensions, and methods. fusionbecomesdifficultatfeaturelevel.Theprevelantmethod Index Terms—Multimodal biometrics, feature fusion, sparse is feature concatenation, which has been used for different representation. multibiometric settings [5]–[7]. However, in many scenarios, each modality produces high-dimensional features. In such I. INTRODUCTION cases, the method is both impractical and non-robust. It Unimodal biometric systems rely on a single source of also cannot exploit the constraint that features of different information such as a single iris or fingerprint or face for modalities should share the same identity. authentication [1]. Unfortunately these systems have to deal In recentyears, theoriesof Sparse Representation (SR) and with some of the followinginevitableproblems[2]:(a) Noisy CompressedSensing(CS)haveemergedaspowerfultoolsfor data:poorlightingofauser’sfaceoranirisimagewithcontact efficient processing of data in non-traditional ways [8]. This lense are examples of noisy data. (b) Non-universality: the has led to a resurgencein interest in the principles of SR and biometric system based on a single source of evidence may CSforbiometricsrecognition[9].Wrightetal.[10]proposed not be able to capture meaningful data from some users. For arobustsparserepresentation-basedclassification(SRC)algo- instance,anirisbiometricsystemmayextractincorrecttexture rithmforfacerecognition.Itwasshownthatbyexploitingthe patterns from the iris of certain users due to the presence inherentsparsityofdata,onecanobtainimprovedrecognition of contact lenses. (c) Intra-class variations: in the case of performanceovertraditionalmethodsespeciallywhenthedata fingerprint recognition wrinkles due to wet fingers [3] can is contaminatedby various artifacts such as illumination vari- cause these variations. These types of variations often occur ations, disguise, occlusionand randompixelcorruption.Pillai when a user incorrectly interacts with the sensor. (d) Spoof etal.extendedthisworkforrobustcancelableirisrecognition attack: hand signature forgery is an example of this type of in [11]. Nagesh and Li [12] presented an expression-invariant attack. It has been observed that some of the limitations of face recognitionmethodusingdistributedcompressedsensing unimodal biometric systems can be addressed by deploying andjointsparsitymodels.Pateletal.[13]proposeddictionary multimodal biometric systems that essentially integrate the based method for face recognition under varying pose and evidence presented by multiple sources of information such illumination. A discriminative dictionary learning method for as iris, fingerprintsand face. Suchsystems are less vulnerable facerecognitionwasalso proposedbyZhangandLi[14].For SumitShekhar,VishalM.PatelandR.ChellappaarewiththeDepartment asurveyofapplicationsofSRandCSalgorithmstobiometric of Electrical and Computer Engineering and the Center for Automation recognition,see [8], [9], [15], [16] and the referencestherein. Research, UMIACS,University ofMaryland,CollegePark,MD20742USA (e-mail: {sshekha,pvishalm,rama }@umiacs.umd.edu) Motivated by the success of SR in unimodal biometric Nasser M. Nasrabadi is with the U.S. ArmyResearch Lab, Adelphi, MD 20783USA(e-mail: [email protected]). recognition, we propose a joint sparsity-based algorithm for Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED DEC 2012 2. REPORT TYPE 00-00-2012 to 00-00-2012 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Joint Sparse Representation for Robust Multimodal Biometrics 5b. GRANT NUMBER Recognition 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION University of Maryland, College Park,Department of Electrical and REPORT NUMBER Computer Engineering and the Center for Automation ,the Center for Automation Research ,College Park,MD,20742 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT Traditional biometric recognition systems rely on a single biometric signature for authentication. While the advantage of using multiple sources of information for establishing the identity has been widely recognized, computational models for multimodal biometrics recognition have only recently received attention. We propose a novel multimodal multivariate sparse representation method for multimodal biometrics recognition which represents the test data by a sparse linear combination of training data, while constraining the observations from different modalities of the test subject to share their sparse representations. Thus, we simultaneously take into account correlations as well as coupling information between biometric modalities. We modify our model so that it is robust to noise and occlusion. A multimodal quality measure is also proposed to weigh each modality as it gets fused. Furthermore, we also kernelize the algorithm to handle non-linearity in data. The optimization problem is solved using an efficient alternative direction method. Various experiments show that our method compares favorably with competing fusion-based methods. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE Same as 13 unclassified unclassified unclassified Report (SAR) Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.X,NO.X,MONTH20XX 2 Fig. 1: Overview of our algorithm. multimodal biometrics recognition 1. Figure 1 presents an • We make the classification robust to occlusion and noise overview of our framework. It is based on the well known byintroducinganerrortermintotheoptimizationframe- regularized regression method, multi-task multivariate Lasso work. [18], [19]. Our method imposes common sparsities both • The algorithm is easily generalizable to handle multiple withineachbiometricmodalityandacrossdifferentmodalities. test inputs from a modality. Furthermore, we extend our model so that it can deal with • We introduce a quality measure for multimodal fusion both occlusion and noise. Note that our method is different based on joint sparse representation. fromsomeofthepreviouslyproposedclassificationalgorithms • Lastly,wekernelizethealgorithmtohandlenon-linearity based on joint sparse representation. Yuan and Yan [20] in data samples. proposedamultitasksparselinearregressionmodelforimage classification. This method uses group sparsity to combine A. Paper Organization different features of an object for classification. Zhang et al. The paper is organized in five sections. In section II, we [21]proposedajointdynamicsparserepresentationmodelfor describe the proposed sparsity-based multimodal recognition object recognition. Their essential goal was to recognize the algorithmwhich is extendedto kernelcase in section IV. The same object viewed from multiple observations i.e., different quality measure is described in III. Experimental evaluations poses. Our method is more general in that it does not only on a comprehensive multimodal dataset and a face database considers multivariate sparse representations but it can also have been described in section V. Finally, in section VI, we deal with multitask multivariate sparse representations which describe the computational complexity. Concluding remarks are natural in multimodalbiometrics. One of the key features are presented in section VII. of our model is that it can deal with both occlusion and noise. Furthermore, using kernel methods, we present non- II. JOINT SPARSITY-BASEDMULTIMODAL BIOMETRICS linear extensions of our joint sparse representation method. RECOGNITION This paper makes the following contributions: Consider a multimodal C-class classification problem with D different biometric traits. Suppose there are p training • We present a robust feature level fusion algorithm for i samples in each biometric trait. For each biometric trait multibiometric recognition. Through the proposed joint i=1,...,D, we denote sparse framework, we can easily handle differentdimen- sions of differentmodalities by forcingdifferentfeatures Xi =[Xi,Xi,...,Xi ] 1 2 C to interactthroughtheir sparse coefficients.Furthermore, as an n ×p dictionary of training samples consisting of C i i the proposed algorithm can efficiently handle large di- sub-dictionaries Xi’s corresponding to C different classes. mensional feature vectors. k Each sub-dictionary 1Apreliminary versionofthisworkappeared in[17]. Xij =[xij,1,xij,2,...,xij,pj]∈Rn×pj IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.X,NO.X,MONTH20XX 3 represents a set of training data from the ith modality labeled B. Robust multimodal multivariate sparse representation withthejthclass.Notethatn isthefeaturedimensionofeach i In this section, we consider a more general problem where sampleandtherearep numberoftrainingsamplesinclassj. j thedataiscontaminatedbynoise.Inthiscase,theobservation Hence, there are a total of p= C p many samples in the j=1 j model can be modeled as dictionary Xi . Elements of thePdictionary are often referred C to as atoms. In multimodal biometrics recognition problem, Yi =XiΓi+Zi+Ni, i=1,...D, (3) givenatestsamples(matrix)Y,whichconsistsofD different modalities{Y1,Y2,...,YD}whereeachsampleYi consists where Ni is a small dense additive noise and Zi ∈ Rn×di of di observations Yi =[y1i,y2i,...,ydi]∈Rn×di, the objec- ilsargaemmatargixnitoufdeb.acOkngeroucannd ansosiuseme(octhcalutseioanc)hwZiithisarsbpitarrasreillyy tive is to identify the class to which a test sample Y belongs represented in some basis Bi ∈ Rn×mi. That is, Zi = BiΛi to. In what follows, we present a multimodal multivariate for some sparse matrices Λi ∈ Rmi×di. Hence, (3) can be sparse representation-based algorithm for this problem [18], rewritten as [19], [22]. Yi =XiΓi+BiΛi+Ni, i=1,...D, (4) A. Multimodal multivariate sparse representation With this model, one can simultaneously recover the coef- We want to exploit the joint sparsity of coefficients from ficients Γi and Λi by takingadvantageof the factthat Λi are different biometrics modalities to make a joint decision. To sparse simplify this model, let us consider a bimodal classification problemwhere the test sample Y=[Y1,Y2] consists of two D 1 different modalities such as iris and face. Suppose that Y1 Γˆ,Λˆ =argmin kYi−XiΓi−BiΛik2 + Γ,Λ 2 F belongs to the jth class. Then, it can be reconstructed by a Xi=1 linearcombinationoftheatomsinthesub-dictionaryX1j.That λ1kΓk1,q+λ2kΛk1, (5) is, Y1 =X1Γ1+N1, where Γ1 is a sparse matrix with only where λ and λ are positive parameters and Λ = pj nonzero rows associated with the jth class and N1 is the [Λ1,Λ2,.1..,ΛD] is2the sparse coefficientmatrix correspond- noise matrix. Similarly,since Y2 representsthe same subject, ing to occlusion. The ℓ -norm of matrix Λ is defined as itbelongstothesameclassandcanberepresentedbytraining 1 kΛk = |Λ |. Note that the idea of exploiting the samples in X2 with different set of coefficients Γ2. Thus, we 1 i,j i,j j j sparsity ofPocclusion term has been studied by Wright et al. canwriteY2 =X2Γ2+N2,whereΓ2 isasparsematrixthat [10] and Candes et al. [24]. has the same sparsity pattern as Γ1. If we let Γ = [Γ1,Γ2], Once Γ,Λ are computed, the effect of occlusion can be then Γ is a sparse matrix with only pj nonzeros rows. removed by setting Y˜i = Yi −BiΛi. One can then declare In the more general case where we have D modalities, the class label associated to an observed vector as if we denote {Yi}D as a set of D observations each i=1 consisting of d samples from each modality and let Γ = D [Γ1,Γ2,...,ΓDi] ∈ Rp×d be the matrix formed by concate- ˆj =argmin kYi−Xiδij(Γi)−BiΛik2F. (6) j nating the coefficient matrices with d = D d , then we Xi=1 i=1 i canseekfortherow-sparsematrixΓbysolPvingthefollowing C. Optimization algorithm ℓ /ℓ -regularized least square problem 1 q In this section, we present an algorithm to solve (5) based D 1 Γˆ =argmin kYi−XiΓik2 +λkΓk (1) on the classical alternating direction method of multipliers Γ 2Xi=1 F 1,q (ADMM) [25], [26]. Note that the optimization problem (1) can be solved by setting λ equal to zero. Let where λ is a positive parameter and q is set greater than 1 to 2 maketheoptimizationproblemconvex.Here,kΓk1,q isanorm 1 D defined as kΓk = p kγkk where γk’s are the row C(Γ,Λ)= kYi−XiΓi−BiΛik2. 1,q k=1 q 2 F vectors of Γ and kYkFPis the Frobenius norm of the matrix Xi=1 Y defined as kYkF = i,jYi2,j. Once Γˆ is obtained, the Then,our goalis to solve the followingoptimizationproblem classlabelassociatedwitqhPanobservedvectoristhendeclared minC(Γ,Λ)+λ kΓk +λ kΛk . (7) as the one that produces the smallest approximation error. 1 1,q 2 1 Γ,Λ D In ADMM the idea is to decouple C(Γ,Λ),kΓk and kΛk ˆj =argmin kYi−Xiδi(Γi)k2, (2) 1,q 1 j F by introducing auxiliary variables to reformulate the problem j Xi=1 into a constrained optimization problem where δi is the matrix indicator function defined by keeping j rows correspondingto the jth class and setting all other rows min C(Γ,Λ)+λ1kVk1,q+λ2kUk1 s. t. Γ,Λ,U,V equal to zero. Note that the optimization problem (1) reduces Γ=V,Λ=U. (8) to the conventional Lasso [23] when D = 1 and d = 1. In the case, when D =1 (1) is referred to as multivariate Lasso Since, (8) is an equally constrained problem, the Augmented [18]. Lagrangian method (ALM) [25] can be used to solve the IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.X,NO.X,MONTH20XX 4 problem. This can be done by minimizing the augmented 4) Update step for V: The final suboptimization problem lagrangian function f (Γ,Λ,V,U;A ,A ) defined as is with respect to V and can be reformulated as αΓ,αΛ Λ Γ C(Γ,Λ)+λ2kUk1+hAΛ,Λ−Ui+ α2ΛkΛ−Uk2F+ mVin21kΓt+1+α−Γ1AΓ,t−Vk2F + αλ1kVk1,q. (11) α Γ λ kVk +hA ,Γ−Vi+ ΓkΓ−Vk2, (9) 1 1,q Γ 2 F Due to the separable structure of (11), it can be solved by minimizing with respect to each row of V separately. Let where A and A are the multipliers of the two linear Λ Γ γ ,a and v be rows of matrices Γ ,A and constraints, and α ,α are the positive penalty parameters. i,t+1 Γ,i,t i,t+1 t+1 Γ,t Λ Γ V , respectively. Then for each i = 1,...,p we solve the TheALMalgorithmsolvesf (Γ,Λ,V,U;A ,A )with t+1 αΓ,αΛ Λ Γ following sub-problem respect to Γ,Λ,U and V jointly, keeping A and A fixed Γ Λ andthenupdatingA andA keepingtheremainingvariables 1 Γ Λ v =argmin kz−vk2+ηkvk , (12) fixed. Due to the separable structure of the objective function i,t+1 v 2 2 q f , one can further simplify the problem by minimizing αΓ,αΛ wherez=γ −a α−1 andη = λ1.Onecanderivethe fαΓ,αΛ with respect to variables Γ,Λ,U and V, separately. solution for (i,1t+2)1forΓa,ni,yt qΓ. In this papλe2r, we only focus on Different steps of the algorithm are given in Algorithm 1. the case when q = 2. The solution of (12) has the following In what follows, we describe each of the suboptimization form problems in detail. η v = 1− z, i,t+1 (cid:18) kzk (cid:19) 2 + Algorithm 1: Alternating Direction Method of Multipliers (ADMM). where (v)+ is a vector with entries receiving values Initialize: Γ0,U0,V0,AΛ,0,AΓ,0,αΓ,αΛ max(vi,0). While notconverged do Our proposed algorithm Sparse Multimodal Biometrics 1.Γt+1=argminΓfαΓ,αΛ(Γ,Λt,Ut,Vt;AΓ,t,AΛ,t) Recognition (SMBR) is summarized in Algorithm 2. We 2.Λt+1=argminΛfαΓ,αΛ(Γt+1,Λ,Ut,Vt;AΓ,t,AΛ,t) 3.Ut+1=argminUfαΓ,αΛ(Γt+1,Λt+1,U,Vt;AΓ,t,AΛ,t) refer to the robust method taking sparse error into account 4.Vt+1=.argminVfαΓ,αΛ(Γt+1,Λt+1,Ut+1,V;AΓ,t,AΛ,t) as SMBR-E (SMBR with error), and the initial case where it 5.AΓ,t+1=. AΓ,t+αΛ(Γt+1−Ut+1) is not taken account as SMBR-WE (SMBR without error). 6.AΛ,t+1=AΛ,t+αΓ(Γt+1−Vt+1) Algorithm 2: Sparse Multimodal Biometrics Recognition 1) Update step for Γ: The first suboptimization problem (SMBR). involves the minimization of fαΓ,αΛ(Γ,Λ,V,U;AΛ,AΓ) Input:Trainingsamples{Xi}Di=1,testsample{Yi}Di=1,Occlusion with respectto Γ. Ithasthe quadraticstructure,whichis easy basis{B}D i=1 to solve by setting the first-order derivative equal to zero. Procedure:ObtainΓˆ andΛˆ bysolving Furthermore, the loss function C(Γ,Λ) is a sum of convex D 1 fΓuinct,ionis=as1s,o.c.i.a,teDd,wwihthichsuhba-ms tahtreicfeoslloΓwi,inognesocluatniosneek for Γˆ,Λˆ =argmΓ,iΛn2Xi=1kYi−XiΓi−BiΛik2F+λ1kΓk1,q+λ2kΛk1, t+1 Output: Γit+1 =(XiTXi+αΓI)−1(XiT(Yi−Λit)+αΓVti+AiV,t), identity(Y)=argminj Di=1kYi−Xiδij(Γˆi)−BiΛˆik2F. P where I is p × p identity matrix and Λi,Vi and Ai are t t V,t submatrices of Λ ,V and A , respectively. t t V,t 2) UpdatestepforΛ: Thesecondsuboptimizationproblem III. QUALITY BASEDFUSION is similar in nature, whose solution is given below Ideally a fusion mechanism should give more weights to Λi =(1+α )−1(Yi−XiΓi +α Ui −Ai ), the more reliable modalities. Hence, the concept of quality t+1 Λ t+1 Λ t Λ,t is important in multimodal fusion. A quality measure based where Ui and Ai are submatrices of U and A , respec- t Λ,t t Λ,t on sparse representation was introduced for faces in [10]. To tively. decide whether a given test sample has good quality or not, 3) Update step for U: The third suboptimization problem its Sparsity Concentration Index (SCI) was calculated. Given is with respect to U, which is the standard ℓ1 minimization a coefficient vector γ ∈Rp, the SCI is given as: problem which can be recast as C.maxi∈{1,···,C}kδi(γ)k1 −1 min1kΛ +α−1A −Uk2 + λ2kUk . (10) SCI(γ)= kγk1 U 2 t+1 Λ Λ,t F α 1 C−1 Λ Equation (10) is the well-known shrinkage problem whose where, δi is the indicator function keeping the coefficients solution is given by corresponding to the ith class and setting others to zero. SCI valuescloseto1 correspondto thecase wherethe testsample λ U =S Λ +α−1A , 2 , can be represented well using the samples of a single class, t+1 (cid:18) t+1 Λ Λ,t α (cid:19) Λ henceis ofhigh quality.On the otherhand,sampleswith SCI where S(a,b) = sgn(a)(|a| − b) for |a| ≥ b and zero close to 0 are not similar to any of the classes, and hence are otherwise. ofpoorquality.Thiscanbeeasilyextendedtothemultimodal IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.X,NO.X,MONTH20XX 5 case using the joint sparse representation matrix Γˆ. In this B. Composite kernel sparse representation case, we can define the quality, qi for sample yi as: j j Anotherwaytocombineinformationofdifferentmodalities qi =SCI(Γˆi) isthroughcompositekernel,whichefficientlycombineskernel j j for each modality. The kernel combines both within and where,Γˆi isthejth columnofΓˆi.Giventhisqualitymeasure, betweensimilaritiesofdifferentmodalities.Fortwomodalities j theclassificationrule(2)canbemodifiedtoincludethequality with the same feature dimension, the kernel matrix can be measure. constructed as: D di κ(X ,X )=α κ(x1,x1)+α κ(x1,x2)+α κ(x2,x1)+ ˆj =argmin qikyi −Xiδ (Γi)k2, (13) i j 1 i j 2 i j 3 i j k k j k F j Xi=1Xk=1 α4κ(x2i,x2j) (17) where, δ is the indicator function retaining the coefficients j where,{α } aretheweightsassociatedwiththekernels corresponding to jth class. i i=1,···,4 and X = [x1;x2]. x1 and x2 are the feature vectors for i i i i i modality 1 and 2 respecetively. It can be similarly extended IV. KERNELSPACE MULTIMODAL BIOMETRICS to multiple modalities. However, the modalities may be of RECOGNITION different dimensions. In such cases, cross-simlarity measure The class identities in the multibiometric dataset may not is not possible. Hence, the modalities are divided according be linearlyseparable.Hence, we also extendthe sparse multi- to being homogenous (e.g. right and left iris) or heteroge- modalfusion frameworkto kernelspace. The kernelfunction, nous (fingerprint and iris), as homogenous modalities have κ:Rn×Rn, is defined as the inner product same feature extraction process and hence, same dimension. κ(x ,x )=hφ(x ),φ(x )i This is also reasonable, because homogenous modalities are i j i j correlated at feature level but heteregenous modalities may where, φ is an implicit mapping projecting the vector x into not be correlated. A composite kernel is defined for each a higher dimensional space. homogenousmodality.ForD modalities,with{d } ,S ⊆ i i∈Sj j {1,2,··· ,D} being the sets of indices of each homogenous A. Multivariate kernel sparse representation modality, the composite kernel for each set will be given as: ConsideringthegeneralcaseofD modalitieswith{Yi}D asasetofdi observations,thefeaturespacerepresentationci=an1 κ(Xki,Xkj)= αs1s2κ(xsi1,xsj2) (18) be written as: s1sX2∈Sk Φ(Yi)=[φ(yi),φ(yi),...,φ(yi)] Here, Xk = [xs1;xs2;··· ;xs|Sk|], S = [s ,s ,··· ,s ] 1 2 d i i i i k 1 2 |Sk| and k = 1,··· ,N , N being the number of different Similarly, the dictionary of training samples for modality i= H H heterogenous modalities. The information from the different 1,··· ,D can be represented in feature space as heterogenous modalities can then be combined similar to the Φ(Xi)=[φ(Xi),φ(Xi),··· ,φ(Xi )] sparse kernel fusion case: 1 2 C As in joint linear space representation, we have: 1 NH Φ(Yi)=Φ(Xi)Γi Γˆ =argmΓin2 trace(ΓiTKXi,XiΓi)− Xi=1(cid:0) where, Γi is the coefficient matrix associated with modality 2trace(KXi,YiΓi) +λkΓk1,q (19) i. Incorporating information from all the sensors, we seek to (cid:1) solve the followingoptimization problemsimilar to the linear where, KXi,Xi is defined for each Si as in (16) and Γ = case: [Γ1,Γ2,··· ,ΓNH]. D 1 Γˆ =argmin kΦ(Yi)−Φ(Xi)Γik2 +λkΓk (14) Γ 2 F 1,q C. Optimization Algorithm Xi=1 Similartothelinearfusionmethod,weapplythealternating where, Γ=[Γ1,Γ2,··· ,ΓD]. It is clear that the information direction method to efficiently solve the problem for kernel from all modalities are integrated via the shared sparsity fusion. The method splits the variable Γ such that the new pattern of the matrices {Γi}D . This can be reformulated in i=1 problemhastwoconvexfunctions.Thisisdonebyintroducing terms of kernel matrices as: anewvariableVandreformulatingtheproblems(15)and(19) D Γˆ =argmΓin21 trace(ΓiTKXi,XiΓi) as: Xi=1(cid:0) 1 NK −2trace(KXi,YiΓi)(cid:1)+λkΓk1,q (15) argmΓ,iVn2Xi=1(cid:0)trace(ΓiTKXi,XiΓi)−2trace(KXi,YiΓi)(cid:1) where, the kernel matrix KA,B is defined as: +λkVk1,qs.t.Γ=V (20) KA,B(i,j)=hφ(ai),φ(bj)i (16) where,N is the numberof kernelsin (15)and (19).Rewrit- K a andb beingith andjth columnsofAandBrespectively. ing the problemusing Lagrangianmultiplier,the optimization i j IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.X,NO.X,MONTH20XX 6 becomes: Algorithm 4: Kernel Sparse Multimodal Biometrics Recogni- tion (kerSMBR). 1 NK argmΓ,iVn2Xi=1(cid:0)trace(ΓiTKXi,XiΓi)−2trace(KXi,YiΓi)(cid:1) PInrpoucet:duTrrea:inOinbgtasianmΓˆplebsy{sXolvi}inDig=1,testsample{Yi}Di=1 +λkVk1,q+hB,Γ−Vi+ β2W Γ−V 2F (21) Γˆ =argmΓin12 D kΦ(Yi)−Φ(Xi)Γik2F +λkΓk1,q (25) (cid:13) (cid:13) Xi=1 which upon re-arranging becomes: (cid:13) (cid:13) Output:identity(Y)=argminj Di=1 trace(KYY)− argmin1 NK trace(ΓiTKXi,XiΓi)−2trace(KXi,YiΓi) 2trace(ΓˆijTKXijYΓˆij)+trace(ΓˆijTKXPijXijΓˆ(cid:0)ij)) Γ,V2 Xi=1(cid:0) (cid:1) β 1 +λkVk1,q+ 2W Γ−V+ β B 2F (22) Algorithm5:CompositeKernelSparseMultimodalBiometrics (cid:13) W (cid:13) Recognition (compSMBR). The optimization algorithm(cid:13)is summarized i(cid:13)n Algorithm 3. Input:Trainingsamples{Xi}Di=1,testsample{Yi}Di=1 Eachoftheoptimizationstepshassimpleclosed-formexpres- Procedure:ObtainΓˆ bysolving sion. Algorithm 3: Alternating Direction Method of Multipliers Γˆ =argmΓin12NXi=H1(cid:0)trace(ΓiTKXi,XiΓi)− (ADMM) in kernel space. 2trace(KXi,YiΓi) +λkΓk1,q (26) IWnihtiialelizneo:tΓco0n,vVer0g,eBd0d,oβW Output:identity(Y)=argm(cid:1)inj Ni=H1 trace(KYY)− 1.Γt+1=argminΓ 12 Ni=K1 trace(ΓiTKXi,XiΓi)− 2trace(ΓˆijTKXiYΓˆij)+trace(ΓˆijTKXPiXiΓˆij(cid:0))) 223..trVBactet++(K11X==iB,aYrtgi+Γmiβi)n(cid:1)WV+(λΓλPktkV+V1kt1−k,q1(cid:0)V,+qt++β1W2β)2W(cid:13)(cid:13)Γ(cid:13)(cid:13)tΓ+−1−VVt+t+β1Wβ1WBBt(cid:13)(cid:13)t2F(cid:13)(cid:13)2F j j j V. EXPERIMENTS 1) UpdatestepsforΓt: Γt+1 isobtainedbyupdatingeach We evaluated our algorithm for different multi-biometric submatrix Γit, i=1,··· ,NK as: settings. We tested on two publicly available datasets - the Γit =(KXi,Xi +βWI)−1(KXi,Yi +βWVti−Bit) (23) WThVeUWMVUultdimataosdeatlidsaotnaseeotf[2th7e]faenwdpthueblAicRlyfaavcaeiladbalteasdeatta[2se8t]s. where, I is an identity matrix and Vti, Bit are sub-matrices which allows fusion at image level. It is a challengingdataset of Vt and Bt respectively. having samples from differnent biometric modalities for each 2) Update steps for Vt: The update equation for Vt is subject. sameasinthelinearfusioncaseusing(11)and(12),replacing In the second experiment,we show the applicability of our AΓ,t and αΓ with Bt and βW respectively. method to fusing information from soft biometrics. Recently, combining information from soft biometrics such as facial D. Classification marks,haircolor,etchasbeenshowntoimprovefacerecogni- Once Γ is obtained using any of the two methods above, tion performance[29]. The challenge for fusion algorithms is classification can be done by assigning the class label as: combinethese weak modalitieswith strong modalitiesas face or fingerprint [30]. We demonstrate that our framework can NK ˆj =argmin kΦ(Yi)−Φ(Xi)Γˆik2 be extended to address this problem. Further, we also show j j F j the effect of noise and occlusion on performance of different Xi=1 algorithms. In all the experiments B was set to be identity or in terms of kernel matrices as: i for convenience, i.e., we assume noise to be sparse in image ˆj =argminNK trace(KYY)−2trace(ΓˆijTKXiYΓˆij) domain. j j Xi=1(cid:0) +trace(ΓˆijTKXiXiΓˆij)) (24) A. WVU Multimodal Dataset j j Here, Xi is the sub-dictionary associated with jth class and TheWVUmultimodaldatasetisacomprehensivecollection j Γˆi is the coefficient matrix associated with this class. of different biometric modalities such as fingerprint, iris, j The classification rule can be further extended to include palmprint,handgeometryandvoicefromsubjectsofdifferent the quality measure as in (13). But, we skip this step here, age, gender and ethnicity as described in Table I. It is a as we wish to study the effect of kernel representation and challenging dataset and many of these samples are corrupted quality separately. withblur,occlusionandsensornoiseasshowninFigure2.Out Multivariate Kernel Sparse Recognition (kerSMBR) and ofthese,wechoseirisandfingerprintmodalitiesfortestingthe Composite Kernel Sparse Recognition (compSMBR) algo- proposedalgorithms.Intotal,thereare2iris(rightandleftiris) rithms are summarized in Algorithms 4 and 5, respectively: and4fingerprintmodalities.Also,theevaluationwasdoneon a subset of 219 subjects having samples in both modalities. IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.X,NO.X,MONTH20XX 7 correct class. We tested the proposed linear and kernel fusion techniques separately and compared them with the linear and kernel versionsof SLR and SVM respectively.We denote the score-level fusion of these methods as SLR-Sum and SVM- Sum, and the decision-level fusion as SLR-Major and SVM- Fig. 2: Examples of challenging images from the WVU Major. Multimodal dataset. The images above suffer from various a) Linear Fusion: The recognition performances of artifacts as sensor noise, blur, occlusion and poor acquisition. SMBR-WEandSMBR-EwascomparedwithlinearSVMand linear SLR classification methods. The parameter values λ 1 and λ were set to 0.01. Biometric Modality #ofsubjects #of samples 2 Iris 244 3099 • Comparsion of Methods: Figures 3 shows the perfor- Fingerprint 272 7219 mance on individual modalities. All the classifiers show Palm 263 683 similar trend. The performancefor all of them are lower Hand 217 3062 Voice 274 714 on iris images and fingers 1 and 3. However, the SVM fares poorer than other methods on all the modalities. TABLE I: WVU Biometric Data Figure 4 and Table II show performance for different fusion settings. The proposed SMBR approach outper- forms existing classification techniques. Both SMBR-E 1) Preprocessing: Robust pre-processing of images was and SMBR-WE have similar performance, though the done before feature extraction. Iris images were segmented latter seems to give a slightly better performance. This following the recent method proposed in [31]. Following the maybeduetothepenaltyonthesparseerror,thoughthe segmentation, 25 × 240 iris templates were formed by re- error may not be sparse in the image domain. Further, sampling using the publicly available code of Masek et al. sum-based fusion shows a superior performance over [32]. Fingerprint images were enhanced using the filtering voting-based methods. based methods described in [33], and then the core point was • Fusionwithquality:Clearlydifferentmodalitieshavedif- detected using the enhanced images [34]. Features were then ferent quality of performance. Hence, we studied the ef- extracted around the detected core point. fect of the proposedquality measure on the performance 2) Feature Extraction: Gabor features were extracted on of different methods. For a consistent comparison, the the processed images as they have been shown to give good qualityvaluesproducedbySMBR-Emethodwasusedfor performance on both fingerprints [34] and iris [35]. For allthealgorithms.TableIIIshowstheperformanceforthe fingerprintsamples,theprocessedimageswereconvolvedwith three fusion settings. The effect of including the quality Gabor filters at 8 different orientations. Circular tesselations measurewhileclassificationcanbestudiedbycomparing wereextractedaroundthecorepointforallthefilteredimages with Table II. Clearly, the recognition rate increases for similar to [34]. The tesselation consisted of 15 concentric all the methodsacross the fusion settings. Again SMBR- bands,eachofwidth5pixelsanddividedinto30sectors.The E and SMBR-WE give the best performances among all mean values for each sector were concatenated to form the the methods. feature vector of size 3600×1. Features for iris images were • Effectofjointsparsity:Wealsostudiedtheeffectofjoint formedby convolvingthe templates with log-Gaborfilter at a sparsity constraint on the recognition performance. For single scale, and vectorizing the template to give a 6000×1 this, SMBR-WE algorithm was run for different values dimensional feature. of λ . Figure 5 shows the rank one recognition variation 1 3) ExperimentalSet-up: Thedatasetwasrandomlydivided across λ values for different fusion settings. All the 1 into 4 training samples per class (1 sample here is 1 data curves show a sharp increase in performance around sample each from 6 modalities) and the rest 519 for testing. λ = 0. Further, the increase is more for iris fusion, 1 Therecognitionresultwasaveragedover5runs.Theproposed which shows around 5% improvement at λ = 0.005 1 methods were compared with state-of-the-art classification over λ = 0. This shows that imposing joint sparsity 1 methods such as sparse logistic regression (SLR) [36] and constraint is important for fusion. Moreover, it helps in SVM [37]. Although these methods have been shown to regulatingfusionperformance,whenreconstructionerror give superior performance, they cannot handle multimodal alone is not sufficient to distinguish between different data. One possible way to handle multimodal data is to use classes. The performanceis then stable acrossλ values, 1 feature concatenation. But, this resulted in feature vectors and starts decreasing slowly after reaching the optimum of size 26400×1 when all six modalities are used, and is performance. not useful. Hence, we explored score-level and decision-level • Variation with number of training samples: We varied fusionmethodsforcombiningresultsofindividualmodalities. the number of training samples and studied the effect on For score-level fusion, the probability outputs for test sample the top 4 algorithms. Figure 6 shows the variation for of each modality, {y }6 were added together to give the fusion of all the modalities. It can be seen that SMBR- i i=1 finalscorevector.Classificationwasbaseduponthefinalscore WE and SMBR-E are stable across number of training values. For decision-level fusion, the subject chosen by the samples, whereas the performance of SLR and SVM maximum number of modalities was taken to be from the based methods fall sharply. The fall in performance of IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.X,NO.X,MONTH20XX 8 CMC Curve for Individual Biometrics with error term CMC Curve for Individual Biometrics without error term 100 100 %) 95 %) 95 Rate ( 90 Rate ( 90 mulative Recognition 77880505 FFFFiiiinnnnggggeeeerrrr 1234 mulative Recognition 77880505 FFFFiiiinnnnggggeeeerrrr 1234 Cu Iris 1 Cu 65 Iris 1 65 Iris 2 Iris 2 50 100 150 200 50 100 150 200 Rank Rank (a) (b) CMC Curve for Individual Biometrics using SLR CMC Curve for Individual Biometrics using SVM Finger 1 100 Rate (%)9905 FFFIriiiinnns ggg1eeerrr 234 Rate (%) 90 Cumulative Recognition 6778850505 Iris 2 Cumulative Recognition 56780000 FFFFIriiiiinnnns gggg1eeeerrrr 1234 60 Iris 2 50 100 150 200 50 100 150 200 Rank Rank (c) (d) Fig. 3: CMCs (Cumulative Match Curve) for individualmodalities using (a) SMBR-E, (b) SMBR-WE, (c) SLR and (d) SVM methods. %) CMC Curves for fingerprint fusion %) CMC Curves for Iris fusion e (100 e (100 at at R R n SMBR−E n SMBR−WE nitio 90 SMBR−WE nitio 80 SMBR−E g g o SLR−Sum o SLR−Sum c c Re 80 SVM−Sum Re 60 SVM−Sum ve SLR−Major ve SLR−Major ulati SVM−Major ulati SVM−Major um 70 20 40 60 80 100 um 40 20 40 60 80 100 C Rank C Rank (a) (b) %) CMC Curve for fusion of all modalities e (100 at R n 95 SMBR−E o niti SMBR−WE og 90 SLR−Sum c Re SVM−Sum ve 85 SLR−Major ati SVM−Major ul um 80 20 40 60 80 100 C Rank (c) Fig. 4: CMCs for multimodal fusion using (a) four fingerprints, (b) two irises and (c) all modalities. SMBR-WE SMBR-E SLR-Sum SLR-Major SVM-Sum SVM-Major 4Fingerprints 97.9 97.6 96.3 74.2 90.0 73.0 2Irises 76.5 78.2 72.7 64.2 62.8 49.3 Allmodalities 98.7 98.6 97.6 84.4 94.9 81.3 TABLE II: Rank one recognition performance for the WVU Multimodal dataset. IEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE,VOL.X,NO.X,MONTH20XX 9 SMBR-WE SMBR-E SLR-Sum SLR-Major SVM-Sum SVM-Major 4Fingerprints 98.2 98.1 97.5 86.3 93.6 85.5 2Irises 76.9 78.8 74.1 67.2 64.3 51.6 Allmodalities 98.8 98.6 98.2 93.8 95.5 93.3 TABLE III: Rank one recognition performance using the proposed quality measure. Rank one recognition across sparsity b) Kernel Fusion: We further compared the perfor- 100 mances of proposed kerSMBR and compSMBR with kernel 95 SVM and kernel SLR methods. In the experiments, we used Recogntion Rate 889050 AFIriilnsl mgeordparilnitty Radial BasisκF(uxnic,tixojn)(=RBexFp)(cid:18)as−thkexik−eσr2nxejlk,22g(cid:19)ive,n as: σ being a parameter to control the width of the RBF. 75 • Hyperparameter tuning: To fix the value of hyper- 700 0.005 Sparsity p0a.r0a1meter (λ) 0.015 0.02 parameter, σ, we iterated over different values of σ, 1 {2−3,2−2,··· ,23} for one set of training and test split Fig. 5: Variation of recognition performance with different of the data. The value of σ giving the maximum perfor- values of sparsity constraint, λ . 1 mance was fixed for each modality,and the performance was averaged over a few iterations. The weights {α } ij SLR and SVM can be attributed to the discriminative were set to 1 for composite kernel. λ and βW were set approaches of these methods, as well as score based to 0.01 and 0.01 respectively. fusion, as the fusion further reduces recognition when • Comparisonofmethods:Figure7showstheperformance individual classifiers are not good. ofdifferentmethodsonindividualmodalities,andFigure 8andTableVondifferentfusionsettings.Comparisonof Rank one recognition across number of training samples performance with linear fusion shows that the proposed 100 kerSMBR significantly improves the performance on in- dividualiris modalitiesas wellas iris fusion.Theperfor- 80 mance on fingerprint modalities are similar, however the Rate 60 fusionofall6modalitiesshowsanimprovementof0.4%. Recogntion 40 SMBR−WE ktheerSmMeBthRodaslsfooradcifhfieerveenst ftuhseiobnesstettainccgus.rakceyrSaLmRonscgoraelsl SMBR−E better than kerSVM in all the cases, and it’s accuracy 20 SVM−Sum is close to kerSMBR. The performance of kerSLR is SLR−Sum 01 1.5 2 2.5 3 3.5 4 better than the linear counterpart,howeverkerSVM does Number of training samples per class not show much improvement.Composite kernels present Fig. 6: Variation of recognition performance with number of an interesting case. Here, compSLR shows better perfor- training samples. mancethancompSMBRoneachhomogenousmodalities. Composite kernel by combining homogenous modalities • Comparison with other score-based fusion methods: Al- into one, reduces effective number of modalities, hence thoughsum-basedfusionisapopulartechniqueforscore the size of Γ matrix is reduced. This decreases the fusion, some other techniques have also been proposed. flexiblityin exploitingdifferentmodalityinformationvia We evaluatedthe performanceof likelihood-basedfusion Γ.Hence,theperformanceofcompSMBRisnotoptimal. method proposedin [38].The results are shown in Table It should be noted, however,we are not comparingcom- IV. The method does not show good performance as it posite kernels with kerSMBR as we have not optimized models score distribution as Gaussian Mixture Model. overαvaluesforcompositekernels.Thisisalso a major However, it is difficult to model score distribution due limitation for composite kernel based methods. to large variations in data samples. The method is also affected by the curse of dimensionality. B. AR Face Dataset 2irises 4fingerprints Allmodalities The AR face dataset consists of faces with varying illumi- SLR-Likelihood 66.6 83.5 75.1 nation, expression and occlusion conditions, captured in two SVM-Likelihood 50.7 31.9 31.0 sessions. We evaluated our algorithms on a set of 100 users. TABLEIV:Fusionperformancewithlikelihood-basedmethod Images from the first session, 7 for each subject were used [38]. as training and the images from the second session, again 7 per subject, were used for testing. Simple intensity values