ebook img

DTIC ADA444367: Illumination Invariant Face Recognition Using Thermal Infrared Imagery PDF

0.27 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA444367: Illumination Invariant Face Recognition Using Thermal Infrared Imagery

∗ Illumination Invariant Face Recognition Using Thermal Infrared Imagery DiegoA.Socolinsky† LawrenceB.Wolff‡ JoshuaD.Neuheisel† ChristopherK.Eveland‡ ‡EquinoxCorporation †EquinoxCorporation 9West57thStreet 207EastRedwoodStreet NewYork,NY10019 Baltimore,MD21202 {diego,wolff,jneuheisel,eveland}@equinoxsensors.com Abstract dependent of ambient illumination, such problems do not exist. Akeyproblemforfacerecognitionhasbeenaccurateiden- To perform our experiments, we have developed a spe- tification under variable illumination conditions. Conven- cial Visible-IR sensor capable of taking simultaneous and tional video cameras sense reflected light so that image co-registeredimageswithbothavisibleCCDandaLWIR grayvalues are a product of both intrinsic skin reflectivity microbolometer. This is of particular significance for this andexternalincidentillumination,thusobfuscatingthein- test, since we are testing on exactly the same scenes for trinsicreflectivityofskin. Thermalemissionfromskin,on boththevisibleandIRrecognitionperformance,notabore- theotherhand,isanintrinsicmeasurementthatcanbeiso- sightedpairofimages. lated from external illumination. We examine the invari- Inordertoperformproperinvarianceanalysis,itisnec- anceofLong-WaveInfraRed(LWIR)imagerywithrespect essary that thermal IR imagery be radiometrically cali- to different illumination conditions from the viewpoint of brated. Radiometric calibration achieves a direct relation- performancecomparisonsoftwowell-knownfacerecogni- shipbetweenthegrayvalueresponseatapixelandtheab- tion algorithms applied to LWIR and visible imagery. We solute amount of thermal emission from the correspond- develop rigourous data collection protocols that formalize ingsceneelement. Thisrelationshipiscalledresponsivity. facerecognitionanalysisforcomputervisioninthethermal Thermalemissionismeasuredasfluxinunitsofpowersuch IR. asW/cm2.ThegrayvalueresponseofthermalIRpixelsfor LWIRcamerasislinearwithrespecttotheamountofinci- dent thermal radiation. The slope of this responsivity line 1 Introduction iscalledthegainandthey-interceptistheoffset. Thegain andoffset foreach pixelon athermal IRfocalplane array Thepotentialforilluminationinvariantfacerecognitionus- issignificantlyvariableacrossthearray. Thatis,thelinear ing thermal IR imagery has received little attention in the relationship can be, and usually is, significantly different literature [1, 2]. The current paper quantifies such invari- from pixel to pixel. This is illustrated in Figure 1 where ancebydirectperformanceanalysisandcomparisonofface both calibrated and uncalibrated images are shown of the recognitionalgorithmsbetweenvisibleandLWIRimagery. samesubject. Ithasoftenbeennotedintheliterature[3,2,4]thatvari- While radiometric calibration provides non-uniformity ations in ambient illumination pose a significant challenge correction, the relationship back to a phisical property of to existing face recognition algorithms. In fact, a variety theimagedobject(itsemmisivity)providesthefurtherad- ofmethodsforcompensatingforsuchvariationshavebeen vantageofdatawhereenvironmentalfactorscontributetoa studiedinordertoboostrecognitionperformance,including muchlesserdegreetowithin-classvariability. amongothershistogramequalization,laplaciantransforms, An added bonus of radiometric calibration for thermal gabortransformsandlogaritmictransforms. Allthesetech- IRisthatitsimplifiestheproblemofskindetectioninclut- niques attempt to reduce the within-class variability intro- teredscenes.Therangeofhumanbodytemperatureisquite duced by changes in illumination, which severly degrades small,varyingfrom96◦Fto100◦F.Wehavefoundthatskin classificationperformance. SincethermalIRimageryisin- temperatureat70◦Fambientroomtemperaturetoalsohave ∗ThisresearchwassupportedbytheDARPAHumanIdentificationata asmallvariablerangefromabout79◦Fto83◦F.Radiomet- Distance(HID)program,contract#DARPA/AFOSRF49620-01-C-0008. ric calibration makes it possible to perform an initial seg- 1 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2006 2. REPORT TYPE 00-00-2006 to 00-00-2006 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Illumination Invariant Face Recognition Using Thermal Infrared 5b. GRANT NUMBER Imagery 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Equinox Corporation,9 West 57th Street,New York,NY,10019 REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES The original document contains color images. 14. ABSTRACT see report 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 8 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 0.0035 0.003 0.0025 0.002 0.0015 0.001 0.0005 0 0 2 4 6 8 10 12 14 Figure 1: Calibrated (right) and uncalibrated LWIR im- ages. There is significant pixel-wise difference in respon- Figure 2: The Planck curve for a black-body at 303K sivitywhichisremovedbythecalibrationprocess. (roughly skin temperature), with the area to be integrated foran8-12µmsensorshaded. mentationofskinpixelsinthecorrecttemperaturerange. observedbythesensor,theresponsivity,R(λ)ofthesensor 2 Data Collection Procedure for mustbetakenintoaccount.Thisallowsthefluxobservedby aspecificsensorfromablack-bodyatagiventemperature Multi-Modal Imagery tobedetermined: The data used in the experiments performed for this pa- W(T)=(cid:90) W(λ,T)R(λ)dλ. (2) per was collected by the authors at the National Institute ofStandardsandTechnology(NIST)duringatwo-daype- For our sensor, the responsivity is very flat between 8 and riod. Visible and LWIR imagery was recorded with a pro- 12 microns, so we can simply integrate Equation (1) for λ totype sensor developed by the authors, capable of imag- between 8 and 12. The Planck curve and the integration ingbothmodalitiessimultaneouslythroughacommonaper- processareillustratedinFigure2. ture.Theoutputdataconsistsof240x320pixelimagepairs, One can achieve marginally higher precision by taking co-registered to within 1/3 pixel, where the visible image measurements at multiple temperatures and obtaining the has8bitsofgrayscaleresolutionandtheLWIRhas12bits. gains and offsets by least squares regression. For the case ofthermalimagesofhumanfaces,wetakeeachofthetwo 2.1 CalibrationProcedures fixedtemperaturestobebelowandaboveskintemperature, All of the LWIR imagery was radiometrically calibrated. toobtainthehighestqualitycalibrationforskinlevelsofIR SincetheresponsivityofLWIRsensorsareverylinear,the emmision. pixelwiselinearrelationbetweengrayvaluesandfluxcanbe It should be noted that a calibration has a limited life computedbyaprocessoftwo-pointcalibration. Imagesof span. If a LWIR camera is radiometrically calibrated in- a black-body radiator covering the entire field of view are doors, takingitoutdoorswherethereisasignificantambi- taken at two known temperatures, and thus the gains and enttemperaturedifferencewillcausethegainandoffsetof offsetsarecomputedusingtheradiantfluxforablack-body linearresponsivityofthefocalplanearraypixelstochange. atagiventemperature. Therefore,radiometriccalibrationmustbeperformedagain. Notethatthisisonlypossibleiftheemissivitycurveof ThiseffectismostlyduetotheopticsandFPAheatingup, ablack-bodyasafunctionoftemperatureisknown. Thisis and causing the sensor to “see” more energy as a result. given by Planck’s Law, which states that the flux emmited Also, suppose two separate data collections are taken with atthewavelengthλbyablackbodyatagiventemperature twoseparateLWIRcamerasbutwiththeexactsamemodel T inW/(cm2µm)isgivenby number,identicalcamerasettingsandundertheexactsame environmentalconditions. Nonetheless,notwothermalIR 2πhc2 focal plane arrays are ever identical and the gain and off- W(λ,T)= (1) λ5(cid:16)eλhkcT −1(cid:17) setofcorrespondingpixelsbetweentheseseparatecameras will be different. Yet another example; suppose two data wherehisPlanck’sconstant,kisBoltzman’sconstant,and collectionsaretakenoneyearapart,withthesamethermal cisthespeedoflightinavacuum. Torelatethistotheflux IR camera. It is very likely that gain and offset character- Figure 4: Sample imagery from our data collection. Note Figure3: Cameraandlightingsetupfordatacollection. thatLWIRimagesarenotradiometricallycalibrated. isticswillhavechanged. Radiometriccalibrationstandard- 3 Algorithms Tested izesallthermalIRdatacollections,whethertheyaretaken under different environmental conditions or with different Sincethepurposeofthispaperistoremarkontheviability camerasoratdifferenttimes. of visible versus thermal IR imagery for face recognition, ThegrayvalueforanythermalIRimageisdirectlyphys- we used two standard algorithms for testing. We review icallyrelatedtothermalemissionfluxwhichisauniversal thembrieflyinthissection. Let{x }N beasetofN vec- i i=1 standard.ThisprovidesastandardizedthermalIRbiometric torsinRk,forsomefixedk >0. Digitalimagesareturned signatureforhumans. Theimagesthatfacerecognitional- intovectors,convertingatwo-dimensionalarrayintoaone- gorithmscanmostbenefitfrominthethermalIRarenotar- dimensionalone,byscanninginrasterorder. raysofgrayvalues,butratherarraysofcorrespondingther- PerhapsthemostpopularalgorithminthefieldisEigen- malemissionvalues. Ifthereisnowaytorelategrayvalues faces [5]. Given a probe vector p ∈ Rk and a training tothermalemissionvaluesthenitisnotpossibletodothis. set{t }N ,theEigenfacesalgorithmissimplya1-nearest i i=1 neighborclassifierwithrespecttotheL2 norm,wheredis- tances are computed between projections of the probe and 2.2 TheCollectionSetup trainingsetsontoafixedm-dimensionalsubspaceF ⊆Rk, known as the face space. The face space is computed Forthecollectionofourimages,weusedtheFBImugshot by taking a (usually separate) set of training observations, standard light arrangement, shown in Figure 3. Image se- and finding the unique ordered orthonormal basis of Rk quences were acquired with three illumination conditions: that diagonalizes the covariance matrix of those observa- frontal, left lateral and right lateral. For each subject and tions,orderedbythevariancesalongthecorrespondingone- illumination condition, a 40 frame, four second, image se- dimensionalsubspaces. Thesevectorsareknownaseigen- quencewasrecordedwhilethesubjectpronouncedthevow- faces. It is well-known that, for a fixed choice of n, the elslookingtowardsthecamera. Aftertheinitial40frames, subspacespannedbythefirstnbasisvectorsistheonewith three static shots were taken while the subject was asked lowest L2 reconstruction error for any vector in the train- to act out the expressions ‘smile’, ‘frown’, and ‘surprise’. ingsetusedtocreatethefacespace. Undertheassumption Inaddition,forthosesubjectswhoworeglasses,theentire thatthattrainingsetisrepresentativeofallfaceimages,the processwasdonewithandwithoutglasses. Figure4shows face space is taken to be a good low-dimensional approxi- asamplingofthedatainbothmodalities. mationtothesetofallpossiblefaceimagesundervarying A total of 115 subjects were imaged during a two-day conditions. period. After removing corrupted imagery from 24 sub- WealsoperformedtestsusingtheARENAalgorithm[6]. jects,ourtestdatabaseconsistsofover25,000framesfrom ARENA is a simpler, appearance based algorithm which 91distinctsubjects. Muchofthedataishighlycorrelated, can also be characterized as a 1-nearest neighbor method. so only specific portions of the database can be used for Thealgorithmproceedsbyfirstreducingthedimensionality training and testing purposes without creating unrealisti- of training observations and probes alike. This is done by callysimplerecognitionscenarios.ThisisexplainedinSec- pixelizingtheimagestoaverycoarseresolution,replacing tion4. eachpixelbytheaveragegrayvalueoverasquareneighbor- hood. Once in the reduced-resolution space, of dimension FERETdatabase,althoughourexpressionframesareoften n,1-NNclassificationisperformedwithrespecttothefol- moredifferentfromvowelframesthanintheFERETcase. lowingsemi-norm Frontal and lateral illumination frames are comparable to fa versus fc sets in FERET. Query set number 9 was used n for face space computations for testing of the eigenfaces L0(x,y)= 1 (cid:107)xj −yj(cid:107), (3) (cid:15) (cid:88) [0,(cid:15)] algorithm. Lastly, we should note that queries 7, 8 and 9 j=1 were only used as testing sets and not as training sets (ex- where1 denotestheindicatorfunctionofthesetU. cept7versus8), sincethemaximumpossiblecorrectclas- U sificationperformanceachievableforthosecombinationsis lower than 100%, and therefore those combinations were 4 Testing Procedure ignoredtosimplifytheanalysis. All performance results reported below are for the top In order to create interesting classification scenarios from match. That is, a given probe is considered correctly clas- ourVisible/LWIRdatabase,weconstructedmultiplequery sifiediftheclosestimageinthetrainingsetbelongstothe sets for testing and training. Frames 0, 3 and 9 (out of same subject as the probe. Note that when using nearest- 40) from a given image sequence are referred to as vowel neighbor classifiers, one runs the risk that multiple train- frames. Framescorrespondingto‘smile’,‘frown’and‘sur- ingobservationswillbeatthesamedistancefromaprobe. prise’arereferredtoasexpressionframes. Ourquerycrite- In particular it is possible to have multiple training ob- riaareasfollows: servations at the minimum distance. This is especially likelywhenhigh-dimensionaldataisprojectedontoalow- 1. Vowelframesfromallsubjects,allilluminations. dimensionalspace. Inthatcase,itispossibletohavefalse 2. Expressionframesfromallsubjects,allilluminations. alarmsevenwhenconsideringonlythetopmatch,asitmay notbeunique. LetT beatrainingsetandP asetofprobes. 3. Vowelframesfromallsubjects,frontalillumination. Forp∈P,letmbethedistancefromptotheclosesttrain- ing observation, and H = {t ∈ T | dist(p,t) = m}. 4. Expressionframesfromallsubjects, frontalillumina- p Defineα tobe1ifanymemberofH belongstothesame tion. p class as p, and zero otherwise. Further define (cid:107)H (cid:107) to be p 5. Vowelframesfromallsubjects,lateralillumination. the number of distinct class labels among elements of H and#P thenumberofprobesinP. Withthisnotation,the 6. Expression frames from all subjects, lateral illumina- correctclassificationrateandfalsealarmratearegivenby tion. 1 CC = α , (4) 7. Vowel frames from subjects wearing glasses, all illu- #P (cid:88) p minations. p∈P (cid:107)H (cid:107) 8. Expression frames from subjects wearing glasses, all FA= (cid:88) (cid:107)H (cid:107)p+α (5) illuminations. p∈P p p 9. 500randomframes,arbitraryillumination. 5 Experimental results Thesamequerieswereusedtoconstructsetsforvisibleand LWIRimagery,andallLWIRimageswereradiometrically For the ARENA algorithm, 240x320 images (in each calibrated. Locations of the eyes and the frenulum were modality)werepixelizedandreducedto15x20pixels. We semi-automaticallylocatedinallvisibleimages,whichalso experimented with multiple values for the parameter (cid:15), in provided the corresponding locations in the co-registered theL0norm,andfoundthatbestperformancewasobtained (cid:15) LWIR frames. Using these feature locations, all images with (cid:15) = 5 for visible imagery and (cid:15) = 10 for LWIR. weregeometricallytransformedtoacommonstandard,and This is consistent with the empirically observed fact that cropped to eliminate all but the inner face. For the visi- theLWIRimageryfromthisdatasethasanaverageeffec- ble imagery, in addition to images processed as described tivedynamic-rangeofabout500grayvalues,versusthe256 above,wecreatedaduplicatesettowhichweappliedamul- forthevisible. ComparingTables1and2, weseethatthe tiscale version of center-surround processing [7] (a cousin ARENAalgorithmonvisibleimagerybenefitsgreatlyfrom of the Retinex algorithm [8]), to compensate for illumina- pre-processingwiththecenter-surroundmethod.Themean, tionvariation. and minimum classification performance over our queries The relation between vowel frames and expression onunprocessedvisibleimageryare72%and13%, respec- frames is comparable to that between fa and fb sets in the tively, with the minimum occurring when training is done on lateral illuminated vowel frames and testing on frontal LWIR is uniformly superior to Eigenfaces on visible im- illuminated vowel frames. For center-surround processed agery. Classificationperformanceisonaverage17percent- visibleimagery, themeanandminimumclassificationper- agepointshigherforLWIRimagery,withthebestscenario formanceforARENAare93%and76%,respectively,with yieldinganimprovementof54percentagepointsovervisi- theminimumoccurringwithtrainingonfrontalilluminated bleimagery,whileneverunderperformingit. expression frames and testing on lateral illuminated vowel Figures5and6showthefirstfiveeigenfacesforthevis- frames. ibleandLWIRfacespaces,respectively. Thevisibleeigen- Allperformanceresultsbelowarereportedintabularfor- faceshavea(bynow)familiarlook,containingmostlylow- mat, whereeachcolumnofaTablecorrespondstoatrain- frequency information, and coding partly for variation in ing set and each row to a testing set. The numbering of illumination. The corresponding LWIR eigenfaces do not the rows and columns matches that of the list at the start have the ‘usual’ characteristics. In particular, we see that of Section 4. The diagonal entries are ommitted, since in theLWIReigenfaceshavefewerlowfrequencycomponents all cases the classifiers achieve perfect performance when an many more high frequency ones. We verified this nu- trained and tested on the same set. Also, certain sets are merically by examining the modulus of the 2-dimensional not used for training, since they do not contain images of Fouriertransformsofthefirsteigenfacesineachmodality, allsubjects,andthereforethemaximumpossibleclassifica- althoughitshouldbereasonablyclearbysimplylookingat tionperfomanceisstrictlylowerthan100%. Suchsetsare theimages. useful when testing the ability of a classifier to determine It is very interesting to look at the spectra of the whetheragivenprobehasamatchinthetrainingsetatall, eigenspace decompositions for the visible and LWIR face butwedonotconsiderthatprobleminthecurrentarticle. spaces. Thecorrespondingnormalizedcumulativesumsfor Table3showsclassificationperformanceforARENAon thefirst100dimensionsisshowninFigure7. Itiseasyto LWIR imagery. The mean and minimum performance are see that the vast majority of the variance of the data dis- 99%and97%,respectively. Theminimuminthiscaseoc- tribution is contained in a lower dimensional subspace for curs when we train on expression frames with frontal il- the LWIR than for the visible imagery. For example, a 6- lumination and test on vowel frames for subjects wearing dimensional subspace is sufficient to capture over 95% of glasses. It is not surprising that the lowest performance the variance of the LWIR data, whereas a 36-dimensional would occur for probe sets where subjects are wearing subspace is necessary to capture the same variance for glasses, since glass is opaque in the LWIR (see Figure 4). the visible imagery. This fact alone may be responsible Howeveritissurprisingthatthelowestperformanceisstill in large part for the higher performance of Eigenfaces on quitehigh. LWIR imagery than visible one. Indeed, since Eigenfaces Despite the big performance boost that center-surround isanearest-neighborclassifier,itsuffersfromthestandard processing affords the ARENA algorithm on visible im- ‘curse of dimensionality’ problems in pattern recognition. agery,suchprocessingisnotsuitableforuseincombination It has been recently noted in the literature that the notion withEigenfaces. Ourexperimentsshowthatitreducesper- ofnearestneighborbecomesunstable, andeventuallymay formanceinallsituations. Thisisprobablyduetothefact bemeaninglessforhighdimensionalsearchspaces[9,10]. that center-surround processing acts partially like a high- Infact,formanyclassificationproblems,asthedimension- passfilter, removingthelow-frequencycomponentswhich ality of the feature space grows, the distance from a given arenormallyheavilyrepresentedamongthefirstfewprinci- probetoitsnearestandfarthestneighborsbecomeindistin- paleigenfaces. Sinceresultsweresopooronpre-processed guishable, thus rendering the classifier unusable. It is not visible imagery, we do not report specifics below. For our known whether face recognition falls in this (large) cate- experiments we took the 240x320 images and subsampled gory of problems, but it can be safely assumed that data themtoobtain768-dimensionalfeaturevectors. which has a lower intrinsic dimensionality is better suited In Table 4, we see classification performance of Eigen- forclassificationproblemswheretheclassconditionalden- faces on visible imagery. The mean and minimum perfor- sities must be inferred from a limited training set. In this mance in this case are 78% and 32%, respectively. Min- context,LWIRimageryofhumanfacesmaysimplybe‘bet- imum performance occurs for training on lateral illumi- terbehaved’thanvisibleimagery,andthusbettersuitedfor nated vowel frames and testing on frontal illuminated ex- classification. pressionframes,similartotheARENAcase. Performance Wefurthercomparedtheperformanceofthetwoclassi- of Eigenfaces on LWIR imagery is displayed in Table 5. fiersonbothmodalities,specificallyfortraining/testingset Mean and minimum performance are 96% and 87%, re- pairswhereeachsethadadifferentilluminationcondition. spectively. Interestingly,theminimumoccursforthesame Thatiswelookedatpairswhere,forexample,thetraining training/testingcombinationasforthevisibleimagery. We set had lateral illumination and the testing set had frontal can see by comparing Tables 4 and 5, that Eigenfaces on illumination. Averageandminimumclassificationsperfor- 1 Visible LWIR 1 2 3 4 5 6 7 8 0.95 1 0.986 0.557 0.514 0.741 0.717 0.9 2 0.970 0.528 0.542 0.694 0.720 3 1.000 0.988 0.988 0.226 0.182 0.85 4 0.953 1.000 0.953 0.136 0.173 0.8 5 1.000 0.986 0.334 0.276 0.986 6 0.979 1.000 0.312 0.308 0.979 0.75 7 1.000 0.990 0.604 0.532 0.733 0.729 0.969 0.7 8 0.992 1.000 0.552 0.564 0.692 0.716 0.985 9 0.998 1.000 0.556 0.526 0.692 0.686 0.65 0.6 Table1: ARENAresultsonunprocessedvisibleimagery 0.55 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 1 0.954 0.934 0.824 0.985 0.919 Figure 7: Normalized cumulative sums of the visible and 2 0.944 0.830 0.914 0.903 0.970 LWIReigenspectra. 3 1.000 0.953 0.942 0.956 0.858 4 0.946 1.000 0.922 0.833 0.913 5 1.000 0.954 0.900 0.765 0.949 6 0.943 1.000 0.783 0.870 0.938 7 1.000 0.976 0.969 0.884 0.990 0.939 0.969 mance are reported in Tables 6 and 7. Table 6 compares 8 0.978 1.000 0.905 0.937 0.951 0.983 0.973 variation in illumination without major expression varia- 9 0.995 0.961 0.904 0.858 0.971 0.935 tion, while Table 7 corresponds to pairs where both illu- minationandexpressionareverydifferent. Weseethatfor Table2: ARENAresultsoncenter-surroundprocessedvis- bothalgorithms,LWIRimageryyieldsmuchhigherclassi- ibleimagery fication performance over visible imagery. Indeed, for the 1 2 3 4 5 6 7 8 variant illumination/expression experiment, LWIR Eigen- 1 0.999 0.999 0.986 0.997 0.996 faces outperform visible Eigenfaces by more than 30 per- 2 0.997 0.993 0.987 0.995 0.996 centagepointsonaverage,andmorethan50fortheworst- 3 1.000 1.000 0.993 0.993 0.993 casescenario. 4 1.000 1.000 1.000 0.993 0.990 5 1.000 0.998 0.998 0.983 0.998 A combination of classifiers can often perform better 6 0.996 1.000 0.990 0.980 0.996 than any one of its individual component classifiers. In 7 1.000 0.997 0.997 0.976 1.000 0.997 0.976 8 1.000 1.000 1.000 0.985 1.000 1.000 0.980 fact there is a rich literature on combination of classi- 9 1.000 0.998 0.996 0.978 1.000 0.998 fiers for identity verification, mostly geared towards com- bining voice and fingerprint, or voice and face biometrics Table3: ARENAresultsonLWIRimagery (e.g.[11]).Themainproblemishowtocombinetheoutputs ofdisparateclassifiersystemstoproduceasingledecission. 1 2 3 4 5 6 7 8 1 0.782 0.927 0.675 0.908 0.689 TheARENAfaceclassifierisverywell-suitedtoensemble 2 0.701 0.598 0.909 0.601 0.867 processing,sincetheL0(cid:15) normisboundedabovebythedi- 3 1.000 0.689 0.685 0.726 0.432 mensionofthespace,thereforemakinginterpointdistances 4 0.593 1.000 0.591 0.319 0.607 from disparate distributions comparable to each other. To 5 1.000 0.828 0.890 0.671 0.819 6 0.756 1.000 0.602 0.863 0.745 test this scenario, we constructed testing and training sets 7 1.000 0.854 0.951 0.750 0.916 0.743 0.865 fromthevisible/LWIRimagepairs. Now,anobservationis 8 0.794 1.000 0.682 0.929 0.675 0.864 0.782 not a single image, but a bi-modal pair. The testing set is 9 0.920 0.856 0.846 0.760 0.826 0.768 composed of images where the subject is wearing glasses, Table4: Eigenfacesresultsonvisibleimagery andthetrainingsetcontainsimagesofthesamesubjectbut without glasses. This is a particularly challenging set, and 1 2 3 4 5 6 7 8 theclassificationperformanceis70.7%forvisibleARENA, 1 0.973 0.981 0.922 0.995 0.962 and92.2%forLWIRARENA.Wecanconstructadistance 2 0.941 0.895 0.957 0.916 0.984 between bi-modal observations simply as a weighted sum 3 1.000 0.974 0.942 0.986 0.949 4 0.922 1.000 0.896 0.868 0.953 ofthedistancesforvisibleandLWIRcomponents,andthen 5 1.000 0.973 0.972 0.912 0.969 perform1-NNclassificationonthenewdistance.Bychoos- 6 0.951 1.000 0.894 0.935 0.941 ing the weighting parameter to be twice as large for the 7 1.000 0.976 0.997 0.967 1.000 0.972 0.956 LWIRcomponentasforthevisiblecomponent,wecanob- 8 0.956 1.000 0.929 0.983 0.936 0.987 0.966 9 0.981 0.983 0.969 0.933 0.971 0.965 tainaclassificationperformanceof94.7%,animprovement of2.5percentagepointsovertheLWIRclassifieralone. Table5: EigenfacesresultsonLWIRimagery Figure5: Firstfivevisibleeigenfaces. Figure6: FirstfiveLWIReigenfaces. 6 Conclusions We presented a systematic performance analysis of two ARENA Eigenfaces standard face recognition algorithms on visible and LWIR Visible 0.874/0.783 0.663/0.432 imagery. In support of our analysis, we performed a com- LWIR 0.993/0.990 0.950/0.894 prehensive data collection with a novel sensor system ca- pable of acquiring co-registered visible/LWIR image pairs Table6: Meanandminimumperformanceonexperiments throughacommonapertureatvideoframe-rates. Thedata wherethetrainingandtestingsetshavedifferentillumina- collection effort was designed to test the hypothesis that tionbutsimilarexpressions LWIRimagerywouldyieldhigherrecognitionperformance undervariableilluminationconditions. Intra-personalvari- abilitywasinducedinthedatabyrequiringthatthesubjects pronouncethevowelswhilebeingimaged, aswellashav- ingthemactoutseverelyvariantfacialexpressions. Dividing the data into multiple training and testing sets allowed us to gain some understanding of the shortcom- ings of each modality. As expected, variation in illumina- tionconditionsbetweentrainingsetsandprobesresultedin ARENA Eigenfaces markedlyreducedperformanceforbothclassifiersonvisi- Visible 0.860/0.765 0.639/0.319 bleimagery. Atthesametime,suchilluminationvariations LWIR 0.990/0.980 0.933/0.868 havenosignificanteffectontheperformanceonLWIRim- agery. The presence or absence of glasses has more influ- Table7: Meanandminimumperformanceonexperiments ence for LWIR than visible imagery, since glass is com- wherethetrainingandtestingsetshavedifferentillumina- pletely opaque in the LWIR. However, a variance-based tionsandexpressions classifier such as Eigenfaces can ignore their effect to a large extent by lowering the relevance of the area around theeyeswithinthefacespace. Overall, classification performance on LWIR imagery appears to be superior to that on visible imagery, even for testing/trainingpairswherethereisnoapparentreasonfor Face & Gesture Recognition, Killington, VT, 1996, onetooutperformtheother. InthecaseofEigenfaces, we pp.182–187. can offer a plausible explanation for the superior perfor- [3] P. Jonathon Phillips, Hyeonjoon Moon, Syed A. manceintermsoftheapparentlylowerintrinsicdimension- Rizvi, and Patrick J. Rauss, “The FERET Evalua- ality of the data. Further experiments and data collections tionMethodologyforFace-RecognitionAlgorithms,” willbenecessarytosubstantiatethisconjecture,especially Tech.Rep.NISTIR6264, NationalInstitiuteofStan- sincethevariance-baseddefinitionof‘intrinsicdimension- dardsandTechnology,7Jan.1999. ality’ is a rather poor one. We intend to extend this in- vestigation to the local dimensionality of LWIR face data [4] Yael Adini, Yael Moses, and Shimon Ullman, “Face ascomparedtoitsvisiblecounterpart. Lower-dimensional Recognition: The Problem of Compensating for face data would be a compelling reason for using LWIR Changes in Illumination Direction,” IEEE Transac- imagery in face recognition systems. In essence, if we tions on Pattern Analysis and Machine Intelligence, cannotbeatthecurseofdimensionalitybystatisticalmeth- vol.19,no.7,pp.721–732,July1997. ods,weshouldperhapsbesearchingforalternativesources oflower-dimensionaldatawithrichclassificationpotential. [5] M. Turk and A. Pentland, “Eigenfaces for Recogni- LWIRfaceimagerymayjustbesuchasource. tion,” J. Cognitive Neuroscience, vol. 3, pp. 71–86, It is possible that the high performance results we ob- 1991. tained are due to nature of our database. While our data [6] Terence Sim, Rahul Sukthankar, Matthew D. Mullin, maycontainenoughchallengingcasesforvisiblefaceclas- and Shumeet Baluja, “High-Performance Memory- sifiers,itmaynotcoverthechallengingsituationsforLWIR basedFaceRecognitionforVisitorIdentification,” in face classifiers. We intend to expand our collection and ProceedingsofIEEEConf.FaceandGestureRecog- analysiseffortinordertoanswerthisquestion. Oneshould nition,Grenoble,2000. note,however,thatthedataonwhichthisstudyisbasedis very representative of common and important face recog- [7] D. Fay et al. A. Waxman, A. Gove, “Color night vi- nition scenarios. For instance, indoor visitor identifica- sion: Opponentprocessinginthefusionofvisibleand tion and access to secure facilities and computer systems. IRimagery,” NeuralNetworks,vol.10,no.1,pp.1–6, The subjects we imaged were not prepared in any special 1997. way,andneitherweretheenvironmentalconditions. Thus, though it may be possible to collect data which is much [8] Z. Rahman, D. Jobson, and G. Woodell, “Multiscale morechallengingintheLWIR,itappearsthatthismodality retinex for color rendition and dynamic range com- holdsgreatpromiseunderreasonableoperatingconditions. pression,”inSPIEConferenceonApplicationsofDig- The main disadvantage of thermal-infrared-based biomet- italImageProcessingXIX,Denver,Nov.1996. ricidentificationatthepresenttimeisthehighpriceofthe [9] Allan Borodin, Rafail Otrovsky, and Yuval Rabani, sensors. While the cost of thermal infrared sensors is sig- “LowerBoundsforHighDimensionalNearestNeigh- nificantlyhigherthanthatofvisibleones,priceshavebeen bor Search and Related Problems,” in Proceedings steadily declining over the last few years, and as volume ofACMSymposiumonTheoryofComputing,26Apr. increases they will continue to do so. This fact, in com- 1999,pp.312–321. binationwiththeirhighperformance, providescompelling reason for the deployment and continuing development of [10] Kevin Beyer, Jonathan Goldstein, Raghu Ramakrish- thermalbiometricidentificationsystems. nan, and Uri Shaft, “When is ”Nearest Neighbor” Meaningful?,” in 7th International Conference on DatabaseTheory,Jan.1999. References [11] “Multi-ModalPersonAuthentication,”inProceedings of Face Recognition: From Theory to Applications, [1] F.J.Prokoski, “History,CurrentStatus,andFutureof Stirling,1997,NATOAdvancedStudyInstitute. Infrared Identification,” in Proceedings IEEE Work- shop on Computer Vision Beyond the Visible Spec- [12] J.Prokoski,F., “MethodandApparatusforRecogniz- trum: MethodsandApplications,HiltonHead,2000. ing and Classifying Individuals Based on Minutiae,” 2001. [2] Joseph Wilder, P. Jonathon Phillips, Cunhong Jiang, and Stephen Wiener, “Comparison of Visible and Infra-RedImageryforFaceRecognition,”inProceed- ings of 2nd International Conference on Automatic

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.