ebook img

DTIC ADA478607: A Graph-based Foreground Representation and Its Application in Example Based People Matching in Video (PREPRINT) PDF

0.38 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA478607: A Graph-based Foreground Representation and Its Application in Example Based People Matching in Video (PREPRINT)

A GRAPH-BASED FOREGROUND REPRESENTATION AND ITS APPLICATION IN EXAMPLE BASED PEOPLE MATCHING IN VIDEO By KedarA. Patwardhan GuillermoSapiro and VassiliosMorellas IMA Preprint Series # 2154 (January 2007) INSTITUTE FOR MATHEMATICS AND ITS APPLICATIONS UNIVERSITYOFMINNESOTA 400 Lind Hall 207 Church Street S.E. Minneapolis, Minnesota 55455–0436 Phone:612/624-6066 Fax:612/626-7370 URL:http://www.ima.umn.edu Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED JAN 2007 2. REPORT TYPE 00-00-2007 to 00-00-2007 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER A Graph-based Foreground Representation and Its Application in 5b. GRANT NUMBER Example Based People Matching in Video (PREPRINT) 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION University of Minnesota,Institute for Mathematics and its REPORT NUMBER Applications,207 Church Street SE,Minneapolis,MN,55455-0436 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT In this work, we propose a framework for foreground representation in video and illustrate it with a multi-camera people matching application. We first decompose the video into foreground and background. A low-level coarse segmentation of the foreground is then used to generate a simple graph representation. A vertex in the graph represents the "appearance" of a corresponding segment in the foreground, while the relationship between two segments is encoded by an edge between the corresponding vertices. This provides a simple yet powerful and general representation of the foreground, which can be very useful in problems such as people detection and tracking. We illustrate the effectiveness of this model using an "example based query" type of application for people matching in videos. Matching results are provided in multiple-camera situations and also under occlusion. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE Same as 5 unclassified unclassified unclassified Report (SAR) Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 AGRAPH-BASEDFOREGROUNDREPRESENTATIONANDITSAPPLICATIONIN EXAMPLEBASEDPEOPLEMATCHINGINVIDEO KedarA.Patwardhan,GuillermoSapiro VassiliosMorellas UniversityofMinnesota UniversityofMinnesota DepartmentofElectricalandComputerEngineering DepartmentofComputerScience Minneapolis,MN55455,{kedar,guille}@umn.edu Minneapolis,MN55455 spatial inter-relationships without using an absolute point of refer- ence. Previousworkontherepresentationofforegroundobjects(peo- Example Frame Query Frame Matching Result ple)inavideoscenecomesfromtwomainareasincomputervision, namely,trackingandposedetection. TheHydrasystem,[2],repre- sentsforegroundpeopleasacombinationofa“head-detector”and ABSTRACT anintensitybasedtemplatecorrelation.Thisrequirestheheadofthe persontoalwaysbepartofthesilhouetteandtheappearancetem- In this work, we propose a framework for foreground representa- plateusestheheadcenterasaspatialorigin. McKennaetal., [3], tioninvideoandillustrateitwithamulti-camerapeoplematching representdifferentpeopleintheforegroundusingacolorhistogram application.Wefirstdecomposethevideointoforegroundandback- oftheperson.Asshownin[4],suchhistogrambasedrepresentations ground. Alow-levelcoarsesegmentationoftheforegroundisthen cannotdiscriminatecorrectlybetweentwoobjects(astheycanhave usedtogenerateasimplegraphrepresentation.Avertexinthegraph thesamecolordistribution)withoutadditionalspatialinformation. representsthe“appearance”ofacorrespondingsegmentinthefore- In[5],blobscorrespondingtopeopleintheforegroundareassigned ground,whiletherelationshipbetweentwosegmentsisencodedby todifferentbodyparts(headandhands)totracksingleindividuals anedgebetweenthecorrespondingvertices. Thisprovidesasimple inascene. Theauthorsof[6]segmentpeople(inasinglecamera) yet powerful and general representation of the foreground, which underocclusionbyrepresentingthemasagroupof3segments(head canbeveryusefulinproblemssuchaspeopledetectionandtrack- +torso+limbs),andthenestimatingthebestarrangementforpeo- ing. We illustrate the effectiveness of this model using an “exam- pleinthesceneusingmaximum-likelihoodestimation. Theheadof plebasedquery”typeofapplicationforpeoplematchinginvideos. thepersonisassumedtobevisiblethroughouttheocclusioninorder Matchingresultsareprovidedinmultiple-camerasituationsandalso toestimatetheoriginfortheappearancemodel.Workin[7]reports underocclusion. theuseofapersonmodellearnedofflinetodetectandrepresentthe IndexTerms— Video,Imagematching,Imageanalysis,Machine people in the foreground, and an appearance model of the person vision. islearnedonlinefortrackingaparticularperson. Recentworkson poseestimationlike[8]utilizealooselimbedmodeltorepresentthe 3-Dposeofapersonintheforeground.Motioncapturedataaligned 1. INTRODUCTIONANDPREVIOUSWORK withacoordinateframeofthecalibratedcamerasisusedtoestimate anddetectthelooselyconnectedlimbswithanon-parametricbelief Theefficientrepresentationofforegroundobjectsinvideosisasig- propagationalgorithm.In[9],theauthorsuseaBayesianframework nificantcomponentofimportantcomputervisionapplicationssuch to combine pictorial structure spatial models with hidden Markov as tracking, detection, matching, and video-indexing. Most often, temporalmodelstorepresentapersoninavideo. therepresentationsproposedintheliteraturearehighlytaskorsit- uationspecific,involvecomputationallyprohibitiveofflinetraining, anddonoteffectivelyhandlechangesinscale,pose,orbackground. Inthispaperweproposeamodelforforegroundrepresentation In this paper we propose the use of a graphical model to repre- which combines low-level segmentation and spatial reasoning in a sentforegroundobjectsinavideo.Theautomaticallydetectedfore- meaningfulway. Thisframeworkisintuitivelyidealandveryflexi- groundis(coarsely)segmentedintoconnectedsegmentsor“super- bleforhighleveltasksinforegroundanalysis,foregroundindexing pixels”(borrowingatermfrom[1]).Eachofthesesegmentsiscon- and retrieval, and other applications in multiple-camera scenarios. sideredasavertexinourgraphicalmodel,andtherelationshipbe- We use an example based people matching application to demon- tweensegmentsisrepresentedbyanedge.Thismodelhelpstocap- stratethepracticalutilityofourmodel. Theremainderofthispaper ture the local information, i.e, appearance of the segments in the isorganizedasfollows:Section2givesabriefdescriptionofthepro- foreground,withoutanyexplicitshapemodel.Therepresentationof posedgraphicalrepresentation.Thematchingapplicationisdetailed interactionbetweensegmentsusingedgesallowsustoincorporate in Section 3. Section 4 discusses the matching results and imple- WorkpartiallysupportedbyONR,NSF,NGA,DARPA,andtheMcK- mentationissues, andfinallyweconcludewithfuturedirectionsin nightFoundation. Section5. 3.1. ProblemSetupandSimilarityMeasures A ConsiderthesituationinFigure2. Theforegroundintheexample e AB e BC C framehasbeensegmentedintoverticesAi connectedbytheedges B eAij wherei,j ∈ (1,2,...,NA)andi 6= j,NA beingthenumber ofverticesintheexample(segmentsintheforeground). Theedge C eD connectionruleisverysimple,weconnecttwoverticesbyanedge eBD onlywhenthecorrespondingsegmentsareconnected.Eachsegment original segmented D intheforegroundismodeledusingthecolorhistogram1 generated usingnon-parametricKernelDensityEstimation(referto[12]). We graphical alsousetheSIFTfeatures(referto[13])detectedinsidethesegment representation tomodelitsappearance. The similarity S(A → B) of the segment (vertex) A in the Fig.1. Foregroundfromtheoriginalframe(left)issegmented(cen- exampleframetoasegmentBinthequeriedframeiscomputedas ter) using the algorithm in [10], to generate a graphical model aproductofthecolorandfeaturesimilarities: (right). S(A→B)=ρ(A,B)f(A→B), (1) where, ρ(A,B) is the Bhattacharya coefficient, [14], and f(A → 2. THEFOREGROUNDMODEL B)isthefeaturesimilarity: Encoding the local appearance as well as spatial relationships be- Z p ρ(A,B)= p (x)p (x)dx, (2) tweenobjectsinascenehasalwaysbeenaverychallengingprob- A B lemincomputervision. Inthispaperweproposeagraphstructure torepresenttheforegroundinasceneasacombinationoflocalap- and 1 pearance models of image segments (vertices) along with a model f(A→B)= e−d((siftA),(siftB)), (3) M oftherelationshipbetweenthem(edges). Figure1depictsthepro- Asift posedmodel.Thegraphicalmodelisgeneratedafter2preprocessing where,pA(x)andpB(x)arethedensityestimatesforpixelxcom- steps: (a) Foreground/Background separation, and (b) Foreground putedfromthecolordistributionsinsegmentAandBrespectively, Segmentation. d((siftA),(siftB))isthesumofsquaredEuclideandistancesbe- The video is first decomposed into “foreground+background” tweenthefeaturesinB thatarethebestmatchesforfeaturesinA, byusingourlayeringbasedforegrounddetectiontechniquedetailed andMAsift isthenumberofSIFTfeaturesinthesegmentA. in [11]. This decomposition is very robust to motion in the back- Apart from this similarity meassure, we also use an edge or ground(movingtrees,waterripples,shakycamera,etc)andprovides relationship similarity measure Ψ(eAij →eBkl). The spatial re- areal-timeforegrounddetectioncapability. Oncetheforegroundis lationship between two connected segments/vertices in the exam- detected, we perform a low-level color-based segmentation of the plegraphismodeledbythegraphedgeasasimple2-DGaussian foreground objects using the algorithm proposed in [10] (other at- N(µr,µθ;σr,σθ), where µr is the magnitude of the vector con- tributesbeyondcolorcouldbeusedaswell). Now,eachsegmentin necting the centroid of the adjacent segments (vertices), µθ is the theforegroundisrepresentedbyavertexinourgraphicalmodel.The anglebetweenthisvectorandthehorizontalaxis,andσrandσθare space(and/ortime)relationshipsbetweenthesegmentsismodeled theallowedvariancesrespectively.2 Thus,thesimilaritybetweenthe bygraphedges(bi-directional). Itshouldbenotedthatthisrepre- relationshipbetweentwoverticesintheexampleframe(eAab)with sentationisdifferentthanatypicalgraph,theedgeisnotjustacon- respecttotherelationshipbetweentwoverticesinthequeryframe nectingmechanismbetweentwoverticescarryingsomeweight,the (eBcd)iscomputedas: edgeactuallymodelstherelationshipbetweenthetwosegments,just likethevertexmodelstheappearanceofthesegment. Thisframe- exp(−(rcd−µrab)2 − (θcd−µθab)2) workcanthusallowforinlinelearningandtrackingofthevarious Ψ(eAab →eBcd)= 2σr22πσ σ 2σθ2 , (4) r θ componentsinthescene.Asapreliminaryinvestigation,thefollow- ing sections demonstrate the capability of this type of foreground wherercdisthemagnitudeofthevectorconnectingthecentroidof representationinaddressingthedifficultproblemofexamplebased adjacentverticescorrespondingtoeBcd,andθcdistheangleofthis peoplematchinginmultiplecamerascenarios. vectorwiththehorizontalaxis. 3.2. MatchingAlgorithm 3. ILLUSTRATIVEAPPLICATION: Weutilizeagreedyalgorithmtoperformasubgraphsearchasde- EXAMPLEBASEDPEOPLEMATCHING pictedinFigure2.Followingisabriefpseudo-codeofourmatching algorithm(pleaserefertoFigure2): Findingormatchingapersonviewedinonesceneinthesameora • For each vertex say A in the example frame, compute the completelydifferentsettingisanimportantprobleminapplications 1 similarity S(A → B ) using Equation (1), to all the ver- suchassurveillance. Inthiswork,weassumethatwearegivenan 1 k ticesB inthequeryframe,andretainthebestMmatchesin “example”ofanisolatedperson(markedbythesecurityguardfor k asetofcandidatesC ={B },k∈(1,2,...,N ). example), and we want to search for the person viewed from the A1 k B same or different (and non-overlapping) camera location. To this 1Weusether-g-Scolor-spaceasin[11]. end,weusethegraphicalrepresentationproposedabovetoconvert 2Inallourexperimentswehaveusedσr = 5pixelsandσθ = π/4 thisproblemintoasub-graphmatchingtask,Figure2. radians. Foreground in Foreground in Example Frame Query Frame B 1 A 1 e B12 e e B23 B3 Example Foreground Segments A12 B 2 (a) A 2 eB24 B 4 Fig.2.Peoplematchingviewedasasub-graphmatchingproblem. • LetthesetofadjacentverticesofA bedenotedbyAdj(A ). 1 1 In Figure 2, Adj(A ) = A . For all vertices B in C , 1 2 k A1 findB ∈ Adj(B )suchthatB ∈ C . Now,computethe l k l A2 GlobalEvidenceformatchingA toB as: 1 k GE(A →B )=S(A →B)Ψ(e →e ). (5) 1 k 2 l A12 Bkl • Assign A to that vertex B ∈ C which maximizes the 1 k A1 GlobalEvidence. Thesimilarityscoreforthisvertexassign- mentiscomputedasS = S(A → B )+GE(A → A1 1 k 1 B ). Similarly compute the assignments for the remaining k verticesintheexamplegraph.Thenetsimilarityscoreforthe matchedsubgraphisgivenbyPNAS . (b) i=1 Ai 4. RESULTSANDCOMPARISON Fig.3. (a)Automaticallydetectedforeground(center)fromtheex- ampleframe(left)issegmented(right)togeta2vertexgraphrepre- Figures3and4illustratetheexamplebasedpeoplematchingalgo- sentation. (b)Thesegmentationofthequeryframes(leftcolumn)is rithmexplainedabove.InFigure3theexamplepersonismatchedto shownwithrandomcolors(centercolumn)whilethematchingresult theforegroundinqueryframesacquiredfromthesamecameralo- isshownintherightcolumn. cation. Itshouldbenotedthatthegreedyalgorithmisabletomake the correct matches in spite of variations in pose and considerable occlusion. InFigure4,theexamplepersoniscomparedtothefore- robustmatching. Theseresults(includingforegrounddetectionand groundinacompletelydifferentcameraview,leadingtosignificant segmentation)wereachievedatrun-timesoflessthan1secondper differencesinscaleandillumination. Theresultsshowthatourrep- queryframe,usingnon-optimizedexperimentalcode,onastandard resentation and matching algorithm can robustly handle variations laptopcomputerwitha1.8GHzCentrinoProcessor. Weusedalite in illumination, pose, scale, and camera-view. We also perform a versionoftheSIFTfeatureextraction,codeprovidedby[16].Color coarse comparison with the covariance distance approach used in density estimation was performed using the Improved Fast Gauss [15].3 We first compute the average similarity score S and av- avg Transformalgorithm[12]. eragecovariancedistanceD oftheexampleinFigure4(a)with avg respecttothesamepersoninthetopthreerowsinFigure4(b). We thencomputethesimilarity(S )andcovariancedistance(D ) bad bad toabadmatch(lastrowinFigure4(b)).Now,wecanapproximately 5. CONCLUSIONANDFUTUREDIRECTIONS comparethediscriminativepowerofthetwomeasuresbycomparing theratiosPS = SSabavdg andPD = DDabavdg.4 WeobservethatPS ≈5.0, Wepresentedanovelschemeutilizingreal-timeforeground/background whereasP ≈ 1.5fortheexampleinFigure4,indicatingthatour separationandlow-levelforegroundsegmentationtogenerateagraph- D similaritymeasureismorediscriminativeandhenceallowsformore ical model of the appearance and relationship between objects (or objectparts)intheforeground.Theeffectivenessofthisrepresenta- 3Itshouldbenotedthatthecovariancetrackerproposedin[15]doesnot tionwasdemonstratedwithanexamplebasedquerytypeofpeople separateforegroundfrombackground,whichleadstolessrobustmatching. matching algorithm, which along with its simple set-up, provides Becauseofourparticularforegroundrepresentation,weareabletocorrectly state-of-the-art results. We plan to further enhance the capability matchthepersonalongwithgivingexactsegmentationforthematchingre- ofourrepresentationbyimprovingthesegmentsrelationshipmodel gion. 4Pleasenotethatweuseasimilaritymeasure, whichis(conceptually) usingmorefeaturesandalsoincorporatinginformationabouttem- inverselyrelatedtothenotionofdistance.Hencetheuseofreciprocalratios poralvariations(temporaledges).Endeavorsinthesedirectionswill forPSandPD. bereportedelsewhere. 6. REFERENCES [1] X.RenandJ.Malik, “Learningaclassificationmodelforseg- mentation,” inICCV’03: ProceedingsoftheNinthIEEEIn- ternationalConferenceonComputerVision,Washington,DC, USA,2003,p.10,IEEEComputerSociety. [2] I. Haritaoglu, D. Harwood, and L. S. Davis, “Hydra: Mul- tiplepeopledetectionandtrackingusingsilhouettes,” inPro- ceedingsoftheSecondIEEEWorkshoponVisualSurveillance, Example Foreground Segments 1999,p.6. (a) [3] S.McKenna,S.Jabri,Z.Duric,andA.Rosenfeld, “Tracking groupsofpeople,”ComputerVisionandImageUnderstanding, vol.80,pp.42–56,2000. [4] S. Savarese, J. Winn, and A. Criminisi, “Discriminative ob- jectclassmodelsofappearanceandshapebycorrelatons,” in CVPR’06,2006,pp.2033–2040. [5] C.RichardWren,A.Azarbayejani,T.Darrell,andAlexPent- land, “Pfinder: Real-timetrackingofthehumanbody,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.19,no.7,pp.780–785,1997. [6] A. Elgammal and L. S. Davis, “Probabilistic framework for segmentingpeopleunderocclusion,” inInProc.ofIEEE8th InternationalConferenceonComputerVision,2001,pp.145– 152. [7] J.Limand D.Kriegman, “Tracking humansusing priorand learnedrepresentationsofshapeandappearance,” inIEEEIn- ternationalConferenceonAutomaticFaceandGestureRecog- nition, LosAlamitos, CA,USA,2004, vol.00, p.869, IEEE ComputerSociety. [8] L.Sigal,B.Sidharth,S.Roth,M.Black,andM.Isard,“Track- ingloose-limbedpeople,” inIEEEConf.OnComputerVision AndPatternRecognition,2004. [9] X.LanandD.P.Huttenlocher, “Aunifiedspatio-temporalar- ticulated model for tracking,” in IEEE Conf. On Computer VisionAndPatternRecognition,2004,vol.01,pp.722–729. [10] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph- basedimagesegmentation,”Int.J.Comput.Vision,vol.59,no. 2,pp.167–181,2004. [11] K. A. Patwardhan, G. Sapiro, and V. Morel- las, “A pixel layering framework for robust foreground detection in video,” Under Review, http://www.tc.umn.edu/∼patw0007/videolayers/index.html, June2006. [12] C. Yang, R. Duraiswami, N. Gumerov, and L. Davis, “Im- proved fast gauss transform and efficient kernel density esti- mation,” inInt’lConf.ComputerVision,2003,pp.464–471. (b) [13] D.G.Lowe,“Objectrecognitionfromlocalscale-invariantfea- tures,” inProc.oftheInternationalConferenceonComputer Fig.4. (a)Foreground(center)fromtheexampleframe(left)isseg- Vision,1999,pp.1150–1157. mented (right) to get a 3 vertex graph representation. (b) Top 3 [14] T.Kailath, “Thedivergenceandbhattacharyyadistancemea- rows: Notethattheexampleiscomparedwithframesfromadiffer- suresinsignalselection,” IEEETransactionsonCommunica- entcameralocation(leftcolumn), ourmatchingresultsareshown tions,vol.15,no.1,pp.52–60,Feb1967. in the center column. The column on the right is the rectangular [15] F.Porikli,O.Tuzel,andP.Meer, “Covariancetrackingusing window used to compute the covariance distance as used in [15]. modelupdatebasedonliealgebra,”inCVPR’06:Proceedings Lastrow: Wecomputethesimilarityscoreandcovariancedistance ofthe2006IEEEComputerSocietyConferenceonComputer betweentheexamplein(a)andlastrowof(b),inordertocoarsely VisionandPatternRecognition,2006,pp.728–735. comparethediscriminativepowerofourmatchingschemewiththe [16] A. Vedaldi, “Sift++ : http://vision.ucla.edu/ covariancedistance. ˜vedaldi/code/siftpp/siftpp.html,”2006.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.