Table Of ContentA GRAPH-BASED FOREGROUND REPRESENTATION AND ITS
APPLICATION IN EXAMPLE BASED PEOPLE MATCHING IN VIDEO
By
KedarA. Patwardhan
GuillermoSapiro
and
VassiliosMorellas
IMA Preprint Series # 2154
(January 2007)
INSTITUTE FOR MATHEMATICS AND ITS APPLICATIONS
UNIVERSITYOFMINNESOTA
400 Lind Hall
207 Church Street S.E.
Minneapolis, Minnesota 55455–0436
Phone:612/624-6066 Fax:612/626-7370
URL:http://www.ima.umn.edu
Report Documentation Page Form Approved
OMB No. 0704-0188
Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and
maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,
including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington
VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it
does not display a currently valid OMB control number.
1. REPORT DATE 3. DATES COVERED
JAN 2007 2. REPORT TYPE 00-00-2007 to 00-00-2007
4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER
A Graph-based Foreground Representation and Its Application in
5b. GRANT NUMBER
Example Based People Matching in Video (PREPRINT)
5c. PROGRAM ELEMENT NUMBER
6. AUTHOR(S) 5d. PROJECT NUMBER
5e. TASK NUMBER
5f. WORK UNIT NUMBER
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION
University of Minnesota,Institute for Mathematics and its REPORT NUMBER
Applications,207 Church Street SE,Minneapolis,MN,55455-0436
9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S)
11. SPONSOR/MONITOR’S REPORT
NUMBER(S)
12. DISTRIBUTION/AVAILABILITY STATEMENT
Approved for public release; distribution unlimited
13. SUPPLEMENTARY NOTES
14. ABSTRACT
In this work, we propose a framework for foreground representation in video and illustrate it with a
multi-camera people matching application. We first decompose the video into foreground and background.
A low-level coarse segmentation of the foreground is then used to generate a simple graph representation.
A vertex in the graph represents the "appearance" of a corresponding segment in the foreground, while
the relationship between two segments is encoded by an edge between the corresponding vertices. This
provides a simple yet powerful and general representation of the foreground, which can be very useful in
problems such as people detection and tracking. We illustrate the effectiveness of this model using an
"example based query" type of application for people matching in videos. Matching results are provided in
multiple-camera situations and also under occlusion.
15. SUBJECT TERMS
16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF
ABSTRACT OF PAGES RESPONSIBLE PERSON
a. REPORT b. ABSTRACT c. THIS PAGE Same as 5
unclassified unclassified unclassified Report (SAR)
Standard Form 298 (Rev. 8-98)
Prescribed by ANSI Std Z39-18
AGRAPH-BASEDFOREGROUNDREPRESENTATIONANDITSAPPLICATIONIN
EXAMPLEBASEDPEOPLEMATCHINGINVIDEO
KedarA.Patwardhan,GuillermoSapiro VassiliosMorellas
UniversityofMinnesota UniversityofMinnesota
DepartmentofElectricalandComputerEngineering DepartmentofComputerScience
Minneapolis,MN55455,{kedar,guille}@umn.edu Minneapolis,MN55455
spatial inter-relationships without using an absolute point of refer-
ence.
Previousworkontherepresentationofforegroundobjects(peo-
Example Frame Query Frame Matching Result ple)inavideoscenecomesfromtwomainareasincomputervision,
namely,trackingandposedetection. TheHydrasystem,[2],repre-
sentsforegroundpeopleasacombinationofa“head-detector”and
ABSTRACT anintensitybasedtemplatecorrelation.Thisrequirestheheadofthe
persontoalwaysbepartofthesilhouetteandtheappearancetem-
In this work, we propose a framework for foreground representa- plateusestheheadcenterasaspatialorigin. McKennaetal., [3],
tioninvideoandillustrateitwithamulti-camerapeoplematching representdifferentpeopleintheforegroundusingacolorhistogram
application.Wefirstdecomposethevideointoforegroundandback- oftheperson.Asshownin[4],suchhistogrambasedrepresentations
ground. Alow-levelcoarsesegmentationoftheforegroundisthen cannotdiscriminatecorrectlybetweentwoobjects(astheycanhave
usedtogenerateasimplegraphrepresentation.Avertexinthegraph thesamecolordistribution)withoutadditionalspatialinformation.
representsthe“appearance”ofacorrespondingsegmentinthefore- In[5],blobscorrespondingtopeopleintheforegroundareassigned
ground,whiletherelationshipbetweentwosegmentsisencodedby todifferentbodyparts(headandhands)totracksingleindividuals
anedgebetweenthecorrespondingvertices. Thisprovidesasimple inascene. Theauthorsof[6]segmentpeople(inasinglecamera)
yet powerful and general representation of the foreground, which underocclusionbyrepresentingthemasagroupof3segments(head
canbeveryusefulinproblemssuchaspeopledetectionandtrack- +torso+limbs),andthenestimatingthebestarrangementforpeo-
ing. We illustrate the effectiveness of this model using an “exam- pleinthesceneusingmaximum-likelihoodestimation. Theheadof
plebasedquery”typeofapplicationforpeoplematchinginvideos. thepersonisassumedtobevisiblethroughouttheocclusioninorder
Matchingresultsareprovidedinmultiple-camerasituationsandalso toestimatetheoriginfortheappearancemodel.Workin[7]reports
underocclusion. theuseofapersonmodellearnedofflinetodetectandrepresentthe
IndexTerms— Video,Imagematching,Imageanalysis,Machine people in the foreground, and an appearance model of the person
vision. islearnedonlinefortrackingaparticularperson. Recentworkson
poseestimationlike[8]utilizealooselimbedmodeltorepresentthe
3-Dposeofapersonintheforeground.Motioncapturedataaligned
1. INTRODUCTIONANDPREVIOUSWORK withacoordinateframeofthecalibratedcamerasisusedtoestimate
anddetectthelooselyconnectedlimbswithanon-parametricbelief
Theefficientrepresentationofforegroundobjectsinvideosisasig-
propagationalgorithm.In[9],theauthorsuseaBayesianframework
nificantcomponentofimportantcomputervisionapplicationssuch
to combine pictorial structure spatial models with hidden Markov
as tracking, detection, matching, and video-indexing. Most often,
temporalmodelstorepresentapersoninavideo.
therepresentationsproposedintheliteraturearehighlytaskorsit-
uationspecific,involvecomputationallyprohibitiveofflinetraining,
anddonoteffectivelyhandlechangesinscale,pose,orbackground.
Inthispaperweproposeamodelforforegroundrepresentation
In this paper we propose the use of a graphical model to repre-
which combines low-level segmentation and spatial reasoning in a
sentforegroundobjectsinavideo.Theautomaticallydetectedfore-
meaningfulway. Thisframeworkisintuitivelyidealandveryflexi-
groundis(coarsely)segmentedintoconnectedsegmentsor“super-
bleforhighleveltasksinforegroundanalysis,foregroundindexing
pixels”(borrowingatermfrom[1]).Eachofthesesegmentsiscon-
and retrieval, and other applications in multiple-camera scenarios.
sideredasavertexinourgraphicalmodel,andtherelationshipbe-
We use an example based people matching application to demon-
tweensegmentsisrepresentedbyanedge.Thismodelhelpstocap-
stratethepracticalutilityofourmodel. Theremainderofthispaper
ture the local information, i.e, appearance of the segments in the
isorganizedasfollows:Section2givesabriefdescriptionofthepro-
foreground,withoutanyexplicitshapemodel.Therepresentationof
posedgraphicalrepresentation.Thematchingapplicationisdetailed
interactionbetweensegmentsusingedgesallowsustoincorporate
in Section 3. Section 4 discusses the matching results and imple-
WorkpartiallysupportedbyONR,NSF,NGA,DARPA,andtheMcK- mentationissues, andfinallyweconcludewithfuturedirectionsin
nightFoundation. Section5.
3.1. ProblemSetupandSimilarityMeasures
A
ConsiderthesituationinFigure2. Theforegroundintheexample
e
AB e BC C framehasbeensegmentedintoverticesAi connectedbytheedges
B eAij wherei,j ∈ (1,2,...,NA)andi 6= j,NA beingthenumber
ofverticesintheexample(segmentsintheforeground). Theedge
C
eD connectionruleisverysimple,weconnecttwoverticesbyanedge
eBD onlywhenthecorrespondingsegmentsareconnected.Eachsegment
original segmented D intheforegroundismodeledusingthecolorhistogram1 generated
usingnon-parametricKernelDensityEstimation(referto[12]). We
graphical
alsousetheSIFTfeatures(referto[13])detectedinsidethesegment
representation
tomodelitsappearance.
The similarity S(A → B) of the segment (vertex) A in the
Fig.1. Foregroundfromtheoriginalframe(left)issegmented(cen-
exampleframetoasegmentBinthequeriedframeiscomputedas
ter) using the algorithm in [10], to generate a graphical model
aproductofthecolorandfeaturesimilarities:
(right).
S(A→B)=ρ(A,B)f(A→B), (1)
where, ρ(A,B) is the Bhattacharya coefficient, [14], and f(A →
2. THEFOREGROUNDMODEL
B)isthefeaturesimilarity:
Encoding the local appearance as well as spatial relationships be- Z p
ρ(A,B)= p (x)p (x)dx, (2)
tweenobjectsinascenehasalwaysbeenaverychallengingprob- A B
lemincomputervision. Inthispaperweproposeagraphstructure
torepresenttheforegroundinasceneasacombinationoflocalap- and
1
pearance models of image segments (vertices) along with a model f(A→B)= e−d((siftA),(siftB)), (3)
M
oftherelationshipbetweenthem(edges). Figure1depictsthepro- Asift
posedmodel.Thegraphicalmodelisgeneratedafter2preprocessing where,pA(x)andpB(x)arethedensityestimatesforpixelxcom-
steps: (a) Foreground/Background separation, and (b) Foreground putedfromthecolordistributionsinsegmentAandBrespectively,
Segmentation. d((siftA),(siftB))isthesumofsquaredEuclideandistancesbe-
The video is first decomposed into “foreground+background” tweenthefeaturesinB thatarethebestmatchesforfeaturesinA,
byusingourlayeringbasedforegrounddetectiontechniquedetailed andMAsift isthenumberofSIFTfeaturesinthesegmentA.
in [11]. This decomposition is very robust to motion in the back- Apart from this similarity meassure, we also use an edge or
ground(movingtrees,waterripples,shakycamera,etc)andprovides relationship similarity measure Ψ(eAij →eBkl). The spatial re-
areal-timeforegrounddetectioncapability. Oncetheforegroundis lationship between two connected segments/vertices in the exam-
detected, we perform a low-level color-based segmentation of the plegraphismodeledbythegraphedgeasasimple2-DGaussian
foreground objects using the algorithm proposed in [10] (other at- N(µr,µθ;σr,σθ), where µr is the magnitude of the vector con-
tributesbeyondcolorcouldbeusedaswell). Now,eachsegmentin necting the centroid of the adjacent segments (vertices), µθ is the
theforegroundisrepresentedbyavertexinourgraphicalmodel.The anglebetweenthisvectorandthehorizontalaxis,andσrandσθare
space(and/ortime)relationshipsbetweenthesegmentsismodeled theallowedvariancesrespectively.2 Thus,thesimilaritybetweenthe
bygraphedges(bi-directional). Itshouldbenotedthatthisrepre- relationshipbetweentwoverticesintheexampleframe(eAab)with
sentationisdifferentthanatypicalgraph,theedgeisnotjustacon- respecttotherelationshipbetweentwoverticesinthequeryframe
nectingmechanismbetweentwoverticescarryingsomeweight,the (eBcd)iscomputedas:
edgeactuallymodelstherelationshipbetweenthetwosegments,just
likethevertexmodelstheappearanceofthesegment. Thisframe- exp(−(rcd−µrab)2 − (θcd−µθab)2)
workcanthusallowforinlinelearningandtrackingofthevarious Ψ(eAab →eBcd)= 2σr22πσ σ 2σθ2 , (4)
r θ
componentsinthescene.Asapreliminaryinvestigation,thefollow-
ing sections demonstrate the capability of this type of foreground wherercdisthemagnitudeofthevectorconnectingthecentroidof
representationinaddressingthedifficultproblemofexamplebased adjacentverticescorrespondingtoeBcd,andθcdistheangleofthis
peoplematchinginmultiplecamerascenarios. vectorwiththehorizontalaxis.
3.2. MatchingAlgorithm
3. ILLUSTRATIVEAPPLICATION:
Weutilizeagreedyalgorithmtoperformasubgraphsearchasde-
EXAMPLEBASEDPEOPLEMATCHING
pictedinFigure2.Followingisabriefpseudo-codeofourmatching
algorithm(pleaserefertoFigure2):
Findingormatchingapersonviewedinonesceneinthesameora
• For each vertex say A in the example frame, compute the
completelydifferentsettingisanimportantprobleminapplications 1
similarity S(A → B ) using Equation (1), to all the ver-
suchassurveillance. Inthiswork,weassumethatwearegivenan 1 k
ticesB inthequeryframe,andretainthebestMmatchesin
“example”ofanisolatedperson(markedbythesecurityguardfor k
asetofcandidatesC ={B },k∈(1,2,...,N ).
example), and we want to search for the person viewed from the A1 k B
same or different (and non-overlapping) camera location. To this 1Weusether-g-Scolor-spaceasin[11].
end,weusethegraphicalrepresentationproposedabovetoconvert 2Inallourexperimentswehaveusedσr = 5pixelsandσθ = π/4
thisproblemintoasub-graphmatchingtask,Figure2. radians.
Foreground in Foreground in
Example Frame Query Frame
B
1
A
1 e
B12
e e B23 B3 Example Foreground Segments
A12 B
2 (a)
A
2
eB24
B
4
Fig.2.Peoplematchingviewedasasub-graphmatchingproblem.
• LetthesetofadjacentverticesofA bedenotedbyAdj(A ).
1 1
In Figure 2, Adj(A ) = A . For all vertices B in C ,
1 2 k A1
findB ∈ Adj(B )suchthatB ∈ C . Now,computethe
l k l A2
GlobalEvidenceformatchingA toB as:
1 k
GE(A →B )=S(A →B)Ψ(e →e ). (5)
1 k 2 l A12 Bkl
• Assign A to that vertex B ∈ C which maximizes the
1 k A1
GlobalEvidence. Thesimilarityscoreforthisvertexassign-
mentiscomputedasS = S(A → B )+GE(A →
A1 1 k 1
B ). Similarly compute the assignments for the remaining
k
verticesintheexamplegraph.Thenetsimilarityscoreforthe
matchedsubgraphisgivenbyPNAS . (b)
i=1 Ai
4. RESULTSANDCOMPARISON Fig.3. (a)Automaticallydetectedforeground(center)fromtheex-
ampleframe(left)issegmented(right)togeta2vertexgraphrepre-
Figures3and4illustratetheexamplebasedpeoplematchingalgo- sentation. (b)Thesegmentationofthequeryframes(leftcolumn)is
rithmexplainedabove.InFigure3theexamplepersonismatchedto shownwithrandomcolors(centercolumn)whilethematchingresult
theforegroundinqueryframesacquiredfromthesamecameralo- isshownintherightcolumn.
cation. Itshouldbenotedthatthegreedyalgorithmisabletomake
the correct matches in spite of variations in pose and considerable
occlusion. InFigure4,theexamplepersoniscomparedtothefore-
robustmatching. Theseresults(includingforegrounddetectionand
groundinacompletelydifferentcameraview,leadingtosignificant
segmentation)wereachievedatrun-timesoflessthan1secondper
differencesinscaleandillumination. Theresultsshowthatourrep-
queryframe,usingnon-optimizedexperimentalcode,onastandard
resentation and matching algorithm can robustly handle variations
laptopcomputerwitha1.8GHzCentrinoProcessor. Weusedalite
in illumination, pose, scale, and camera-view. We also perform a
versionoftheSIFTfeatureextraction,codeprovidedby[16].Color
coarse comparison with the covariance distance approach used in
density estimation was performed using the Improved Fast Gauss
[15].3 We first compute the average similarity score S and av-
avg Transformalgorithm[12].
eragecovariancedistanceD oftheexampleinFigure4(a)with
avg
respecttothesamepersoninthetopthreerowsinFigure4(b). We
thencomputethesimilarity(S )andcovariancedistance(D )
bad bad
toabadmatch(lastrowinFigure4(b)).Now,wecanapproximately 5. CONCLUSIONANDFUTUREDIRECTIONS
comparethediscriminativepowerofthetwomeasuresbycomparing
theratiosPS = SSabavdg andPD = DDabavdg.4 WeobservethatPS ≈5.0, Wepresentedanovelschemeutilizingreal-timeforeground/background
whereasP ≈ 1.5fortheexampleinFigure4,indicatingthatour separationandlow-levelforegroundsegmentationtogenerateagraph-
D
similaritymeasureismorediscriminativeandhenceallowsformore ical model of the appearance and relationship between objects (or
objectparts)intheforeground.Theeffectivenessofthisrepresenta-
3Itshouldbenotedthatthecovariancetrackerproposedin[15]doesnot tionwasdemonstratedwithanexamplebasedquerytypeofpeople
separateforegroundfrombackground,whichleadstolessrobustmatching. matching algorithm, which along with its simple set-up, provides
Becauseofourparticularforegroundrepresentation,weareabletocorrectly
state-of-the-art results. We plan to further enhance the capability
matchthepersonalongwithgivingexactsegmentationforthematchingre-
ofourrepresentationbyimprovingthesegmentsrelationshipmodel
gion.
4Pleasenotethatweuseasimilaritymeasure, whichis(conceptually) usingmorefeaturesandalsoincorporatinginformationabouttem-
inverselyrelatedtothenotionofdistance.Hencetheuseofreciprocalratios poralvariations(temporaledges).Endeavorsinthesedirectionswill
forPSandPD. bereportedelsewhere.
6. REFERENCES
[1] X.RenandJ.Malik, “Learningaclassificationmodelforseg-
mentation,” inICCV’03: ProceedingsoftheNinthIEEEIn-
ternationalConferenceonComputerVision,Washington,DC,
USA,2003,p.10,IEEEComputerSociety.
[2] I. Haritaoglu, D. Harwood, and L. S. Davis, “Hydra: Mul-
tiplepeopledetectionandtrackingusingsilhouettes,” inPro-
ceedingsoftheSecondIEEEWorkshoponVisualSurveillance,
Example Foreground Segments
1999,p.6.
(a)
[3] S.McKenna,S.Jabri,Z.Duric,andA.Rosenfeld, “Tracking
groupsofpeople,”ComputerVisionandImageUnderstanding,
vol.80,pp.42–56,2000.
[4] S. Savarese, J. Winn, and A. Criminisi, “Discriminative ob-
jectclassmodelsofappearanceandshapebycorrelatons,” in
CVPR’06,2006,pp.2033–2040.
[5] C.RichardWren,A.Azarbayejani,T.Darrell,andAlexPent-
land, “Pfinder: Real-timetrackingofthehumanbody,” IEEE
Transactions on Pattern Analysis and Machine Intelligence,
vol.19,no.7,pp.780–785,1997.
[6] A. Elgammal and L. S. Davis, “Probabilistic framework for
segmentingpeopleunderocclusion,” inInProc.ofIEEE8th
InternationalConferenceonComputerVision,2001,pp.145–
152.
[7] J.Limand D.Kriegman, “Tracking humansusing priorand
learnedrepresentationsofshapeandappearance,” inIEEEIn-
ternationalConferenceonAutomaticFaceandGestureRecog-
nition, LosAlamitos, CA,USA,2004, vol.00, p.869, IEEE
ComputerSociety.
[8] L.Sigal,B.Sidharth,S.Roth,M.Black,andM.Isard,“Track-
ingloose-limbedpeople,” inIEEEConf.OnComputerVision
AndPatternRecognition,2004.
[9] X.LanandD.P.Huttenlocher, “Aunifiedspatio-temporalar-
ticulated model for tracking,” in IEEE Conf. On Computer
VisionAndPatternRecognition,2004,vol.01,pp.722–729.
[10] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graph-
basedimagesegmentation,”Int.J.Comput.Vision,vol.59,no.
2,pp.167–181,2004.
[11] K. A. Patwardhan, G. Sapiro, and V. Morel-
las, “A pixel layering framework for robust
foreground detection in video,” Under Review,
http://www.tc.umn.edu/∼patw0007/videolayers/index.html,
June2006.
[12] C. Yang, R. Duraiswami, N. Gumerov, and L. Davis, “Im-
proved fast gauss transform and efficient kernel density esti-
mation,” inInt’lConf.ComputerVision,2003,pp.464–471.
(b)
[13] D.G.Lowe,“Objectrecognitionfromlocalscale-invariantfea-
tures,” inProc.oftheInternationalConferenceonComputer
Fig.4. (a)Foreground(center)fromtheexampleframe(left)isseg- Vision,1999,pp.1150–1157.
mented (right) to get a 3 vertex graph representation. (b) Top 3 [14] T.Kailath, “Thedivergenceandbhattacharyyadistancemea-
rows: Notethattheexampleiscomparedwithframesfromadiffer- suresinsignalselection,” IEEETransactionsonCommunica-
entcameralocation(leftcolumn), ourmatchingresultsareshown tions,vol.15,no.1,pp.52–60,Feb1967.
in the center column. The column on the right is the rectangular
[15] F.Porikli,O.Tuzel,andP.Meer, “Covariancetrackingusing
window used to compute the covariance distance as used in [15]. modelupdatebasedonliealgebra,”inCVPR’06:Proceedings
Lastrow: Wecomputethesimilarityscoreandcovariancedistance ofthe2006IEEEComputerSocietyConferenceonComputer
betweentheexamplein(a)andlastrowof(b),inordertocoarsely VisionandPatternRecognition,2006,pp.728–735.
comparethediscriminativepowerofourmatchingschemewiththe
[16] A. Vedaldi, “Sift++ : http://vision.ucla.edu/
covariancedistance.
˜vedaldi/code/siftpp/siftpp.html,”2006.