ebook img

Continuous Localization and Mapping of a Pan Tilt Zoom Camera for Wide Area Tracking PDF

4.9 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Continuous Localization and Mapping of a Pan Tilt Zoom Camera for Wide Area Tracking

NonamemanuscriptNo. (willbeinsertedbytheeditor) Continuous Localization and Mapping of a Pan Tilt Zoom Camera for Wide Area Tracking GiuseppeLisanti · IacopoMasi · FedericoPernici · AlbertoDelBimbo 5 1 Received:March18,2015 0 2 r Abstract Pan-tilt-zoom(PTZ)camerasarepowerfultosup- area coverage and close-up views at high resolution. This a portobjectidentificationandrecognitioninfar-fieldscenes. capabilityisparticularlyusefulinsurveillanceapplications M However,theeffectiveuseofPTZcamerasinrealcontexts topermittrackingoftargetsinhighresolutionandzooming 3 iscomplicatedbythefactthatacontinuouson-linecamera inonbiometricdetailsofpartsofthebodyinordertoresolve 2 calibrationisneededandtheabsolutepan,tiltandzoompo- ambiguitiesandunderstandtargetbehaviors. sitional values provided by the camera actuators cannot be However,thepracticaluseofPTZcamerasinrealcon- ] V used because are not synchronized with the video stream. textsofoperationiscomplicateduetoseveralreasons.First, C So,accuratecalibrationmustbedirectlyextractedfromthe the geometrical relationship between the camera view and visualcontentoftheframes.Moreover,thelargeandabrupt the3Dobservedsceneistime-varyinganddependsoncam- . s scalechanges,thescenebackgroundchangesduetothecam- eracalibration.Unfortunately,theabsolutepantiltandzoom c [ eraoperationandtheneedofcameramotioncompensation positional values provided by the camera actuators, even maketargettrackingwiththesecamerasextremelychalleng- when they are sufficiently precise, in most cases are not 2 v ing. In this paper, we present a solution that provides con- synchronized with the video stream, and, for IP cameras, 6 tinuouson-linecalibrationofPTZcameraswhichisrobust a constant frame rate cannot be assumed. So, accurate cal- 0 to rapid camera motion, changes of the environment due ibration must be extracted from the visual content of the 6 to illumination or moving objects and scales beyond thou- frames. Second, the pan tilt and zooming facility may de- 6 . sands of landmarks. The method directly derives the rela- termine large and abrupt scale changes. This prevents the 1 tionship between the position of a target in the 3D world assumption of smooth camera motion. Moreover, since the 0 4 plane and the corresponding scale and position in the 2D scenebackgroundiscontinuouslychanging,someadaptive 1 image,andallowsreal-timetrackingofmultipletargetswith representationofthesceneunderobservationbecomesnec- : highandstabledegreeofaccuracyevenatfardistancesand essary. All these facts have significant impact also on the v i anyzoominglevel. possibility of having effective target detection and tracking X inreal-time.Duetothiscomplexity,thereisasmallbodyof r Keywords RotatingandZoomingCamera·PTZSensor· a literatureontrackingwithPTZcamerasandmostoftheso- LocalizationandMapping·MultipleTargetTracking lutionsproposedwerelimitedtoeitherunrealisticorsimple andrestrictedcontextsofapplication. In the following, we present a novel solution that pro- 1 Introduction videscontinuousadaptivecalibrationofaPTZcameraand enablesreal-timetrackingoftargetsin3Dworldcoordinates Pan-tilt-zoom(PTZ)camerasarepowerfultosupportobject in generalcontexts ofapplication. Wedemonstrate that the identification and recognition in far-field scenes. They are method is effective and is robust over long time periods of equipped with adjustable optical zoom lenses that can be operation. manually or automatically controlled to permit both wide Thesolutionhastwodistinctstages.Intheoff-linestage, wecollectafinitenumberofkeyframestakenfromdifferent MediaIntegrationandCommunicationCenter,UniversityofFlorence, VialeMorgagni65,Florence,50134,Italy viewpoints,andforeachkeyframedetectandstorethescene +39055275-1390 landmarks and the camera pose. In the on-line stage, we 2 GiuseppeLisantietal. perform camera calibration by estimating the homographic EKFsequentialfiltering,theysuggestedtousekeyframesto transformation between the camera view and the 3D world achievescalableperformance.Theyclaimedtoprovidefull plane at each time instant from the matching between the PTZ camera self-calibration but did not demonstrate cali- current view and the keyframes. Changes in the scene that bration with variable focal length. The main drawback of have occurred over time due to illumination or objects are all these methods is that they assume that the scene is al- accounted with an adaptive representation of the scene un- moststationaryandchangesareonlyduetocameramotion, derobservationbyupdatingtheuncertaintyinlandmarklo- which is a condition that is unlikely to happen in real con- calization. The relationship between target position in the texts. 3Dworldplaneanditspositioninthe2Dimageallowsusto Wu and Radke [9] presented a method for on-line PTZ estimatethescaleoftargetineachframe,compensatecam- camera self-calibration based on a camera model that ac- eramotionandperformaccuratemulti-targetdetectionand countsforchangesoffocallengthandlensdistortionatdif- trackingin3Dworldcoordinates. ferent zooming levels. The authors claimed robustness to smoothscenebackgroundchangesanddrift-freeoperation, withhighercalibrationaccuracythan[3,4]especiallyathigh 2 Relatedwork zoomlevels.However,asreportedbytheauthors,thismethod failswhenalargecomponentinthesceneabruptlymodifies Inthefollowing,wereviewtheresearchpapersthataremost itspositionorthebackgroundchangesslowly.Itistherefore relevantforthescopeofthiswork.Inparticular,wereview mostlyusablewithstationaryscenes.Asimilarstrategywas separately solutions for self-calibration and target tracking also applied in [10], but accounts for pan and tilt camera withmovingandPTZcameras. movements,only. Otherauthorsdevelopedveryeffectivemethodsforpose PTZcameraself-calibration estimation of moving cameras with pre-calibrated internal camera parameters [5,6]. In [5], Klein and Murray applied Hartleyetal.[1]werethefirsttodemonstratethepossibil- on-linebundleadjustmenttothefivenearestkeyframessam- ityofperformingself-calibrationofPTZcamerasbasedon pled every ten frames of the sequence. In [6], Williams et imagecontent.However,sincecalibrationisperformedoff- al. used a randomized lists classifier to find the correspon- line,theirmethodcannotbeappliedinreal-timecontextsof dencesbetweenthefeaturesinthecurrentviewandthe(pre- operation. The method was improved in [2] with a global calculated)featuresfromallthepossibleviewsofthescene, optimizationoftheparameters. withRANSACrefinement.Howeverboththeseapproaches, Solutions for on-line self-calibration and pose estima- ifappliedtoaPTZcamera,arelikelytoproduceover-fitting tion of moving and PTZ cameras were presented by sev- in the estimation of the camera parameters at progressive eral authors. Among them, the most notable contributions zoomingsin. werein[3,4,5,6,7,8,9].SinhaandPollefeysin[3]usedthe method of [2] to obtain off-line a full mosaic of the scene. Featurematchingandbundleadjustmentwereusedtoesti- TrackingwithPTZcameras matethevaluesoftheintrinsicparametersfordifferentpan and tilt angles at the lowest zooming level, and the same SolutionstoperformgeneralobjecttrackingwithPTZcam- process is repeated until the intrinsic parameters are esti- eraswereproposedbyafewauthors.Haymanetal.[11]and mated for the full range of views and zoomings. In [4] the Tordoffetal.[12]proposedsolutionstoadaptthePTZcam- same authors suggested that on-line control of a PTZ cam- era focal length to compensate the changes of target size, erainclosedloopcouldbeobtainedbymatchingthecurrent assumingasingletargetinthesceneandfixedsceneback- frame with the full mosaic. However, their paper does not ground. In particular, in [11], the authors used the affine include any evidence of the claims nor provides any eval- transformappliedtolinesandpointsofthescenebackground; uation of the accuracy of the on-line calibration. Civera et in[12]thePTZcamerafocallengthisadjustedtocompen- al.[7],proposedamethodthatexploitsreal-timesequential satedepthmotionofthetarget.Kumaretal.[13]suggested mosaicingofascene.TheyusedSimultaneousLocalization toadaptthevarianceoftheKalmanfiltertothetargetshape andMapping(SLAM)withExtendedKalmanFilter(EKF) changes.Theyperformedcameramotioncompensationand to estimate the location and orientation of a PTZ camera implementedalayeredrepresentationofspatialandtempo- and included the landmarks of the scene in the filter state. ral constraints on shape, motion and appearance. However, This solution cannot scale with the number of scene land- the method is likely to fail in the presence of abrupt scale marks. Moreover, they only considered the case of camera changes. In [14], Varcheie and Bilodeau addressed target rotations, and did not account for zooming. Lovegrove et trackingwithIPPTZcameras,inthepresenceoflowandir- al.[8]obtainedthecameraparametersbetweenconsecutive regularframerate.Tofollowthetarget,theycommandedthe imagesbywholeimagealignment.Asanalternativetousing PTZmotorswiththepredictedtargetposition.Afuzzyclas- ContinuousLocalizationandMappingofaPanTiltZoomCameraforWideAreaTracking 3 sifier is used to sample the target likelihood in each frame. the2Dimageandthe3Dworldplanepermitsmoreeffective Since zooming is not managed, this approach can only be targetdetection,dataassociationandreal-timetracking. applied in narrow areas. The authors in [15] assumed that Some of the ideas for calibration contained in this pa- PTZ focal length is fixed and coarsely estimated from the perwerepresentedwithpreliminaryresultsundersimplified camera CCD pixel size. They performed background sub- assumptions in [20,21]. Targets were detected manually in tractionbycameramotioncompensationtoextractandtrack the first frame of the sequence and the scene was assumed targets. This method is therefore unsuited for wide areas almost static through time. Therefore we could not main- monitoringandhighlydynamicscenes. tain camera calibration over hours of activity, neither sup- SolutionsfortrackingwithPTZcamerasinspecificdo- portrapidcameramotion. mains of application were proposed in [16,17,18,19]. All these methods exploit context-specific fiducial markers to obtain an absolute reference and compute the time-varying 3 PTZCameraCalibration relationship between the positions of the targets in the 2D imageandthoseinthe3Dworldplane.In[17],theauthors In the following, we introduce the scene model and define used the a-priori known circular shape of the hockey rink thevariablesused.Thenwediscusstheoff-linestage,where and playfield lines to locate the reference points needed to a scene map is obtained from the scene landmarks of the estimatetheworld-to-imagehomographyandcomputecam- keyframes,andtheon-linestage,whereweperformcamera eramotioncompensation.Thehockeyplayersweretracked poseestimationandupdatingofthescenemap. usingadetectorspecializedforhockeyplayerstrainedwith Adaboostandparticlefilteringbasedonthedetector’sconfi- dence[16].Thechangesinscaleofthetargetswasmanaged 3.1 Scenemodel withsimpleheuristicsusingwindowsslightlylarger/smaller WeconsideranoperatingscenariowhereasinglePTZcam- than the current target size. Similar solutions were applied eraisallowedrotatingarounditsnodalpointandzooming, insoccergames[18,19]. whileobservingtargetsthatmoveoveraplanarscene.The Beyondthefactthatthesesolutionsaredomain-specific followingentitiesaredefinedastime-varyingrandomvari- andhavenogeneralapplicability,themaindrawbackisthat ables: fiducial markers are likely to be occluded and impair the qualityoftracking. – Thecamerapose c.Cameraposeisdefinedintermsof the pan and tilt angles (ψ and φ, respectively), and fo- cal length f of the camera. Since the principal point is 2.1 ContributionsandDistinguishingFeatures apoorlyconditionedparameter,itisassumedtobecon- stant in order to obtain a more precise calibration [2]. Themaincontributionsofthesolutionproposedare: Radialdistortionwasnotconsideredsinceitcanbeas- sumedtobenegligibleforzoomingoperations[4]. – Wedefineamethodforon-linePTZcameracalibration – The scene landmarks u. These landmarks account for that jointly estimates the pose of the camera, the focal salient points of the scene background. In the off-line lengthandthescenelandmarklocations.Underreason- stageSURFkeypoints[22]aredetectedinkeyframeim- able assumptions, such estimation is Bayes-optimal, is agessampledatfixedintervalsofpan,tiltandfocallength. very robust to zoom and camera motion and scales be- ASURFdescriptorisassociatedtoeachlandmark.These yond thousands of scene landmarks. The method does landmarkschangeduringtheon-linecameraoperation. notassumeanytemporalcoherencebetweenframesbut – TheviewmapmandscenemapM.Aviewmapiscre- onlyconsiderstheinformationinthecurrentframe. atedforeachkeyframethatcollectsthescenelandmarks – Weprovideanadaptiverepresentationofthesceneunder (i.e.m={u }).Thescenemapisobtainedastheunion observationthatmakesPTZcameraoperationsindepen- i ofalltheviewmapsandcollectsallthescenelandmarks dentofthechangesofthescene. that have been detected at different pan, tilt and focal – Fromtheoptimallyestimatedcameraposeweinferthe lengths values (i.e. M = {m }). Since the scene land- expectedscaleofatargetatanyimagelocationandcom- k markschangethroughtime,thesemapswillchangeac- pute the relationship between the target position in the cordingly. 2Dimageandthe3Dworldplaneateachtimeinstant. – Thelandmarkobservationsv.Theselandmarksaccount Differentlyfromtheothersolutionspublishedinthelit- forthesalientpointsthataredetectedinthecurrentframe. erature like [4], [7], [8] and [9] our approach allows per- They can either belong to the scene background or to forming on-line PTZ camera calibration also in dynamic targets. The SURF descriptors of the landmark obser- scenes. Estimation of the relationship between positions in vationsv arematchedwiththedescriptorsofthescene 4 GiuseppeLisantietal. Fig.1:Mainentitiesandtheirrelationships:thecurrentframeandthelandmarkobservationsextractedv;theviewmapsmincludingthescene landmarksu;theinitialscenemapMobtainedfromtheunionoftheviewmaps;the3Dscene;thefunctionsthatrepresenttherelationships betweentheseentities. landmarks u, in order to estimate the camera pose and estimatedasintheusualwayofplanarmosaicing[1]: updatethescenemap. – Thetargetstates.Thetargetstateisrepresentedin3D world coordinates and includes both the position and Hrk =KrRrR−k1K−k1 (1) speed of a target. It is assumed that targets move on a planarsurface,i.e.Z =0,sothats=[X,Y,X˙,Y˙]. The optimal values of both the external camera parameter – Thetargetobservationsinthecurrentframe,p.Thisisa matrix R and the internal camera parameter matrix K are locationinthecurrentframethatislikelytocorrespond k k estimatedbybundleadjustmentforeachkeyframek. tothelocationofatarget.Ateachtimeinstanttthereis anon-linearandtimevaryingfunctiongrelatingthepo- Differentlyfrom[5],weusebundleadjustmentforoff- sitionofthetargetinworldcoordinatesstothelocation linescenemapinitializationandusethewholesetofkeyframes p of the target in the image. Its estimation depends on of the scene at multiple zoomings. Since keyframes were thecameraposecandthescenemapMattimet. taken by uniform sampling of the parameter space, over- fitting of camera parameters is avoided. This results in a Fig. 1 provides an overview of the main entities of the more accurate on-line estimation of the PTZ parameters. scenemodelandtheirrelationships. Thedifferenceintheaccuracyoftheestimationisespecially sensibleinthecaseinwhichPTZoperatesathighzooming. Fig. 2 shows an example of estimation of the focal length 3.2 Off-lineSceneMapInitialization with the two approaches for a sample sequence with right In the off-line stage, image views (keyframes) are taken at panningandprogressivezooming-in. regularsamplesofpanandtiltanglesandfocallength,and The pan, tilt, zoom values of the camera actuators are viewmapsm arecreatedsotocovertheentirescene.SURF stored in order to uniquely identify each view map. The k keypoints[22]areorganizedinak-dtreeforeachviewmap. complete scene map M is obtained as the union of all the Givenareferencekeyframeandthecorrespondingview view maps. Differently from [20], a forest of k-d trees is mapm ,thehomographythatmapseachm tom canbe usedformatching. r k r ContinuousLocalizationandMappingofaPanTiltZoomCameraforWideAreaTracking 5 (a) (b) Fig.2:Estimationsofthecamerafocallengthofthelastframeofasequencewithrightpanningandprogressivezoomingin:a)usingtheon-line bundleadjustmentof[5];b)usingouroff-linesolutionwithkeyframesobtainedbyuniformsamplingofthecameraparameterspaceandthelast frame.Thefocallengthofthelastframeisrepresentedwithasquareboxonthescenemosaic.Focallengthestimationisrespectively741.174 pixelsand2097.5pixels.Thetruefocallengthis2085pixels. 3.3 On-linecameraposeestimationandmapping In order to make the problem scalable with respect to the number of landmarks, Eq. (2) is approximated by decou- The positional values provided by the camera actuators at plingcameraposeestimationfrommapupdating: each time instant, although not directly usable for on-line (cid:0) (cid:1) (cid:0) (cid:1) p c(t)|v(t),M(t−1) p M(t)|v(t),c(t),M(t−1) (3) camera calibration, are nevertheless sufficiently precise to (cid:124) (cid:123)(cid:122) (cid:125)(cid:124) (cid:123)(cid:122) (cid:125) retrieve the view map mk(cid:63) with the closest values of pan, cameraposeestimation mapupdating tilt and focal length. This map is likely to have almost the Consideringthetheviewmapm withtheclosestval- samecontentasthecurrentframeandmanylandmarkswill k(cid:63) uesofpan,tiltandfocallengthandapplyingBayestheorem match.Thelandmarksmatchedcanbeusedtoestimatethe tothemapupdatingterm,Eq.(3)canberewrittenas: homographyH(t)fromthecurrentviewtom (t).Match- k(cid:63) (cid:0) (cid:1) ingisperformedaccordingtoNearestNeighbordistancera- p m (t)|v(t),c(t),m (t−1) = k(cid:63) k(cid:63) tio as in [23] and RANSAC. To reduce the computational (cid:0) (cid:1) (cid:0) (cid:1) p v(t)|c(t),m (t) p m (t)|m (t−1) , (4) k(cid:63) k(cid:63) k(cid:63) effort of matching, only a subset of the landmarks in m k(cid:63) (cid:0) (cid:1) where the term p m (t)|m (t−1) indicates that view is taken by random sampling. The descriptors of the land- k(cid:63) k(cid:63) mapm (t)attimetdependonlyonm (t−1).Assuming marksmatchedareupdatedusingarunningaveragewitha k(cid:63) k(cid:63) thatforeachcameraposetheobservationlandmarksv that forgettingfactor. i matchthescenelandmarksu inm (t)areindependentof TheoptimalestimationofH(t)onthebasisofthecorre- i k(cid:63) eachother,i.e.: spondencesbetweenlandmarkobservationsv (t)andscene i landmarks ui(t) is fundamental for effective camera pose p(cid:0)v(t)|c(t),mk(cid:63)(t)(cid:1)=(cid:89)p(cid:0)vi(t)|c(t),ui(t)(cid:1), (5) (pan,tilt,focallength)estimationandmappinginrealcon- i ditions.However,changesofthevisualenvironmentdueto Eq.(4)modifiesin: illuminationortoobjectsentering,leavingorchangingpo- (cid:0) (cid:1) sitioninthesceneinducemodificationsoftheoriginalscene p mk(cid:63)(t)|v(t),c(t),mk(cid:63)(t−1) = map as time progresses. Moreover, imprecisions in the de- (cid:89) (cid:0) (cid:1) (cid:0) (cid:1) p v (t)|c(t),u (t) p u (t)|u (t−1) , (6) i i i i tectionandestimationprocessmightaffectscenelandmark i estimation and localization. To this end, under reasonable (cid:0) (cid:1) where p u (t)|u (t−1) is the prior pdf of the i-th scene i i assumptions, we derive a linear measurement model that landmark at time t given its state at time t−1. Under the accounts for all the sources of error of landmark observa- assumptions that both scene landmarks u (t) and the key- i tions, that permits to obtain the optimal localization of the point localization error have a Gaussian distribution, and scenelandmarks.Permanentmodificationsofthesceneare thatDirectLinearTransformisused,theobservationmodel accounted through a landmark birth-death process that in- (cid:0) (cid:1) p v (t)|c(t),u (t) canbeexpressedas: i i cludesnewlandmarksanddiscardstemporarychanges. v (t)=H (t)u (t)+λ (t), (7) i i i i Closed-formrecursiveestimationofscenelandmarks whereH (t)isthe2×2matrixobtainedbylinearizingthe i homographyH(t)atv (t)andλ (t)isanadditiveGaussian i i Cameraposeestimationandmappingrequiresinferenceof noisetermwithcovarianceΛ (t)thatrepresentsthewhole i thejointprobabilityofthecameraposec(t)andsceneland- errorinthelandmarkmappingprocess.Thiscovariancecan marklocationsinthemapM(t),giventhelandmarkobser- be expressed in closed form and in homogeneous coordi- vationsvuntiltimetandtheinitialscenemapM(0): natesas: p(cid:0)c(t),M(t)|v(0:t),M(0)(cid:1). (2) Λ (t)=B (t)Σ (t)B (t)(cid:62)+Λ(cid:48) +H(t)−1P (t)H(t)−(cid:62), (8) i i i i i i 6 GiuseppeLisantietal. where the three terms account respectively for the spatial distributionofthematchedlandmarks,thecovarianceofkey- point localization in the current frame and the uncertainty associatedtothescenelandmarkpositionsintheviewmap. InEq.(8),Σ (t)isthe9×9homographycovariancematrix i (calculatedinclosedformaccordingto[24])andB (t)isthe i 3×9blockmatrixoflandmarkobservations;Λ(cid:48) modelsthe i keypointdetectionerrorcovariance;P (t)isthecovariance i oftheestimatedlandmarkpositiononthenearestviewmap, andHisobtainedfromtheDirectLinearTransform.Covari- ance Λ (t) can be directly obtained as the 2×2 principal i Fig. 3: Proximity check for scene map updating. Current frame and minorofΛi(t). itsnearestkeyframeinthescenemap.Matchedlandmarksandanew Theoptimallocalizationofthescenelandmarksisthere- landmarkareshowninmagentaandwhite,respectively,togetherwith theirboundingboxes. fore obtained in closed form through multiple applications of the Extended Kalman Filter to each landmark observa- tion,withtheKalmangainbeingcomputedas: Localizationinworldcoordinates Looking at Fig. 1, the time varying homography G(t) (in homogeneouscoordinates),mappingatargetpositioninthe Ki(t)=Pi(t|t−1)Hi(t)−1(cid:2)Hi(t)−1Pi(t|t−1)Hi(t)−(cid:62)+Λi(t)(cid:3)−1, world plane to its position p in the current frame, can be (9) representedas: (cid:0) (cid:1)−1 G(t)= H H H(t) , (10) where P is the Kalman covariance of the i-th scene land- W rk(cid:63) i mark. where H is the stationary homography from the mosaic W planetothe3Dworldplane: Birth-deathofscenelandmarks H =H H , (11) W s p Objectsthatenterorleavethesceneintroducemodifications that can be obtained as the product of the rectifying ho- of the original scene map. Their landmarks are not taken mographyHp (derivedfromtheprojectionsofthevanishing into account in the computation of H(t) at the current time points by exploiting the single view geometry of the pla- (they are the RANSAC outliers in the matching process), nar mosaic1 [26]) and transformation Hs from pixels in the butaretakenintoaccountinthelongterm,inordertoavoid mosaic plane to 3D world coordinates (estimated from the that the representation of the original scene becomes dras- projectionoftwopointsataknowndistanceLintheworld tically different from that of the current scene. We assume planeontotwopointsinthemosaicplaneasinFig.4). thatnewlandmarksthatpersistin20consecutiveframesand are closest to the already matched landmarks have higher probabilityofbelongingtoanewsceneelement(theyhave 4 TargettrackingwithPTZcameras smallercovarianceaccordingtoEq.(8)).Accordingtothis, we implemented a proximity check (Fig. 3) that computes We perform multi-target tracking in 3D world coordinates such probability as the ratio between the bounding box of using the Extended Kalman Filter. Data association to dis- the landmarks matched and the extended bounding box of criminatebetweentargettrajectoriesisimplementedaccord- the new landmark (respectively box A and B in Fig. 3). ingtotheCheap-JPDAFmodel[27]. Such candidate landmarks are included in mk(cid:63) using the The relationship between the image plane and the 3D homographyH(t).Landmarksareterminatedwhentheyare worldplane ofEq.(10) allowsusto obtainthetarget scale nomorematchedinconsecutiveframes. and perform tracking in the 3D world plane. As it will be Sincethetransformationbetweentwonearframesunder showninSection5,trackinginthe3Dworldplaneallowsa pantiltandzoomcanbelocallyapproximatedbyasimilar- betterdiscriminationbetweentargets. ity transformation, the asymptotic stability of the updating 1 In the case of a PTZ sensor, the homography between each procedureisguaranteedbytheMultiplicativeErgodicTheo- keyframeandthereferencekeyframeistheinfinitehomographyH∞ rem[25].Therefore,wecanassumethatnosensibledrifting thatputsinrelationvanishinglinesandvanishingpointsbetweenthe isintroducedinthescenelandmarkupdating. images. ContinuousLocalizationandMappingofaPanTiltZoomCameraforWideAreaTracking 7 Fig.4:Thetransformationfromthe2Dmosaicplane(Left)tothe3Dworldplane(Right).Thevanishingpointsandthevanishinglinesareused forthecomputationofmatrixHp.ApairofcorrespondingpointstocomputeHsisshown. 4.1 Targetscaleestimation where A is the 4 × 4 constant velocity transition matrix andQisthe4×4processnoisematrix.Formultipletarget At each time instant t, the homography G(t) permits to de- tracking,G(t)influencesthetargetcovarianceoftheCheap- rivethehomologyrelationshipthatdirectlyprovidesthescale JPDAFrespectivelyfortheKalmangainexpression: atwhichthetargetisobservedinthecurrentframe: W(t)=P(t|t−1)G(t)S(t|t)−1, (16) h(t)=W(t)p(t) (12) andthetargetcovarianceontheimageplane: whereh(t)andp(t)arerespectivelythepositionofthetarget S(t|t)=G(t)P(t|t−1)G(t)(cid:62)+V(t), (17) topandbottomintheimageplaneandW(t)isdefinedas: whereV(t)isthecovariancematrixofthemeasurementer- v (t)·l(cid:62)(t) W(t)=I+(µ−1) ∞ ∞ , (13) rorofEq.(14). v(cid:62)(t)·l (t) ∞ ∞ whereIistheidentitymatrix,l (t)istheworldplanevan- ∞ 5 Experimentalresults ishing line, v (t) is the vanishing point of the world nor- ∞ malplanedirection,andµisthecross-ratio.Thevanishing InthisSectionwereportonanextensivesetofexperiments point v (t) is computed as v (t) = K(t)K(t)(cid:62) · l (t), ∞ ∞ ∞ toassesstheaccuracyofourPTZcameracalibrationmethod withl (t)=G(t)·[0, 0, 1](cid:62) andK(t)isderivedfromH(t) ∞ and its effective exploitation for real-time multiple target asin[20].Estimationofthetargetscaleallowsustoapply tracking. the detector at a single scale instead of multiple scales and improve in both recall and computational performance for detectionandtracking. 5.1 PTZcameracalibration In the following, we summarize the experiments that vali- 4.2 MultipleTargetTracking dateourapproachforcameracalibration.Wejustifytheuse ofmotoractuatorstoretrievetheclosestscenemap;were- TheExtendedKalmanfilterobservationmodelforeachtar- portontheprecisionoftheoff-linescenemapinitialization getisdefinedas: andtheon-linecameraposeestimationandmapping. (cid:0) (cid:1) (cid:2) (cid:3) p(t)=g s(t),t = G(t)0 s(t)+ζ(t), (14) 2×2 AccuracyofPTZmotoractuators whereζ(t)isaGaussiannoisetermwithzeromeananddi- agonalcovariancethatmodelsthetargetlocalizationerrorin We validated the use of pan tilt and zoom values provided thecurrentframe;s(t)isthetargetstate,representedin3D by the camera motor actuators to retrieve the closest view world coordinates, G(t) is the homography G(t) linearized map,bycheckingtheirprecisionwiththesameexperiment at the predicted target position and 0 is the 2×2 zero as in [9]. We placed four checkerboard targets at different 2×2 matrix.Assumingconstantvelocity,themotionmodelinthe positions in a room. These positions corresponded to dif- 3Dworldplaneisdefinedas: ferentpan,tiltandzoomconditions.ASONYSNC-RZ30P PTZcamerawasmovedtoarandompositionevery30sec- p(s(t)|s(t−1))=N(s(t);As(t−1),Q), (15) ondsandreturnedattheinitialpositionseveryhour.Foreach 8 GiuseppeLisantietal. 2000 25 20 4 Mean (pixel)1150500000000 50 100 150 200Std. Dev. (pixel)112050500 50 100 150 200 Average Reprojection Error (pixel)11505 Average Reprojection Error (pixel)123...152535 Keyframes Keyframes 00 200 400 600 800 1000 1200 1400 1600 0.50 1 2 3 4 5 6 7 8 (a) (b) Number of landmarks RANSAC inlier threshold (pixel) (a) (b) Fig.6:Average(a)andstandarddeviation(b)ofthebundle-adjusted Fig.7:Reprojectionerrorasafunctionof(a)thenumberoflandmarks focal length for the keyframes used in scene map initialization. extracted(b)inlierthresholdintheRANSACalgorithm,forSequence Keyframesareorderedforincreasingvaluesoffocallenght. 1undertest. imageviewthecornersofthecheckerboardwereextracted was superimposed and the average reprojection error was andcompared tothereferenceimage. Theerrorswere col- measured between the grid points as obtained by the esti- lectedfor200hours.Wehavemeasuredanaverageerrorof matedhomographyandthesamepointsbytheoff-linebun- 2pixelsatthelowestzoomingand9pixelsforthemaximum dleadjustment. zooming.Fig.5showstheplotsoftheerrorsandtheinitial andfinalcameraviewforeachtarget. Tab.1showstheaveragereprojectionerror,theerrorsin theestimationofpan,tiltandfocallengthandtheimprove- mentsthatareobtainedwiththeproximitychecking,forthe Scenemapinitialization outdoor sequences under test. As in [9], the errors in pan andtiltangleswerecomputedase (t) = |ψ(t)−ψ |and Off-linescenemapinitializationasdiscussedinSect.3.2is ψ rk e (t) = |φ(t)−φ |,respectively,andthefocallengther- accurate and produces repeatable results. Fig. 6 reports the φ (cid:12) rk (cid:12) mean and standard deviation of the focal length estimated rorasef(t)=(cid:12)(cid:12)f(tf)−rkfrk(cid:12)(cid:12)(inpercentage).Panandtiltangles during the scene map initialization. In this experiment, we estimatedandthosecalculatedwithbundleadjustmentwere acquired images of the same outdoor scene in 43 consecu- obtainedfromtherotationmatricesR(t) = K−1H H(t)K r rk(cid:63) k tivedaysatdifferenttimeoftheday,at202distinctvalues (see Eq. (10)) and R = K−1H K (see Eq. (1)), respec- rk r rk k of pan tilt zoom. The PTZ camera was driven using motor tively. The results confirm that proximity checking avoids actuators. We can notice that the standard deviation of the to select landmarks that introduce drifting in the homogra- focallengththatisestimatedthroughoff-linebundleadjust- phyestimation.Itcanbeobservedthaterrorsinfocallength mentincreasesalmostproportionallywithfocallength.The measuredwithourmethodoveralongperiodinanoutdoor maximumstandarddeviationvalueobservedis23pixelsat scenarioaresimilartothoseobtainedin[9],andlowerthan focallengthofabout1700pixels. those in [4] (as reported in [9]), for an indoor experiment withafewkeyframes. On-linePTZcameraposeestimationandmapping The reprojection error depends on both the number of landmarksextractedandtheRANSACthresholdforinliers In this experiment, we report on the average reprojection asshowninFig.7foroneofthesequencesundertest(Se- errorandcalibrationerrorswithourmethod.Wediscussthe quence 1). It can be observed that a large reprojection er- influence of the number of landmarks and RANSAC inlier ror with high standard deviation (plotted at one sigma) is threshold on the reprojection error and the effectiveness of presentbelow200landmarks.Instead,sucherrorislowwhen scenelandmarkupdating. thenumberoflandmarksisbetween200and1500(Fig.7(a)). As in [9], we recorded 10 outdoor video sequences of Fig. 7(b) shows that a RANSAC thresholds between 1 and 8 hours each (80 hours in total). Due to the long period 3 pixels for the inliers used in the homography estimation of observation, all the sequences include slow background assuressmallreprojectionerrors.Valuesof1000and3pix- changesduetoshadowsorilluminationvariations,aswellas elswereusedrespectivelyforthenumberoflandmarksex- largechangesduetomovingobjectsenteringorexitingthe tractedandRANSACthresholdinourexperiments. scene. ThePTZ camerawas movedcontinuously using the Scene map updating significantly contributes to the ro- motor actuators and stopped for a few seconds at the same bustness of our camera calibration to both slow and sud- pantiltzoomvalues,sotohavealargenumberofkeyframes den variations of the scene, maintaining a high number of at the same scene locations and different conditions, in all RANSAC inliers through time. Fig. 8(a) shows the cumu- thesequences.Onaverageweperformedabout34000mea- lative sum of the inliers with and without scene landmark surementspersequence.Foreachkeyframe,agridofpoints updating. It is possible to observe that without scene land- ContinuousLocalizationandMappingofaPanTiltZoomCameraforWideAreaTracking 9 (a) Zoom actuator value = 6553 Zoom actuator value = 13762 Zoom actuator value = 15663 Zoom actuator value = 16384 el) el) el) el) x10 x10 x10 x10 pi pi pi pi or ( or ( or ( or ( e Err 5 e Err 5 e Err 5 e Err 5 g g g g a a a a er er er er v v v v A 0 A 0 A 0 A 0 0 100 200 0 100 200 0 100 200 0 100 200 Time (hr) Time (hr) Time (hr) Time (hr) (b) (c) Fig.5:(a)Checkerboardimagesattheinitialcamerapose.(b)AverageErrorsover200hours.(c)Checkerboardimagesafterthecamerahas returnedinthesameinitialposeafter200hours. Sequence #measurements Avg.reproj.error Pan Tilt FocalLength – – Ours Oursw/op. Ours Oursw/op. Ours Oursw/op. Ours Oursw/op. Seq.1 34,209 2.83 2.96 1.18 1.55 0.39 0.42 0.96 1.06 Seq.2 34,605 6.69 6.90 2.47 2.09 0.68 0.94 4.41 3.65 Seq.3 33,102 3.26 3.30 1.26 1.17 0.33 0.33 0.84 0.91 Seq.4 33,939 6.88 7.09 2.11 2.58 1.93 1.73 2.78 3.79 Seq.5 33,974 22.54 60.04 11.14 11.53 9.51 9.85 12.49 14.21 Seq.6 33,570 3.21 4.26 1.91 2.84 0.49 0.54 1.26 3.05 Seq.7 34,157 3.62 3.59 1.71 1.27 0.35 0.43 1.81 2.15 Seq.8 33,932 21.76 21.99 7.08 7.41 10.07 9.23 11.91 11.81 Seq.9 34,558 8.78 12.26 3.35 5.48 1.37 2.70 3.47 4.80 Seq.10 34,405 8.47 9.26 7.20 5.71 5.28 6.59 8.99 9.54 Average 34,032 8.80 13.17 3.94 4.16 3.04 3.28 4.89 5.50 Table1:Averagereprojectionerrorandcalibrationerrorsofpan,tiltandfocallengthwithandwithoutproximitycheckevaluatedatthekeyframes duringtheperiodofobservation. mark updating the number of inliers decreases (the cumu- Fig. 8, in a dynamic scene few of the original scene land- lative curve is almost flat) as the initial landmarks do not marks survive at the end of the observation period. Fig. 9 match anymore with the landmarks observed due to scene highlights the scene landmark lifetime over a 20 minutes changes.Fig.8(b)showsthedistributionoftheinliersinthe window, for one keyframe (randomly chosen). The scene twocases.Withnoscenelandmarkupdating,typicallyonly landmarks with ID ∈ [0..2000] are the original landmarks. few of the original landmarks are taken as inliers for each Landmarks with ID ≥ 2000 are those observed during the keyframe, that is insufficient to assure a robust calibration 20minutes. overtime.Withscenelandmarkupdating,ahighernumber OurPTZcameracalibrationkeepssufficientlystableover ofinliersistakenforeachframethatincludeboththeorig- long periods of observation. Fig. 10 shows a typical plot inalandthenewscenelandmarks.Ascanbeinferredfrom of the reprojection error over 8-hour operation for a sam- 10 GiuseppeLisantietal. 10x 10N7o scene landmarks updating 3x 105 No scene landmarks updating Influenceofcameracalibration Scene landmarks updating Scene landmarks updating 8 2.5 To evaluate the impact of our PTZ calibration on tracking, Cumulative Sum 46 Frequency1.152 wwoerrkeicnogrddeadyaan8d-heoxutrrasceteqduetnhcreeeinviadepoasrkwinitghaorneea,dtwuroinagnda threetargets.Thisisadynamiccondition,withbothsmooth 2 0.5 andabruptscenechanges.Multi-targettrackingperformance 00 2 Fra4mes 6 x 1058 00 50 N1u0m0ber of In1l5ie0rs 200 250 wasevaluatedaccordingtoboththeCLEARMOT[29]and (a) (b) USCmetrics[30].TheCLEARMOTmetricsmeasurestrack- ingaccuracy(MOTA): Fig.8:(a)CumulativeSumofnumberofinliersasafunctionoftime: withoutandwithscenelandmarkupdating(dashedandsolidcurvere- spectively).(b)Distributionsofthenumberofinlierswithoutandwith (cid:80) (FN +FP +ID SW ) scenelandmarkupdating(greyandblackbinsrespectively). MOTA=1− t t (cid:80) t t (18) n t t e 12000 andprecision(MOTP): m eti 9000 (cid:80) mark Lif 36000000 MOTP= (cid:80)i,ttVTOPCti,t, (19) d n La 0 whereFN andFP arerespectivelythefalsenegativesand t t 0 2000 4000 6000 8000 10000 12000 14000 positives, ID SW are the identity switches, n is the num- Landmark ID t t beroftargetsandVOC istheVOCscoreofthei-thtarget i,t Fig.9:Lifetimeofscenelandmarksobservedforasamplekeyframe. at time t. The USC metric reports the ratio of the trajecto- riesthatweresuccessfullytrackedformorethan80%(MT), Error (pixels)2300 tt(hrPaeTck)raeatdniodfotohrfelmeasvosesrttalhygaenloc2sot0u%tnrtaoj(eMfcftLaol)rs,ieetshaeltahrrametsstwppeearrertfirsaaulmlcyceet(srFasfcAukFlel)yd. Reprojection 100 Wmaepmuepadsautriendg,thweipthernfoormpraonxciemfiotyrtchheecmkeinthgoadnwdifthorntohsecfeunlel 0 1 2 3 4 5 6 7 method. Frames x 105 From Tab. 2 it is apparent that scene map updating has Fig. 10: Reprojection error over 8-hour operation for a sample amajorinfluenceonthenumberoffalsenegativesandfalse keyframewithout(lightplot)andwith(darkplot)proximitychecking. positives and therefore on the tracking accuracy. Proximity checkinghasalsoapositiveimpactonthereductionoffalse positivesanddeterminesanaverageincreaseoftheaccuracy ple keyframe. Camera calibration at different time of the ofabout10%. daywithoutandwithscenelandmarkupdatingisshownin Fig. 11(a-b) for a few sample frames. It can be observed Influenceoftrackingin3Dworldcoordinates thatwithscenelandmarkupdating,cameracalibration(rep- resented by the superimposed grid of points) is still accu- Toanalyzetheeffectofusing3Dworldcoordinateswerun rate despite of the large illumination changes occurred in ourmethodin2Dimagecoordinates(notapplyingmapping thescene. in the 3D world plane). In this case, the target scale could not be evaluated directly and was estimated within a range fromthescaleatthepreviousframe.Tab.3reportstheper- formanceofourmulti-targettrackingperformedinthetwo cases. 5.2 Multi-TargetTrackingwithPTZcameras Itcanbeobservedthattrackingin3Dworldcoordinates lowersthenumberoffalsepositivesandcontributestoasen- Inthefollowing,wesummarizeexperimentsonmulti-target sibleimprovementinbothaccuracyandprecision,withre- trackingin3Dworldcoordinatesusingouron-linePTZcam- specttotrackinginthe2Dimageplane.Thisimprovement era calibration, and compare our method with a few meth- is even greater as the number of targets increases since the odsthatappearedintheliteratureonastandardPTZvideo trackerhastodiscriminatebetweenthem. sequence.Inourexperimentstargetsweredetectedautomat- Wecomparedourcalibrationandtrackingagainstthere- icallyusingthedetectorin[28]. sultsreportedbyafewauthors,namely[16],[32]and[33],

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.