ebook img

Video Copy Detection: a Comparative Study PDF

8 Pages·0.588 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Video Copy Detection: a Comparative Study

Video Copy Detection: a Comparative Study Julien Law-To Li Chen Alexis Joly Ivan Laptev Institut National de l’Audiovisuel UCL Adastral Park Campus INRIA, IMEDIA Team INRIA, VISTA Team Bry Sur Marne, France Martlesham Heath, Ipswich, UK Rocquencourt, France Rennes, France [email protected] [email protected] [email protected] [email protected] Olivier Buisson Valerie Gouet-Brunet Nozha Boujemaa Fred Stentiford Institut National de l’Audiovisuel INRIA, IMEDIA Team INRIA, IMEDIA Team UCL Adastral Park Campus Bry Sur Marne, France Rocquencourt, France Rocquencourt, France Martlesham Heath, Ipswich, UK [email protected] [email protected] [email protected] [email protected] ABSTRACT tracing of video content into a very hard problem for video professionals. At the same time, controlling the copyright This paper presents a comparative study of methods for ofthehugenumberofvideosuploadedeverydayisacritical video copy detection. Different state-of-the-art techniques, challenge for the owner of the popular video web servers. using various kinds of descriptors and voting functions, are Content Based Copy Detection (CBCD) presents an alter- described: global video descriptors, based on spatial and native to the watermarking approach to identify video se- temporal features; local descriptors based on spatial, tem- quences and to solve this challenge. poralaswellasspatio-temporalinformation. Robustvoting functions is adapted to these techniques to enhance their performanceandtocomparethem. Then,adedicatedframe- workforevaluatingthesesystemsisproposed. Allthetech- niquesaretestedandcomparedwithinthesameframework, byevaluatingtheirrobustnessundersingleandmixedimage transformations,aswellasfordifferentlengthsofvideoseg- ments. Wediscusstheperformanceofeachapproachaccord- ing to the transformations and the applications considered. Localmethodsdemonstratetheirsuperiorperformanceover The Robustness issue: the global ones, when detecting video copies subjected to Two videos which are copies various transformations. Source video: Syst`eme deux. C. Fayard 1975 (c)INA CategoriesandSubjectDescriptors I.4.9[ComputingMethodologies]: ImageProcessingand Computer Vision—Applications GeneralTerms The Discriminability issue: Algorithms Two similar videos which are not copies (different ties) Keywords Content-Based Video Copy Detection Figure 1: Copy / similarity. 1. INTRODUCTION A crucial difficulty of CBCD concerns the fundamental difference between a copy and the notion of similarity en- Growingbroadcastingofdigitalvideocontentondifferent counteredinContent-BasedVideoRetrieval(CBVR):acopy media brings the search of copies in large video databases is not an identical or a near replicated video sequence but to a new critical issue. Digital videos can be found on TV rather a transformed video sequence. Photometric or ge- Channels,Web-TV,VideoBlogsandthepublicVideoWeb ometric transformations (gamma and contrast transforma- servers. The massive capacity of these sources makes the tions, overlay, shift, etc) can greatly modify the signal, and thereforecopiescaninfactbevisuallylesssimilarthanother kinds of videos that might be considered similar. CBVR Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor applications aim to find similar videos in the same visual personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare category,likeforexamplesoccergamesorepisodesofsoaps, notmadeordistributedforprofitorcommercialadvantageandthatcopies but most of these detections would clearly represent false bearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,to alarms in a CBCD application. Figure 1 illustrates the dif- republish,topostonserversortoredistributetolists,requirespriorspecific ferences between CBCD and CBVR on two examples of a permissionand/orafee. very similar non-copy video pair and a less similar pair of CIVR’07,July9–11,2007,Amsterdam,TheNetherlands. Copyright2007ACM978-1-59593-733-9/07/0007...$5.00. video copies. As already mentioned by X. Fang et. al.in 371 [6], another strong difference between CBCD and CBVR is 3.1 Globaldescriptors the fact that, in a CBCD application, a query video can be [Temporal]. Thismethod1 definesaglobaltemporalac- infinitestreamwithnospecificboundariesandonlyasmall tivity a(t) depending of the intensity I of each pixel (N is part of the query video can be a copy. the number of pixels for each image). 2. STATEOFTHEART N a(t)= K(i)(I(i,t)−I(i,t−1))2 Fortheprotectionofcopyright,watermarkingandCBCD i=1 X aretwodifferentapproaches. Watermarkinginsertsnonvis- where K(i) is a weight function to enhance the importance ible information into the media which can be used to es- of the central pixels. A signature is computed around each tablish ownership. However, very robust watermarking al- maximaofthetemporalactivitya(t). Thespectralanalysis gorithms are not yet available. In a CBCD approach, the by a classical FFT, lead to a 16-dimensional vector based watermark is the media itself. Existing methods for CBCD on the phase of the activity. usuallyextractasmallnumberofpertinentfeatures(called [Ordinal Measurement]. The ordinal intensity signa- signaturesorfingerprints)fromimagesoravideostreamand ture consists in partitioning the image into N blocks; these thenmatchthemwiththedatabaseaccordingtoadedicated blocks are sorted using their average gray level and the sig- voting function. Several kinds of techniques have been pro- nature S(t) uses the rank r of each block i. posedintheliterature: in[11]inordertofindpiratevideos i ontheInternet,Indyketal. usetemporalfingerprintsbased S(t)=(r1,r2,...,rN) on the shot boundaries of a video sequence. This technique can be efficient for finding a full movie, but may not work ThedistanceD(t)isdefinedforcomputingthesimilarityof well for short episodes with a few shot boundaries. Oost- twovideos(areferenceRandacandidateC)atatimecode veen et al. in [20] present the concept of video fingerprint- t where T is the length of the considered segment. ingorhashfunctionasatoolforvideoidentification. They have proposed a spatio-temporal fingerprint based on the 1 t+T/2 differential of luminance of partitioned grids in spatial and D(t)= T |R(i)−C(i)| temporal regions. B. Coskun et al. in [4] propose two ro- i=Xt−T/2 bust hash algorithms for videos both based on the Discrete Hampapur and Bolle in [8] have described and tested this Cosine Transform (DCT) for identification of copies. technique and they have shown that it has superior perfor- Hampapur and Bolle in [8] compare global descriptions mances when compared to motion and color features. Li of the video based on motion, color and spatio-temporal Chen has re-developped this technique for this study. distribution of intensities. This ordinal measure was origi- [Temporal Ordinal Measurement]. Instead of using nally proposed by Bhat and Nayar [2] for computing image the rank of the regions in the image, the method proposed correspondences, and adapted by Mohan in [19] for video by L. Chen and F. Stentiford in [3] use the rank of regions purposes. Differentstudiesusethisordinalmeasure[10,15] along the time. If each frame is divided in K blocks and andithasbeenprovedtoberobusttodifferentresolutions, if λk is the ordinal measure of the region k in a temporal illumination shifts and display formats. Other approaches window with the length M, the dissimilarity D between a focus on exact copy detection for monitoring commercials; query video V and a reference video V at the time code t an example being Y. Li et al. in [18] who use a compact q r is: binary signature involving color histograms. The drawback of the ordinal measure is its lack of robustness as regards 1 K D(V ,Vp)= dp(λk,λk) logoinsertion, shiftingorcropping,whichareveryfrequent q r K q r transformationsinTVpost-production. In[12],theauthors kX=1 showthatusinglocaldescriptorsisbetterthanordinalmea- where: sure for video identification when captions are inserted. M Whenconsideringpost-productiontransformationsofdif- dp(λk,λk)= 1 |λk(i)−λk(p+i−1)| ferent kinds, signatures based on points of interest have q r C q r M i=1 demonstratedtheireffectivenessforretrievingvideosequences X inverylargevideodatabases,likeintheapproachproposed pisthetemporalshifttestedandCM isanormalizingfactor. in [14] and described in section 3.2. Using such primitives The best temporal shift p is selected. is mainly motivated by the observation that they provide a 3.2 Localdescriptors compact representation of the image content while limiting the correlation and redundancy between the features. [AJ]. This technique described in [13] by A. Joly st. al. isbasedonanimprovedversionoftheHarrisinterestpoint 3. TECHNIQUESCOMPARED detector[9]andadifferentialdescriptionofthelocalregion aroundeachinterestpoint. Toincreasethecompression,the Thissectionpresentsthedifferenttechniquesusedforthe features are not extracted in every frame of the video but comparative evaluation in this paper. They use global de- only in key-frames corresponding to extrema of the global scriptors of video like the ones describe in section 3.1 with intensity of motion [5]. The resulting local features are 20- techniques based on the temporal activity, on the spatial dimensional vectors in [0,255]D=20 and the mean rate is distribution and on a spatio-temporal distribution. Section about 17 local features per second of video (1000 hours of 3.2 presents techniques using local descriptions of the con- tent. Section 3.3 defines the concept of a voting function 1ThankstoG.DaigneaultfromINAwhohasdevelopedthis and adapts it to different video descriptors. fingerprint. 372 video are represented by about 60 million feature vectors). featuresandanapproximativesearchdescribedin[13]. The Let S~ be one of the local features, defined as: similarity search technique provides for each of the nc can- didate local features a set of matches characterized by an S~ = s~1 , s~2 , s~3 , s~4 identifier Vh defining the referenced video clip to which the ks~1k ks~2k ks~3k ks~4k feature belongs and their temporal position for the tech- (cid:18) (cid:19) wheres~ correspondto5-dimensionalsub-vectorscomputed niques described in section 3.1 and a spatio-temporal posi- i tion for the techniques described in 3.2. atfourdifferentspatio-temporalpositionsdistributedaround the interest point. Each s~ is the differential decomposition Once the similarity search has been done and once some i of the gray level 2D signal I~(x,y) up to the second order: potential candidates from the database have been selected, the partial results must be post-processed to compute a ∂I~ ∂I~ ∂2I~ ∂2I~ ∂2I~ global similarity measure and to decide if the more similar s~ = , , , , i ∂x ∂y ∂x∂y ∂x2 ∂y2 documents are copies of the candidate document. Usually, ! this step is performed by a vote on the document identifier [ViCopT].ViCopTforvideoCopyTrackingisasystem providedwitheachretrievedlocalfeature[21]. Inthisstudy, developed for finding copies. The system is fully described weuseageometricallyconsistentmatchingalgorithmwhich in[17]. Harrispointsofinterestareextractedoneveryframe consists in keeping, for each retrieved document, only the and a signal description similar to the one used in [AJ] is matches which are geometrically-consistent with a global computed,leadingto20-dimensionalsignatures. Thediffer- transform model. The vote is then applied by counting ence is that the differential decomposition of the gray level only the matches that respect the model (registration + signal until order 2 is computed around 4 spatial positions vote strategy). The choice of the model characterizes the aroundtheinterestpoint(inthesameframe). Thesepoints tolerated transformations. A generic model, if we consider of interest are associated from frame to frame to build tra- resize, rotation and translation for the spatial transforma- jectories with an algorithm similar to the KLT [22]. For tions and also slow/fast motion from the temporal point of eachtrajectory,thesignaldescriptionfinallykeptistheav- view, should be: erage of each component of the local description. By using the properties of the built trajectories, a label of behavior x′ rcosθ −rsinθ 0 x bx can be assigned to the corresponding local description. For y′ = rsinθ rcosθ 0 y + by CBCD two particular labels have been selected:  t′c   0 0 at  tc   bt        (1) • label Background: motionless and persistent points where (x′,y′,t′) and (x,y,t ) are the spatio-temporal coor- c c along frames dinates of two matching points. The transformation model parameters are estimated for • label Motion: moving and persistent points each retrieved video clip V using the random sample con- h The final signature for each trajectory is composed of 20- sensus (RANSAC [7]) algorithm. Once the transformation dimensional vector, trajectory properties and a label of be- modelhasbeenestimated,thefinalsimilaritymeasurem(Vh) havior. related to a retrieved video clip Vh consists in counting the [SpaceTimeInterestPoints(STIP)].Thistechnique number of matching points that respects the model accord- was developed by I. Laptev and T. Lindeberg in order to ingtoasmalltemporal(forglobalfeatures)andtoaspatio- detectspatio-temporalevents(see[16]). Thespacetimein- temporal precision for local features. terest points correspond to points where the image values For Temporal, Ordinal Measure and Temporal Or- havesignificantlocalvariationinbothspaceandtime. Pre- dinalMeasure,wehaveusedatemporalregistrationwith- vious applications of this detector concerned classification out considereing slow/fast motion of the videos. For AJ, of human actions and detection of periodic motion. In this two different strategies taking into account the relative po- paperweareinterestedintheapplicationofthisdetectorto sitionsofthepointsareusedinthisstudy: apurelytempo- CBCD.Fornow,thedetectorhasnotbeenoptimizedforthe ralmethodcalledAJ Tempandaspatiotemporalmethod taskofcopydetectionandthepresentedresultsareprelimi- called AJ Spatio Temporal. STIP has been associated nary. Spacetimeinterestpointsaredescribedbythespatio to a temporal registration. temporal third order local jet leading to a 34-dimensional A special strategy is used to merge the local features of vector ViCopT because this system is asymmetric. The queries arepointsofinterestwhileinthedatabase,thefeaturesare j =(L ,L ,L ,L ,...,L ) x y t xx ttt trajectories. Theregistrationisthereforefindingthespatio- whereLxm,yn,tk arespatio-temporalGaussianderivatives temporal offset that maximize the number of query points normalizedbythespatialdetectionscaleσandthetemporal in the trajectories and is fully explained in [17]. It is close detection scale τ. to the model used by AJ Spatio Temporal. Lxm,yn,tk =σm+nτk(∂xmyntkg)∗f 4. FRAMEWORKOFTHEEVALUATION TheL2distanceisusedasametricforthelocalsignatures. Evaluatingandcomparingdifferentsystemsofvideocopy 3.3 Votingfunctions detection is not obvious. This section presents the frame- A candidate video sequence is defined as a set of n suc- workusedforthisstudy. AgoodCBCDsystemshouldhave f cessive frames described by features (global features or n high precision (low false positive rate) and should detect c local features). The features are searched in the database: all copies in a video stream possibly subjected to complex a sequential search is performed for techniques with global transformations. 373 4.1 Imagetransformations 4.3 ProposedBenchmark CBCD systems should be able to cope with transforma- We created a set of video copies as follows: video seg- tionsofthepostproductionprocess: insertionoflogo,crop, ments were randomly extracted from a video database and shift. However, relevant transformations can also originate transformed. These transformed segments were inserted in from involuntary events (re-encoding, cam-cording) which a video stream composed from videos not contained in the decrease the quality of the video and add blur and noise. referencedatabase. Figure3illustratethisbenchmark. The Examples of transformations are shown in figure 2 with ex- queryvideosareanalyzedusingdifferentmethodsunderthe amplesofdetectionfoundontheInternetbyViCopT.The goal of locating and identifying video segments correspond- exampleinFigure2(a)showsalowqualityre-encodedvideo ingtocopiesofthereferencedatabase. Alltheexperiments effected by blur, noise and color changes. The example in Figure 2(b) presents a zoom with a shift and an insertion of a caption while Figure 2(c) demonstrates an example of vertical stretching. To test the robustness of the different CBCDmethods,wesimulatethesetypesoftransformations. (a)La T´el´e des Inconnus P.Lederman1990(c). Figure 3: Video Benchmark Building. were carried out on the BBC open news archives [1]. 79 videos(about3.1hours)coverdifferenttopicsincludingcon- flictsandwars,disasters,personalitiesandleaders,politics, science and technology, and sports. The robustness to re- encodingisalsotested,becausetocomputethequeryvideo, the video segment is re-encoded twice with different encod- (b)Celine Dion Music Video 2002(c). ing systems. 4.4 Evaluation criteria To evaluate our system, Precision Recall curves are com- puted using the following formulas: N Recall= TruePositive N AllTrue N Precision= TruePositive (c)The Full Monty 1997(c)20thCenturyFox. N AllPositive Anothercriteriausedistheaverageprecision(AveP).The Figure 2: Copies found on the Web precisionandrecallarebasedonthewholelistofdetections returned by the system. Average precision emphasizes re- turningmorerelevantdetectionearlier. Itisaverageofpre- 4.2 Frameworksfoundintheliterature cisions computed after truncating the list after each of the Intheliterature,evaluationusuallyconsistsinextracting relevant detections in turn: avideosegmentfromavideosource. Afterdifferentspecific transformations, the video segment is used as a query. The 1 Ndetected AveP = (P(r)×δ(r)) goal is to find the best match and check if it is the right Nmax r=1 original video. For example, in [8], the authors use a 2 hrs X whereristherank,N thenumberofrelevantdetections, 12 mins video reference (with a resolution 452 x 240) and max N the number of detections, δ(r) a binary function videoqueriesfrom5.3secsto21.33secs. Anotherevaluation detected ontherelevanceofagivenrankandP(r)theprecisionata of copy detection (see [18]) uses a database of 5000 videos given cut-off rank. with a constant length of 15 secs (total size is 21 hrs) and the queries are also videos with a constant length. These 5. EXPERIMENTALRESULTS evaluation methods do not account for the fact that in a monitoring process, the queries are video streams without This section presents the results for the different CBCD well-definedtemporalboundaries. Thefactthatonlyaseg- methodsondifferenttestsets. Section5.1comparesthedif- ment of the query video can be a segment of a video from ferent techniques with specific single transformations while the reference database has to be considered. section 5.2 uses a benchmark with mixed transformations. 374 5.1 Singletransformation For these experiments, 9 videos queries have been com- putedaccordingtotheframeworkdefinedinsection4.3: one with no transformation and 8 with a single transformation among the following: • contrast increased by 25 % • contrast decreased by 25 % (a)Sourcevideo: Alexandrie. 1978(c). • crop 5% with black window • gaussian radius-2 blur • letter-box • insertion of a small logo • zoom 0.8 with black window (b)Sourcevideo: Samedi et Compagnie 1970(c)ORTF. • zoom 1.2 Figure 4 illustrates these transformations. Each video query is 15 minutes long and contains 8 segments of sim- ulated video copies (each segment 5 sec. long). Figure 5 presents the PR curves for different techniques and for the different transformations. For all transforma- tions, Temporal Ordinal Measure presents excellent re- sults: allthesegmentshavebeenfoundwithnofalsealarm. (c)Sourcevideo: Gala du Midem. G.Ulmer1970(c)INA The Ordinal Measurepresentspoorresultsforzooming, cropping and letter-box transformation. Figure 6: Complex transformation Tworeasonscanexplainamissingdetectionofasequence. The first reason is that some descriptions are intrinsically not robust to some transformations. Technique based on Harris points of interest are not robust to a decrease of the To simulate this type of more complex transformation, contrast because the value of the corners can become too wehavecreatedanothervideoasdefinedinsection4.3. The low. Ordinal measurement is by nature not robust to a lengthofeachsegmentisarandomvaluebetween1sand10s. cropwhichchangestheaveragegraylevelofeachblockand For each segment, the transformations use different param- thus spatial order sequence. Techniques which used a strict eters and a random combination of transformations occurs. spatio temporal registration with no consideration of spa- Theshiftandachangeofthegammahadbeenaddedtothe tial deformation have poor results if we consider resize and possible transformations but for this experiment, the zoom letter-box transformations. The difference in these exam- wasnottested. Thereare60segmentsinsertedinthevideo ples between AJ Temp and AJ SpatioTemp illustrates query which has a total length of 30 minutes. Examples of thisdecreaseofqualitybecausethespatialregistrationdur- these simulated transformations are shown in figure 7. In ingthevotehaseliminatedsomelocaldescriptorswellfound the example (a), there is an horizontal and vertical shift, during the search step. and insertion of a logo in the top-left corner and a change The second reason of a missing detection is that the se- of the gamma. quences themselves cannot be well described. If the query Figure 8 presents the PR curves for the different tech- segments do not have much motion and activity, the tem- niques and table 1 gives the Average precision values. A poral signature can not describe well this sequence. It is systemforcontrollingthecopyrightofalargenumbervideo the same for ViCopT: there is a sequence with trajecto- filesmusthaveagreatprecisionforavoidingtoomanyfalse ries hard to compute and we can observe that the missing alarms that need to be checked by a person, so we com- segmentisalwaysthesame. Forthesetechniques,thequal- pare the results at the precision 95%. At this precision, ity will also depend on the sequences and not only on the the best technique is ViCopT with a recall equal to 82%, transformations. then come AJ SpatioTemp (recall 80%). The methods whichuselocalfeatureswithonlytemporalregistrationhas 5.2 Randomtransformationsmixed lowerperformances(55%forSTIPand51%forAJ Temp). In real cases, there are combinations of transformations As we could expect for this test with large transformations whichcanresultonalargemodificationofthevideo. Figure and some short segments, the method with global features 6presentsexamplesofrealcaseswithlargetransformations missescopies: Temporalhasa49%recall,Temporal Or- found by ViCopT in a previous work. In the example (a), dinal Measure has a 41% while the Ordinal Measure thereisalargecropandcaptionsinsertedatthebottombut has a 10% recall. alsoonthepersons. Theexample(b)showsalargeinsertof Theaverageprecisionvalueschangealittlebitthisrank- a logo and a zoom while the example (c) presents a change ing and the Temporal Ordinal Measure and AJ Temp of background for creating a virtual duet for a TV show. perform better than STIP for this criteria. 375 Exact Copy Contrast75 Contrast125 Crop Blur Letter-box Insert Zoom 1.2 Zoom 0.8 Figure 4: Single transformations. 1 1 1 0.8 0.8 0.8 Precision 00..46 Precision 00..46 Precision 00..46 Ordinal Measurement Ordinal Measurement Ordinal Measurement Temporal Ord. Meas. Temporal Ord. Meas. Temporal Ord. Meas. 0.2 ViCSoTpIPT 0.2 ViCSoTpIPT 0.2 ViCSoTpIPT AJ_SpatioTemp AJ_SpatioTemp AJ_SpatioTemp AJ_Temp AJ_Temp AJ_Temp 0 Temporal 0 Temporal 0 Temporal 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Recall Recall Exact Copy Contrast75 Contrast125 1 1 1 0.8 0.8 0.8 Precision 00..46 Precision 00..46 Precision 00..46 Ordinal Measurement Ordinal Measurement Ordinal Measurement Temporal Ord. Meas. Temporal Ord. Meas. Temporal Ord. Meas. 0.2 ViCSoTpIPT 0.2 ViCSoTpIPT 0.2 ViCSoTpIPT AJ_SpatioTemp AJ_SpatioTemp AJ_SpatioTemp AJ_Temp AJ_Temp AJ_Temp 0 Temporal 0 Temporal 0 Temporal 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Recall Recall Crop Blur Letter 1 1 1 0.8 0.8 0.8 Precision 00..46 Precision 00..46 Precision 00..46 Ordinal Measurement Ordinal Measurement Ordinal Measurement Temporal Ord. Meas. Temporal Ord. Meas. Temporal Ord. Meas. 0.2 ViCSoTpIPT 0.2 ViCSoTpIPT 0.2 ViCSoTpIPT AJ_SpatioTemp AJ_SpatioTemp AJ_SpatioTemp AJ_Temp AJ_Temp AJ_Temp 0 Temporal 0 Temporal 0 Temporal 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Recall Recall Insert Zoom 0.8 Zoom 1.2 Figure 5: PR curves for single transformation. 376 1 0.9 0.8 (a)Sourcevideo: hkyachtclub (c)BBC Precision 0.7 0.6 Ordinal Measurement Temporal Ord. Meas. 0.5 ViCopT STIP AJ_SpatioTemp AJ_Temp Temporal 0.4 0 0.2 0.4 0.6 0.8 1 Recall (b)Sourcevideo: sheepclone (c)BBC. Figure 8: Video Benchmark Building. for locating the boundaries. For finding full movies, as the length is important, methods with global features like TemporalOrdinalMeasurement areprobablyfasterwiththe same efficiency than methods with local features. The last two items are the most difficult cases because the segments (c)Sourcevideo: manhattanaftermath (c)BBC. can be short and strongly transformed. Finding short seg- mentsinavideostreamisacriticalissueforexample,forthe Figure 7: Combined transformations. INAwhichprovidevideoarchivesfromaverylargedatabase Technique AveP Technique AveP (300 000 hours of digital videos) to TV channels. For this task AJ Temp has proved a good efficiency and ViCopT ViCopT 0.86 STIP 0.55 presents a good improvement. For videos on the Internet, AJ SpatioTemp 0.79 Temporal 0.51 all the different difficulties are mixed and the solution de- AJ Temp 0.68 Ord. Meas. 0.36 pends on the quality required. The choice is still open but Temp. Ord. Meas. 0.65 we think that local features seem more promising. A limit of this evaluation is the size of the database (3 Table 1: Average Precision For each technique. hours). The robustness issue has been evaluated on these experiments but the discriminability issue needs a larger 6. DISCUSSIONS database. Avoiding false positive detections is a very cru- cial point for an automatic or a semi automatic system. This section presents some limits of this evaluation and AJ Temp has been tested on 30 000 hours in [13] with give some reflexions about the advantages of the different continuous TV stream as queries. It presents some false techniques tested in this paper. alarmswhenthesimilarityistoohigh: ifthebackgroundis 6.1 Whattechniqueshouldbeused? the same for a recurrent TV Show for example. ViCopT has been tested on 300 hours in [17] and seems to be more How to choose a copy detection system will strongly de- efficient with a better discriminability. pendsonwhatwearesearchingforandwherewearesearch- ing it. No single technique and no universal description 6.2 Somevalues seems to be optimal to all the applications that need video Less fundamental but practical reasons for using a par- copy detection. We can identify different application cases ticular method is its costs. This section gives some values for finding copies: related to these computational costs. The methods with • Exact copies in a stream for statistics on commercials global descriptions use a sequential search. It is fast but for example; linearly dependent on the size of the database. The meth- ods ViCopT and AJ use an index structure that allows • Transformed full movie with no post production and the search to be very fast [14] and therefore this search is possible decrease of quality (cam cording); sub linear. Table 2 give some values measured during the • ShortsegmentsonTVstreamwithpossiblelargepost tests. The number of features correspond to the features production transformations; computed to describe the whole BBC archive database (3 hours of video). The search speed corresponds to the time • short videos on the Internet with various transforma- needed to do the similarity search on a 15 minute query tions (can be extract from TV stream); videos. As the tests were not done with the same comput- Forthefirstitem,allthetechniquesshouldbeefficientwith ers and the same OS the results have to be analyzed with a voting function that allows the detection to be precise caution but the computers used were all classical PC. The 377 values for STIP are very high because no optimization and [7] M. A. Fischler and R. C. Bolles. Random sample studies have been done for selecting the points and making consensus: a paradigm for model fitting with the system efficient for an industrial use for now. applications to image analysis and automated cartography. Communications of the ACM, Technique Nb of Features Search Speed 24(6):381–395, 1981. ViCopT 218 030 27s [8] A. Hampapur and R. Bolle. Comparison of sequence AJ 149 688 3s matching techniques for video copy detection. In Conference on Storage and Retrieval for Media Temp. Ord. Meas. 279 000 40min Databases, pages 194–201, 2002. STIP 2 264 994 3h30 [9] C. Harris and M. Stevens. A combined corner and Temporal 2700 34s edge detector. In 4th Alvey Vision Conference, pages Ord. Meas. 279 000 37min 153–158, 1988. Table 2: Sizes of feature spaces and speed of the [10] X.-S. Hua, X. Chen, and H.-J. Zhang. Robust video search. signature based on ordinal measure. In International Conference on Image Processing, 2004. [11] P. Indyk, G. Iyengar, and N. Shivakumar. Finding pirated video sequences on the internet. Technical 7. CONCLUSIONS report, Stanford University, 1999. In this paper, we have presented several solutions to the [12] K. Iwamoto, E. Kasutani, and A. Yamada. Image videocopydetectionissue,whichisaverycriticalchallenge signature robust to caption superimposition for video for the protection of copyright. This work describes differ- sequence identification. In International Conference enttechniquesforcharacterizingthevideosbytheircontent. on Image Processing, 2006. All the techniques have been tested and compared with a [13] A. Joly, O. Buisson, and C. Frelicot. Content-based relevantdedicatedevaluatingframework,whichisrepresen- copy detection using distortion-based probabilistic tative of real cases. We have first tested single transforma- similarity search. ieee Transactions on Multimedia, tions to measure the robustness of each technique. Then 2007. we have performed experiments on real cases where trans- [14] A. Joly, C. Frelicot, and O. Buisson. Feature formations are mixed and where copies can be very short statistical retrieval applied to content-based copy segments. These experiments lead to the following conclu- identification. In International Conference on Image sions: methodswithlocalfeatureshavemorecomputational Processing, 2004. costs but present interesting results in term of robustness. [15] C. Kim and B. Vasudev. Spatiotemporal sequence TheOrdinalTemporalmeasureisveryefficientforsmall matching techniques for video copy detection. ieee transformations. However,additionaltestsneedtobedone. Transactions on Circuits and Systems for Video First,abiggerreferencedatabaseshouldbeusedtotestthe Technology, 1(15):127–132, Jan. 2005. discriminability of each technique. Secondly, a larger set of [16] I. Laptev and T. Lindeberg. Space-time interest queriesshouldbebuiltformodelingotherpotentialcasesof points. In International Conference on Computer use. Vision, pages 432–439, 2003. [17] J. Law-To, O. Buisson, V. Gouet-Brunet, and 8. ACKNOWLEDGMENTS N. Boujemaa. Robust voting algorithm based on Thisresearchhasbeenconductedwiththesupportofthe labels of behavior for video copy detection. In ACM European NoE MUSCLE http://www.muscle-noe.org/ . Multimedia, MM’06, 2006. [18] Y. Li, L. Jin, and X. Zhou. Video matching using 9. REFERENCES binary signature. In Int. Symposium on Intelligent Signal Processing and Communication Systems, pages [1] http://www.bbc.co.uk/calc/news. 317–320, 2005. [2] D. Bhat and S. Nayar. Ordinal measures for image [19] R. Mohan. Video sequence matching. In Int. correspondence. ieee Transactions on Pattern Conference on Audio, Speech and Signal Processing, Analysis and Machine Intelligence, 20(4):415–423, 1998. 1998. [20] J. Oostveen, T. Kalker, and J. Haitsma. Feature [3] L. Chen and F. W. M. Stentiford. Video sequence extraction and a database strategy for video matching based on temporal ordinal measurement. fingerprinting. In VISUAL ’02: Proceedings of the 5th Technical report no. 1, UCL Adastral, 2006. International Conference on Recent Advances in [4] B. Coskun, B. Sankur, and N. Memon. Visual Information Systems, pages 117–128, London, Spatio-temporal transform-based video hashing. ieee UK, 2002. Springer-Verlag. Transactions on Multimedia, 8(6):1190–1208, 2006. [21] C.SchmidandR.Mohr.Localgrayvalueinvariantsfor [5] S. Eickeler and S. Mu¨ller. Content-based video imageretrieval.ieee Transactions on Pattern Analysis indexing of tv broadcast news using hidden markov and Machine Intelligence, 19(5):530–534, May 1997. models. In Proc. of Int. Conf. on Acoustics, Speech, [22] C. Tomasi and T. Kanade. Detection and tracking of and Signal Processing, pages 2997–3000, 1999. point features. Technical report CMU-CS-91-132, Apr. [6] X. Fang, Q. Sun, and Q. Tian. Content-based video 1991. identification: a survey. In Int. Conf. Information Technology: Research and Education, 2003. 378

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.