ebook img

Robust Method of Vote Aggregation and Proposition Verification for Invariant Local Features PDF

0.54 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Robust Method of Vote Aggregation and Proposition Verification for Invariant Local Features

Robust method of vote aggregation and proposition verification for invariant local features GrzegorzKurzejamski,JacekZawistowski,GrzegorzSarwas LingaroSp.zo.o. Puławska99a,02-595Warsaw,Poland {grzegorz.kurzejamski,jzawisto,grzegorz.sarwas}@gmail.com Keywords: ComputerVision,ImageAnalysis,MultipleObjectDetection,ObjectLocalization,PatternMatching 6 Abstract: Thispaperpresentsamethodforanalysisofthevotespacecreatedfromthelocalfeaturesextractionprocess 1 inamulti-detectionsystem. Themethodisopposedtotheclassicclusteringapproachandgivesahighlevel 0 of control over the clusters composition for further verification steps. Proposed method comprises of the 2 graphicalvotespacepresentation,thepropositiongeneration,thetwo-passiterativevoteaggregationandthe cascadefiltersforverificationofthepropositions.Cascadefilterscontainalloftheminoralgorithmsneededfor n a effectiveobjectdetectionverification.Thenewapproachdoesnothavethedrawbacksoftheclassicclustering J approachesandgivesasubstantialcontroloverprocessofdetection. Methodexhibitsanexceptionallyhigh 5 detectionrateinconjunctionwithalowfalsedetectionchanceincomparisontoalternativemethods. ] V C 1 INTRODUCTION ternandmatchthemagainstthemselves,usinganap- . propriate metric. Common evaluations of such sys- s tems use brute KNN classification or its derivatives c Objectdetectionbasedonlocalfeaturesiswellknown [ as FLANN KNN search or BBF search. These give in the computer vision field. Many different re- good approximation of, theoretically ideal, results 1 searches brought about different features and meth- with substantially lower computational cost. Some v odsofsceneanalysisinsearchofaparticularobject. of the applications make use of a LSH hashing but 1 Recently developed feature points are proven to be 8 wellsuitedforaspecificobjectdetection,ratherthan thisapproachdoesnotprovidepracticaldistancedata 7 needed for further analysis. Matching local features a generalised object’s class identification. As many 0 provides set of correspondences, that can be filtered applicationsoflocalfeatureshasbeenevaluated, the 0 later. Under the assumption, that the scene contains ability to describe selective elements of rich graph- . 1 oneornoinstanceoftheobject, correspondencecan ics is the main purpose of invariant local features in 0 beputintotheobject-relatedornoise-relatedclass.To manyfields. Amongstthemostpopularlocalfeatures 6 distinguish which class a particular correspondence aree.g.SIFT,SURF,BRISK,FREAK,MSER.These 1 belongs to, few methods have been developed. Fre- : are commonly described as feature points or feature v quently used method for such purpose is RANSAC, regions. They are easy to manipulate and to match. i giving very good results, even for a high level of X Useofsuchlocalfeaturesgivesnotonlytheabilityto noise-related correspondences. After the classifica- describetheobjectinmanyways,butalsotoachieve r tion of correspondences the model can be assumed a invarianceforbasicobjectandimagetransformation, and a homography can be calculated. Well designed asskew,rotation,blur,noise. Invariantlocalfeatures parameters and filters can lead to high detection rate used in conjunction with invariant characteristic re- andlowfalsedetectionchance. RANSACandquan- giondetectors,suchasHarris-AffineorSIFTdetector, titative analysis of correspondence data become in- providedataforthoroughscene-objectanalysis,lead- efficient, when a contribution of noise-related corre- ing to detection of an object in the scene. Giving an spondences in whole correspondence group grows. appropriatesetoffeaturesonecandeterminetheex- Because of that such approach is not sufficient for actpositionoftheobjectinthescenewithaspecified scenes, where the objects occupy small share of the scale,rotationandevenminor,lineardeformations. image. Methodsmentionedabovewillnotworkwell The straightforward approach to a detection task withmultipleobjectspresentintheimageaswell. is to identify local features in the scene and the pat- Systemsformulti-detectionpurposesincorporate 2 RELATED WORK divide and conquer approach. Each correspondence can be assigned to one of N+1 classes, where N is Localfeaturesintheimagecanbetrackedalongway the number of objects in the scene. There is no in the literature. We present state-of-the-art feature known, straightforward method of assigning corre- pointsextractinganddescribingmethods,thatcanbe spondences. Most applications use correspondences usedinourmethod,andsimilarframeworksformulti- as votes in a multidimensional space. Vote space object detection purposes developed through the last canbeclusteredwithcommonclusteringalgorithms. years. Eachclustercanbeprocessedwithasingle-objectde- tection algorithm. Clustering approaches can be di- 2.1 Featurepoints videdintotwogroups.Thefirstoneconsistsofsparse clustering, where each cluster should contain all the Our method should be used with conjunction with needed vote data of a specific class. Such methods scale-invariant and rotation-invariant features for the show low detection rate because of a far from ideal best results. Usage of local features lacking any of clusteringprocessandahighlevelofparametrization. this characteristics may come with a need for rejec- Secondgroupisdenseclustering,whereclustersmay tion of some parts of our method, but can be imple- contain only small portion of a particular correspon- mentednevertheless. denceclass. Itsanalysisleadstocreatingorsupport- Thebestknownfeaturepoints,uptothispoint,are inghypothesisoftheobject’soccurrenceinthescene. SIFTpointsdevelopedbyLowe (Lowe,1999),which TheflagshipsinthatmatterareHough-likemethods. became a model for various local features bench- Such approaches show high detection rate but high marks. TheclosestalternativetoSIFTisSURF(Bay false positive rate as well. There are some works et al., 2008), that comes with a lower dimension- thattrytosegmentthescenewithknowncontext, as ality and, in the result, a higher computing effi- shown in the work of Iwanowski et al. (Iwanowski ciency. There are also known attempts to incorpo- etal.,2014). rate additional enhancements into SIFT and SURF Inourtestsbothclusteringapproacheslackedthe as PCA-SIFT (Ke and Sukthankar, 2004) or Affine- abilitytoattainaveryhighdetectionratewithavery SIFT (Morel and Yu, 2009). SIFT and SURF and low false positive rate at the same time. For our test its derivatives are computationally demanding dur- cases a processing power is not a limitation and the ing matching process. In last years there has been imagesareofaveryhighquality. Ourtestdatacon- big development in feature points based on binary tains from zero to up to 100 objects per image and test pairs, that can be matched and described in a presentsdifferentenvironmentalconditions. Mostof very fast manner. The flagships of this approach the stat-of-the-art publications do not test detection are BRIEF (Calonder et al., 2010), ORB (Rublee capabilities for such complex tasks. We found that et al., 2011), BRISK (Leutenegger et al., 2011) and current approaches cannot maintain good detection FREAK (Alahi et al., 2012) features. Most of the ratetofalsepositiverateratioonsatisfactorylevelin cited algorithms can be used to create dense and manyreallifeapplications. highlydiscriminativevotingspace, whichholdssub- stantialobjectcorrespondencedataneededtoaccom- Thispaperpresentsthemethodofvotespaceanal- plishmanyofthereal-worlddetectiontasks. ysis,apartofinventionshownin(Kurzejamskietal., 2014).Themethodcanbeadjustedtoavastvarietyof 2.2 Frameworks objectdetectionpurposes,wheretheeffectivenessand a low false positive rate is crucial. The method has beendevelopedtoworkwellwithhugeamountoffea- There are few approaches to conduct multi-object turedata,extractedfromhighqualityimages.Mostof multi-detection, meaning detecting multiple differ- the algorithms used in the new approach come with ent objects on the scene, where any object can be a logical justification. The new method uses the it- visible in multiple places. Viola and Jones (Viola erative vote aggregation, starting from proposition’s and Jones, 2001) developed cascade of boosted fea- positions. Propositions are generated from a graph- tures,thatcanefficientlydetectmultipleinstancesof ical vote space analysis. Aggregated data undergoes the same object in a one pass of the detection pro- analysisandfiltration. Wholeprocesshasatwo-pass cess. The method needs a time consuming, learn- model,thatmakesthemethodrobusttosomespecific ing process with thousands of images. Method has objectpositioninginthescene.Cascadeofaspecially been mostly tested on general objects, as people, selectedsetoffilteralgorithmshasbeenutilizedtore- cars, faces. Most straightforward method for multi- jectmostofthefalsepositivedetections. detectionisusingalloftheslidingwindowsasused, forexample, inSarwas’andSkoneczny’swork(Sar- Algorithm 1: Vote Data, Vote Image and proposi- was and Skoneczny, 2015). Most of them are un- tionscreation Data: OriginalPatterns(OPT),SceneImage fortunately computationally expensive. High effec- (SCN) tivenesscanbeachievedwithHistogramofOriented Result: VoteDataandpropositionsfor Gradients (Dalal and Triggs, 2005) and Deformable object’scentresforeachpattern. PartModels(Felzenszwalbetal.,2010). Thebiggest drawbacksforourapplicationisthatDeformablePart 1 FeaturepointsextractiononOPTandSCN; Models needs learning stage and Histogram of Ori- 2 foreachpatterninOPT do ented Gradients is not rotation invariant. Blaschko 3 Findcorrespondences(COR)between patternandSCNfeaturepoints; and Lampert in (Blaschko and Lampert, 2008) uses SVM to enhance the sliding window process. Ef- 4 foreachcorrespondenceinCORdo ficient subwindows search has been used in (Lam- 5 Rejectifhaslowdistancevalue; pert et al., 2008). In addition, branch-and-bound 6 Rejectifhashighhuedifferencevalue; approaches, as in (Yeh et al., 2009), are promising 7 Calculateadjacencyvalue; for multi-detection purposes with conjunction with 8 end Bag-of-words descriptors. Lowe (Lowe, 2004) pro- 9 Creationofvotespace(VS)fromCOR; posed generalized Hough Transform for clustering 10 Creationofvoteimage(VI)fromVS; vote space with SIFT correspondence data. Authors 11 Searchforpropositions(PR)inVI; of (Azad et al., 2009) created a 4D voting space 12 SortPRlist; andusedcombinationofHough,RANSACandLeast 13 end Squares Homography Estimation in order to detect and accept potential object instances. Zickler in et al. (Zickler and Efros, 2007) used angle differences 3.1 Voteimagecreation criterion in addition to RANSAC mechanisms and votenumberthreshold. Zickleretal.in(Zicklerand First part of our method is the vote space and vote Veloso,2006)usedcustomprobabilisticmodelinad- imagecreation(lines9and10ofAlgorithm1). Vote ditiontoHoughalgorithm. space consists of multiple dimensions: X, Y, Scale, Rotation and Distance. Each vote contains specific Inoursystem’sapplicationwecoulduseonlyone X and Y position of the center of the object. The genericpatternimageperobjectsowerejectedmost Distance may be the result of using specific metric ofthelearning-basedglobaldescriptors. for particular feature points. For SIFT the standard procedureistouseL2distanceforitsfeaturevector, which contains gradient data in the area around the characteristicpoint. Onemayusesomeadditionalin- 3 ALGORITHM formationindistancecalculation,ascolordifference. SomeonecanuserankingmethodastheLSHhashing insteadoftheL2metricaswell. The algorithm presented by authors is built upon Voteimageistheprojectionoftheadjacencydata twomechanisms: thevotespacescreationandavote available in vote space onto X and Y dimensions. aggregation for each of the vote spaces created. The Vote image has one intensity channel, created by vote space is created for each pattern. Its adjacency normalizing adjacency sum cue. Another approach dataisprojectedontothe(X,Y)plane,creatingvote would be to use the distance value instead of adja- images (one for each vote space). The vote images cency as the main cue. We found L2 metric, as well are analysed in search for object’s position proposi- as many other distance-based approaches, as insuffi- tions. This mechanism is shown in the Algorithm 1. cient. The aggregation process is performed for each vote Votesinthevotespacesarebuiltuponfilteredcor- space and for its each proposition, starting from the respondencesets. Thedistancethresholdusedinline propositionwiththehighestadjacencyvalue. Aggre- 5ofAlgorithm1wascalculatedas: gation consists of two passes with slightly different vote gathering approaches. The first pass is needed MIN(V)+MAX(V) thr= , (1) toestimatethedetectedobject’sareainthescene,so 2 thesecondaggregationpasswouldgatheronlyvotes where V is the votes group and the MIN and MAX consideredtobefromthatparticularobject’sinstance. operatorsreturnthevalueofavotewithminimaland The structure of each pass is presented in Algorithm maximaldistancevaluefromthegroup.Therejection 2. functionDispresentedinequation2. Algorithm 2: Vote aggregation and detection ac- thevoteimageandcorrespondingpartofvotespace, ceptance wherethepotentialobject’scenterislocated.Weused Data: VI,VS,PR GoodFeaturesToTrackbyTomasiandShi(Shiand Result: Occurrences(OCR)intheSCNfora Tomasi,1994)todetectmultiplelocalmaximasinthe particularpattern vote image and used them as the propositions. The 1 foreachPropositioninPRdo number of propositions should be much higher than 2 GatherallvotesinlocalareafromVS; a number of objects in the image. It is trivial to set 3 Uniquefilteringforgatheredvotes(V); the Good Features To Track to find all the important 4 CascadefilteringforV; pointsintheimage,butitleadstogenerationofthou- 5 ifnotrejectedbycascadefilteringthen sands of propositions. Number of propositions will 6 Estimateobject’sarea; significantly impact the algorithm’s processing time, 7 GatherallvoteswithaFloodFill soitisnotpossibletoignoretheneedforatrade-off algorithm; in detector’s parameter adjustment. For each propo- 8 UniquefilteringfornewV; sition’sX andY position theadjacency sumfor cor- 9 CascadefilteringfornewV; respondingvotesinvotespaceiscalculatedandused 10 ifnotrejectedbycascadefilteringthen forsortingpurposesinline12. Thehighestadjacency 11 Calculateobject’sarea; value proposition should be the first taken later into 12 CreateoccurrenceentryinOCR; a vote aggregation process. As the adjacency sum is 13 Eraseallvotedatainoccurrence’s proportional to the channel value in the vote image, areainVSandVI; the cue for sorting stage is easy to compute. Sort- 14 else ing the propositions ensures, that the strongest vote 15 rejectproposition groupingwillbeprocessed first. Incaseofthe posi- 16 end tiveobjectrecognition,thevotedatacorrespondingto 17 else object’sdetectionareawillbeerasedfromvotespace 18 rejectproposition andimage. 19 end 20 end 3.2 Voteaggregation Second part of our approach contains a vote aggre- gation mechanism. Vote aggregation starts from a (cid:26) accept, dist(v)≤thr proposition’sposition, whichshouldbethecenterof D(v)= . (2) reject, dist(v)>thr a local vote grouping in the vote image. Data of the votegroupingscansignificantlyvaryfordifferentob- Wetransformdistancevalueintonormalizedadja- jectinstancesinthescene. Thebestinstancescanbe cencyvalueinrangefrom0to1(line7ofAlg. 1). 1 represented by hundreds of votes, when the weakest indicatesperfectmatch. 0indicatesneartorejection positive object response can be connected with only difference between feature points. We transformed a few. Generic clustering may ignore such clusters distancevaluesintoadjacency(adj)valueswithaspe- andmergeitwiththebiggerones. Genericclustering cificfunction: algorithmshasgenericparameters,thatarehardtoad- justwithobject-orientedlogicorevenintuition.Some (cid:18)dist(v)(cid:19)2 adj(v)=1− . (3) clusteringapproachestendtoclusteralltheavailable thr vote data, even if the noise (false correspondences) Adjacency values are gathered in a single chan- fillsmostofthevotespace. nel, gray vote image. Vote image can be optionally We propose an iterative 2-pass vote aggregation normalizedforvisualizationpurposes. Suchnormal- process for selective clustering purposes. In each izedimagehasbeenshowninFigure1. Ifthefeature passtheuniquefilteringandthecascadefilteringtake extraction and matching process are highly discrim- place,whichrejectfalsepositivedetections.Twopass inative, the object instances in the scene should be design prevents situations in which aggregation area recognizablebyahuman. Manualverificationofvote containsmultipleobjects.Passoneoftheaggregation imagegivessomelevelofvaluableinsightintovotes collectsallthevotesinlocalareaofproposition’spo- intersperse in the image and a level of a false votes sition(line2ofAlg. 2). Thesizeofalocalareamay groupingsrecognizablebyahuman. be a function of a corresponding pattern size. After Last step of Algorithm 1 is the search for propo- gatheringofallthevotesinthelocalarea,theunique sitions in a vote image. Proposition is a point in filteringisperformedandtheresultinggroupofvotes (a)Voteimage (b)Blurredandnormalizedvoteimage (c)Scenewithteadetections Figure1:Sampleofvoteimagegeneratedwhilelocalizingred,herbalteacasing. istestedwithacascadeoffilters(lines3and4ofAlg. Some of the false positive detections in our ex- 2). periments were initiated as a bunch of feature points placedalongasimple,steepgradientsandedges. For Inthesecondpassoftheprocesstheaggregationis instance, when the scene presented product shelves, conductedwithFloodFillalgorithm,startingfromthe more than a half of false detections contained edge proposition’s position (line 7 of Alg. 2). The Flood of the shelf near the center and its vote data present Fill range is limited to a scaled down object’s area. mostlyalongtheshelf’sedge. Such limitation can be constructed with a scale and rotation estimation from the first pass of the aggre- 3.3 Cascadefiltering gation. The limitation ensures, that the aggregation process will not collect the votes from neighbouring Cascade filtering (lines 4 and 9 of Alg. 2) is a pro- object instances. Second pass of the algorithm con- cess of validating vote group with a cascade of fil- tainsuniquefilteringandcascadefilteringaswell,as ters. Eachfiltercanacceptaggregatedvotesorreject thevotecollectionmaybedifferentinthispass. them. Anyrejectionwillresultindroppingtheaggre- Foreachgroupofvotes,auniquefilteringshould gation process and removing the processed proposi- be performed in each pass (lines 3 and 8 of Alg. tionfromthepropositionssortedqueue. Novotedata 2). Uniquefilteringpreservesonlyonevotewiththe isremovedfromvotespaceorvoteimageinthatsit- highest adjacency corresponding to the same feature uation. If all the filters in first pass accepts the vote pointinthepattern. Wecandoso, becausewewant group, process may estimate the size and rotation of the aggregated votes to be connected with only one the object represented by the majority of votes (line object. Ifmultiplevotesareconnectedwithonespe- 11ofAlg. 2). cificfeatureinthepatternwecanassumethatonlythe Cascadefilterscompriseof:(1)votecountthresh- strongestvoteisn’tthenoise. olding,(2)adjacencysumthresholding,(3)scalevari- Most of the feature point detectors incorporate ancethresholding,(4)rotationvariancethresholding, mechanismsofrejectingthepointslocatedalongthe (5) feature points binary test, (6) global normalised edges.Unfortunately,thismechanismsworkonlyina luminance cross correlation thresholding. First pass microscale. Inhighresolutionsomegraphicalstruc- of the vote aggregation uses filters: (1), (2), (3) and tures, that for human seem as a straight edge, has (4). Second pass of the process uses filters: (3), (4), a very complicated, uneven shape for characteristic (5)and(6). points detector. Characteristic points located along Vote count thresholding is a simple filter, thresh- the edges have similar features, so may be matched olding number of votes in the aggregated group. with the same feature in the pattern. It leads to gen- Loweinhiswork(Lowe,2004)proposedgeneralised eration of many false propositions, which can some- Houghtransformforobjectdetection. Inthismethod timesbeacceptedbyacascadefilters. he has assumed that only three votes are enough to identify the object. Unfortunately, such assumption process, where the theoretical object’s frame can be leads to many false positive detections. Three local calculatedfromthedatafromthefirstpass. featuresarenotenoughtodescribecomplex,generic graphics. We tested vote count thresholding for val- uesfrom3upto20. Wefound6astheoptimalvalue 4 EXPERIMENTS forfilteringouttooweakresponses.Ifthevotegroup- ingrepresentsrealobjectinstanceandhaslessthan6 votes,itmeansthattheprioralgorithmprocesseshas Ourtestingplatform,incorporatingmethoddescribed tooloweffectiveness. inthispaper,hasbeendevelopedtosearchproductlo- gosandcasingsonscenespresentingmarketshelves Adjacencysumthresholdingrejectsallthegroups and displays. The database used for the test for this of votes with sum of adjacency values less than a paper consists of 120 shelf photos taken in 12MPx threshold value. This filter in certain circumstances resolution and scaled down to 3MPx for testing pur- can be used instead of the vote count thresholding. poses. The pattern group consists of 60 generic pat- Neverthelesstherejectiondatafromthesetwofilters terns of logos and product wrappings. Each shelf may give an insight into vote certainty levels of the photowastestedwitheachoneofthepatterns,giving detection. Evenhugevotegroupingswithmorethan 7200detectionprocesses. Thephotoscontainedusu- 100votesmayhaveaverylowadjacencysumvalue. ally three classes of products so most of the patterns Scale variance thresholding rejects all the groups couldgenerateonlyfalsepositivedetections.Average of votes with a scale value variance higher than the numberofproductspresentedinthesceneswas23.6. thresholdvalue.Onemayrebuildthisfilterintomech- Patternswerescaledtohavethebiggersizebetween anismseparatingnoisesignalfrompositivedetection 512 and 256 pixels. In the application of product signalwithaGaussianmodel. Forourpurposessuch search on the market shelves we describe high qual- method is computationally too expensive. Simple ityimagesasphotosbiggerthan2MPx,withamini- variance thresholding rejects many false detections mumoftenthousandspixelsforthesmallestsearched andiseasytocompute. objectandallofthelogostextreadableforahuman. Rotation variance thresholding rejects all the Ouraggregationapproachbasesitseffectivenessupon groupsofvoteswithrotationvalue’svariancehigher chosenlocalfeatures. WeusedSIFTimplementation thanthethresholdvalue.Rotationvariancethreshold- formainexperiments. Mainadvantageofourmethod ingworksanalogicallytoscalevariancethresholding laysinfilteringoutfalsedetectionsandprocessingall but using the rotation values. A rotation variance is possible occurrences. SIFT is a state-of-the-art fea- notstraightforwardtocompute. Wesettwelvebuck- tures detector and descriptor. Our test showed that ets for rotation values and choose the three buckets 100% of the actual object instances were processed withthehighestcountnumber.Itsresultantwastaken through our cascade filtering with a proper proposi- asanaveragerotation. Allthevalueshasbeenrotated tion’slocation. That’sthankstodensepropositionde- so the average rotation was assigned to 180 degrees. tectionsandastraightforwardvoteimagecreation. Thenthevariationinregardsto180degreeshasbeen Detection effectiveness lays in proper vote group computedandusedforthresholding. filtering. Amountofpositivedetectionsrejecteddur- Featurebinarytestusesfeaturepointscorrespon- ing cascade filtering results from all the computer dence data preserved in each vote. We created mul- vision algorithms incorporated into detection system tiple luminance binary tests for random feature pairs and can be hardly used to measure aggregation ef- inthescene,whicharerepresentedbyvotesinaggre- fectiveness without proper comparisons with similar gatedvotesgroup. Wecreatedidenticaltestsforcor- methodsinthesameapplicationfield. Falsedetection responding feature points on the pattern side. Each rate yields more analytical data. We found no false setofbinarytestsprovidedabinarystringthatcanbe positivedetectionsduringourtests,thatwerefaultof comparedwithahammingdistance. Thenormalised insufficient description capability of feature descrip- distancecanbethresholded. tor. Alloffalsedetectionsweretheresultoftooloose Normalisedluminancecrosscorrelationisusedas parameters, that were needed for very high positive alastfilter. Itneedstheexactobject’sgraphicspatch detectionrate. Neverthelesswecameacross203false extractedfromthescene. It’scomputationallyexpen- detectionsin129of7200detectionprocesses,result- sive,butcanfilteroutmanyfalsepositivedetections, inginmorethan1%(Table1)falsedetectionchance that cannot be filtered by previous filters. The im- per detection process. This result seems low, but at agesareresizedtothesizeof50x50pixelsbeforethe thesametimemeans66.2%chance,thatthefalsede- calculation of the cross correlation. The filtering is tection will take place when looking for any product conductedonlyinthesecondpassoftheaggregation instancefromourpatternsdatabase. (a)Pattern (b)Scene Figure2:Sampleofdetectionresults Ourmethodhasbeencomparedtothemethodus- cessed further to discriminate different variations of ing the HOG descriptor. For the training stage we theproducts. Onecanusepartialpatternswithabag- generated set of 60 derivative images for each pat- of-wordsapproachontopofouraggregationmethod tern through small affine transformations. We used todoso. all other patterns as a negative images. We used im- plementation of HOG method, called Classifier Tool For OpenCV and FANN (HOG, 2014). Our method 5 CONCLUSIONS achieved only slightly better detection rate, but sig- nificantlylowerchanceforfalsedetections.Theaver- Inthispaperthemethodofvoteaggregationdesigned agenumberoffalsedetectionswerealmosttwotimes for use in multi-object multi-detection systems has higherfortheHOGapproach(Table2). beenintroduced. Aggregationprocessyieldspromis- ingresultsintests,leadingtoanalysisofeachpoten- Method DetectionRate FalseDetectionChance tial object in the image. The unique filtering leaves Ours 81.3% 1.79% outmanyfalseobjectoccurrencepropositionsandthe HOG 73.6% 21.42% cascadefilteringrejectsmostofthefalsepositivede- Table1: Detectionrateandfalsedetectionchanceforour tections,thatiscrucialforpresentedapplication.Sys- tests. tem built upon the aggregation method can achieve morethan80%detectionratewiththefalsedetection chance below 2%. It is still far from industrial stan- Method AverageNumberofFalseDetections dards, but there are many places for improvement as Ours 1.57 well. HOG 3.18 Presentedmethodisdesignedtoanalyzeveryhigh qualityimages. Imagesprocessedintestsweretaken Table 2: Average number of false detections for process, by a hand, resulting in high amount of blurred and wherethefalsedetectionoccurred. skewedvisualdata. Themethodofimageacquisition During experiments with product casings we en- should be analyzed further. In future work we will countered number of problems with association of incorporate estimates of the best parameters for pre- detections to a specific result group. Some prod- sented method as well as solve simple parametriza- ucts are very similar, with only slight local graphi- tion dependencies. We are going to test the system caldifferences. Thisisparticularlytrueforthesame with a two-phase approach, where second phase of brandwithdifferentaromasorcasingsizes. Figure2 thedetectionwouldusethepatternsextracteddirectly presentsoneofsuchcases,whereteacasinghasiden- fromthescene. Thepatternsizehastoomuchimpact tical logo for its few variations with one being visu- on the detection rate, as the feature points approach allyverydifferentfromtheothers. Wedecidedtoin- works the best, when the objects in the scene and in terpret only the visually off tea as a false detection. pattern images have the same size. We are going to Inretailfieldtherestofthedetectionsshouldbepro- evaluateresizingoptionsforbetterdetectionresults. ACKNOWLEDGEMENTS Kurzejamski, G., Zawistowski, J., and Sarwas, G. (2014). Apparatusandmethodformulti-objectdetectionina This work was co-financed by the European Union digitalimage. EUPatent14461566.3. withintheEuropeanRegionalDevelopmentFund. Lampert,C.H.,Blaschko,M.,andHofmann,T.(2008).Be- yondslidingwindows:Objectlocalizationbyefficient subwindow search. In Computer Vision and Pattern Recognition,2008.CVPR2008.IEEEConferenceon, REFERENCES pages1–8. Leutenegger,S.,Chli,M.,andSiegwart,R.(2011).BRISK: (2014). Classifier tool for opencv and fann v. 4.11.8. Binary robust invariant scalable keypoints. In Com- http://classifieropencv.codeplex.com/. Accessed: puter Vision (ICCV), 2011 IEEE International Con- 2014-12-18. ferenceon,pages2548–2555. Alahi,A.,Ortiz,R.,andVandergheynst,P.(2012).FREAK: Lowe, D. (1999). Object recognition from local scale- Fast retina keypoint. In Computer Vision and Pat- invariantfeatures.InComputerVision,1999.ThePro- ternRecognition(CVPR),2012IEEEConferenceon, ceedings of the Seventh IEEE International Confer- pages510–517. enceon,volume2,pages1150–1157vol.2. Azad, P., Asfour, T., and Dillmann, R. (2009). Combin- Lowe, D. (2004). Distinctive image features from scale- ingHarrisinterestpointsandtheSIFTdescriptorfor invariant keypoints. volume 60, pages 91–110. fast scale-invariant object recognition. In Intelligent KluwerAcademicPublishers. RobotsandSystems,2009.IROS2009.IEEE/RSJIn- Morel,J.-M.andYu,G.(2009). ASIFT:Anewframework ternationalConferenceon,pages4275–4280. forfullyaffineinvariantimagecomparison. SIAMJ. Bay, H., Ess, A., Tuytelaars, T., and Gool, L. V. (2008). Img.Sci.,2(2):438–469. Speeded-uprobustfeatures(SURF).ComputerVision Rublee, E., Rabaud, V., Konolige, K., and Bradski, G. and Image Understanding, 110(3):346 – 359. Simi- (2011). ORB: An efficient alternative to SIFT or larityMatchinginComputerVisionandMultimedia. SURF. In Computer Vision (ICCV), 2011 IEEE In- Blaschko,M.andLampert,C.(2008). Learningtolocalize ternationalConferenceon,pages2564–2571. objectswithstructuredoutputregression. InForsyth, Sarwas, G.andSkoneczny, S.(2015). Objectlocalization D.,Torr,P.,andZisserman,A.,editors,ComputerVi- anddetectionusingvariancefilter. InChoras´, R.S., sion ECCV 2008, volume 5302 of Lecture Notes in editor, Image Processing & Communications Chal- ComputerScience, pages2–15.SpringerBerlinHei- lenges6,volume313ofAdvancesinIntelligentSys- delberg. temsandComputing,pages195–202.SpringerInter- Calonder, M., Lepetit, V., Strecha, C., andFua, P.(2010). nationalPublishing. BRIEF: Binary robust independent elementary fea- Shi, J.andTomasi, C.(1994). Goodfeaturestotrack. In tures. In Daniilidis, K., Maragos, P., and Paragios, ComputerVisionandPatternRecognition,1994.Pro- N., editors, Computer Vision ECCV 2010, volume ceedings CVPR ’94., 1994 IEEE Computer Society 6314 of Lecture Notes in Computer Science, pages Conferenceon,pages593–600. 778–792.SpringerBerlinHeidelberg. Viola,P.andJones,M.(2001).Rapidobjectdetectionusing Dalal,N.andTriggs,B.(2005).Histogramsoforientedgra- aboostedcascadeofsimplefeatures.InComputerVi- dientsforhumandetection. InComputerVisionand sionandPatternRecognition,2001.CVPR2001.Pro- Pattern Recognition, 2005. CVPR 2005. IEEE Com- ceedingsofthe2001IEEEComputerSocietyConfer- puter Society Conference on, volume 1, pages 886– enceon,volume1,pages511–518. 893vol.1. Yeh,T.,Lee,J.,andDarrell,T.(2009). Fastconcurrentob- Felzenszwalb, P., Girshick, R., McAllester, D., and Ra- jectlocalizationandrecognition. InComputerVision manan, D. (2010). Object detection with discrim- and Pattern Recognition, 2009. CVPR 2009. IEEE inatively trained part-based models. Pattern Analy- Conferenceon,pages280–287. sisandMachineIntelligence, IEEETransactionson, Zickler,S.andEfros,A.(2007). Detectionofmultiplede- 32(9):1627–1645. formable objects using PCA-SIFT. In Proceedings Iwanowski, M., Zielin´ski, B., Sarwas, G., and Stygar, S. ofthe22NdNationalConferenceonArtificialIntelli- (2014). Identification of products on shop-racks by gence-Volume2,AAAI’07,pages1127–1132.AAAI morphologicalpreprocessingandfeature-baseddetec- Press. tion. In Chmielewski, L., Kozera, R., Shin, B.-S., andWojciechowski,K.,editors,ComputerVisionand Zickler,S.andVeloso,M.M.(2006). Detectionandlocal- Graphics,volume8671ofLectureNotesinComputer izationofmultipleobjects.InHumanoidRobots,2006 Science, pages 286–293. Springer International Pub- 6th IEEE-RAS International Conference on, pages 20–25. lishing. Ke, Y. and Sukthankar, R. (2004). PCA-SIFT: a more distinctiverepresentationforlocalimagedescriptors. In Computer Vision and Pattern Recognition, 2004. CVPR2004.Proceedingsofthe2004IEEEComputer Society Conference on, volume 2, pages II–506–II– 513Vol.2.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.