Table Of Content

Deep Region Hashing for Efficient Large-scale Instance Search from Images JingkuanSong TaoHe LianliGao ColumbiaUniversity UniversityofElectronicScience UniversityofElectronicScience jingkuan.song@gmail.com andTechnologyofChina andTechnologyofChina tao.he@uestc.edu.cn lianli.gao@uestc.edu.cn XingXu HengTaoShen 7 UniversityofElectronicScience UniversityofElectronicScience 1 0 andTechnologyofChina andTechnologyofChina 2 xing.xu@uestc.edu.cn shenhengtao@hotmail.com n a J 6 Abstract Ithaslongbeenahotresearchtopicduetothemanyappli- 2 cationssuchasimageclassificationanddetection[33,35]. Instance Search (INS) is a fundamental problem for Early research [20] usually extracts image-level hand- ] V manyapplications,whileitismorechallengingcomparing craftedfeaturessuchascolor,shapeandtexturetosearcha totraditionalimagesearchsincetherelevancyisdefinedat C userquery.Recently,duetothedevelopmentofdeepneural the instance level. Existing works have demonstrated the . networks,usingtheoutputsoflastfully-connectednetwork s success of many complex ensemble systems that are typi- c layers as global image descriptors to support image clas- [ callyconductedbyfirstlygeneratingobjectproposals,and sification and retrieval have demonstrated their advantage then extracting handcrafted and/or CNN features of each overpriorstate-of-the-art[18]. Morerecently,wehavewit- 1 proposalformatching. However,objectboundingboxpro- v nessedtheresearchattentionshiftedfromfeaturesextracted posals and feature extraction are often conducted in two 1 from the fully-connected layers to deep convolutional lay- 0 separatedsteps,thustheeffectivenessofthesemethodscol- ers[1], anditturnsoutthatconvolutionallayeractivations 9 lapses. Also,duetothelargeamountofgeneratedpropos- havesuperiorperformanceinimageretrieval. However,us- 7 als, matching speed becomes the bottleneck that limits its ingimage-levelfeaturesmayfailtosearchinstancesinim- 0 application to large-scale datasets. To tackle these issues, . ages. AsillustratedinFig. 1, sometimesatargetregionis 1 inthispaperweproposeaneffectiveandefficientDeepRe- onlyasmallportionofadatabaseimage. Directlycompar- 0 gionHashing(DRH)approachforlarge-scaleINSusingan ingthequeryimageandawholeimageinthedatabasemay 7 image patch as the query. Specifically, DRH is an end-to- 1 resultaninaccuratematching. Therefore,itisnecessaryto : enddeepneuralnetworkwhichconsistsofobjectproposal, considerthelocalregioninformation. v featureextraction,andhashcodegeneration. DRHshares i Recently,manysuccessfulINSapproaches[29,17]were X full-image convolutional feature map with the region pro- proposed by combining fast filtering with more computa- r posal network, thus enabling nearly cost-free region pro- tionallyexpensivespatialverificationandqueryexpansion. a posals. Also, each high-dimensional, real-valued region Morespecifically,allimagesinadatabasearefirstlyranked features are mapped onto a low-dimensional, compact bi- accordingtotheirdistancestothequery,andthenaspatial narycodesfortheefficientobjectregionlevelmatchingon verification is applied to re-rank the search results, and fi- large-scale dataset. Experimental results on four datasets nally a query expansion is followed to improve the search show that our DRH can achieve even better performance performance. Anintuitivewayforspatialverificationisthe than the state-of-the-arts in terms of MAP, while the effi- sliding window strategy. It generates sliding windows at ciencyisimprovedbynearly100times. differentscalesandaspectratiosoveranimage. Eachwin- dowisthencomparedtothequeryinstanceinordertofind the optimal location that contains the query. A more ad- 1.Introduction vancedstrategyistouseobjectproposalalgorithms[32,7]. Large-scale instance search (INS) is to efficiently re- It evaluates many image locations and determines whether trieve the images containing a specific instance in a large theycontaintheobjectornot.Therefore,thequeryinstance scaleimagedataset, givingaqueryimageofthatinstance. onlyneedtocomparewiththeobjectproposalinsteadofall 1 consists of layers for feature extraction, object proposal, and hash code generation. DRH can generate nearly cost- free region proposals because the region proposal network (a) (b) sharesfull-imageconvolutionalfeaturemap. Also,thehash Sim(Hq,Hr) codegenerationlayerlearnsbinarycodestorepresentlarge region scaleimagesandregionstoaccelerateINS.Itisworthwhile tohighlightthefollowingaspectsofDRH: Sim(Hq,Hg) (c) • Weproposeaneffectiveandefficientend-to-endDeep Region Hashing (DRH) approach, which consists of objectproposal,featureextraction,andhashcodegen- Sim(Hq,Hr)==Sim(Hq,Hg) ?? eration,forlarge-scaleINSwithanimagepatchasthe query. Givenanimage,DRHcanautomaticallygener- Figure1.Purelyusingglobalhashcodestocalculatethesimilar- atethehashcodesforthewholeimageandtheobject ity between query instance and database images is not accurate candidateregions. enough. (a)isaqueryinstancewithadeephashcoderepresen- tationH ;(c)isadatabaseimagewithadeepglobalregionhash q • We design different strategies for object proposals codeH bytakingthewholeimageasanregion;(b)isanregion g based on the convolutional feature map. The region pooledfrom(c)andwithadeepregionhashcodeH. However, l proposal is nearly cost-free, and we also integrate it is clear that conduct INS search only using global region (c) information is inaccurate and the deep region hashing should be hashing strategy into our approach to enable efficient considered as well. Thus, in this paper, we aim to learn a deep INSonlarge-scaledatasets. regionhashingtosupportfastandefficientINS. • Extensive experimental results on four datasets show that our generated hash codes can achieve even bet- the image locations. Existing works [29, 30] have demon- ter performance than the state-of-the-arts using real- stratedthesuccessofmanycomplexensemblesystemsthat valued features in terms of MAP, while the efficiency are conducted by firstly generating object bounding box isimprovedbynearly100times. proposalsandthenextractinghandcraftedand/orCNNfea- tures for each bonding box. Because the region detection 2.RelatedWork andfeatureextractionareconductedintwoseparatesteps, thedetectedregionsmaynotbetheoptimaloneforfeature Our work is closed related to instance retrieval, object extraction,leadingtosuboptimalresults. proposalandhashing. Webriefreviewtherelatedworkfor FasterR-CNN[26]introducesaregionproposalnetwork eachofthem. which integrates feature extraction and region proposal in Instanceretrieval. MostapproachesinINSrelyonthe a single network. The detection network shares the full- bag-of-words(BoW)model[23,20,17],whichisoneofthe imageconvolutionalfeatures,thusenablingnearlycost-free mosteffectivecontent-basedapproachesforlargescaleim- region proposals. Inspired by this, if a neural network can ageandvisualobjectretrieval. Typically,theyarebasedon simultaneousextracttheconvolutionalfeaturesandpropose localfeaturessuchasRootSIFT[29]andfewarebasedon someobjectcandidates,wecanperforminstancesearchon CNNfeatures. Forinstance, [17]proposestoconductINS these candidates. However, efficiency is a big issue for byencodingtheconvolutionalfeaturesofCNNusingBoW FasterR-CNN.ThesuccessfullyselectivesearchandRPN aggregation scheme. More recently, research has shown are still generating hundreds to thousands of candidate re- thatVLAD[11]andfishervector[19]performsbetterthan gions. If a dataset has N images and each image contains BoW[29,12,30]. Specifically,BoWencodesalocalrepre- M regions, each region is represented by a d-dimensional sentationbyindexingthenearestvisualword,whileVLAD featurevector,thenanexhaustivesearchforeachqueryre- andfishervectorquantizethelocaldescriptorwithanextra quiresN×M×doperationtomeasurethedistancestothe residualvectorobtainedbysubtractingthemeanofthevi- candidateregions. Therefore,despitetakingtheadvantages sual word or a Gaussian component fit to all observations of deep convlutional layer activations and region propos- [29, 30]. However, embedding methods have drawbacks, als,theexpensivecomputationalcostfornearestneighbors e.g.,inhibitingtruepositivematchesbetweenlocalfeatures, search in high-dimensional data limits their application to consistingoflotsofparametersandsufferingfromoverfit- largescaledatasets. ting[30]. InthispaperweproposeafastandeffectiveDeepRegion Several works are proposed to go beyond handcrafted Hashing(DRH)approachforvisualinstancesearchinlarge features for image retrieval and INS. [13, 5] trained deep scale datasets with an image patch as the query. Specif- CNNs and adopted features extracted from the fully con- ically, DRH is an end-to-end deep neural network which nectedlayerstoenhanceimageretrieval.Recently,research Deep Region Hashing Network HashLoss Dataset Hash Code List Conv1 010101 110011 Conv2 region Conv3 CCoonnv4v5 PRoeogliionng Hash 010111 010111 Layer layer regrki-o1n 011110 010111 Unsupervised 111011 Global Region Search LRHR, QE 010101 010101 010101 010101 Query Instance Training Dataset Testing Dataset Query Instance Hash Code Result List Figure2.TheoverviewofDRH.Itcontainstwoparts,i.e.,deepregionhashingnetworkpartandinstancesearchpart.Thefirstparttrains anetworkforfeatureextraction,objectproposalandhashcodegeneration,whilethesecondpartutilizesDRHtogetthequeryresults. attention shifted from extracting deep fully-connected fea- withanabjectnessscores. Thisregionproposalnetworkis tures to deep convolutional features [25, 36, 4]. For in- trainedonthePASCALVOCdatasetandachieveshighac- stance, [1] aggregates local deep features to produce com- curacy. A comprehensive survey and evaluation protocol pact global descriptors for image retrieval. [25] provides forobjectproposalmethodscanbefoundin[3]. an extensive study on the availability of image representa- Hashing. Existing hashing methods can be classified tionsbasedonconvolutionalnetworksforINS.Inaddition, intotwocategories:unsupervisedhashing[34,6]andsuper- [2] demonstrates that the activations invoked by an image visedhashing[34,15,28,6,37]. Acomprehensivesurvey within the top layers of a large convolutional neural net- andcomparisonofhashingmethodsarerepresentedin[35]. work provide a high-level descriptor of the visual content CNN have achieved great success in various tasks such as oftheimage. Also,PCA-compressedneuralcodesoutper- imageclassification,retrieval,trackingandobjectdetection form compact descriptors compressed on handcrafted fea- [3,4,2,5,18,36],andtherearefewhashingmethodsthat turessuchasSIFT.Inaddition,[27]exploresthesuitability adopts deep models. For instance, Zhao et. al.[37] pro- for INS of image and region-wise representations pooled posed a deep semantic ranking based method for learning from the trained Faster R-CNN or fine-turned Faster R- hash functions that preserve multilevel semantic similarity CNNwithsupervisedinformationbycombiningre-ranking betweenmulti-labelimages. Linet. al. [14]proposeanef- andqueryexpansions. fectivesemantics-preservingHashingmethod,whichtrans- Objectproposals.Objectproposalshavereceivedmuch formssemanticaffinitiesoftrainingdataintoaprobability attention recently. It can be broadly categorized into two distribution. To date, most of the deep hashing methods categories: super-pixels based methods (e.g., Selective are supervised and they tried to leverage supervised infor- Search [32] and Multiscale Combinatorial Grouping [22]) mation (e.g., class labels) to learn compact bitwise repre- andslidingwindowbasedapproaches(e.g.,EdgeBoxesand sentations,whilefewmethodsarefocusedonunsupervised Faster R-CNN [26]). More specifically, selective search methods, which used unlabeled data to learn a set of hash adoptstheimagestructureforguidingsamplingprocessand functions. However,theproposeddeepunsupervisedhash- captures all possible locations by greedily merging super- ing method are focused on the global image presentations pixels based on engineered low-level features. Due to its without considering the locality information which we be- goodperformance,ithasbeenwidelyusedasaninitialstep lievetobeimportantforINS. for detection such as R-CNN [7] and INS task [29]. Even though MCG achieves the best accuracy, it takes about 50 3.Methodology seconds for each image to generate region proposals. By contrast, Faster R-CNN proposed a region propsoal net- This paper explores instance search from images using work,whichtakesanimageasinput,sharesacommonset hash codes of image regions detected by an object detec- of convolutional layers with object detection network, and tionCNNorslidingwindow. Inoursetup,queryinstances finally outputs a set of rectangular objects proposals, each aredefinedbyaboundingboxoverthequeryimages. The framework of our DRH-based instance search is shown in globalregionimagedescriptor, whichwillbeusedas Fig.2,andithastwophases,i.e.,offlineDRHtrainingphase theinputforglobalregionhashcodegeneration. and online instance search phase. In this section, we de- scribethesetwophasesindetails. • RPN network. Faster R-CNN [26] proposed a Re- gion Proposal Network (RPN), which takes conv fea- 3.1.DeepRegionHashing turemapasinputandoutputsasetofrectangularob- AsisshowninFig.2,theDRHtrainingphaseconsistsof jectproposals,eachwithanabjectnessscores. Givena threecomponents,i.e.,CNN-basedfeaturemapgeneration, convfeaturemapW ×H ,itgeneratesW ×H ×9 c c c c regionproposalsandregionhashcodegeneration.Next,we (typically216,00regions). Specifically,theRPNpro- illustrateeachofthem. posed by Faster R-CNN is able to propose object regions and extracts the convolutional activations for each of them. Therefore, for each region proposal, it 3.1.1 CNN-basedRepresentations can compose a descriptor by aggregating the activa- Asmentionedabove,wefocusonleveragingconvolutional tions of that window in the RoI pooling layer, giving networks for hash learning. We adopt the well known ar- raise to the region-wise descriptors. In order to con- chitecture in [5] as our basic framework. As shown in duct image-level operation, we obtain a global region Fig. 2, our network has 5 groups of convolution layers, 4 bysettingthewholeconvmapasaregion. Toconduct maxconvolution-poolinglayers,1RegionofInterestPool- instance-level search, we obtain local regions which inglayer(RoI)and1hashinglayer. Following[5],weuse havesmallersizethantheglobalregion. 64, 128, 256, 512, 512 filters in the 5 groups of convolutional layers, respectively. More specifically, our network Afterregionproposalgenerationeitheradoptingsliding firstly processes an arbitrary image with several convolu- window based approach or RPN approach, for each image tional(conv)andmaxpoolinglayerstoproduceaconvfea- weobtainasetofregionswhichincludeaglobalregionand ture map. Next, for each region proposal, a region of in- several local regions. Next, we apply max pooling to the terest(RoI)poolinglayerextractsafixed-length(i.e.,512) correspondingregionwindowontheconvmaptogenerate featurefromthefeaturemap. Eachfeaturemapisthenfed ROIfeatureforeachregion. into the region hash layer that finally outputs hash codes representingthecorrespondingRoI. 3.1.3 RegionHashingLayer 3.1.2 RegionPoolingLayer Forinstancesearch,directlyusingRoIfeaturestocompare thequeryfeaturewithalltheROIfeaturesiseffectiveandit Therearetwowaystogenerateregionproposals: 1)sliding shouldworkwellinsmall-scaledataset. However,interms window(thenumberofregionproposalisdeterminedbya oflarge-scaledataset,thisapproachiscomputationallyex- slidingwindowoverlapingparameterλ)and2)theRegion pensive and requires large memory since it involves huge ProposalNetwork(RPN)proposedbyFaterRCNN[26]. It vector operations. In this paper, we aim to improve INS providesasetofobjectproposals, andweusethemasour search accuracy and efficiency. Inspired by the success of regionproposalsbyassumingthatthequeryimagealways hashing in terms of search accuracy, search time cost and containssomeobjects. space cost, we propose a region hashing layer to generate • Sliding window. Given a raw image with the size hashcodesfortheregionproposals. of W ×H, our network firstly generates a conv fea- Here,weconverteachRoIfeaturetoasetofbinaryhash turemapWc×Hc,whereW,Wc andH,Hc arethe code. Given a RoI feature xr ∈ R1×512, the latent region widthandheightoftheoriginalimageandconvmap, hashlayeroutputishr = σ(Wrxr +br),whereWr isthe respectively. Next, we choose windows of any possi- hash function and br is the bias. Next, we perform the bi- ble combinations of width (cid:8)Wc,W2c,W3c(cid:9) and height naryhashfortheoutputhr ∈R1×Lr toobtainabinarycode (cid:8)Hc,H2c,H3c(cid:9). We use a sliding window strategy di- yr =sgn(σ(Wrxr+br))∈{0,1}Lr. Toobtainaqualified rectlyontheconvmapwithoverlapinbothdirections. binary hash code, we propose the following optimization The percentage of overlap is measured by a overlap- problem: ping parameter λ. Next, we utilize a simple filtering 1 α β η min (cid:96)= g(y ,h )− t(h )+ r(W )+ o(b ) (1) strategytodiscardthosewindows. If WH ≤th,wedis- Wr,br 2 r r 2 r 2 r 2 r cardthescale W3c.When WH ≤th,wediscard H3c,oth- whereg(yr,hr)isthepenaltyfunctiontominimizetheloss erwisewekeeptheoriginalsettings. Whenthewidth betweentheRoIfeatureh andy .t(h ), r(W )ando(b ) r r r r r and height are set to W and H , we can generate an aretheregularizationtermforh , W andb . α, β andη c c r r r areparameters. Wedefineg(y ,h )andt(h )asbellow: generated by the whole image region. Next, the image r r r g(y ,h )=(cid:107)y −h (cid:107)2 querylistissortedbasedonthehammingdistanceofdataset r r (cid:16) r (cid:17)r F (2) itemstothequery. AftergDRH,wechoosetopM images t(hr)=tr hrhrT toformthe1strankinglistXrank =(cid:8)xrank,··· ,xrank(cid:9). 1 M Theregularizetermsr(W )ando(b )aredefinedas: Local Region Hashing Re-search (lDRH). After we r r r(W )=(cid:107)W (cid:107)2 obtained the Xrank, we perform the local region hashing r r F (3) searchbycomputingthehammingdistancebetweenquery o(b )=(cid:107)b (cid:107)2 r r F instance hash code and hash codes of each local region in To solve this optimization problem, we employ the thetopM images. Toclarify,thehashingqueryexpansions stochastic gradient descent method to learn parameters introducedbelowisoptional. Foreachimage,thedistance Wr,br. The gradient of the objective function in (2) with scoreiscalculatedbyusingtheminimalhammingdistances respecttodifferentparametersarecomputedasfollowing: betweenthehashcodeofqueryinstanceandallthelocalre- ∂(cid:96) =((h −y −αh )(cid:12)σ(cid:48)(W h +b ))x T +βW gionhashcodeswithinthatimage. Finally, theimagesare ∂Wr r r r r r r r r ∂(cid:96) =(h −y −αh )(cid:12)σ(cid:48)(W h +b )+ηb re-orderedbasedonthedistancesandwegetthefinalquery ∂br r r r r r r r (4) listXrank(cid:48) =(cid:110)xrank(cid:48),··· ,xrank(cid:48)(cid:111). 1 1 M where(cid:12)iselement-wisemultiplication. Theparameters RegionHashingQueryExpansions(QE).Inthisstudy, W andb areupdatedwithgradientdescentalgorithmuntil wefurtherinvestigatetwoqueryexpansionstrategiesbased r r convergence. ontheglobalandlocalhashcodes. • Global Region Hashing Query expansion (gQE). Af- 3.1.4 NetworkTraining ter the Global Region Search, we conduct the global hashing query expansion, which is applied after the Wetrainourmodelsusingstochasticgradientdescentwith gDRH and before the lDRH. Specifically, we firstly momentum, with back-propagation used to compute the choosetheglobalhashcodeofthetopqimageswithin gradientoftheobjectivefunction∇Loss(W ,b )withre- r r the Xrank to calculate similarities between each of spect to all parameters W and b . More specifically, we r r themwitheachglobalhashcodeoftheXrankimages. randomly initialize the RoI pooling layer and region hash- For xrank(cid:48), the similarity score is computed by using ing layer by drawing weights from a zero-mean Gaussian i the max similarity between xrank(cid:48) and the top q im- distribution with standard deviation 0.01. The shared con- i ages’ global region hash codes. Next, the Xrank is volutional layers are initialized by pre-training a model (cid:110) (cid:111) [24]. Next, we tune all the layers of the RoI layer and the re-ordered to Xrankq = xr1ankq,··· ,xrMankq and hashing layer to conserve memory. In particular, for RPN finallyweconductlDRHontheXrankq list. trainingwefixtheconvolutionallayersandonlyupdatethe • Local Region Hashing Query Expansion (lQE). Here, RPNlayersparameters.Thepascal voc2007datasetisused totraintheRPNandforthisstudywechoosetheTop3˜00 to conduct local region hashing query expansion, ini- tiallyweneedtoconductgDRHandlDRHtogetthe regions as input to the region hash layer. For sliding win- Xrank(cid:48). HeregQEisoptional. Next,wegetthetopq dow based region proposal, we use different scales to di- bestmatchedregionhashcodefromimageswithinthe rectly extract regions and then input them into the region Xrank(cid:48) list and then compare the q best region hash hashlayer. Atthisstage,theparametersforconvlayersand codes with all the regions of the Xrank(cid:48). Thus, we regionpoolinglayersareobtained. Next,wefixtheparam- choose the max value as the similarity score for each etersofconvlayersandtheregionpoolinglayertotrainthe regionhashlayer. Inaddition,learningrateissetto0.01in image and finally the Xrank(cid:48) is re-ranked based on the experiments. The parameters α, β, η were empirically thosesimilarityscores. setas100,0.001and0.001respectively. 4.Experiment 3.2.InstanceSearchusingDRH Weevaluateouralgorithmonthetaskofinstancesearch. ThissectiondescribestheINSpipelinebyutilizingDRH Firstly, we study the influence of different components of approach. ThispipelineconsistsofaGlobalRegionHash- ourframework. Then,wecompareDRHwithstate-of-the- ing (gDRH) search, Local Region Hashing (lDRH) re- art algorithms in terms of efficiency and effectiveness on rankingandtwoRegionHashingQueryExpansions(QE). twostandarddatasets. Global Region Hashing Search (gDRH). It is con- 4.1.DatasetsandEvaluationMetric ducted by computing the similarity between the hash code of a query and the global region hash codes of dataset im- We consider two publicly available datasets that have ages. For each item in the dataset, we use the hash codes been widely used in previous work. 1) Oxford Build- ings [20]. This dataset contains two sub-datasets: Oxford Table 1. The performance (mAP) variance with different code lengthsLusinggDRH. 5k, which contains 5,062 high revolutionary (1024×768) Method oxford5K Paris6K images, including 11 different landmarks (i.e., a particular gDRH(128-bits) 0.425 0.531 part of a building), and each represented by several possi- gDRH(256-bits) 0.583 0.629 blequeries;andOxford105K,whichcombinesOxford5K gDRH(512-bits) 0.668 0.724 with 100,000 distractors to allow for evaluation of scala- gDRH(1024-bits) 0.748 0.773 bility. 2) Paris Buildings [21], which contains two sub- datasetsaswell: Paris6k including6,300highrevolution- gDRH(4096-bits) 0.783 0.815 ary(1024×768)imagesofParislandmarks;andParis106 MaxConvFeature512 0.797 0.824 K, which is generated by adding 100,000 Flickr distractor tune λ = [0.4,0.5,0.6,0.7] to get different number of ob- images to Paris 6k dataset. Compared with Oxford Build- ject proposals. We further study the mAP of different top ings, it has images representing buildings with some simi- M results. gDRHcangetsomeinitialretrievalresults,and larities in architectural styles. The hashing layer is trained inthissubsection,werefinetheresultsbyusinglDRH,i.e., on a dataset which is composed of Oxford 5K training ex- gDRH+lDRH.Thereportedresults(Tab.2)areontheOx- amplesandParis6ktrainingexamples. ford 5k dataset. From these results, we can make several For each dataset, we follow the standard procedures to observations. evaluate the retrieval performance. In this paper, we use meanAveragePrecision(mAP)astheevaluationmetric. • ComparedwithRPNbasedDRH,theslidingwindow 4.2.PerformanceusingDifferentSettings based DRH performs slightly better. It also requires lessnumberofregionboxesanddoesnotneedtraining There are several different settings affecting the perfor- a RPN network, thus we adopt sliding window based manceofouralgorithm. Tocomprehensivelystudytheper- DRH as the default approach to conduct the follow- formance of DRH with different settings, we further study ing experiments. RPN is supposed to generate better different components of DRH including: 1) Hash code regionproposalsthanslidingwindow,butaworseper- lengthL;2)regionproposalcomponentanditsparameters; formanceisachievedintheexperiment. Onepotential and3)queryexpansionanditssettings. reason is that RPN is trained on pascal voc dataset, which is robust to propose the object of ‘buildings’. 4.2.1 TheEffectofHashCodeLengthL On the other hand, the objects in ‘Oxford buildings’ are usually very large, thus they can be readily cap- Firstly,weinvestigatetheinfluenceofhashcodelengthby turedbyslidingwindows. usinggDRHtoconductINS.WereportthemAPsofgDRH withdifferentcodelengthL,andalsothemAPofINSusing • For sliding window based DRH, λ affects the perfor- themaxpoolingonconv5feature. Theresultsareshownin mance. In general, the performance is not very sen- Tab. 1. These results show that with the increase of hash sitive to λ. The best performance is achieved when codelengthLfrom128to4096,theperformanceofgDRH λ = 0.6, and it only generates about 40 regions for increases from 42.5% to 78.3% on the oxford 5k dataset, eachimage. Inthefollowingexperiments,thedefault whilethemAPincreasesfrom53.1%to81.5%ontheParis settingforλis0.6. 6k dataset. Comparing L = 1024 to L = 4096, the code lengthincreasedto4times,buttheperformanceisonlyim- • ThetopM resultshavenosignificantimpactonmAP provedby3.5%and4.2%. Therefore,tobalancethesearch either for DRH. In general, with the increase of M, efficiency and effectiveness, we use L = 1024 as the de- the performance increases as well. When M = 800, fault setting. On the other hand, the best performance of the algorithm achieves the best results. When M = gDRH(i.e., L = 4096)isslightlylowerthanthatofusing 400, the performance is 0.3% lower than M = 800. max pooling on conv5 features, with 1.4% and 0.9% de- Considering the memory cost, we choose M = 400 creases,respectively. Thisisduetotheinformationlossof forthefollowingexperiments. hashcodes. 4.2.2 TheEffectofRegionProposals 4.2.3 TheEffectofQueryExpansions In this sub-experiment, we explore firstly the two ways of Inthissubsection,westudytheeffectofqueryexpansions. regionproposal: slidingwindowbasedandRPNbasedap- Thisstudyaimstoexplore:theeffectoflDRH,theeffectof proaches. Both of them can generate different number of q forgQE,theeffectofgQEandlQEforDRH,aswellas objectproposals. ForRPN,wechoose300regionstogen- severalcombinations. Theexperimentresultsareshownin erate region hash code for lDRH. For sliding window, we Tab.3. FromTab.3,wehavesomeobservationsbelow: Figure3.ThequalitativeresultsofDRHsearchingon5Kdataset.7queries(i.e.,redcliffe camera,all souls,ashmolean,balliol,bodleian, christ churchandhertford)andthecorrespondingsearchresultsareshownineachrow. Specifically,thequeryisshownintheleft,with selectedtop9rankedretrievedimagesshownintheright. Table2.TheinfluenceofTopM,λandthewayofregiongenera- Table3.Theeffectofqueryexpansions(gQEandlQE). tion. Thisexperimentisconductedontheoxford5kdatasetwith oxford Paris Settings gDRH gQE lDRH lQE L=1024. 5k 6k SlidingWindowbasedDRH RPN-DRH yes no no no 0.745 0.773 gDRH+lDRH λ=0.4 λ=0.5 λ=0.6 λ=0.7 ∼300boxes NoQE ∼21boxes ∼36boxes ∼40boxes ∼60boxes yes no yes no 0.783 0.801 M=100 0.773 0.774 0.776 0.772 0.772 yes yes,q=4 no no 0.813 0.823 M=200 0.779 0.780 0.781 0.778 0.778 M=400 0.779 0.781 0.783 0.780 0.780 gQE yes yes,q=5 no no 0.809 0.828 M=800 0.782 0.784 0.786 0.783 0.783 (differentq) yes yes,q=6 no no 0.815 0.835 yes yes,q=7 no no 0.789 0.842 • ThelDRHimprovestheINS.IntermsofmAP,forox- yes yes no no 0.815 0.835 ford5kandparis6kdatasets,itincreasesby3.8%and gQE+lQE yes yes yes no 0.804 0.831 2.8%,respectively. (q=6) yes no yes yes 0.833 0.819 • The gQE improves the performance of DRH. When yes yes yes yes 0.851 0.849 q =6,itperformsthebest(i.e.,85.1%)ontheOxford yes yes no no 0.789 0.842 5k dataset, and when q = 7 it gains the best perfor- gQE+lQE yes yes yes no 0.804 0.834 manceontheParis6kdataset. Inthefollowingexper- (q=7) yes no yes yes 0.826 0.821 iments,qissetas6. yes yes yes yes 0.838 0.854 • TheexperimentalresultsshowthatbycombininggQE • Feature Encoding approaches [8, 9, 18]. They aim and lQE, DRH performs the best (i.e., 85.1% for ox- to reducing the number of vectors against which the ford 5k and 85.4% for Paris 6k datasets). Therefore, query is compared. [9] designed a dictionary learn- queryexpansionstrategycanimproveINS. ingapproachforglobaldescriptorsobtainedfromboth SIFTandCNNfeatures. [18]extractedconvolutional 4.3.ComparingwithState-of-the-artMethods features from different layers of the networks, and adoptedVLADencodingtoencodefeaturesintoasin- To evaluate the performance of DRH, we compare our glevectorforeachimage. methodswiththestate-of-the-artdeepfeaturesbasedmeth- ods. Thesemethodscanbedividedintofourcategories: • Aggregating Deep features [25, 1, 31, 12, 17, 27, 29, Table4.Comparisontostate-of-the-artCNNrepresentation. Table5.Thetimecost(ms)fordifferentalgorithms Oxford Oxford Paris Paris CNNfeature[31] DRH DRH Methods Datasets 5K 105K 6K 106K 512-Dconv5 512-Dhash 1024-Dhash CNN Iscenetal.-Keans[8] 0.656 0.612 0.797 0.757 Oxford105K 1078 3 12 Feature Iscenetal.-Rand[8] 0.251 0.437 0.212 0.444 Encod- Ngetal.[18] 0.649 - 0.694 - Paris106K 1137 3 12 ing Iscenetal.[9] 0.737 0.655 0.853 0.789 Razavianetal.[25] 0.556 - 0.697 - improved by integrating re-ranking and query expan- Bebenkoetal.[1] 0.657 0.642 - - sion strategies. On the other hand, hashing methods CNN Toliasetal.[31] 0.669 0.616 0.830 0.757 performstheworstintermsofmAP.Thisisprobably Featues Kalantidisetal.[12] 0.684 0.637 0.765 0.691 duetotheinformationlossofhashcodes. Aggre- Mohedanoetal.[17] 0.739 0.593 0.820 0.648 gation Salvadoretal.[27] 0.710 - 0.798 - Taoetal.[29] 0.765 - - - • ThequalitativeresultsshowthatDRHcanretrievein- Taoetal.[30] 0.722 - - - stances precisely, even for the cases when the query Kalantidisetal. image is different from the whole target image, but Integra 0.749 0.706 0.848 0.794 +GQE[12] -tion similartoasmallregionofit. Toliasetal. with 0.773 0.732 0.865 0.798 +AML+QE[31] Locaility 4.4.EfficiencyStudy Mohedanoetal.[17] and 0.788 0.651 0.848 0.641 QE +Rerank+LQE Inthissubsection,wecomparetheefficiencyofourDRH Salvadoretal.[27] 0.786 - 0.842 - to the state-of-the-art algorithms. The time cost for in- +CS-SR+QE stance search usually consists of two parts: filtering and Jegouetal.[10] 0.503 - 0.491 - Hashing Liuetal.[16] 0.518 - 0.511 - re-ranking. For our DRH, the re-ranking is conducted us- DRH(gDRH+lDRH) 0.783 0.754 0.801 0.733 ing M=400 candidates, each has around 40 local regions. ours DRHAll 0.851 0.825 0.849 0.802 Therefore,thetimecostforre-rankingcanbeignoredcom- 30]. [25, 1, 31, 12] are focusing on aggregating deep paredwiththefilteringstep. Forthecomparingalgorithms, convolutional features for image or instance retrieval. theyhavedifferentre-rankingstrategies,andsomeofthem In [17], Mohedano et al. propose a simple instance donothavethisstep. Therefore,weonlyreportthetimefor retrievalpipelinebasedonencodingtheconvolutional filteringstepofthesecomparingalgorithms. features of CNN using the bag of words aggregation Wechoose[31]astherepresentativealgorithmfornon- scheme (BoW). [27] explores the suitability for in- hashing algorithms. To make fair comparison, we imple- stanceretrievalofimage-andregion-wisedescriptors ment the linear scan of the conv5 feature [31] and our pooled from Faster RCNN. For [29, 30], they are fo- hash codes using Python and C++ respectively, without cusingongenericinstancesearch. any accelerating strategies. All the experiments are done onaserverwithIntel@Corei7-6700KCPU@4.00GB×8. • Integrating with Locality and QE. Some of the pre- The graphics is GeGorce GTX TITAN X/PCle/SSE2. The vious methods are extended by integrating re-ranking searchtimeisaveragenumberofmillisecondstoreturntop and/orqueryexpansiontechniques[12,31,17,27]. M =400resultstoaquery,andthetimecostsarereported inTab. 5. • Hashing methods. Lots of hashing methods are pro- There results show that directly using max pooling fea- posedforimageretrieval[10,16]. tures extracted from conv5 to conduct INS search is much ThecomparisonresultsareshowninTab.4andwealso slowerthanusingourhashcodes. Ittakes1078msonOx- showsomequalitativeresultsinFig.3. Fromtheseresults, ford 105k dataset and 1137 ms on Paris 106k dataset for wehavethefollowingobservations. [31]. Bycontrast,whenthecodelengthis512,ourDRHis more than 300 times faster than [31], and it takes 3 ms on • First, our approach performs the best on both two bothOxford105k andParis106k datasets. Evenwhenthe large-scale datasets oxford 105k and Paris 106k with codelengthincreasesto1024,DRHcanachievenearly100 0.825 and 0.802, respectively. In addition, the per- timesfasterthan[31].Therefore,ourmethodhastheability formance on oxford 105k is higher than Tolias et al. toworkonlargescaledatasets. +AML+QE[31]witha9.3%increase. • OnParis6kdataset,ourperformanceisslightlylower 5.Conclusion thanthatofToliasetal. +AML+QE[31]by1.1%,but In this work, we propose an unsupervised deep region ontheOxford5kdataset,itoutperformsMohedanoet hashing (DRH) method, a fast and efficient approach for al.+Rerank+LQE[17]by6.3%. large-scaleINSwithanimagepatchasthequeryinput. We • TheperformanceofCNNfeaturesaggregationisbet- firstly utilize deep conv feature map to extract nearly 40 terthanCNNfeaturesingeneral,anditcanbefurther to generate hash codes to represent each image. Next, the gDRH search, gQE, lDRH and lQE are applied to further [18] J. Y. Ng, F. Yang, and L. S. Davis. Exploiting local improve the INS search performance. Experiment on four features from deep networks for image retrieval. CoRR, datasetsdemonstratethesuperiorityofourDRHcompared abs/1504.05133,2015. 1,3,7,8 toothersintermsofbothefficiencyandeffectiveness. [19] F.Perronnin,Y.Liu,J.Snchez,andH.Poirier. Large-scale image retrieval with compressed fisher vectors. In CVPR, References 2010. 2 [20] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisser- [1] A.BabenkoandV.S.Lempitsky. Aggregatingdeepconvo- man.Objectretrievalwithlargevocabulariesandfastspatial lutionalfeaturesforimageretrieval.CoRR,abs/1510.07493, matching. InCVPR,2007. 1,2,6 2015. 1,3,7,8 [21] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. [2] A. Babenko, A. Slesarev, A. Chigorin, and V. S. Lempit- Lostinquantization:Improvingparticularobjectretrievalin sky.Neuralcodesforimageretrieval.CoRR,abs/1404.1777, largescaleimagedatabases. InCVPR,2008. 6 2014. 3 [22] J.Pont-Tuset,P.Arbelaéz,J.Barron,F.Marques,andJ.Ma- [3] N.Chavali,H.Agrawal,A.Mahendru,andD.Batra.Object- lik. Multiscalecombinatorialgroupingforimagesegmenta- proposalevaluationprotocolis’gameable’. InCVPR,June tionandobjectproposalgeneration. InIEEETransactions 2016. 3 onPatternAnalysisandMachineIntelligence,2016. 3 [4] M.Cimpoi, S.Maji, andA.Vedaldi. Deepfilterbanksfor [23] D.Qin, C.Wengert, andL.V.Gool. Queryadaptivesimi- texturerecognitionandsegmentation. InCVPR,2015. 3 larityforlargescaleobjectretrieval. InCVPR,pages1610– [5] A.Conneau,H.Schwenk,L.Barrault,andY.LeCun. Very 1617,2013. 2 deep convolutional networks for natural language process- [24] F.Radenovic,G.Tolias,andO.Chum. CNNimageretrieval ing. CoRR,abs/1606.01781,2016. 2,3,4 learnsfrombow:Unsupervisedfine-tuningwithhardexam- [6] V.ErinLiong,J.Lu,G.Wang,P.Moulin,andJ.Zhou.Deep ples. CoRR,abs/1604.02426,2016. 5 hashingforcompactbinarycodeslearning. InCVPR,2015. [25] A.S.Razavian,J.Sullivan,A.Maki,andS.Carlsson.Visual 3 instanceretrievalwithdeepconvolutionalnetworks. CoRR, [7] R.Girshick,J.Donahue,T.Darrell,andJ.Malik. Richfea- abs/1412.6574,2014. 3,7,8 ture hierarchies for accurate object detection and semantic [26] S.Ren,K.He,R.Girshick,andJ.Sun. FasterR-CNN:To- segmentation. InCVPR,2014. 1,3 wards real-time object detection with region proposal net- [8] A.Iscen,T.Furon,V.Gripon,M.G.Rabbat,andH.Je´gou. works. InNIPS,2015. 2,3,4 Memory vectors for similarity search in high-dimensional [27] A. Salvador, X. Giro-i Nieto, F. Marques, and S. Satoh. spaces. CoRR,abs/1412.3328,2014. 7,8 Faster r-cnn features for instance search. In CVPR Work- [9] A.Iscen,M.Rabbat,andT.Furon.Efficientlarge-scalesim- shops,2016. 3,7,8 ilaritysearchusingmatrixfactorization. InCVPR,2016. 7, [28] D.Song,W.Liu,R.Ji,D.A.Meyer,andJ.R.Smith. Top 8 rank supervised binary coding for visual search. In ICCV, [10] H.Jegou,M.Douze,andC.Schmid. Hammingembedding December2015. 3 andweakgeometricconsistencyforlargescaleimagesearch. InECCV,pages304–317,2008. 8 [29] R.Tao,E.Gavves,C.G.M.Snoek,andA.W.M.Smeulders. Locality in generic instance search from one example. In [11] H. Jgou, M. Douze, C. Schmid, and P. Prez. Aggregating CVPR,2014. 1,2,3,7,8 local descriptors into a compact image representation. In CVPR,2010. 2 [30] R.Tao,A.W.M.Smeulders,andS.F.Chang.Attributesand [12] Y. Kalantidis, C. Mellina, and S. Osindero. Cross- categoriesforgenericinstancesearchfromoneexample. In dimensional weighting for aggregated deep convolutional CVPR,2015. 2,7,8 features. CoRR,abs/1512.04065,2015. 2,7,8 [31] G. Tolias, R. Sicre, and H. Je´gou. Particular object re- [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet trievalwithintegralmax-poolingofCNNactivations.CoRR, classification with deep convolutional neural networks. In abs/1511.05879,2015. 7,8 NIPS,pages1106–1114,2012. 2 [32] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and [14] Z.Lin,G.Ding,M.Hu,andJ.Wang. Semantics-preserving A.W.M.Smeulders.Selectivesearchforobjectrecognition. hashingforcross-viewretrieval. InCVPR,2015. 3 IJCV,104(2):154–171,2013. 1,3 [15] X.Liu,J.He,B.Lang,andS.F.Chang.Hashbitselection:A [33] A.VedaldiandA.Zisserman. Sparsekernelapproximations unifiedsolutionforselectionproblemsinhashing.InCVPR, for efficient classification and detection. In CVPR, pages pages1570–1577,2013. 3 2320–2327,2012. 1 [16] Z.Liu, H.Li, W.Zhou, R.Zhao, andQ.Tian. Contextual [34] J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing for large-scale image search. IEEE Trans. Image hashing for large-scale search. IEEE Transactions on Pat- Processing,23(4):1606–1614,2014. 8 ternAnalysisandMachineIntelligence,34(12):2393–2406, [17] E. Mohedano, A. Salvador, K. McGuinness, F. Marque´s, 2012. 3 N. E. O’Connor, and X. Giro´ i Nieto. Bags of local con- [35] J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen. A volutional features for scalable instance search. CoRR, surveyonlearningtohash. CoRR,abs/1606.00185,2016. 1, abs/1604.04653,2016. 1,2,7,8 3 [36] L.Wang,W.Ouyang,X.Wang,andH.Lu. Visualtracking withfullyconvolutionalnetworks. InICCV,2015. 3 [37] F.Zhao,Y.Huang,L.Wang,andT.Tan.Deepsemanticrank- ingbasedhashingformulti-labelimageretrieval. InCVPR, 2015. 3

Deep Region Hashing for Efficient Large-scale Instance Search from Images PDF

1.3 MB·

by Jingkuan Song

#journals #arxiv

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Deep Region Hashing for Efficient Large-scale Instance Search from Images PDF Free - Full Version

by Jingkuan Song| 1.3

Download Deep Region Hashing for Efficient Large-scale Instance Search from Images by Jingkuan Song in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Deep Region Hashing for Efficient Large-scale Instance Search from Images

No description available for this book.

Detailed Information

Author:	Jingkuan Song
File Size:	1.3
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Deep Region Hashing for Efficient Large-scale Instance Search from Images Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Deep Region Hashing for Efficient Large-scale Instance Search from Images PDF?

Yes, on https://PDFdrive.to you can download Deep Region Hashing for Efficient Large-scale Instance Search from Images by Jingkuan Song completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Deep Region Hashing for Efficient Large-scale Instance Search from Images on my mobile device?

After downloading Deep Region Hashing for Efficient Large-scale Instance Search from Images PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Deep Region Hashing for Efficient Large-scale Instance Search from Images?

Yes, this is the complete PDF version of Deep Region Hashing for Efficient Large-scale Instance Search from Images by Jingkuan Song. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Deep Region Hashing for Efficient Large-scale Instance Search from Images PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.