ebook img

Robust and Real-time Deep Tracking Via Multi-Scale Domain Adaptation PDF

2.1 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Robust and Real-time Deep Tracking Via Multi-Scale Domain Adaptation

ROBUSTANDREAL-TIMEDEEPTRACKINGVIAMULTI-SCALEDOMAINADAPTATION XinyuWang1,HanxiLi1∗,YiLi2,FuminShen3,FatihPorikli4 JiangxiNormalUniversity,China1 ToyotaResearchInstituteofNorthAmerica,USA2 UniversityofElectronicScienceandTechnologyofChina,China3 AustralianNationalUniversity,Australia4 7 1 ABSTRACT 0 2 Visualtrackingisafundamentalproblemincomputervision. Tracking n Recently,somedeep-learning-basedtrackingalgorithmshave Car Domain a beenachievingrecord-breakingperformances. However,due J to the high complexity of deep learning, most deep trackers Human Face 3 suffer from low tracking speed, and thus are impractical in many real-world applications. Some new deep trackers with ] V smallernetworkstructureachievehighefficiencywhileatthe Classification Domain C cost of significant decrease on precision. In this paper, we . proposetotransferthefeatureforimageclassificationtothe s visualtrackingdomainviaconvolutionalchannelreductions. c [ The channel reduction could be simply viewed as an addi- tional convolutional layer with the specific task. It not only Fig. 1. The high level concept of the proposed MSDAT 1 extractsusefulinformationforobjecttrackingbutalsosignif- tracker. Left: most of the deep neural network is pretrained v 1 icantlyincreasesthetrackingspeed. Tobetteraccommodate for image classification, where the learning algorithm focus 6 the useful feature of the target in different scales, the adap- onobjectclasses. Right: anadaptationisperformedtotrans- 5 tation filters are designed with different sizes. The yielded fer the classification features to the visual tracking domain, 0 visualtrackerisreal-timeandalsoillustratesthestate-of-the- where the learning algorithm treats the individual object in- 0 art accuracies in the experiment involving two well-adopted dependently. . 1 benchmarkswithmorethan100testvideos. 0 7 IndexTerms— visualtracking,deeplearning,real-time 1 : v 1. INTRODUCTION i X Visual tracking is one of the long standing computer vision r a tasks. During the last decade, as the surge of deep learning, transfers the features from the classification domain to the moreandmoretrackingalgorithmsbenefitfromdeepneural trackingdomain,wheretheindividualobjects,ratherthanthe networks,e.g. ConvolutionalNeuralNetworks[1,2]andRe- image categories, play as the learning subjects. In addition, current Neural Networks [3, 4]. Despite the well-admitted theadaptationcouldbealsoviewedasadimension-reduction success, a dilemma still existing in the community is that, processthatremovestheredundantinformationfortracking, deep learning increases the tracking accuracy, while at the and more importantly, reduces the channel number signifi- costofhighcomputationalcomplexity.Asaresult,mostwell- cantly. This leads to a considerable improvement on track- performing deep trackers usually suffer from low efficiency ing speed. Figure 1 illustrates the adaptation procedure. To [5,6]. Recently,somereal-timedeeptrackerswereproposed accommodatethevariousfeaturesofthetargetobjectindif- [7, 8]. They achieved very fast tracking speed, but can not ferentscales,wetrainfilterswithdifferentsizes,asproposed beat the shallow methods in some important evaluations, as in the Inception network [9] in the domain adaptation layer. weillustratelatter. Our experiment shows that the proposed MSDAT algorithm In this paper, a simple yet effective domain adaptation runsinaround35FPSwhileachievesveryclosetrackingac- algorithm is proposed. The facilitated tracking algorithm, curacytothestate-of-the-arttrackers. Toourbestknowledge, termed Multi-Scale Domain Adaptation Tracker (MSDAT), ourMSDATisthebest-performingreal-timevisualtracker. 2. RELATEDWORK ispre-trainedusingtheILSVRCdataset[16]forimageclas- sification, where the learning algorithm usually focus on the Similar to other fields of computer vision, in recent years, objectcategories. Thisisdifferentfromvisualtrackingtasks, more and more state-of-the-art visual trackers are built on wheretheindividualobjectsaredistinguishedfromotherones deeplearning.[1]isawell-knownpioneeringworkthatlearns (eventhosefromthesamecategory)andthebackground. In- deep features for visual tracking. The DeepTrack method tuitively,itisbettertotransfertheclassificationfeaturesinto [10,2]learnsadeepmodelfromscratchandupdatesitonline thevisualtrackingdomain. andachieveshigheraccuracy. [11,12]adoptsimilarlearning strategies, i.e., learning the deep model offline with a large number of images while updating it online for the current Multi-Scale Adaption video sequence. [13] achieves real-time speed via replacing Conv3_5 32@56x56 𝜔1 theslowmodelupdatewithafastinferenceprocess. 𝜔2 The HCF tracker [5] extracts hierarchical convolutional 𝜔3 Conv4_5 features from the VGG-19 network [14], then puts the fea- 64@28x28 KCF Respond tures into correlation filters to regress the respond map. It Conv5_5 can be considered as a combination between deep learning 64@14x14 and the fast shallow tracker based on correlation filters. It achieveshightrackingaccuracywhilethespeedisaround10 Input Conv1 Conv2 Conv3 Conv4 Conv5 fps. HyeonseobNametal. proposedtopre-traindeepCNNs 3@224x224 64@224x224 112@128x128 256@56x56 512@28x28 512@14x14 in multi domains, with each domain corresponding to one trainingvideosequence[6]. Theauthorsclaimthatthereex- Fig. 2. The network structure of the proposed MSDAT istssomecommonpropertiesthataredesirablefortargetrep- tracker.Threelayers,namely,conv3 5,conv4 5andconv5 5 resentationsinalldomainssuchasilluminationchanges. To are selected as feature source. The domain adaption (as extractthesecommonfeatures,theauthorsseparatedomain- showninyellowlines)reducesthechannelnumberby8times independent information from domain-specific layers. The and keeps feature map size unchanged. Better viewed in yielded tracker, termed MD-net, achieves excellent tracking color. performancewhilethetrackingspeedisonly1fps. In this work, we propose to perform the domain adapta- Recently, some real-time deep trackers have also been tion in a simple way. A “tracking branch” is “grafted” onto proposed. In[7],DavidHeldetal. learnadeepregressorthat each feature layer, as shown in Fig. 2. The tracking branch canpredictthelocationofthecurrentobjectbasedonitsap- is actually a convolution layer which reduces the channel pearanceinthelastframe. Thetrackerobtainsamuchfaster number by 8 times and keeps feature map size unchanged. trackingspeed(over100fps)comparingtoconventionaldeep Theconvolutionlayeristhenlearnedviaminimizingtheloss trackers. Similarly, in [8] a fully-convolutional siamese net- functiontailoredfortracking,asintroducedbelow. work is learned to match the object template in the current frame. It also achieves real-time speed. Even though these real-timedeeptrackersalsoillustratehightrackingaccuracy, 3.2. Learningstrategy there is still a clear performance gap between them and the The parameters in the aforementioned tracking branch is state-of-the-artdeeptrackers. learnedfollowingasimilarmannerasSingleShotMultiBox Detector (SSD), a state-of-the-art detection algorithm [17]. 3. THEPROPOSEDMETHOD When training, the original layers of VGG-19 (i.e. those onesbeforeconvx 5arefixedandeach“trackingbranch”is In this section, we introduce the details of the proposed trained independently) The flowchart of the learning proce- tracking algorithm, i.e., the Multi-Scale Domain Adaptation dureforonetrackingbranch(basedonconv3 4)isillustrated Tracker(MSDAT). inupperrowofFigure3,comparingwiththelearningstrategy ofMD-net[6](thebottomrow). Toobtainacompletedtrain- ing circle, the adapted feature in conv3 5 is used to regress 3.1. Networkstructure thobjects’locationsandtheirobjectnessscores(showninthe In HCF [5], deep features are firstly extracted from multi- dashedblock). Pleasenotethatthedeeplearningstageinthis plelayersfromtheVGG-19network[14], andasetofKCF work is purely offline and the additional part in the dashed [15] trackers are carried out on those features, respectively. blockwillbeabandonedbeforetracking. The final tracking prediction is obtained in a weighted vot- InSSD,anumberof“defaultboxes”aregeneratedforre- ingmanner. Followingthesettingin[5], wealsoextractthe gressingtheobjectrectangles. Furthermore,toaccommodate deep features from conv3 5, conv4 5 and conv5 5 network the objects in different scales and shapes, the default boxes layersoftheVGG-19model. However,theVGG-19network also vary in size and aspect ratios. Let m ∈ {1,0} be an i,j labelednegativeinonedomaincouldbeselectedasapositive Conv132_@5_1n4ox1rm4_loc Location sample in another domain. Given the training video number Smooth 𝑙1Loss isC andthedimensionofthelastconvolutionlayerisd ,the c SoftmaxLoss MD-netlearnsC independentdc×2fully-connectedalterna- Conv3_5_norm_conf Class tivelyusingC soft-maxlosses,i.e., 12@14x14 Input Conv1 Conv2 Conv3 Conv3_5 Training 3@224x224 64@224x224128@112x112256@56x5632@56x56 Mi :Rdc →R2,∀i=1,2,...,C (4) MSDAT fc Softmax Cross where Mi ,∀i ∈ {1,2,...,C} denotes the C fully- EntropyLoss fc 𝑭2𝒄𝟔𝟏 connectedlayersthattransferringthecommonvisualdomain Softmax Cross totheindividualobjectdomain,asshowninFigure3. EntropyLoss 𝑭𝒄𝟔𝟐 2 DifferingfromtheMD-net,thedomaininthisworkrefers Softmax Cross Input Conv1 Conv2 Conv3 Fc4 Fc5 EntropyLoss toageneralvisualtrackingdomain,ormorespecifically,the 3@107x10796@51x51256@11x11512@3x3512 512 𝑭𝒄𝟔𝟑 MD-Net Tracker 2 KCF domain. It is designed to mimic the KCF input in vi- sual tracking (see Figure 3). In this domain, different track- Fig. 3. The flow-charts of the training process of MSDAT ing targets are treated as one category, i.e., objects. When and MD-net. Note that the network parts inside the dashed training, the object’s location and confidence (with respect blocksareonlyusedfortrainingandwillbeabandonedbefore to the objectness) are regressed to minimize the smoothed tracking. Betterviewedincolor. l loss. Mathematically, we learn a single mapping function 1 M (·)as conv M :Rdc →R4 (5) msdat indicatorformatchingthei-thdefaultboxtothej-thground wheretheR4spaceiscomposedofoneR2spacefordisplace- truthbox. ThelossfunctionofSSDwrites: ment{x,y}andonelabelspaceR2. 1 Compared with Equation 4, the training complexity in L(m,c,l,g)= (L (m,c)+αL (m,l,g)) (1) N conf loc Equation5decreasesandthecorrespondingconvergencebe- comesmorestable. Ourexperimentprovesthevalidityofthe where c is the category of the default box, l is the predicted proposeddomainadaptation. bounding-boxwhilegistheground-truthoftheobjectbox,if applicable. Forthej-thdefaultboxandthei-thground-truth, thelocationlossLi,j iscalculatedas 3.3. Multi-scaledomainadaptation loc (cid:88) As introduced above, the domain adaption in our MSDAT Li,j(l,g)= m ·smooth (lu−gˆu) (2) loc i,j L1 i j methodisessentiallyaconvolutionlayer. Todesignthelayer, u∈{x,y,w,h} an immediate question is how to select a proper size for the filters. AccordingtoFigure2,thefeaturemapsfromdifferent wheregˆu,u ∈ {x,y,w,h}isoneofthegeometryparameter layers vary in size significantly. It is hard to find a optimal ofnormalizedground-truthbox. filer size for all the feature layers. Inspired by the success However,thetaskofvisualtrackingdiffersfromdetection ofInceptionnetwork[9],weproposetosimultaneouslylearn significantly. WethustailorthelossfunctionfortheKCFal- the adaptation filters in different scales. The response maps gorithm,whereboththeobjectsizeandtheKCFwindowsize withdifferentfiltersizesarethenconcatenatedaccordingly,as are fixed. Recall that, the KCF window plays a similar role showninFigure4. Inthisway,theinputoftheKCFtracker asdefaultboxesinSSD[15], wethenonlyneedtogenerate involvesthedeepfeaturesfromdifferentscales. one type of default boxes and the location loss Li,j(l,g) is loc Inpractice,weuse3×3and5×5filtersforallthethree simplifiedas featurelayers. GiventheoriginalchannelnumberisK,each (cid:88) typeoffiltergenerate K channelsandthusthechannelreduc- Li,j(l,g)= m ·smooth (lu−gu) (3) 16 loc i,j L1 i j tionratioisstill8:1. u∈{x,y} Inotherwords,onlythedisplacement{x,y}istakenintocon- 3.4. Makethetrackerreal-time siderationandthereisnoneedforground-truthboxnormal- 3.4.1. Channelreduction ization. Note that the concept of domain adaptation in this work Oneimportantadvantageoftheproposeddomainadaptation is different from that defined in MD-net [6], where differ- istheimprovementofthetrackingspeed. Itiseasytoseethat entvideosequencesaretreatedasdifferentdomainsandthus the speed of KCF tracker drops dramatically as the channel multiple fully-connected layers are learned to handle them numberincrease. Inthiswork,aftertheadaptation,thechan- (see Figure 3). This is mainly because in MD-net samples nelnumberisshrunkby8timeswhichacceleratesthetracker thetraininginstancesinasliding-windowmanner,Anobject by2to2.5times. rulesasHCFtracker[5],theinputwindowis10%largerthan 7x7 the KCF window, both in terms of width and height. Facili- tatedbythelazyfeed-forwardstrategy,intheproposedalgo- rithm,feed-forwardisconductedonlyonceinmorethan60% videoframes. Thisgivesusanother50%speedgain. 5x5 4. EXPERIMENT 4.1. Experimentsetting 3x3 Inthissection,wereporttheresultsofaseriesofexperiment involving the proposed tracker and some state-of-the-art ap- Conv3_4 Conv3_5 proaches. OurMSDATmethodiscomparedwithsomewell- 256@56x56 36@56x56 performingshallowvisualtrackersincludingtheKCFtracker [15],TGPR[18],Struck[19],MIL[20],TLD[21]andSCM Fig.4. Learntheadaptationlayerusingthreedifferenttypes [22]. Also, some recently proposed deep trackers including offilters MD-net[6],HCF[5],GOTURN[7]andtheSiamesetracker [8] are also compared. All the experiment is implemented 3.4.2. Lazyfeed-forward in MATLAB with matcaffe [23] deep learning interface, on acomputerequippedwithaInteli74770KCPU,aNVIDIA Anothereffectivewaytoincreasethetrackingspeedistore- GTX1070graphiccardand32GRAM. duce the number of feed-forwards of the VGG-19 network. The code of our algorithm is published in Bitbucket InHCF,thefeed-forwardprocessisconductfortwotimesat viahttps://bitbucket.org/xinke_wang/msdat, eachframe,oneforpredictionandoneformodelupdate[5]. pleaserefertotherepositoryfortheimplementationdetails. However,wenoticethatthedisplacementofthemovingob- jectisusuallysmallbetweentwoframes.Consequently,ifwe 4.2. ResultsonOTB-50 maketheinputwindowslightlylargerthantheKCFwindow, one can reuse the feature maps in the updating stage if the Similartoitsprototype[24],theObjectTrackingBenchmark new KCF window (defined by the predicted location of the 50 (OTB-50) [25] consists 50 video sequences and involves object)stillresidesinsidetheinputwindow. Wethuspropose 51 tracking tasks. It is one of the most popular tracking alazyfeed-forwardstrategy,whichisdepictedinFigure5. benchmarks since the year 2013, The evaluation is based on twometrics: centerlocationerrorandboundingboxoverlap ratio.Theone-passevaluation(OPE)isemployedtocompare our algorithm with the HCF [5], GOTURN [7], the Siamese tracker[8]andtheaforementionedshallowtrackers. There- Last position sultcurvesareshowninFigure6 Current position FromFigure6wecansee,theproposedMSDATmethod beats all the competitor in the overlapping evaluation while margin rankssecondinthelocationerrortest,withatrivialinferiority (around1%)toitsprototype,theHCFtracker. Recallthatthe MSDATbeatstheHCFwiththesimilarsuperiorityandruns 3timesfasterthanHCF,oneconsidertheMSDATasasuper variationoftheHCF,withmuchhigherspeedandmaintains its accuracy. From the perspective of real-time tracking, our method performs the best in both two evaluations. To our Fig. 5. The illustration of lazy feed-forward strategy. To best knowledge, the proposed MSDAT method is the best- predict the location of the object (the boy’s head), a part of performingreal-timetrackerinthiswell-acceptedtest. theimage(greenwindow)iscroppedforgeneratingthenet- workinput. Notethatthegreenwindowisslightlylargerthan 4.3. ResultsonOTB-100 theredblock,i.e.,theKCFwindowforpredictingthecurrent location. Ifthepredictedlocation(showninyellow)stillre- TheObjectTrackingBenchmark100istheextensionofOTB- sides inside the green lines, one can reuse the deep features 50 and contains 100 video sequences. We test our method bycroppingthecorrespondingfeaturemapsaccordingly. underthesameexperimentprotocolasOTB-50andcompar- ingwithalltheaforementionedtrackers. Thetestresultsare Inthiswork,wegeneratetheKCFwindowusingthesame reportedinTable1 Precision plots Success plots 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 e noisicerP0.5 HCFT(11fps) [89.07] tar sseccu0.5 ours(32fps) [61.41] 0.4 ours(32fps) [88.01] S0.4 SiamFC(58fps) [61.22] DeepTrack(3fps) [82.60] HCFT(11fps) [60.47] 0.3 SiamFC(58fps) [81.53] 0.3 DeepTrack(3fps) [58.92] TGPR(0.66fps) [76.61] TGPR(0.66fps) [52.94] KCF(245fps) [74.24] KCF(245fps) [51.64] 0.2 Struck(10fps) [65.61] 0.2 SCM(0.37fps) [49.90] SCM(0.37fps) [64.85] Struck(10fps) [47.37] 0.1 GOTURN(165fps) [62.51] 0.1 GOTURN(165fps) [45.01] TLD(22fps) [60.75] TLD(22fps) [43.75] MIL(28fps) [47.47] MIL(28fps) [35.91] 0 0 0 5 10 15 20 25 30 35 40 45 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Location error threshold Overlap threshold Fig.6. Thelocationerrorplotsandtheoverlappingaccuracyplotsoftheinvolvingtrackers,testedontheOTB-50dataset. Sequence Ours HCF MD-Net SiamFC GOTURN KCF Struck MIL SCM TLD DPrate(%) 83.0 83.7 90.9 75.2 56.39 69.2 63.5 43.9 57.2 59.2 OS(AUC) 0.567 0.562 0.678 0.561 0.424 0.475 0.459 0.331 0.445 0.424 Speed(FPS) 34.8 11.0 1 58 165 243 9.84 28.0 0.37 23.3 Table1. TrackingaccuraciesofthecomparedtrackersonOTB-100 As can be seen in the table, the proposed MSDAT algo- 5. CONCLUSIONANDFUTUREWORK rithmkeepitssuperiorityoveralltheotherreal-timetrackers and keep the similar accuracy to HCF. The best-performing In this work, we propose a simple yet effective algorithm to MD-net(accordingtoourbestknowledge)enjoysaremark- transferring the features in the classification domain to the ableperformancegapoveralltheothertrackerswhilerunsin visual tracking domain. The yielded visual tracker, termed around1fps. MSDAT, is real-time and achieves the comparable tracking accuracies to the state-of-the-art deep trackers. The experi- mentverifiesthevalidityoftheproposeddomainadaptation. Admittedly, updating the neural network online can lift thetrackingaccuracysignificantly[2,6]. However,theexist- 4.4. Thevalidityofthedomainadaptation ingonlineupdatingschemeresultsindramaticalspeedreduc- tion.Onepossiblefuturedirectioncouldbetosimultaneously Tobetterverifytheproposeddomainadaptation,herewerun updatetheKCFmodelandacertainpartoftheneuralnetwork another variation of the HCF tracker. For each feature layer (e.g. thelastconvolutionlayer). Inthisway,onecouldstrike (conv3 4, conv4 4, conv5 4)ofVGG-19, onerandomlyse- the balance between accuracy and efficiency and thus better lectsoneeighthofthechannelsfromthislayer. Inthisway, trackercouldbeobtained. the input channel numbers to KCF are identical to the pro- posedMSDATandthusthealgorithmcomplexityofthe“ran- 6. REFERENCES domHCF”andourmethodarenearlythesame. Thecompar- [1] NaiyanWangandDit-YanYeung, “Learningadeepcompactimage isonofMSDAT,HCFandrandomHCFonOTB-50isshown representationforvisualtracking,”inNIPS,pp.809–817.2013. inFigure7 [2] Hanxi Li, Yi Li, and Fatih Porikli, “Deeptrack: Learning discrimi- Fromthecurvesonecanseealargegapbetweentheran- nativefeaturerepresentationsonlineforrobustvisualtracking,” IEEE TransactionsonImageProcessing(TIP),vol.25,no.4,pp.1834–1848, domized HCF and the other two methods. In other words, 2016. theproposeddomainadaptationnotonlyreducethechannel [3] Anton Milan, Seyed Hamid Rezatofighi, Anthony Dick, Konrad number, but also extract the useful features for the tracking Schindler,andIanReid, “Onlinemulti-targettrackingusingrecurrent task. neuralnetworks,”arXivpreprintarXiv:1604.03635,2016. Precision plots 1 Success plots 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 noisicerP00..45 etar sseccuS00..45 0.3 0.3 0.2 0.2 HCFT [89.07] ours [61.41] 0.1 ours [88.01] 0.1 HCFT [60.47] random [72.54] random [50.68] 0 0 0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Location error threshold Overlap threshold Fig.7. ThelocationerrorplotsandtheoverlappingaccuracyplotsofthethreeversionoftheHCFtracker: theoriginalHCF, theMSDATandtherandomHCFmethod. TestedontheOTB-50dataset,betterviewedincolor. [4] GuanghanNing,ZhiZhang,ChenHuang,ZhihaiHe,XiaoboRen,and [16] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev HaohongWang, “Spatiallysupervisedrecurrentconvolutionalneural Satheesh,SeanMa,ZhihengHuang,AndrejKarpathy,AdityaKhosla, networksforvisualobjecttracking,”arXivpreprintarXiv:1607.05781, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei, “ImageNet 2016. LargeScaleVisualRecognitionChallenge,” InternationalJournalof ComputerVision(IJCV),vol.115,no.3,pp.211–252,2015. [5] ChaoMa,Jia-BinHuang,XiaokangYang,andMing-HsuanYang,“Hi- erarchicalconvolutionalfeaturesforvisualtracking,” inICCV,2015, [17] WeiLiu,DragomirAnguelov,DumitruErhan,ChristianSzegedy,and pp.3074–3082. Scott Reed, “Ssd: Single shot multibox detector,” arXiv preprint arXiv:1512.02325,2015. [6] Hyeonseob Nam and Bohyung Han, “Learning multi-domain con- volutional neural networks for visual tracking,” arXiv preprint [18] Jin Gao, Haibin Ling, Weiming Hu, and Junliang Xing, “Transfer arXiv:1510.07945,2015. learningbasedvisualtrackingwithgaussianprocessesregression,” in ECCV,pp.188–203.2014. [7] David Held, Sebastian Thrun, and Silvio Savarese, “Learning to track at 100 fps with deep regression networks,” arXiv preprint [19] SamHare,AmirSaffari,andPhilipHSTorr,“Struck:Structuredoutput arXiv:1604.01802,2016. trackingwithkernels,”inICCV,2011,pp.263–270. [8] Luca Bertinetto, Jack Valmadre, Joa˜o F Henriques, Andrea Vedaldi, [20] BorisBabenko,Ming-HsuanYang,andSergeBelongie,“Visualtrack- andPhilipHSTorr, “Fully-convolutionalsiamesenetworksforobject ingwithonlinemultipleinstancelearning,”IEEETransactionsonPat- tracking,”inECCV,2016,pp.850–865. ternAnalysisandMachineIntelligence(TPAMI),pp.1619–1632,2011. [9] ChristianSzegedy,WeiLiu,YangqingJia,PierreSermanet,ScottReed, [21] Zdenek Kalal, Jiri Matas, and Krystian Mikolajczyk, “Pn learning: DragomirAnguelov,DumitruErhan,VincentVanhoucke,andAndrew Bootstrappingbinaryclassifiersbystructuralconstraints,” inCVPR, Rabinovich, “Goingdeeperwithconvolutions,” inCVPR,2015,pp. 2010,pp.49–56. 1–9. [22] WeiZhong,HuchuanLu,andMing-HsuanYang,“Robustobjecttrack- ingviasparsity-basedcollaborativemodel,”inCVPR,2012,pp.1838– [10] HanxiLi,YiLi,andFatihPorikli, “Deeptrack: Learningdiscrimina- 1845. tivefeaturerepresentationsbyconvolutionalneuralnetworksforvisual tracking,”BMVC,2014. [23] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, JonathanLong, RossGirshick, SergioGuadarrama, andTrevorDar- [11] NaiyanWang,SiyiLi,AbhinavGupta,andDit-YanYeung, “Transfer- rell,“Caffe:Convolutionalarchitectureforfastfeatureembedding,”in ringrichfeaturehierarchiesforrobustvisualtracking,” arXivpreprint ACMMM,2014,pp.675–678. arXiv:1501.04587,2015. [24] YiWu,JongwooLim,andMing-HsuanYang,“Onlineobjecttracking: [12] SeunghoonHong,TackgeunYou,SuhaKwak,andBohyungHan,“On- Abenchmark,”inCVPR,2013,pp.2411–2418. line tracking by learning discriminative saliency map with convolu- tionalneuralnetwork,”inICML,2015,pp.597–606. [25] YiWu,JongwooLim,andMing-HsuanYang,“Objecttrackingbench- mark,” IEEETransactionsonPatternAnalysisandMachineIntelli- [13] KaihuaZhang,QingshanLiu,YiWu,andMing-HsuanYang, “Robust gence(TPAMI),vol.37,no.9,pp.1834–1848,2015. trackingviaconvolutionalnetworkswithoutlearning,” arXivpreprint arXiv:1501.04505,2015. [14] K.SimonyanandA.Zisserman,“Verydeepconvolutionalnetworksfor large-scaleimagerecognition,”CoRR,vol.abs/1409.1556,2014. [15] Joa˜o F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista, “High-speedtrackingwithkernelizedcorrelationfilters,” IEEETrans- actionsonPatternAnalysisandMachineIntelligence(TPAMI),vol.37, no.3,pp.583–596,2015.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.