ebook img

Region Based CNN for Foreign Object Debris Detection on Airfield Pavement PDF

14 Pages·2017·5.13 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Region Based CNN for Foreign Object Debris Detection on Airfield Pavement

sensors Article Region Based CNN for Foreign Object Debris Detection on Airfield Pavement XiaoguangCao1,†,PengWang1,†,CaiMeng1,*,XiangzhiBai1,2,* ID,GuopingGong1, MiaomingLiu1andJunQi1 1 ImageProcessingCenter,BeijingUniversityofAeronauticsandAstronautics,Beijing100191,China; [email protected](X.C.);[email protected](P.W.);[email protected](G.G.); [email protected](M.L.);[email protected](J.Q.) 2 StateKeyLaboratoryofVirtualRealityTechnologyandSystems, BeijingUniversityofAeronauticsandAstronautics,Beijing100191,China * Correspondence:[email protected](C.M.);[email protected](X.B.); Tel.:+86-010-8233-8578(C.M.);+86-010-8231-6544(X.B.) † Thesetwoauthorscontributedequallytothiswork. Received:10November2017;Accepted:29January2018;Published:1March2018 Abstract: Inthispaper,anovelalgorithmbasedonconvolutionalneuralnetwork(CNN)isproposed todetectforeignobjectdebris(FOD)basedonopticalimagingsensors. Itcontainstwomodules, theimprovedregionproposalnetwork(RPN)andspatialtransformernetwork(STN)basedCNN classifier. IntheimprovedRPN,someextraselectrulesaredesignedanddeployedtogeneratehigh qualitycandidateswithfewernumbers. Moreover,theefficiencyofCNNdetectorissignificantly improvedbyintroducingSTNlayer. ComparedtofasterR-CNNandsingleshotmultiBoxdetector (SSD), the proposed algorithm achieves better result for FOD detection on airfield pavement in theexperiment. Keywords: foreign object debris; object detection; convolutional neural network; vehicular imagingsensors 1. Introduction AfteraplanecrashedinParisdeGaulle,thedetectionofforeignobjectdebris(FOD)onairfield pavement has become more and more important. Actually, FOD detecting is one of the crucial technologiesforintelligentvehicularsystems. Thesafetyandconvenienceofmoderntransportation couldbesignificantlyimprovedbyadoptinganefficientFODdetectingsystem. However,complex airfieldenvironmentmakesthedetectionofFODachallengingtask.Duetothevariationofbackground andinfluenceofautomotiveimagingsystem,FODscouldbehardlydetectedandrecognizedonairfield pavementbytraditionalfeatures,suchasscaleinvariantfeaturetransform(SIFT)[1],histogramsof orientedgradients(HOG)[2]andlocalbinarypatterns(LBP)[3]. Deep learning is one of the rapidly developing technologies in the area of big data. Since [4] significantly improved the effect of image classification, convolutional neural network (CNN) has been widely introduced into computer vision applications, such as image classification [5–8], face verification [9–11], semantic segmentation [12–15], object detection [16–18] and image annotation [19–21]. Moreover, some CNN based algorithms are proposed to solve transportation problems[22–24]. BasedonvariouspublicavailabledatasetssuchasImageNet[25],PascalVOC[26] andCOCO[27],CNNalgorithmshavebeenprovedtoperformbetterindetectionandrecognition than traditional feature methods. Compared with these manually designed features, CNN based featureshavebetterresolutionandrobustnessforFODdetection. Actually,theFODproblemconsists oftwotasks: targetlocationandobjectclassificationonpavement. Aimedatthesetwotasks,anovel Sensors2018,18,737;doi:10.3390/s18030737 www.mdpi.com/journal/sensors Sensors2018,18,737 2of14 two-stage framework is designed and introduced in this paper. In the first stage, region proposal network (RPN) [28], as a kind of fully convolutional network (FCN) [12], is trained end-to-end to generateFODlocationproposals. Inthesecondstage,CNNclassifierbasedonspatialtransformer network (STN) [29] is applied to obtain the parameters of scale, rotation and warping. Due to the performance of STN, FODs could be correctly identified by generated features, regardless of imagedistortion. Apreliminaryversionofthispaperwaspresentedin[30]. Thehighlightsandextensionsinthis papercouldbesummarizedasfollows. 1. AnewFODdetectionframeworkbasedonCNNmodelsforFODdetectionisproposedwith improvedregionproposalnetworkandspatialtransformernetwork. 2. RPN is firstly introduced and improved to generate high quality region proposals for FOD detectiononairfieldpavement. Inaddition,somecandidateselectrulesaredesignedtoreduce quantityandimprovequalityofregionproposals. 3. The STN based CNN classifier is proved to be effective for FOD classification. Moreover, the proposed framework achieves better result than other detection algorithms, such as faster-RCNN[28]andSSD[17],forFODdetectiononairfieldpavement. 4. Thevehicularimagingsystem,includingDGPS,cameras,alarm,FODmanagementsystemand remotequerysystem,ispresentedanddiscussedindetail. Therestofthispapercontainsfourmain sections. InSection2,somerecentworksareanalyzed, includingtraditionalfeaturebasedalgorithmsandsomeCNNalgorithms. InSection3,theoverall detectionframeworkisgivenindetails,includinglocationbasedonRPNandclassificationbasedon STN.InSection4,comparisonwithotheralgorithmsisconductedanddiscussed. Conclusionsare givenanddiscussedinSection5. 2. RelatedWork To solve FOD detection problem, some effective algorithms are proposed recently [31–36]. Thealgorithmsbasedondifferentsensors,suchasactivelyscanningLiDARsystem[31],mm-wave FMCWradar[34]andwideband96GHzMillimeter-WaveRadar[35],couldachievegoodresultsin differentenvironments. Acosecansquaredbeampatterninelevationandapencil-beampatternin azimuth,generatedthroughfoldedreflectarrayantenna(FRA)byphaseonlycontrol,isanalyzedto detectobjectsonground[36]. Amulti-sensorsystem,basedoninherentfeatureofFOD,isintroduced todetectandrecognizeFOD[32]. Basedonalargeamountofpriorknowledgeandmanuallydesigned feature extractor, these methods could transform pixel values of an image into a suitable internal representation.Thus,thesemethodsareeffectivefordetectionofFODswithlessnoise,butnotworking forFODswithcomplexbackgroundandnoise. Actually, CNN based detection has becoming more and more popular. There are two basic researchmethods: regionproposalbasedandnon-regionproposalbasedalgorithms. Regionproposal basedalgorithm,suchasfasterR-CNN[28],R-FCN[37]andMaskR-CNN[38],consistsofaregion ofinterest(ROI)generatorandanobjectclassifier. Regionproposalnetwork(RPN)isdesignedin fasterR-CNN[28],byintroducingtheclassificationfeaturesintoROIextractionmodule. Inthisway, better region proposals could be generated in less time. Moreover, this leads to high recognition accuracyandgoodreal-timeperformance. SemanticsegmentationmethodisintroducedinR-FCN[37], whichcouldcalculateclassificationpossibilityofallregionproposalsinoneimageatthesametime. This enlightening strategy greatly reduces the running time of ROI classification. Finally, object detection, semantic segmentation and key point detection are solved by one CNN algorithm [38]. Non-regionproposalbasedalgorithm,suchasYOLO[16]andSSD[17],usuallyhasbetterreal-time performancewithlowerdetectionaccuracy. Objectdetectionisdescribedasaregressionproblemin YOLO[16],withanend-to-endnetwork. YOLOv2andYOLO9000[39]aretwoimprovedversions withfasterspeedandhigheraccuracy. However, YOLOcouldhardlydetectsmallobjectssuchas Sensors2018,18,737 3of14 FODsonpavement. SimilartoYOLO,SSD[17]isalsobasedonregressionoftargetboundingbox. Toachievehigheraccuracy,featuremapsfrommulti-layersareintroducedtothelocatingandscoring module. Thisstrategyhasbeenprovedtoachievebetterperformance. TodetectFODsonairportpavement,improvedRPNandSTNbasedCNNclassifieraredesigned asgiveninSection3. 3. Algorithm TheFODdetectionframeworkisshowninFigure1,whichcontainstwostages. Firstly,region proposalnetwork(RPN)[28]isadoptedtogenerateasetoforiginalobjectproposals,whicharetaken asFODcandidates. Moreover,someselectrulesaredesignedtoreducecomputationalcostandfix imagesize. Secondly,spatialtransformernetwork(STN)[29]isintroducedtoadjustthetargetsin regionproposalsfromtheinfluenceofscale,rotationandwarping. Then,theseadjustedproposal imagesarefedtoCNNclassifiertoextractfeaturesandidentifyFODs. Figure1. TheframeworkoftheFODdetectioninthispaper. 3.1. LocateFODCandidateswithImprovedRPN Generally,slidingwindowonimagesisawidelyusedstrategyfortargetdetection.Withwell-designed features, such as SIFT [1] and HoG [2], the window with target inside could be identified from background. Inthisway,thetargetdetectionproblemcouldbedescribedasaclassificationproblem in sliding windows. However, sliding window is a low efficiency method, which costs much computing time and memory. Recently, a series of region-based convolutional neural networks (R-CNN)[28,37,38,40,41]isproposedtodetecttargetsusingdeeplearningmethod. Oneofthemost meaningfulideasfromR-CNNalgorithmsistheregionproposalstrategytolocatetargetcandidates frombackground. Ontheonehand,regionproposalalgorithmismuchfasterthanslidingwindowon wholeimagebecauseregionproposalalgorithmreducesthenumberofcandidatesfrommillionsto hundredsorthousands. Ontheotherhand,regionproposalalgorithmhashigherrecallrateforfinding all objects in an image, which could improve object location efficiency significantly. In this paper, somepriorknowledgeisintroducedinregionproposalstoreducethenumberofFODcandidatesfor betterlocation. The algorithm to generate region proposals is based on region proposal network (RPN) [28]. Actually,RPNtakesanimageasinputandgeneratesaseriesofrectangularobjectcandidateswith correspondingobjectnessscores. TogenerateregionproposalsasshowninFigure2,asmallnetwork isslidovertheconvolutionfeaturemap. Ineachslidingwindow, kreferenceboxes(anchors) [28] isdefinedindifferentscalesandaspectratiostopredictvariousregionproposals. Inotherwords, RPNisdesignedwiththehelpofconvolutionfeaturesfromdetectionmodule. Theintersectionover union(IoU)betweenregionproposalsandground-truthboundingbox,ischosenasevaluateindexof theproposedFODdetectionframework. ForFODdetectioninourdataset,targetsinaverageshape Sensors2018,18,737 4of14 80×80locatein2048×2048inputimage. TomatchthescaleofFODs,theanchorboxesaresetwith areaof1002andaspectratiosof1:1,1:2and2:1. Figure2. TheRPNframeworkforFODlocation. Although there are much fewer FOD candidates generated by RPN than by common sliding windowparadigm,thecandidatenumberisstillverylargeforclassification. TofurtherreduceFOD candidates, anovelseriesofselectrulesisdesignedandintroducedinRPNframework. Actually, candidateFODboundingboxesbyoriginalRPNvaryindifferentsizes. Forallregionproposals,FODs suchasscrewsandstoneshavedifferentshapescalesfromfalsealarms. Therefore,threeselectrules basedonpriorknowledgeareproposedandlistedasbelow. Rule1: RegionproposalsbyRPNarefilteredbyhighaspectratioRatio,whichisexpressedas max(w,h) Ratio = , min(w,h) (1) Ratio < T . ratio w and h are the width and height of the image, respectively, and T is a threshold with a ratio constantvalue. Rule2:TheareasofproposalscontainingFODsshouldbeinrange[T ,T ],whichisexpressedas min max Area=w×h, (2) T ≤ Area≤T . min max Areaistheareaofregionproposal. Rule3:TheproposalswithhigherobjectnessscoresO generatedbyRPNarepickedout,whichis l expressedas O >T . (3) l objectness T isaconstantthresholdforjudgingobjectness. objectness Sensors2018,18,737 5of14 ThethresholdsforthesethreerulesaresetasT =1.5,T =602,T =1002andT =0.8. ratio min max objectness Bydeployingtheserules,thenumberofregionproposalsbyimprovedRPNcouldbegreatlyreduced byabout60%. Inthisway,moreaccuratecandidateproposalswithfewernumbercouldbegenerated. Moreover,theefficiencyofFODclassificationcouldbegreatlyimproved. 3.2. SpatialTransformerNetwork STN[29]isaneffectiveCNNframeworktolearnscale,rotationandwarpingofimages. Witha predicted affine transformation by STN, an input image I(x,y) could be adjusted to a rectified image I(cid:48)(x,y). As shown in Figure 3, affine matrix is firstly regressed via a localization network. Then,predictedtransformationparametersareusedtogenerateasamplinggrid. Thesamplinggridis asetofpointsfrominputmapforaffinetransformation. Finally,featuremapandthesamplinggrid areutilizedasinputstobesampled. ThearchitectureofSTNisshowninFigure4. DetailsofSTN algorithmsaregivenbelow. Localization Grid Network Generator Screw CNN Stone Classification Bilinear Interpolation V Background Input Image I Rectified Image I Figure3. TheFODclassificationframework. Conv1 Conv2 Conv5 Input Image ReLU1 ReLU3 Conv3 Conv4 ReLU5 I Max Pooling Max Pooling ReLU3 ReLU4 Max Pooling FFCC67 44009966 ST Layer Output Image 224x224 CoKn=v 7P,Sar=a2m: CoKn=v 5P,Sar=a2m: Conv Param: Conv Param: CKo=n3v,S P=a1r,Pam=1: FC8 6 I Pooling Param: Pooling Param: K=3,S=1,P=1 K=3,S=1,P=1 Pooling Param: K=3,S=2 K=3,S=2 K=3,S=2 Figure4. Thearchitectureofspatialtransformernetwork. 3.2.1. LocalizationNetwork Thelocalizationnetworktakesfeaturemapwithwidthw,heighthandchannelcasinputU.Then, thetransformationparameterscouldbegeneratedbyO= f (U). Inthispaper,theeffectofvehicular loc imagingsystemisassumedasaffinetransformation. Therefore,outputoflocalizationnetworkOisa six-dimensionalvectorfromfully-connectlayer. Actually,therearefiveconvolutionlayersandthree fully-connectlayersinthelocalizationnetwork. 3.2.2. GridGenerator Toperformawarpingofinputfeaturemap,eachoutputpixeliscomputedbylocatingasampling kernelinaparticularareaoftheinputfeaturemap. Ingeneral, theoutputpixelsaredefinedona regulargridG =(xt,yt),whichformanoutputfeaturemapP. Therelationshipbetweeninputand i i i outputfeaturemapsisexpressedasbelow.     (cid:34) (cid:35) xt (cid:34) (cid:35) xt xs i θ θ θ i i =Γ  yt = 11 12 13  yt  (4) ys θ i  θ θ θ  i  i 1 21 22 23 1 Sensors2018,18,737 6of14 InEquation(4),Γ signifiesaffinetransformationmatrixforimagedistortion. θ 3.2.3. Sampler Inthesamplermodule,pixelsinP(cid:48) arecalculatedthroughregionalbilinearinterpolationonthe inputimageP. Withallpixelvaluesobtained,therectifiedimage Icouldbegeneratedas: I(cid:48) =V(P,I) (5) V represents the bilinear interpolation. Specifically, V is also a differentiable module for backpropagation. 3.3. FODClassificationwithConvolutionalNeuralNetwork InFODclassificationframework,eachextractedregionproposalisidentifiedasbackground,stone orscrew. Aconvolutionalneuralnetwork(CNN)classifierisadoptedforthisapplication,aspresented inFigure5. ThenetworkisbasedonthetypeCofvisualgeometrygroup(VGG)model[5],whichis one of the most famous CNN algorithm. There are thirteen convolution layers followed by three fully-connectlayersinVGGmodel. TheFODclassificationnetworkisfine-tunedfrompre-trained modelsofVGGonILSVRC2012dataset[25],withtheCaffetoolbox[42]inourexperiments. Conv3-128 Conv3-512 Conv3-512 Screw Conv3-64 Conv3-128 Input Image Conv3-64 Conv3-128 Conv3-128 Conv3-512 Conv3-512 FC6 4096 I Max Pooling Max Pooling MCaoxn Pv1o-o2l5in6g MCaoxn Pv1o-o5l1in2g MCaoxn Pv1o-o5l1in2g FFCC78 4 0 9 63 Stone 224x224 Pooling Param: Pooling Param: Softmax Pooling Param: Pooling Param: Pooling Param: K=2,S=2 K=2,S=2 K=2,S=2 K=2,S=2 K=2,S=2 Background Figure5. TheFODclassificationarchitecture. Actually,thelastlayerofVGGmodelisafully-connectlayerwith1000outputneurons. Tomatch ourapplicationwiththepre-trainedVGGmodel,thefirstfifteenlayersofVGGmodelaretakenas feature extractor. The last layer of VGG model is replaced by a classifier to predict the candidate rectangularregionsasbackground,screworstone. Therefore,onlythreeoutputneuronsarenecessary inthelastlayerofclassificationnetwork. 4. Experiments The airfield pavement images for experiment are sampled by a vehicular imaging system in Tianjin Binhai International Airport (ZBTJ). Actually, there are already some automatic vehicular imagingandprocessingsystems,suchasPCES,PAVUEandARAN.TheframeworkofvehicularFOD detectionsystemonairfieldpavementinthispaperisshowninFigure6. Therearemainlytwoparts: FODdetectingcarandofflinedatabaseinthisframework. Indetail,someimportantmodulesarelisted asbelow. DGPS:ThedifferentialGPS(DGPS)basestationandmobilestationcouldprovidethereal-time positionoftheFODdetectingcarandthedetectedFODs. Cameras: TherearefourGT2050Ccameraswith2048×2048resolutionscanningfor5minwidth atthesametime. Alarm: TheinformationofpredictedFODs,includingFODclassesandaccuratelocationsfrom theimageandDGPS,issentasalarmandsavedinofflinedatabase. FODManagementSystem: ThedetectionandcleaningofFODsaremanagedandupgradedby themanagementsysteminofflinedatabase. RemoteQuerySystem: Thereal-timestatusofFODsindatabasecouldberemotelyinquiredby others,suchasairfieldpavementcleaningsystemandplaneoperatingsystem. Sensors2018,18,737 7of14 Figure6. TheframeworkofvehicularFODdetectionsystemonairfieldpavement. Tomeetthe25fpsreal-timesamplingfrequency,thespeedofFODdetectioncarshouldbeless than31.25m/sandtheprocessingtimeforeachframeshouldbelessthan40ms. WithGTX1080ti gpu,theproposeddetectionalgorithmcouldachieve26fpswithhighaccuracy,whichisfasterthan theoriginalfasterR-CNN[28]with14fpsandslowerthanSSD[17]with31fps. ForFODsonpavement,lampcovers,markerlines,dilapidationsandtiremarksarethemainnoise andfalsealarms. ThismakesFODdetectionacomplexandchallengingtask. Actually,screwsand stonesaremajortargetstoberecognizedforairfieldpavementFODdetection. Thesetwokindsof FODsandbackgroundmakeupthefinalthreepredictclasses. 4.1. TheDatasetandTraining Theairfieldpavementimagedatasetcontains12,231imageswithshape2048×2048sampledby thevehicularimagingsysteminourdataset. Intheseimages,thereare3562screwsand4202stones. Thenumberofforeignobjectdebrisinoneimagevariesfromzerotofour. Theshapeofboundary boxforscrewsandstonesis80×80onaverage, inthedatasetgroundtruth. Regionproposalsby RPNareemployedtofine-tunetheFODdetectorwithSTN.TheregionproposalswithIoU≥0.7over ground-truthboxesaretreatedaspositive,whiletheregionproposalswithIoU≤0.3arelabeledas negative,asshowninFigure7. Thetop100scoreregionproposalsarechosenastrainingsamples. Thismakesthenumberofnegativesamplesmuchlargerthanthatofpositivesamplesinourapplication. Toobtainareasonablenumberofcandidates,16stonewindows,16screwwindowsand32background windowsareusedineachtrainingbatch. (a) (b) Figure7. (a)PositiveFODsampleswithIoU≥0.7;and(b)negativeFODsampleswithIoU≤0.3. Sensors2018,18,737 8of14 TotraintheFODDetector,60%ofimagesaretakenastrainingdataset,another20%aretakenas validatedatasetandtherestfortest. TheCaffeToolbox[42]arechosentotrainournetworks,with StochasticGradientDescent(SGD)[43]foroptimization. AsoneofthemostwidelyusedCNNmodels, VGGmodelforfine-tuningistrainedon1.2millionimagesfromILSVRC2012datasetwith1000classes. Beforefine-tuning,parametersofalllayersexceptFC8layerareinitializedwiththepre-trainedtypeC VGGmodel. Inthisway,thetrainingtimecouldbeapparentlyreducedandabetterfine-tunedmodel couldbegenerated. Thenumberoftrainingiterationsissetas50,000. Thelearningrateisinitiallyset as0.001anddecayingby0.1aftereach10,000iterations. 4.2. TheExperimentsofLocation Due to the sequential structure of the most target detection algorithms, location efficiency significantlyaffectsclassificationresults. Candidateswithmoreaccuratelocationscouldbemoreeasily classifiedastargetsorbackground,especiallyforCNN-basedclassifier. Actually,RPNisimprovedby addingpriorknowledgebasedselectrules,toobtainFODcandidateswithhighquality. Toevaluate generatedproposals,Figure8showsthecandidaterectanglesextractedbySelectiveSearch[40,41,44], originalRPNinfasterR-CNN[28]andimprovedRPNforimagesinourdataset. (a) (b) (c) (d) (e) (f) Figure8. (a,d)FODlocationresultsbySelectiveSearch;(b,e)resultsoforiginalRPN;and(c,f)results ofimprovedRPN.ThegreenboxesaregeneratedFODcandidatesbythesetwomethodsrespectively, whileredboxesaregroundtruths. The patches generated by Selective Search and original RPN vary in different sizes and may containmultipleobjects. ThisuncertaintyaffectstheaccuracyandrobustnessofFODdetectorand producesmorefalsepositiveresults. TheimprovedRPNgeneratesmuchlessregionproposalswith more accurate locations. To compare the quality and quantity of generated proposals, proposal numbersandrecallratesindifferentIoUlevelsareshowninTable1. Sensors2018,18,737 9of14 Table1.TherecallofSelectiveSearchandRPN. Recall Total Recall Average Methods IoU Num Num Rate Num Selective IoU>0.5 2108 2469 85.37% 800 Search IoU>0.6 1875 2469 75.94% 800 IoU>0.5 2263 2469 91.65% Top5 Region IoU>0.6 2253 2469 91.25% Top5 Proposal IoU>0.5 2399 2469 97.16% Top10 Network IoU>0.6 2394 2469 96.96% Top10 (RPN) IoU>0.5 2462 2469 99.72% Top20 IoU>0.6 2461 2469 99.60% Top20 In general, the number of candidate boxes generated by RPN is less than that generated by SelectiveSearch,buttherecallrateisabout10%higher. ItindicatesthatregionproposalsbyRPNare withhighqualityandeasytobeidentifiedbythefollowedFODclassifier. 4.3. TheExperimentsofClassification TheSTNisproposedtolearnandrectifyimagedistortion,suchasscale,rotationandwarping. Actually,CNNclassifierwithSTNcouldimproveclassificationaccuracy,especiallyonsmalldataset. TomeasuretheeffectofSTN,fourcomparativeexperimentsareconducted. Thesefourexperiments areFODdetectorwithoutfine-tune,STNbasedFODdetectorwithoutfine-tune,FODdetectorwith fine-tuneandSTNbasedFODdetectorwithfine-tune. TheseresultsarepresentedinTable2. Table2.Theresultsofclassification. FODDetector RecallRate FODclassification(nofine-tune) 94.52% STN+FODclassification(nofine-tune) 96.31% FODclassification+fine-tune 96.45% STN+FODclassification+fine-tune 97.67% As presented in Table 2, the recall rates of FOD classification are significantly increased by introducing STN. This is because STN decreases the effect of image distortion while sampling. ComparedtoImageNetorPascalVOC,thesamplenumberismuchsmallerinourdataset,whichthe variousimagedistortionsmaynotbefullyrepresentedbysamples. Withfine-tuningfromVGGmodel, STNbasedFODclassifiercouldachievehighaccuracyforclassification. 4.4. ComparisonwithOtherAlgorithms To test the performance of proposed method, three other object detection algorithms are introducedforcomparison. ThesealgorithmsarefasterR-CNN[28],SingleShotMultiBoxDetector (SSD)[17]andSelectiveSearch[44]withFODdetector. FasterR-CNNisthethirdgenerationofthe regionproposalbasedCNNalgorithms,inwhichRPNisproposedtoenhancethereal-timedetection performance. RPNisdesignedtogenerateregionproposalsthroughconvolutionfeaturemapsinstead oforiginalimage. Withthehelpofsharedfeaturesfromclassificationnetwork,RPNcouldachieve high recall rate with less running time. As region proposals by RPN have good robustness for varioustargetscales,fasterR-CNNareoneofthemostpopulardetectionCNNalgorithms. Inour experiment,theZeilerandFergusmodel[45]basedfasterR-CNNisdeployedforcomparison. SSD, asanend-to-endfullyconvolutionalnetwork,isbasedonregressionoftargetboundingbox. Usually, regression based detection algorithm, such as [16,17], has good real-time performance with lower Sensors2018,18,737 10of14 accuracy,comparingtoregionbasedCNNs. Toincreasetheaccuracyoflocationandclassification, featuresfrommulti-layersareintroducedtothelocatingandscoringmoduleinSSDframework.Asone ofthemostfamousregressionbaseddetectionCNNs,SSD512modelisdeployedforcomparisonwith theproposedalgorithm. ForcomparedfasterR-CNNandSSD,parametersaresetasthesametothe originalalgorithms[17,28],exceptforobjectclassnumbers. theMoreover,SelectiveSearchmethod withtheproposedSTNbasedFODdetectorisalsoconductedtoevaluatedetectionefficiency. Forallexperiments,ifthecorrectlyidentifiedcandidatehasagreatervaluethan0.5intersection over union (IoU) overlap with a ground-truth box, this will be recorded as a successful recall. All wrongly detected candidates, including wrongly sorted targets, are counted as false alarms. Indetail,falsealarmrate(FAR)andrecallrate(RR)are,respectively,definedas: TN FAR = , TP+TN (6) TP RR = . TP+FP InEquation (6),TPisthenumberoftruepositivesamples,TN isthenumberoftruenegative samples, FPrepresentsthenumberoffalsepositivesamples,and FN signifiesthenumberoffalse negativesamples. FARs of four algorithms are shown in Table 3, while RRs for screws and stones are shown in Table4. ItindicatesthattheproposedalgorithmcouldpredictFODswiththeleastfalsealarmsandthe highestrecallrates. FasterR-CNNusesRPNascandidatedetectionnetwork,whichcouldachievehigh recallratesforallclassesregardlessofFODscales. However,morebackgroundsarealsointroducedas regionproposals. Thisisbecausethesamplenumberofdatasetisnotlargeenoughtotrainthemodel withgoodcandidateselectingcapability. AlthoughSSDgreatlyenhancesthereal-timeperformance, detectionforsmallobjectsisnoteffective. Actually,theaveragelengthratiobetweenFODsandwhole imagesisabout0.04,whichisusuallyconsideredassmalltargetdetectionproblem[46–49]. Itmeans thatconvolutionfeaturescouldpossiblybelostthroughpoolingoperationwithnofully-connectlayers. ThisleadstohighFARandlowPRofSSDalgorithmforsmallobjects,suchasFODs. SelectiveSearch algorithm generates more FOD candidates as shown in Figure 8, which lead to high FAR usually. However,SelectiveSearchwithFODdetectorachieveslowFARduetotheproposedhighefficiency classifier. Inotherwords,theproposedSTNbasedclassifiercorrectlyidentifiedmostofthecandidates bySelectiveSearch. Moreover,RRsforbothclassesarelimitedbythelowrecallrateofSelectiveSearch, asshowninTable1inSection4.2. Table3.ThedetectionevaluationsbyFAR. Methods FAR fasterR-CNN 11.02% SSD 8.19% SelectiveSearch+FODDetector 1.21% RPN+FODDetector 0.66% Table4.Therecallratesofscrewandstone. Methods ScrewRR StoneRR fasterR-CNN 83.51% 93.84% SSD 87.72% 88.63% SelectiveSearch+FODDetector 80.63% 81.46% RPN+FODDetector 96.90% 96.40%

Description:
detection, semantic segmentation and key point detection are solved by one CNN algorithm [38]. Non-region proposal based algorithm, such as YOLO [16] and SSD [17], usually has better real-time performance with lower detection accuracy. Object detection is described as a regression problem in.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.