sensors Article Region Based CNN for Foreign Object Debris Detection on Airfield Pavement XiaoguangCao1,†,PengWang1,†,CaiMeng1,*,XiangzhiBai1,2,* ID,GuopingGong1, MiaomingLiu1andJunQi1 1 ImageProcessingCenter,BeijingUniversityofAeronauticsandAstronautics,Beijing100191,China; [email protected](X.C.);[email protected](P.W.);[email protected](G.G.); [email protected](M.L.);[email protected](J.Q.) 2 StateKeyLaboratoryofVirtualRealityTechnologyandSystems, BeijingUniversityofAeronauticsandAstronautics,Beijing100191,China * Correspondence:[email protected](C.M.);[email protected](X.B.); Tel.:+86-010-8233-8578(C.M.);+86-010-8231-6544(X.B.) † Thesetwoauthorscontributedequallytothiswork. Received:10November2017;Accepted:29January2018;Published:1March2018 Abstract: Inthispaper,anovelalgorithmbasedonconvolutionalneuralnetwork(CNN)isproposed todetectforeignobjectdebris(FOD)basedonopticalimagingsensors. Itcontainstwomodules, theimprovedregionproposalnetwork(RPN)andspatialtransformernetwork(STN)basedCNN classifier. IntheimprovedRPN,someextraselectrulesaredesignedanddeployedtogeneratehigh qualitycandidateswithfewernumbers. Moreover,theefficiencyofCNNdetectorissignificantly improvedbyintroducingSTNlayer. ComparedtofasterR-CNNandsingleshotmultiBoxdetector (SSD), the proposed algorithm achieves better result for FOD detection on airfield pavement in theexperiment. Keywords: foreign object debris; object detection; convolutional neural network; vehicular imagingsensors 1. Introduction AfteraplanecrashedinParisdeGaulle,thedetectionofforeignobjectdebris(FOD)onairfield pavement has become more and more important. Actually, FOD detecting is one of the crucial technologiesforintelligentvehicularsystems. Thesafetyandconvenienceofmoderntransportation couldbesignificantlyimprovedbyadoptinganefficientFODdetectingsystem. However,complex airfieldenvironmentmakesthedetectionofFODachallengingtask.Duetothevariationofbackground andinfluenceofautomotiveimagingsystem,FODscouldbehardlydetectedandrecognizedonairfield pavementbytraditionalfeatures,suchasscaleinvariantfeaturetransform(SIFT)[1],histogramsof orientedgradients(HOG)[2]andlocalbinarypatterns(LBP)[3]. Deep learning is one of the rapidly developing technologies in the area of big data. Since [4] significantly improved the effect of image classification, convolutional neural network (CNN) has been widely introduced into computer vision applications, such as image classification [5–8], face verification [9–11], semantic segmentation [12–15], object detection [16–18] and image annotation [19–21]. Moreover, some CNN based algorithms are proposed to solve transportation problems[22–24]. BasedonvariouspublicavailabledatasetssuchasImageNet[25],PascalVOC[26] andCOCO[27],CNNalgorithmshavebeenprovedtoperformbetterindetectionandrecognition than traditional feature methods. Compared with these manually designed features, CNN based featureshavebetterresolutionandrobustnessforFODdetection. Actually,theFODproblemconsists oftwotasks: targetlocationandobjectclassificationonpavement. Aimedatthesetwotasks,anovel Sensors2018,18,737;doi:10.3390/s18030737 www.mdpi.com/journal/sensors Sensors2018,18,737 2of14 two-stage framework is designed and introduced in this paper. In the first stage, region proposal network (RPN) [28], as a kind of fully convolutional network (FCN) [12], is trained end-to-end to generateFODlocationproposals. Inthesecondstage,CNNclassifierbasedonspatialtransformer network (STN) [29] is applied to obtain the parameters of scale, rotation and warping. Due to the performance of STN, FODs could be correctly identified by generated features, regardless of imagedistortion. Apreliminaryversionofthispaperwaspresentedin[30]. Thehighlightsandextensionsinthis papercouldbesummarizedasfollows. 1. AnewFODdetectionframeworkbasedonCNNmodelsforFODdetectionisproposedwith improvedregionproposalnetworkandspatialtransformernetwork. 2. RPN is firstly introduced and improved to generate high quality region proposals for FOD detectiononairfieldpavement. Inaddition,somecandidateselectrulesaredesignedtoreduce quantityandimprovequalityofregionproposals. 3. The STN based CNN classifier is proved to be effective for FOD classification. Moreover, the proposed framework achieves better result than other detection algorithms, such as faster-RCNN[28]andSSD[17],forFODdetectiononairfieldpavement. 4. Thevehicularimagingsystem,includingDGPS,cameras,alarm,FODmanagementsystemand remotequerysystem,ispresentedanddiscussedindetail. Therestofthispapercontainsfourmain sections. InSection2,somerecentworksareanalyzed, includingtraditionalfeaturebasedalgorithmsandsomeCNNalgorithms. InSection3,theoverall detectionframeworkisgivenindetails,includinglocationbasedonRPNandclassificationbasedon STN.InSection4,comparisonwithotheralgorithmsisconductedanddiscussed. Conclusionsare givenanddiscussedinSection5. 2. RelatedWork To solve FOD detection problem, some effective algorithms are proposed recently [31–36]. Thealgorithmsbasedondifferentsensors,suchasactivelyscanningLiDARsystem[31],mm-wave FMCWradar[34]andwideband96GHzMillimeter-WaveRadar[35],couldachievegoodresultsin differentenvironments. Acosecansquaredbeampatterninelevationandapencil-beampatternin azimuth,generatedthroughfoldedreflectarrayantenna(FRA)byphaseonlycontrol,isanalyzedto detectobjectsonground[36]. Amulti-sensorsystem,basedoninherentfeatureofFOD,isintroduced todetectandrecognizeFOD[32]. Basedonalargeamountofpriorknowledgeandmanuallydesigned feature extractor, these methods could transform pixel values of an image into a suitable internal representation.Thus,thesemethodsareeffectivefordetectionofFODswithlessnoise,butnotworking forFODswithcomplexbackgroundandnoise. Actually, CNN based detection has becoming more and more popular. There are two basic researchmethods: regionproposalbasedandnon-regionproposalbasedalgorithms. Regionproposal basedalgorithm,suchasfasterR-CNN[28],R-FCN[37]andMaskR-CNN[38],consistsofaregion ofinterest(ROI)generatorandanobjectclassifier. Regionproposalnetwork(RPN)isdesignedin fasterR-CNN[28],byintroducingtheclassificationfeaturesintoROIextractionmodule. Inthisway, better region proposals could be generated in less time. Moreover, this leads to high recognition accuracyandgoodreal-timeperformance. SemanticsegmentationmethodisintroducedinR-FCN[37], whichcouldcalculateclassificationpossibilityofallregionproposalsinoneimageatthesametime. This enlightening strategy greatly reduces the running time of ROI classification. Finally, object detection, semantic segmentation and key point detection are solved by one CNN algorithm [38]. Non-regionproposalbasedalgorithm,suchasYOLO[16]andSSD[17],usuallyhasbetterreal-time performancewithlowerdetectionaccuracy. Objectdetectionisdescribedasaregressionproblemin YOLO[16],withanend-to-endnetwork. YOLOv2andYOLO9000[39]aretwoimprovedversions withfasterspeedandhigheraccuracy. However, YOLOcouldhardlydetectsmallobjectssuchas Sensors2018,18,737 3of14 FODsonpavement. SimilartoYOLO,SSD[17]isalsobasedonregressionoftargetboundingbox. Toachievehigheraccuracy,featuremapsfrommulti-layersareintroducedtothelocatingandscoring module. Thisstrategyhasbeenprovedtoachievebetterperformance. TodetectFODsonairportpavement,improvedRPNandSTNbasedCNNclassifieraredesigned asgiveninSection3. 3. Algorithm TheFODdetectionframeworkisshowninFigure1,whichcontainstwostages. Firstly,region proposalnetwork(RPN)[28]isadoptedtogenerateasetoforiginalobjectproposals,whicharetaken asFODcandidates. Moreover,someselectrulesaredesignedtoreducecomputationalcostandfix imagesize. Secondly,spatialtransformernetwork(STN)[29]isintroducedtoadjustthetargetsin regionproposalsfromtheinfluenceofscale,rotationandwarping. Then,theseadjustedproposal imagesarefedtoCNNclassifiertoextractfeaturesandidentifyFODs. Figure1. TheframeworkoftheFODdetectioninthispaper. 3.1. LocateFODCandidateswithImprovedRPN Generally,slidingwindowonimagesisawidelyusedstrategyfortargetdetection.Withwell-designed features, such as SIFT [1] and HoG [2], the window with target inside could be identified from background. Inthisway,thetargetdetectionproblemcouldbedescribedasaclassificationproblem in sliding windows. However, sliding window is a low efficiency method, which costs much computing time and memory. Recently, a series of region-based convolutional neural networks (R-CNN)[28,37,38,40,41]isproposedtodetecttargetsusingdeeplearningmethod. Oneofthemost meaningfulideasfromR-CNNalgorithmsistheregionproposalstrategytolocatetargetcandidates frombackground. Ontheonehand,regionproposalalgorithmismuchfasterthanslidingwindowon wholeimagebecauseregionproposalalgorithmreducesthenumberofcandidatesfrommillionsto hundredsorthousands. Ontheotherhand,regionproposalalgorithmhashigherrecallrateforfinding all objects in an image, which could improve object location efficiency significantly. In this paper, somepriorknowledgeisintroducedinregionproposalstoreducethenumberofFODcandidatesfor betterlocation. The algorithm to generate region proposals is based on region proposal network (RPN) [28]. Actually,RPNtakesanimageasinputandgeneratesaseriesofrectangularobjectcandidateswith correspondingobjectnessscores. TogenerateregionproposalsasshowninFigure2,asmallnetwork isslidovertheconvolutionfeaturemap. Ineachslidingwindow, kreferenceboxes(anchors) [28] isdefinedindifferentscalesandaspectratiostopredictvariousregionproposals. Inotherwords, RPNisdesignedwiththehelpofconvolutionfeaturesfromdetectionmodule. Theintersectionover union(IoU)betweenregionproposalsandground-truthboundingbox,ischosenasevaluateindexof theproposedFODdetectionframework. ForFODdetectioninourdataset,targetsinaverageshape Sensors2018,18,737 4of14 80×80locatein2048×2048inputimage. TomatchthescaleofFODs,theanchorboxesaresetwith areaof1002andaspectratiosof1:1,1:2and2:1. Figure2. TheRPNframeworkforFODlocation. Although there are much fewer FOD candidates generated by RPN than by common sliding windowparadigm,thecandidatenumberisstillverylargeforclassification. TofurtherreduceFOD candidates, anovelseriesofselectrulesisdesignedandintroducedinRPNframework. Actually, candidateFODboundingboxesbyoriginalRPNvaryindifferentsizes. Forallregionproposals,FODs suchasscrewsandstoneshavedifferentshapescalesfromfalsealarms. Therefore,threeselectrules basedonpriorknowledgeareproposedandlistedasbelow. Rule1: RegionproposalsbyRPNarefilteredbyhighaspectratioRatio,whichisexpressedas max(w,h) Ratio = , min(w,h) (1) Ratio < T . ratio w and h are the width and height of the image, respectively, and T is a threshold with a ratio constantvalue. Rule2:TheareasofproposalscontainingFODsshouldbeinrange[T ,T ],whichisexpressedas min max Area=w×h, (2) T ≤ Area≤T . min max Areaistheareaofregionproposal. Rule3:TheproposalswithhigherobjectnessscoresO generatedbyRPNarepickedout,whichis l expressedas O >T . (3) l objectness T isaconstantthresholdforjudgingobjectness. objectness Sensors2018,18,737 5of14 ThethresholdsforthesethreerulesaresetasT =1.5,T =602,T =1002andT =0.8. ratio min max objectness Bydeployingtheserules,thenumberofregionproposalsbyimprovedRPNcouldbegreatlyreduced byabout60%. Inthisway,moreaccuratecandidateproposalswithfewernumbercouldbegenerated. Moreover,theefficiencyofFODclassificationcouldbegreatlyimproved. 3.2. SpatialTransformerNetwork STN[29]isaneffectiveCNNframeworktolearnscale,rotationandwarpingofimages. Witha predicted affine transformation by STN, an input image I(x,y) could be adjusted to a rectified image I(cid:48)(x,y). As shown in Figure 3, affine matrix is firstly regressed via a localization network. Then,predictedtransformationparametersareusedtogenerateasamplinggrid. Thesamplinggridis asetofpointsfrominputmapforaffinetransformation. Finally,featuremapandthesamplinggrid areutilizedasinputstobesampled. ThearchitectureofSTNisshowninFigure4. DetailsofSTN algorithmsaregivenbelow. Localization Grid Network Generator Screw CNN Stone Classification Bilinear Interpolation V Background Input Image I Rectified Image I Figure3. TheFODclassificationframework. Conv1 Conv2 Conv5 Input Image ReLU1 ReLU3 Conv3 Conv4 ReLU5 I Max Pooling Max Pooling ReLU3 ReLU4 Max Pooling FFCC67 44009966 ST Layer Output Image 224x224 CoKn=v 7P,Sar=a2m: CoKn=v 5P,Sar=a2m: Conv Param: Conv Param: CKo=n3v,S P=a1r,Pam=1: FC8 6 I Pooling Param: Pooling Param: K=3,S=1,P=1 K=3,S=1,P=1 Pooling Param: K=3,S=2 K=3,S=2 K=3,S=2 Figure4. Thearchitectureofspatialtransformernetwork. 3.2.1. LocalizationNetwork Thelocalizationnetworktakesfeaturemapwithwidthw,heighthandchannelcasinputU.Then, thetransformationparameterscouldbegeneratedbyO= f (U). Inthispaper,theeffectofvehicular loc imagingsystemisassumedasaffinetransformation. Therefore,outputoflocalizationnetworkOisa six-dimensionalvectorfromfully-connectlayer. Actually,therearefiveconvolutionlayersandthree fully-connectlayersinthelocalizationnetwork. 3.2.2. GridGenerator Toperformawarpingofinputfeaturemap,eachoutputpixeliscomputedbylocatingasampling kernelinaparticularareaoftheinputfeaturemap. Ingeneral, theoutputpixelsaredefinedona regulargridG =(xt,yt),whichformanoutputfeaturemapP. Therelationshipbetweeninputand i i i outputfeaturemapsisexpressedasbelow. (cid:34) (cid:35) xt (cid:34) (cid:35) xt xs i θ θ θ i i =Γ yt = 11 12 13 yt (4) ys θ i θ θ θ i i 1 21 22 23 1 Sensors2018,18,737 6of14 InEquation(4),Γ signifiesaffinetransformationmatrixforimagedistortion. θ 3.2.3. Sampler Inthesamplermodule,pixelsinP(cid:48) arecalculatedthroughregionalbilinearinterpolationonthe inputimageP. Withallpixelvaluesobtained,therectifiedimage Icouldbegeneratedas: I(cid:48) =V(P,I) (5) V represents the bilinear interpolation. Specifically, V is also a differentiable module for backpropagation. 3.3. FODClassificationwithConvolutionalNeuralNetwork InFODclassificationframework,eachextractedregionproposalisidentifiedasbackground,stone orscrew. Aconvolutionalneuralnetwork(CNN)classifierisadoptedforthisapplication,aspresented inFigure5. ThenetworkisbasedonthetypeCofvisualgeometrygroup(VGG)model[5],whichis one of the most famous CNN algorithm. There are thirteen convolution layers followed by three fully-connectlayersinVGGmodel. TheFODclassificationnetworkisfine-tunedfrompre-trained modelsofVGGonILSVRC2012dataset[25],withtheCaffetoolbox[42]inourexperiments. Conv3-128 Conv3-512 Conv3-512 Screw Conv3-64 Conv3-128 Input Image Conv3-64 Conv3-128 Conv3-128 Conv3-512 Conv3-512 FC6 4096 I Max Pooling Max Pooling MCaoxn Pv1o-o2l5in6g MCaoxn Pv1o-o5l1in2g MCaoxn Pv1o-o5l1in2g FFCC78 4 0 9 63 Stone 224x224 Pooling Param: Pooling Param: Softmax Pooling Param: Pooling Param: Pooling Param: K=2,S=2 K=2,S=2 K=2,S=2 K=2,S=2 K=2,S=2 Background Figure5. TheFODclassificationarchitecture. Actually,thelastlayerofVGGmodelisafully-connectlayerwith1000outputneurons. Tomatch ourapplicationwiththepre-trainedVGGmodel,thefirstfifteenlayersofVGGmodelaretakenas feature extractor. The last layer of VGG model is replaced by a classifier to predict the candidate rectangularregionsasbackground,screworstone. Therefore,onlythreeoutputneuronsarenecessary inthelastlayerofclassificationnetwork. 4. Experiments The airfield pavement images for experiment are sampled by a vehicular imaging system in Tianjin Binhai International Airport (ZBTJ). Actually, there are already some automatic vehicular imagingandprocessingsystems,suchasPCES,PAVUEandARAN.TheframeworkofvehicularFOD detectionsystemonairfieldpavementinthispaperisshowninFigure6. Therearemainlytwoparts: FODdetectingcarandofflinedatabaseinthisframework. Indetail,someimportantmodulesarelisted asbelow. DGPS:ThedifferentialGPS(DGPS)basestationandmobilestationcouldprovidethereal-time positionoftheFODdetectingcarandthedetectedFODs. Cameras: TherearefourGT2050Ccameraswith2048×2048resolutionscanningfor5minwidth atthesametime. Alarm: TheinformationofpredictedFODs,includingFODclassesandaccuratelocationsfrom theimageandDGPS,issentasalarmandsavedinofflinedatabase. FODManagementSystem: ThedetectionandcleaningofFODsaremanagedandupgradedby themanagementsysteminofflinedatabase. RemoteQuerySystem: Thereal-timestatusofFODsindatabasecouldberemotelyinquiredby others,suchasairfieldpavementcleaningsystemandplaneoperatingsystem. Sensors2018,18,737 7of14 Figure6. TheframeworkofvehicularFODdetectionsystemonairfieldpavement. Tomeetthe25fpsreal-timesamplingfrequency,thespeedofFODdetectioncarshouldbeless than31.25m/sandtheprocessingtimeforeachframeshouldbelessthan40ms. WithGTX1080ti gpu,theproposeddetectionalgorithmcouldachieve26fpswithhighaccuracy,whichisfasterthan theoriginalfasterR-CNN[28]with14fpsandslowerthanSSD[17]with31fps. ForFODsonpavement,lampcovers,markerlines,dilapidationsandtiremarksarethemainnoise andfalsealarms. ThismakesFODdetectionacomplexandchallengingtask. Actually,screwsand stonesaremajortargetstoberecognizedforairfieldpavementFODdetection. Thesetwokindsof FODsandbackgroundmakeupthefinalthreepredictclasses. 4.1. TheDatasetandTraining Theairfieldpavementimagedatasetcontains12,231imageswithshape2048×2048sampledby thevehicularimagingsysteminourdataset. Intheseimages,thereare3562screwsand4202stones. Thenumberofforeignobjectdebrisinoneimagevariesfromzerotofour. Theshapeofboundary boxforscrewsandstonesis80×80onaverage, inthedatasetgroundtruth. Regionproposalsby RPNareemployedtofine-tunetheFODdetectorwithSTN.TheregionproposalswithIoU≥0.7over ground-truthboxesaretreatedaspositive,whiletheregionproposalswithIoU≤0.3arelabeledas negative,asshowninFigure7. Thetop100scoreregionproposalsarechosenastrainingsamples. Thismakesthenumberofnegativesamplesmuchlargerthanthatofpositivesamplesinourapplication. Toobtainareasonablenumberofcandidates,16stonewindows,16screwwindowsand32background windowsareusedineachtrainingbatch. (a) (b) Figure7. (a)PositiveFODsampleswithIoU≥0.7;and(b)negativeFODsampleswithIoU≤0.3. Sensors2018,18,737 8of14 TotraintheFODDetector,60%ofimagesaretakenastrainingdataset,another20%aretakenas validatedatasetandtherestfortest. TheCaffeToolbox[42]arechosentotrainournetworks,with StochasticGradientDescent(SGD)[43]foroptimization. AsoneofthemostwidelyusedCNNmodels, VGGmodelforfine-tuningistrainedon1.2millionimagesfromILSVRC2012datasetwith1000classes. Beforefine-tuning,parametersofalllayersexceptFC8layerareinitializedwiththepre-trainedtypeC VGGmodel. Inthisway,thetrainingtimecouldbeapparentlyreducedandabetterfine-tunedmodel couldbegenerated. Thenumberoftrainingiterationsissetas50,000. Thelearningrateisinitiallyset as0.001anddecayingby0.1aftereach10,000iterations. 4.2. TheExperimentsofLocation Due to the sequential structure of the most target detection algorithms, location efficiency significantlyaffectsclassificationresults. Candidateswithmoreaccuratelocationscouldbemoreeasily classifiedastargetsorbackground,especiallyforCNN-basedclassifier. Actually,RPNisimprovedby addingpriorknowledgebasedselectrules,toobtainFODcandidateswithhighquality. Toevaluate generatedproposals,Figure8showsthecandidaterectanglesextractedbySelectiveSearch[40,41,44], originalRPNinfasterR-CNN[28]andimprovedRPNforimagesinourdataset. (a) (b) (c) (d) (e) (f) Figure8. (a,d)FODlocationresultsbySelectiveSearch;(b,e)resultsoforiginalRPN;and(c,f)results ofimprovedRPN.ThegreenboxesaregeneratedFODcandidatesbythesetwomethodsrespectively, whileredboxesaregroundtruths. The patches generated by Selective Search and original RPN vary in different sizes and may containmultipleobjects. ThisuncertaintyaffectstheaccuracyandrobustnessofFODdetectorand producesmorefalsepositiveresults. TheimprovedRPNgeneratesmuchlessregionproposalswith more accurate locations. To compare the quality and quantity of generated proposals, proposal numbersandrecallratesindifferentIoUlevelsareshowninTable1. Sensors2018,18,737 9of14 Table1.TherecallofSelectiveSearchandRPN. Recall Total Recall Average Methods IoU Num Num Rate Num Selective IoU>0.5 2108 2469 85.37% 800 Search IoU>0.6 1875 2469 75.94% 800 IoU>0.5 2263 2469 91.65% Top5 Region IoU>0.6 2253 2469 91.25% Top5 Proposal IoU>0.5 2399 2469 97.16% Top10 Network IoU>0.6 2394 2469 96.96% Top10 (RPN) IoU>0.5 2462 2469 99.72% Top20 IoU>0.6 2461 2469 99.60% Top20 In general, the number of candidate boxes generated by RPN is less than that generated by SelectiveSearch,buttherecallrateisabout10%higher. ItindicatesthatregionproposalsbyRPNare withhighqualityandeasytobeidentifiedbythefollowedFODclassifier. 4.3. TheExperimentsofClassification TheSTNisproposedtolearnandrectifyimagedistortion,suchasscale,rotationandwarping. Actually,CNNclassifierwithSTNcouldimproveclassificationaccuracy,especiallyonsmalldataset. TomeasuretheeffectofSTN,fourcomparativeexperimentsareconducted. Thesefourexperiments areFODdetectorwithoutfine-tune,STNbasedFODdetectorwithoutfine-tune,FODdetectorwith fine-tuneandSTNbasedFODdetectorwithfine-tune. TheseresultsarepresentedinTable2. Table2.Theresultsofclassification. FODDetector RecallRate FODclassification(nofine-tune) 94.52% STN+FODclassification(nofine-tune) 96.31% FODclassification+fine-tune 96.45% STN+FODclassification+fine-tune 97.67% As presented in Table 2, the recall rates of FOD classification are significantly increased by introducing STN. This is because STN decreases the effect of image distortion while sampling. ComparedtoImageNetorPascalVOC,thesamplenumberismuchsmallerinourdataset,whichthe variousimagedistortionsmaynotbefullyrepresentedbysamples. Withfine-tuningfromVGGmodel, STNbasedFODclassifiercouldachievehighaccuracyforclassification. 4.4. ComparisonwithOtherAlgorithms To test the performance of proposed method, three other object detection algorithms are introducedforcomparison. ThesealgorithmsarefasterR-CNN[28],SingleShotMultiBoxDetector (SSD)[17]andSelectiveSearch[44]withFODdetector. FasterR-CNNisthethirdgenerationofthe regionproposalbasedCNNalgorithms,inwhichRPNisproposedtoenhancethereal-timedetection performance. RPNisdesignedtogenerateregionproposalsthroughconvolutionfeaturemapsinstead oforiginalimage. Withthehelpofsharedfeaturesfromclassificationnetwork,RPNcouldachieve high recall rate with less running time. As region proposals by RPN have good robustness for varioustargetscales,fasterR-CNNareoneofthemostpopulardetectionCNNalgorithms. Inour experiment,theZeilerandFergusmodel[45]basedfasterR-CNNisdeployedforcomparison. SSD, asanend-to-endfullyconvolutionalnetwork,isbasedonregressionoftargetboundingbox. Usually, regression based detection algorithm, such as [16,17], has good real-time performance with lower Sensors2018,18,737 10of14 accuracy,comparingtoregionbasedCNNs. Toincreasetheaccuracyoflocationandclassification, featuresfrommulti-layersareintroducedtothelocatingandscoringmoduleinSSDframework.Asone ofthemostfamousregressionbaseddetectionCNNs,SSD512modelisdeployedforcomparisonwith theproposedalgorithm. ForcomparedfasterR-CNNandSSD,parametersaresetasthesametothe originalalgorithms[17,28],exceptforobjectclassnumbers. theMoreover,SelectiveSearchmethod withtheproposedSTNbasedFODdetectorisalsoconductedtoevaluatedetectionefficiency. Forallexperiments,ifthecorrectlyidentifiedcandidatehasagreatervaluethan0.5intersection over union (IoU) overlap with a ground-truth box, this will be recorded as a successful recall. All wrongly detected candidates, including wrongly sorted targets, are counted as false alarms. Indetail,falsealarmrate(FAR)andrecallrate(RR)are,respectively,definedas: TN FAR = , TP+TN (6) TP RR = . TP+FP InEquation (6),TPisthenumberoftruepositivesamples,TN isthenumberoftruenegative samples, FPrepresentsthenumberoffalsepositivesamples,and FN signifiesthenumberoffalse negativesamples. FARs of four algorithms are shown in Table 3, while RRs for screws and stones are shown in Table4. ItindicatesthattheproposedalgorithmcouldpredictFODswiththeleastfalsealarmsandthe highestrecallrates. FasterR-CNNusesRPNascandidatedetectionnetwork,whichcouldachievehigh recallratesforallclassesregardlessofFODscales. However,morebackgroundsarealsointroducedas regionproposals. Thisisbecausethesamplenumberofdatasetisnotlargeenoughtotrainthemodel withgoodcandidateselectingcapability. AlthoughSSDgreatlyenhancesthereal-timeperformance, detectionforsmallobjectsisnoteffective. Actually,theaveragelengthratiobetweenFODsandwhole imagesisabout0.04,whichisusuallyconsideredassmalltargetdetectionproblem[46–49]. Itmeans thatconvolutionfeaturescouldpossiblybelostthroughpoolingoperationwithnofully-connectlayers. ThisleadstohighFARandlowPRofSSDalgorithmforsmallobjects,suchasFODs. SelectiveSearch algorithm generates more FOD candidates as shown in Figure 8, which lead to high FAR usually. However,SelectiveSearchwithFODdetectorachieveslowFARduetotheproposedhighefficiency classifier. Inotherwords,theproposedSTNbasedclassifiercorrectlyidentifiedmostofthecandidates bySelectiveSearch. Moreover,RRsforbothclassesarelimitedbythelowrecallrateofSelectiveSearch, asshowninTable1inSection4.2. Table3.ThedetectionevaluationsbyFAR. Methods FAR fasterR-CNN 11.02% SSD 8.19% SelectiveSearch+FODDetector 1.21% RPN+FODDetector 0.66% Table4.Therecallratesofscrewandstone. Methods ScrewRR StoneRR fasterR-CNN 83.51% 93.84% SSD 87.72% 88.63% SelectiveSearch+FODDetector 80.63% 81.46% RPN+FODDetector 96.90% 96.40%
Description: