ebook img

Salient Object Detection: A Benchmark PDF

12.7 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Salient Object Detection: A Benchmark

IEEETRANSACTIONSONIMAGEPROCESSING,VOL.XXX,NO.XXX,XXXXX2014 1 Salient Object Detection: A Benchmark Ali Borji, Ming–Ming Cheng, Huaizu Jiang and Jia Li Abstract—We extensively compare, qualitatively and quan- models [17]–[19], which were often driven by computer titatively, 40 state-of-the-art models (28 salient object de- vision applications such as content-aware image resizing tection, 10 fixation prediction, 1 objectness, and 1 baseline) and photo visualization [20], attempted to identify salient over 6 challenging datasets for the purpose of benchmarking regions/objects and used explicit saliency judgments for salient object detection and segmentation methods. From the resultsobtainedsofar,ourevaluationshowsaconsistentrapid evaluation[21].Althoughbothtypesofsaliencymodelsare progress over the last few years in terms of both accuracy expected to be applicable interchangeably, their generated and running time. The top contenders in this benchmark saliency maps actually demonstrate remarkably different 5 significantly outperform the models identified as the best in characteristics due to the distinct purposes in saliency 1 the previous benchmark conducted just two years ago. We 0 find that the models designed specifically for salient object detection. For example, fixation prediction models usually 2 detectiongenerallyworkbetterthanmodelsincloselyrelated pop-outsparseblob-likesalientregions,whilesalientobject areas,whichinturnprovidesaprecisedefinitionandsuggests detection models often generate smooth connected areas. n a anappropriatetreatmentofthisproblemthatdistinguishesit On the one hand, detecting large salient areas often causes J from other problems. In particular, we analyze the influences severe false positives for fixation prediction. On the other of center bias and scene complexity in model performance, 5 hand, popping-out only sparse salient regions causes mas- which, along with the hard cases for state-of-the-art models, provide useful hints towards constructing more challenging sive misses in detecting salient regions and objects. ] V large scale datasets and better saliency models. Finally, we To separate these two types of saliency models, in this proposeprobablesolutionsfortacklingseveralopenproblems study we provide a precise definition and suggest an ap- C suchasevaluationscoresanddatasetbias,whichalsosuggest propriate treatment of salient object detection. Generally, a . future research directions in the rapidly-growing field of s salientobjectdetectionmodelshould,firstdetectthesalient c salient object detection. attention-grabbing objects in a scene, and second, segment [ Index Terms—Salient object detection, saliency, explicit the entire objects. Usually, the output of the model is a 1 saliency, visual attention, regions of interest, objectness, seg- saliency map where the intensity of each pixel represents v mentation, interestingness, importance, eye movements its probability of belonging to salient objects. From this 1 4 I. INTRODUCTION definition, we can see that this problem in its essence 7 is a figure/ground segmentation problem, and the goal is 2 VISUALattention,theastonishingcapabilityofhuman to only segment the salient foreground object from the 0 visual system to selectively process only the salient background.Notethatitslightlydiffersfromthetraditional . visual stimuli in details, has been investigated by multiple 1 imagesegmentationproblemthataimstopartitionanimage 0 disciplines such as cognitive psychology, neuroscience, into perceptually coherent regions. 5 and computer vision [2]–[5]. Following cognitive theories Thevalueofsalientobjectdetectionmodelsliesontheir 1 (e.g., feature integration theory (FIT) [6], guided search applicationsinmanyareassuchascomputervision,graph- : v model [7], [8]) and early attention models (e.g., Koch and ics,androbotics.Forinstance,thesemodelshavebeensuc- i Ullman [9] and Itti et al. [10]), hundreds of computational X cessfullyappliedinmanyapplicationssuchasobjectdetec- saliencymodelshavebeenproposedtodetectsalientvisual tion and recognition [22]–[29], image and video compres- r a subsets from images and videos. sion [30], [31], video summarization [32]–[34], photo col- Despite the psychological and neurobiological defini- lage/media re-targeting/cropping/thumb-nailing [20], [35], tions, the concept of visual saliency is becoming vague in [36], image quality assessment [37]–[39], image segmen- the field of computer vision. Some visual saliency models tation [40]–[43], content-based image retrieval and image (e.g., [3], [10]–[16]) aimed to predict human fixations as a collection browsing [44]–[47], image editing and manipu- waytotesttheiraccuracyinsaliencydetection,whileother lating [48]–[51], visual tracking [52]–[58], object discov- ery [59], [60], and human-robot interaction [61], [62]. The A. Borji is with the Computer Science Department, University of Wisconsin,Milwaukee,WI53211.E-mail:[email protected] field of salient object detection develops very fast. Many M.MChengiswiththeDepartmentofEngineeringScience,University new models and benchmark datasets have been proposed ofOxford,ParksRoad,OxfordOX13PJ.E-mail:[email protected] since our earlier benchmark conducted two years ago [1]. H. Jiang is with the Institute of Artificial Intelligence and Robotics, Xi’anJiaotongUniversity,China.E-mail:[email protected] Yet, it is unclear how the new algorithms fare against J. Li is with State Key Laboratory of Virtual Reality Technology previous models and new datasets. Are there any real and Systems, School of Computer Science and Engineering, Beihang improvements in this field or we are just fitting models to University. He is also with the International Research Institute for Mul- tidisciplinaryScience(IRIMS)atBeihangUniversity,Beijing,China.E- datasets?Itisalsointerestingtotesttheperformanceofold mail:[email protected] high-performingmodelsonthenewbenchmarkdatasets.A AnearlierversionofthisworkhasbeenpublishedinECCV2012[1]. recent exhaustive review of salient object detection models Firsttwoauthorscontributedequally. Manuscriptreceivedxx2014. can be found in [28]. IEEETRANSACTIONSONIMAGEPROCESSING,VOL.XXX,NO.XXX,XXXXX2014 2 # Model Pub Year Code Time(s) Cat. In this study, we compare and analyze models from 1 LC[63] MM 2006 C .009 three categories: 1) salient object detection, 2) fixation 2 AC[64] ICVS 2008 C .129 prediction, and 3) object proposal generation1. The reason 3 FT[18] CVPR 2009 C .072 to include the latter two types of models is to conduct 4 CA[65] CVPR 2010 M+C 40.9 5 MSS[66] ICIP 2010 C .076 across-category comparison and to study whether mod- 6 SEG[67] ECCV 2010 M+C 10.9 els specifically designed for salient object detection show 7 RC[68] CVPR 2011 C .136 actual advantage over models for fixation prediction and 8 HC[68] CVPR 2011 C .017 9 SWD[69] CVPR 2011 M+C .190 object proposal generation. This is particularly important 10 SVO[70] ICCV 2011 M+C 56.5 since these models have different objectives and generate 11 CB[71] BMVC 2011 M+C 2.24 on visuallydistinctivemaps.Wealsoincludeabaselinemodel 12 FES[72] Img.Anal. 2011 M+C .096 cti e to study the effect of center bias in model comparison. 1134 SLFM[L7C3][74] CTVIPPR 22001132 MC+C .124002. Det In summary, we hope that such a benchmark not only 15 HS[75] CVPR 2013 EXE .528 ct e allows researchers to compare their models with other 16 GMR[76] CVPR 2013 M .149 bj O 17 DRFI[77] CVPR 2013 C .697 algorithmsbutalsohelpsidentifythechieffactorsaffecting 18 PCA[78] CVPR 2013 M+C 4.34 nt e the performance of salient object detection models. 19 LBI[79] CVPR 2013 M+C 251. ali 20 GC[80] ICCV 2013 C .037 S 21 CHM[81] ICCV 2013 M+C 15.4 II. SALIENTOBJECTDETECTIONBENCHMARK 22 DSR[82] ICCV 2013 M+C 10.2 In this benchmarking, we focus on evaluating models 23 MC[83] ICCV 2013 M+C .195 24 UFO[84] ICCV 2013 M+C 20.3 whose input is a single image. This is due to the fact that 25 MNP[50] Vis.Comp. 2013 M+C 21.0 salient object detection on a single input image is the main 26 GR[85] SPL 2013 M+C 1.35 research direction, while the comprehensive evaluation of 27 RBD[86] CVPR 2014 M .269 models working on multiple input images (e.g., co-salient 28 HDCT[87] CVPR 2014 M 4.12 1 IT[10] PAMI 1998 M .302 object detection) lacks public benchmark datasets. 2 AIM[88] JOV 2006 M 8.66 n A. Compared Models 34 GSRB[[9809]] CNVIPPSR 22000077 MM+C ..703450 dictio 5 SUN[91] JOV 2008 M 3.56 re In this study, we run 40 models in total (28 salient 6 SeR[92] JOV 2009 M 1.31 P n object detection models, 10 fixation prediction models, 1 78 SSISM[9[49]3] CPAVMPRI 22001112 MM 1.0.1513 atio objectness proposal model, and 1 baseline) whose codes or x 9 COV[95] JOV 2013 M 25.4 Fi executableswereaccessible(seeFig.1foracompletelist). 10 BMS[96] ICCV 2013 M+C .575 The baseline model, denoted as “Average Annotation Map 1 OBJ[97] CVPR 2010 M+C 3.01 - (AAM),” is simply the average of ground-truth annotations 1 AAM - - - - - of all images on each dataset. Note that AAM often has a Fig. 1. Compared salient object detection, fixation prediction, object largeractivationattheimagecenter(seeFig.2),andwecan proposalgeneration,andbaselinemodelssortedbytheirpublicationyear thus study the effect of center bias in model comparison. {M=Matlab,C=C/C++,EXE=executable}.Theaveragerunningtime istestedonMSRA10Kdataset(typicalimageresolution400×300)using adesktopmachinewithXeonE56452.4GHzCPUand8GBRAM.We B. Datasets evaluatethosemodelswhosecodesorexecutablesareavailable. Since there exist many datasets that differ in number of images,numberofobjectsperimage,imageresolutionand annotationform(boundingboxoraccurateregionmask),it images in the popular ASD dataset [18]. THUR15K and is likely that models may rank differently across datasets. DUT-OMRON are used to compare models on a large Hence, to come up with a fair comparison, it is necessary scale. ECSSD contains a large number of semantically torunmodelsovermultipledatasetssoastodrawobjective meaningful but structurally complex natural images. The conclusions. A good model should perform well over reason to include JuddDB was to assess performance almost all datasets. Toward this end, six datasets were cho- of models over scenes with multiple objects with high sen for model comparison, including: 1) MSRA10K [98], background clutter. Finally, we also evaluate models over 2) ECSSD [75], 3) THUR15K [98], 4) JuddDB [99], SED2 to check whether salient object detection algorithms 5) DUT-OMRON [76], and 6) SED2 [1], [100]. These can perform well on images containing more than one datasets were selected based on the following four criteria: salient object (i.e., two in SED2). Fig. 2 shows the AAM 1) being widely-used, 2) containing a large number of model output of six benchmark datasets to illustrate their images, 3) having different biases (e.g., number of salient differentcenterbiases.SeeFig.3forrepresentativeimages objects, image clutter, center-bias), and 4) potential to be and annotations from each dataset. used as benchmarks in the future research. We illustrate in Fig. 4 the statistics of the six chosen MSRA10KisadescendantoftheMSRAdataset[17].It datasets. In Fig. 4(a), we show the normalized distances contains 10,000 annotated images that covers all the 1,000 fromthecentroidofsalientobjectstothecorrespondingim- agecenters.WecanseethatsalientobjectsinECCSDhave 1Objectproposalgenerationisarecentlyemergingtrendwhichattempts the shortest distance to image centers, while salient objects todetectimageregionsthatmaycontainobjectsfromanyobjectcategory (i.e.,categoryindependentobjectproposals). inSED2havethelongestdistances.Thisisreasonablesince IEEETRANSACTIONSONIMAGEPROCESSING,VOL.XXX,NO.XXX,XXXXX2014 3 (a) MSRA10K (b) ECSSD (c) THUR15K (a) MSRA10K (b) ECSSD (d) DUT-OMRON (e) JuddDB (f) SED2 Fig.2. Averageannotationmapsofsixdatasetsusedinbenchmarking. images in SED2 usually have two objects aligned around (c) JuddDB (d) DUT-OMRON opposite image borders. Moreover, we can see that the spatialdistributionofsalientobjectsinJuddDBhasalarger variety than other datasets, indicating that this dataset have smaller positional bias (i.e., center-bias of salient objects and border-bias of background regions). In Fig. 4(b), we aim to show the complexity of images in six benchmark datasets. Toward this end, we apply the segmentationalgorithmbyFelzenszwalbetal.[101]tosee (e) THUR15K (f) SED2 how many super-pixels (i.e., homogeneous regions) can be Fig.3. Imagesandpixel-levelannotationsfromsixsalientobjectdatasets. obtained on average from salient objects and background regionsofeachimage,respectively.Inthismanner,wecan use this measure to reflect how challenging a benchmark saliency prediction, including the precision-recall (PR) and is since massive super-pixels often indicate complex fore- the receiver operating characteristics (ROC). From these ground objects and cluttered background. From Fig. 4(c), two metrics, we also report the F-Measure, which jointly wecanseethatJuddDBisthemostchallengingbenchmark considers recall and precision, and AUC, which is the area sinceithasanaveragenumberof493super-pixelsfromthe under the ROC curve. Moreover, we also use the third backgroundofeachimage.Onthecontrary,SED2contains measure which directly computes the mean absolute error fewernumberofsuper-pixelsinforegroundandbackground (MAE) between the estimated saliency map and ground- regions, indicating that images in this benchmark often truthannotation.Forthesakeofsimplification,weuseS to contain uniform regions and are easy to process. represent the predicted saliency map normalized to [0,255] In Fig. 4(c), we demonstrate the average object sizes and G to represent the ground-truth binary mask of salient of these benchmarks, while the size of each object is objects. For a binary mask, we use | · | to represent the normalized by the size of the corresponding image. We number of non-zero entries in the mask. can see that MSRA10K and ECCSD datasets have larger Precision-recall (PR). For a saliency map S, we can objects while SED2 has smaller ones. In particular, we convert it to a binary mask M and compute Precision can see that some benchmarks contain a limited number and Recall by comparing M with ground-truth G: of image regions with large foreground objects. By jointly |M ∩G| |M ∩G| considering the center-bias property, it becomes very easy Precision= , Recall= (1) to achieve a high precision on these images. |M| |G| From this definition, we can see that the binarization C. Evaluation Measures of S is the key step in the evaluation. Usually, there are three popular ways to perform the binarization. In the first There are several ways to measure the agreement be- solution,Achantaetal.[18]proposedtheimage-dependent tweenmodelpredictionsandhumanannotations[21].Some adaptive threshold for binarizing S, which is computed as metricsevaluatetheoverlapbetweenataggedregionwhile twice as the mean saliency of S: others try to assess the accuracy of drawn shapes with object boundary. In addition, some metrics have tried to 2 (cid:88)W (cid:88)H T = S(x,y), (2) consider both boundary and shape [102]. a W ×H x=1 y=1 Here, we use three universally-agreed, standard, and whereW andH arethewidthandtheheightofthesaliency easy-to-understand measures for evaluating a salient object map S, respectively. detection model. The first two evaluation metrics are based ThesecondwaytobipartiteS istouseafixedthreshold on the overlapping area between subjective annotation and which changes from 0 to 255. On each threshold, a pair IEEETRANSACTIONSONIMAGEPROCESSING,VOL.XXX,NO.XXX,XXXXX2014 4 0.5 0.5 MSRA10K Background area (solid lines) ECSSD Salient object area (dashed lines) THUR15K 0.6 0.4 JuddDB 0.4 Probability Density00..23 DSEUDTO2MRON Probability Density00..24 Probability Density00..23 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 101 102 103 0.2 0.4 0.6 0.8 (a) Object to image center distance (b) Number of regions (c) Normalized object size Fig.4. Statisticsofthebenchmarkdatasets.a)distributionofnormalizedobjectdistancefromimagecenter,b)distributionofsuper-pixelnumberon salientobjectsandimagebackground,andc)distributionofnormalizedobjectsize. 1 of precision/recall scores are computed, and are finally combined to form a precision-recall (PR) curve to describe the model performance at different situations. 0.8 e The third way of binarization is to use the SaliencyCut e rat algorithm [68]. In this solution, a loose threshold, which sitiv0.6 typicallyresultsingoodrecallbutrelativelypoorprecision, o p isusedtogeneratetheinitialbinarymask.Thenthemethod ue iteratively uses the GrabCut segmentation method [103] to n / tr0.4 o gradually refines the binary mask. The final binary mask is si BMS−PrecisionRecall used to re-compute the precision-recall value. Preci GBMB−SP−rReOciCsionRecall 0.2 GB−ROC F-measure. Usually, neither Precision nor Recall can comprehensivelyevaluatethequalityofasaliencymap.To thisend,theF-measureisproposedasaweightedharmonic 00 0.2 0.4 0.6 0.8 1 Recall / false positive rate mean of them with a non-negative weight β: Fig.5. PRandROCcurvesforBMS[96]andGB[89]overECSSD. (1+β2)Precision×Recall F = . (3) β β2Precision+Recall thresholds: |M ∩G| |M ∩G¯| As suggested by many salient object detection works (e.g., TPR= , FPR= (4) [18], [68], [73]), β2 is set to 0.3 to raise more importance |G| |G¯| tothePrecisionvalue.Thereasonforweightingprecision whereM¯ andG¯ denotetheoppositeofthebinarymaskM more than recall is that recall rate is not as important as and ground-truth, respectively. The ROC curve is the plot precision(seealso[104]).Forinstance,100%recallcanbe of TPR versus FPR by varying the threshold T . f easily achieved by setting the whole region to foreground. Area under ROC curve (AUC) score. While ROC is a According to the different ways for saliency map bina- two-dimensional representation of a model’s performance, rization,thereexisttwowaystocomputeF-Measure.When the AUC distills this information into a single scalar. As theadaptivethresholdorGrabCutalgorithmisusedforthe the name implies, it is calculated as the area under the binarization, we can generate a single Fβ for each image ROCcurve.AperfectmodelwillscoreanAUCof1,while and the final F-Measure is computed as the average Fβ. random guessing will score an AUC around 0.5. When using fixed thresholding, the resulted PR curve can Mean absolute error (MAE) score. The overlap-based be scored by its maximal F , which is a good summary β evaluation measures introduced above do not consider the of the detection performance (as suggested in [105]). As truenegative saliencyassignments, i.e.,the pixelscorrectly defined in (3), F-Measure is the weighted harmonic mean marked as non-salient. This favors methods that success- of precision and recall, thus share the same value bounds fully assign saliency to salient pixels but fail to detect as precision and recall values, i.e. [0, 1]. non-salient regions over methods that successfully detect Receiver operating characteristics (ROC) curve. In ad- non-salient pixels but make mistakes in determining the dition to the Precision, Recall and F , we can also salient ones [73], [80]. Moreover, in some application β report the false positive rate (FPR) and true positive rate scenarios [106] the quality of the weighted, continuous (TPR)whenbinarizingthesaliencymapwithasetoffixed saliencymapsmaybeofhigherimportancethanthebinary IEEETRANSACTIONSONIMAGEPROCESSING,VOL.XXX,NO.XXX,XXXXX2014 5 1 1 1 HDCT GC FES CA SIM RBD LBI CB FT SeR GR PCA SVO AC SUN 0.8 0.8 0.8 MNP DRFI SWD LC SR UFO GMR HC OBJ GB MC HS RC BMS AIM DSR LMLC SEG COV IT 0.6 0.6 0.6 CHM SF MSS SS AVG 0.4 0.4 0.4 0.2 0.2 0.2 (a) MSRA10K (b) ECSSD (c) JuddDB 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 (d) DUT-OMRON (e) THUR15K (f) SED2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Fig.6. Precision(verticalaxis)andrecall(horizontalaxis)curvesofsaliencymethodson6popularbenchmarkdatasets. masks.Foramorecomprehensivecomparisonwetherefore D. Quantitative Comparison of Models also evaluate the mean absolute error (MAE) between the Weevaluatesaliencymapsproducedbydifferentmodels continuous saliency map S¯ and the binary ground truth G¯, on six datasets by using all evaluation metrics: both normalized in the range [0, 1]. The MAE score is 1) Fig. 6 and Fig. 7 show PR and ROC curves; defined as: 2) Fig.8andFig.9demonstrateAUCandMAEscores; 3) Fig. 10 shows the F scores of all models2. β MAE = 1 (cid:88)W (cid:88)H |S¯(x,y)−G¯(x,y)| (5) In terms of both PR and ROC curves, DRFI model surprisinglyoutperformsallothermodelsonsixbenchmark W ×H x=1 y=1 datasets with large margins. Besides, RBD, DSR and MC (solid lines with blue, yellow, and magenta colors, re- Notethatthesescoressometimesdonotagreewitheach spectively) achieve close performance and perform slightly other. For example, Fig. 5 shows a comparison of two better than other models. models over ECSSD using PR and ROC metrics. While UsingtheF-measure(i.e.,F ),thefivebestmodelsare: there is not a big difference in ROC curves (thus about β DRFI, MC, RBD, DSR, and GMR, where DRFI model the same AUC), one model clearly scores better using the consistently wins over all the 5 datasets. MC ranks the PR curve (thus having higher F ). Such disparity between β second best over 2 datasets and the third best over 2 the ROC and PR measures has been extensively studied datasets. SR and SIM models perform the worst. in [107]. Note that the number of negative examples (non- With respect to the AUC score, DRFI again ranks the salient pixels) is typically much bigger than the number best over all six datasets. Following DRFI, DSR model of positive examples (salient object pixels) in evaluating ranks the second over 4 datasets. RBD ranks the second salient object detection models. Therefore, PR curves are on 1 dataset and the third on 2 datasets. While PCA ranks more informative than ROC curves and can present an the third on 1 dataset in terms of AUC score, it is not on over optimistic view of an algorithm’s performance [107]. the list of top three contenders using F measure. IT, LC, Thus we mainly base our conclusions on the PR curves β and SR achieve the worst performance. It is worth being scores(i.e.,F-Measurescores),andalsoreportotherscores mentioned that all the models perform well above chance forcomprehensivecomparisonsandforfacilitatingspecific level (AUC = 0.5) on six benchmark datasets. application requirements. It is worth mentioning that ac- tive research is ongoing to figure out the better ways of 2Three segmentation methods are used, including adaptive threshold, measuringsalientobjectdetectionandsegmentationmodels fixedthreshold,andSaliencyCutalgorithm.Theinfluenceofsegmentation (e.g. [108]). methodswillbediscussedinSect.III-A IEEETRANSACTIONSONIMAGEPROCESSING,VOL.XXX,NO.XXX,XXXXX2014 6 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 HDCT PCA HC COV RBD DRFI RC SS GR GMR SEG SIM 0.4 MNP HS MSS SeR 0.4 0.4 UFO LMLC CA SUN MC SF FT SR DSR FES AC GB 0.2 CHM CB LC AIM 0.2 0.2 GC SVO OBJ IT LBI SWD BMS AAM (a) MSRA10K (b) ECSSD (c) JuddDB 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 (d) DUT-OMRON (e) THUR15K (f) SED2 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Fig.7. ROCcurvesofmodelson6benchmarks.Falseandtruepositiveratesareshowninxandy axes,respectively. Rankings of models using MAE are more diverse than that indeed there is less center bias in this dataset and either F or AUC scores. DSR, RBD and DRFI rank on salientobjectdetectionmodelscandetectoff-centerobjects. β the top, but none of them are among top three models over NoticethatAAMrankslowestonSED2comparedtoother JuddDB. MC, which performs well in terms of F and datasets.Pleasenoticethatitdoesnotnecessarilymeanthat β AUC,isnotincludedinthetopthreemodelsonanydataset. models below AAM are not good, as taking advantage of PCA performs the best on JuddDB but worse on others. the location prior may further enhance their performance SIM and SVO models perform the worst. (e.g., LC and FT). On average, the compared fixation prediction and object Onaverage,overallmodelsandscores,theperformances proposal generation models perform worse than salient were lower on JuddDB, DUT-OMRON and THUR15K, object detection models. As two outliers, COV and BMS implying that these datasets were more challenging. The outperformseveralsalientobjectdetectionmodelsinterms low model performance of JuddDB can be caused by both of all evaluation metrics, implying that they are suitable lesscenterbiasandsmallobjectsinimages.Noisylabeling for detecting salient proto objects. Additionally, Fig. 11 of DUT-OMRON dataset might also be a reason for low shows the distribution of F , ROC and MAE scores of all modelperformance.Byinvestigatingsomeimagesofthese β salientobjectdetectionmodelsversusallfixationprediction two datasets for which models performed low, we found models over all benchmark datasets. We can see a sharp that there are several objects that can be potentially the separation of models especially for the F score, where mostsalientone.Thismakesthegenerationofground-truth β most of the top models are salient object detection models. quite subjective and challenging, although the most salient This result is consistent with the conclusion in [1] that object in JuddDB has objectively been defined to be the fixationpredictionmodelsperformlowerthansalientobject most looked-at one measured from eye movement data. detection models. Though stemming from fixation predic- tion, research in salient object detection shares its unique E. Qualitative Comparison of Models properties and has truly added to what traditional saliency models focusing on fixation prediction already offer. Fig. 12 shows output maps of all models for a sample Inparticular,mostofthe28salientobjectdetectionmod- imagewithrelativelycomplexbackground.Darkblueareas els outperform the baseline AAM model. Among these 28 are less salient while dark red indicates higher saliency models,AAMonlyoutperforms2modelsoverMSRA10K, values. Compared with other models, top contenders like 8overECSSD,4onTHUR15K,12onJuddDB,and4on DRFI and DSR suppress most of the background well DUT-OMRON in terms of F . Interestingly, AAM model while almost successfully detect the whole salient object. β does not outperform any model over SED2, which means They thus generate higher precision scores and less false IEEETRANSACTIONSONIMAGEPROCESSING,VOL.XXX,NO.XXX,XXXXX2014 7 Model THUR15K JuddDB DUT-OMRON SED2 MSRA10K ECSSD Model THUR15K JuddDB DUT-OMRON SED2 MSRA10K ECSSD HDCT .878 .771 .869 .898 .941 .866 HDCT .177 .209 .164 .162 .143 .199 RBD .887 .826 .894 .899 .955 .894 RBD .150 .212 .144 .130 .108 .173 GR .829 .747 .846 .854 .925 .831 GR .256 .311 .259 .189 .198 .285 MNP .854 .768 .835 .888 .895 .820 MNP .255 .286 .272 .215 .229 .307 UFO .853 .775 .839 .845 .938 .875 UFO .165 .216 .173 .180 .150 .207 MC .895 .823 .887 .877 .951 .910 MC .184 .231 .186 .182 .145 .204 DSR .902 .826 .899 .915 .959 .914 DSR .142 .196 .139 .140 .121 .173 CHM .910 .797 .890 .831 .952 .903 CHM .153 .226 .152 .168 .142 .195 GC .803 .702 .796 .846 .912 .805 GC .192 .258 .197 .185 .139 .214 LBI .876 .792 .854 .896 .910 .842 LBI .239 .273 .249 .207 .224 .280 PCA .885 .804 .887 .911 .941 .876 PCA .198 .181 .206 .200 .185 .248 DRFI .938 .851 .933 .944 .978 .944 DRFI .150 .213 .155 .130 .118 .166 GMR .856 .781 .853 .862 .944 .889 GMR .181 .243 .189 .163 .126 .189 HS .853 .775 .860 .858 .933 .883 HS .218 .282 .227 .157 .149 .228 LMLC .853 .724 .817 .826 .936 .849 LMLC .246 .303 .277 .269 .163 .260 SF .799 .711 .803 .871 .905 .817 SF .184 .218 .183 .180 .175 .230 FES .867 .805 .848 .838 .898 .860 FES .155 .184 .156 .196 .185 .215 CB .870 .760 .831 .839 .927 .875 CB .227 .287 .257 .195 .178 .241 SVO .865 .784 .866 .875 .930 .857 SVO .382 .422 .409 .348 .331 .404 SWD .873 .812 .843 .845 .901 .857 SWD .288 .292 .310 .296 .267 .318 HC .735 .626 .733 .880 .867 .704 HC .291 .348 .310 .193 .215 .331 RC .896 .775 .859 .852 .936 .892 RC .168 .270 .189 .148 .137 .187 SEG .818 .747 .825 .796 .882 .808 SEG .336 .354 .337 .312 .298 .342 MSS .813 .726 .817 .871 .875 .779 MSS .178 .204 .177 .192 .203 .245 CA .830 .774 .815 .853 .872 .784 CA .248 .282 .254 .229 .237 .310 FT .684 .593 .682 .820 .790 .661 FT .241 .267 .250 .206 .235 .291 AC .740 .548 .721 .831 .756 .668 AC .186 .239 .190 .206 .227 .265 LC .696 .586 .654 .827 .771 .627 LC .229 .277 .246 .204 .233 .296 OBJ .839 .750 .822 .870 .907 .818 OBJ .306 .359 .323 .269 .262 .337 BMS .879 .788 .856 .852 .929 .865 BMS .181 .233 .175 .184 .151 .216 COV .883 .826 .864 .833 .904 .879 COV .155 .182 .156 .210 .197 .217 SS .792 .754 .784 .826 .823 .725 SS .267 .301 .277 .266 .266 .344 SIM .797 .727 .783 .833 .808 .734 SIM .414 .412 .429 .384 .388 .433 SeR .778 .746 .786 .835 .813 .695 SeR .345 .379 .352 .290 .310 .404 SUN .746 .674 .708 .789 .778 .623 SUN .310 .319 .349 .307 .306 .396 SR .741 .676 .688 .769 .736 .633 SR .175 .200 .181 .220 .232 .266 GB .882 .815 .857 .839 .902 .865 GB .229 .261 .240 .242 .222 .263 AIM .814 .719 .768 .846 .833 .730 AIM .298 .331 .322 .262 .286 .339 IT .623 .586 .636 .682 .640 .577 IT .199 .200 .198 .245 .213 .273 AAM .849 .797 .814 .736 .857 .863 AAM .248 .343 .288 .405 .260 .276 Fig. 8. AUC: area under ROC curve (Higher is better. The top Fig. 9. MAE: Mean Absolute Error (Smaller is better. The top threemodelsarehighlightedinred,greenandblue). threemodelsarehighlightedinred,greenandblue). positive rates. Some models that include a center-bias III. PERFORMANCEANALYSIS componentalsoresultinappealingmaps,e.g.,CB.Interest- Based on the performances reported above, we also ingly,region-basedapproaches,e.g.,RC,HS,DRFI,BMR, conduct several experiments to provide a detailed analysis CB, and DSR always preserve the object boundary well of all the benchmarking models and datasets. compared with other pixel-based or patch-based models. A. Analysis of Segmentation Methods We can also clearly see the distinctness of different categoriesofmodels.Salientobjectdetectionmodelstryto In many computer vision and graphics applications, seg- highlight the whole salient object and suppress the back- menting regions of interest is of great practical importance ground. Fixation prediction models often produce blob- [36], [44], [47]–[49], [109], [110]. The simplest way of like and sparse saliency maps corresponding to the fixation segmenting a salient object is to binarize the saliency map areas of humans on scenes. The objectness map is a rough using a fixed threshold, which might be hard to choose. indicationofthesalientobject.Theoutputofthelattertwo In this section, we extensively evaluate two additional typesofmodelsmightnotsuittosegmentthewholesalient most commonly used salient object segmentation methods, object well. including adaptive threshold [18] and SaliencyCut [68]. IEEETRANSACTIONSONIMAGEPROCESSING,VOL.XXX,NO.XXX,XXXXX2014 8 Model THUR15K JuddDB DUT-OMRON SED2 MSRA10K ECSSD Fixed AdpT SCut Fixed AdpT SCut Fixed AdpT SCut Fixed AdpT SCut Fixed AdpT SCut Fixed AdpT SCut HDCT .602 .571 .636 .412 .378 .422 .609 .572 .643 .822 .802 .758 .837 .807 .877 .705 .669 .740 RBD .596 .566 .618 .457 .403 .461 .630 .580 .647 .837 .825 .750 .856 .821 .884 .718 .680 .757 GR .551 .509 .546 .418 .338 .378 .599 .540 .580 .798 .753 .639 .816 .770 .830 .664 .583 .677 MNP .495 .523 .603 .367 .337 .405 .467 .486 .576 .621 .778 .765 .668 .724 .822 .568 .555 .709 UFO .579 .557 .610 .432 .385 .433 .545 .541 .593 .742 .781 .729 .842 .806 .862 .701 .654 .739 MC .610 .603 .600 .460 .420 .434 .627 .603 .615 .779 .803 .630 .847 .824 .855 .742 .704 .745 DSR .611 .604 .597 .454 .421 .410 .626 .614 .593 .794 .821 .632 .835 .824 .833 .737 .717 .703 CHM .612 .591 .643 .417 .368 .424 .604 .586 .637 .750 .750 .658 .825 .804 .857 .722 .684 .735 GC .533 .517 .497 .384 .321 .342 .535 .528 .506 .729 .730 .616 .794 .777 .780 .641 .612 .593 LBI .519 .534 .618 .371 .353 .416 .482 .504 .609 .692 .776 .764 .696 .714 .857 .586 .563 .738 PCA .544 .558 .601 .432 .404 .368 .554 .554 .624 .754 .796 .701 .782 .782 .845 .646 .627 .720 DRFI .670 .607 .674 .475 .419 .447 .665 .605 .669 .831 .839 .702 .881 .838 .905 .787 .733 .801 GMR .597 .594 .579 .454 .409 .432 .610 .591 .591 .773 .789 .643 .847 .825 .839 .740 .712 .736 HS .585 .549 .602 .442 .358 .428 .616 .565 .616 .811 .776 .713 .845 .800 .870 .731 .659 .769 LMLC .540 .519 .588 .375 .302 .397 .521 .493 .551 .653 .712 .674 .801 .772 .860 .659 .600 .735 SF .500 .495 .342 .373 .319 .219 .519 .512 .377 .764 .794 .509 .779 .759 .573 .619 .576 .378 FES .547 .575 .426 .424 .411 .333 .520 .555 .380 .617 .785 .174 .717 .753 .534 .645 .655 .467 CB .581 .556 .615 .444 .375 .435 .542 .534 .593 .730 .704 .657 .815 .775 .857 .717 .656 .761 SVO .554 .441 .609 .414 .279 .419 .557 .407 .609 .744 .667 .746 .789 .585 .863 .639 .357 .737 SWD .528 .560 .649 .434 .386 .454 .478 .506 .613 .548 .714 .737 .689 .705 .871 .624 .549 .781 HC .386 .401 .436 .286 .257 .280 .382 .380 .435 .736 .759 .646 .677 .663 .740 .460 .441 .499 RC .610 .586 .639 .431 .370 .425 .599 .578 .621 .774 .807 .649 .844 .820 .875 .741 .701 .776 SEG .500 .425 .580 .376 .268 .393 .516 .450 .562 .704 .640 .669 .697 .585 .812 .568 .408 .715 MSS .478 .490 .200 .341 .324 .089 .476 .490 .193 .743 .783 .298 .696 .711 .362 .530 .536 .203 CA .458 .494 .557 .353 .330 .394 .435 .458 .532 .591 .737 .565 .621 .679 .748 .515 .494 .625 FT .386 .400 .238 .278 .250 .132 .381 .388 .259 .715 .734 .436 .635 .628 .472 .434 .431 .257 AC .410 .431 .068 .227 .199 .049 .354 .383 .040 .684 .729 .140 .520 .566 .014 .411 .410 .038 LC .386 .408 .289 .264 .246 .156 .327 .353 .243 .683 .752 .486 .569 .589 .432 .390 .396 .219 OBJ .498 .482 .593 .368 .282 .413 .481 .445 .578 .685 .723 .731 .718 .681 .840 .574 .456 .698 BMS .568 .578 .594 .434 .404 .416 .573 .576 .580 .713 .760 .627 .805 .798 .822 .683 .659 .690 COV .510 .587 .398 .429 .427 .315 .486 .579 .373 .518 .724 .212 .667 .755 .394 .641 .677 .413 SS .415 .482 .523 .344 .321 .397 .396 .443 .502 .533 .696 .641 .572 .642 .675 .467 .441 .574 SIM .372 .429 .568 .295 .292 .384 .358 .402 .539 .498 .685 .725 .498 .585 .794 .433 .391 .672 SeR .374 .419 .536 .316 .285 .388 .385 .411 .532 .521 .714 .702 .542 .607 .755 .419 .391 .596 SUN .387 .432 .486 .303 .291 .285 .321 .360 .445 .504 .661 .613 .505 .596 .670 .388 .376 .478 SR .374 .457 .002 .279 .270 .001 .298 .363 .000 .504 .700 .002 .473 .569 .001 .381 .385 .001 GB .526 .571 .650 .419 .396 .455 .507 .548 .638 .571 .746 .695 .688 .737 .837 .624 .613 .765 AIM .427 .461 .559 .317 .260 .360 .361 .377 .495 .541 .718 .693 .555 .575 .750 .449 .357 .571 IT .373 .437 .005 .297 .283 .000 .378 .449 .005 .579 .697 .008 .471 .586 .158 .407 .414 .003 AAM .458 .569 .620 .392 .367 .411 .406 .514 .534 .388 .524 .640 .580 .692 .779 .597 .627 .756 Fig.10. Fβ statisticsoneachdataset,usingvaryingfixedthresholds,adaptivethreshold,andSaliencyCut(Higherisbetter.Thetopthreemodelsare highlightedinred,greenandblue). Average F scores for salient object segmentation results “single none ambiguous salient object assumption” made β on six benchmark datasets are shown in Fig. 10. Each seg- in [68]. mentation algorithm was fed with saliency maps produced As also observed by most works in image segmentation by all 40 compared models. literature, nearby pixels with similar appearance tend to Except JuddDB and SED2 datasets, best segmentation havesimilarobjectlabels.Tovalidatethis,wedemonstrated results are all achieved via SaliencyCut method combined in Fig. 13(a) some better segmentation results by further with a sophisticated salient object detection model (e.g., enforcing label consistency among nearby and similar pix- DRFI, RBD, MNP). This suggests that enforcing label els. Enforcing such label consistency often helps improve consistencyintermsofusinggraph-basedsegmentationand labeling pixels specially when the majority of the salient global appearance statistics benefits salient object segmen- object pixels have been highlighted in the detection phase. tations. ThedefaultSaliencyCut [68]program onlyoutputs Challenging examples might still exist, however, such as the most dominate salient object, This causes results for complex object topology, spindle components, and similar SED2 and JuddDB benchmarks to be less optimal, as appearancewithrespecttoimagebackground.Moreresults images in these two datasets (see Fig. 3) do not follow the of using the best combination, DRFI saliency maps and IEEETRANSACTIONSONIMAGEPROCESSING,VOL.XXX,NO.XXX,XXXXX2014 9 Fig.11. HistogramofAUC,MAE,andMeanFβ scoresforsalientobjectdetectionmodels(blue)versusfixationpredictionmodels(red)collapsed overallsixdatasets. #159400 HDCT RBD GR MNP UFO MC DSR CHM GC LBI PCA DRFI GMR HS LMLC SF FES CB SVO (a) Left to right: image, saliency map, AdpT, SCut and gTruth. SWD HC RC SEG MSS CA FT AC LC (b) DRFI model output fed to the SaliencyCut algorithm. OBJ AAM Fig.13. Samplesofsalientobjectsegmentationresults. only a part of the object is finally segmented. BMS COV SS SIM SeR B. Analysis of Center Bias SUN SR GB AIM IT In this section, we study the center-bias challenge since it has caused a major problem in fixation prediction and salient object detection models. Some studies usually add a Gaussian center prior to models when comparing them. This might not be fair as several salient object detection Fig. 12. Estimated saliency maps from various salient object detection models, object proposal generation model, average annotation map, and models already contain center-bias at different levels. Al- fixationpredictionmodels. ternatively, we randomly choose 1000 images with no/less center bias from the MSRA10K dataset. First, the distance of salient object centroid to the image center is computed SaliencyCut segmentation, are demonstrated for images for each image. Those images for which such distance with various complexities, as shown in Fig. 13(b). is bigger than a threshold are then chosen. Some sample A failure case of SaliencyCut segmentation along with images with no/less center-bias, as well as an illustration intermediate results is also shown in the last row of Fig. of the threshold of choosing images, are shown in Fig. 14. 13(a). Due to the complex topology of the salient ob- The average annotation of less center-biased images shows ject, label consistency in a local range considered in the two peaks on the left and on the right of the image, which SaliencyCut algorithm may not work well. Additionally, is suitable for testing the performance of salient object the appearance of the object looks very distinct due to the detection models on off-center images. existence of shading and reflection, which makes the seg- We evaluate all the compared 40 models on these 1000 mentation of the whole object very challenging. Therefore, images. PR and ROC curves, F , AUC, and MAE scores β IEEETRANSACTIONSONIMAGEPROCESSING,VOL.XXX,NO.XXX,XXXXX2014 10 1 1 0.8 0.8 0.6 0.6 n o si HDCT PCA HC COV eci 0.4 RBD DRFI RC SS Pr0.4 GR GMR SEG SIM MNP HS MSS SeR UFO LMLC CA SUN 0.2 MC SF FT SR 0.2 DSR FES AC GB CHM CB LC AIM GC SVO OBJ IT LBI SWD BMS AAM 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Method HDCT RBD GR MNP UFO MC DSR CHM GC LBI PCA DRFI GMR HS LMLC SF FES CB SVO Max .822 .811 .791 .661 .805 .764 .776 .746 .697 .685 .750 .831 .754 .815 .720 .747 .621 .693 .792 AUC .941 .943 .925 .912 .929 .888 .938 .920 .860 .910 .928 .964 .886 .918 .896 .885 .839 .872 .942 MAE .122 .106 .183 .188 .128 .171 .117 .138 .164 .197 .162 .127 .148 .150 .201 .150 .160 .207 .325 Method SWD HC RC SEGMSS CA FT AC LC OBJBMSCOV SS SIM SeR SUN SR GB AIM IT AAM Max .521 .700.744 .629 .666 .620.671.521.569.708 .739 .463 .571.515.546 .498 .444.590 .540 .460 .328 AUC .813 .898.855 .828 .868 .896.843.800.797.915 .879 .805 .852.858.849 .795 .750.850 .836 .655 .716 MAE .291 .176.177 .300 .167 .199.183.177.192.243 .146 .176 .225.363.273 .276 .184.208 .265 .165 .406 Fig.15. Resultsofcenter-biasanalysisover1000lesscenter-biasedimageschosenfromtheMSRA10Kdataset.Top:ROCandPRcurves,Bottom: MeanFβ,AUC,andMAEscoresforallmodels. the contrast, CB model uses a lot of location prior and that is why its performance drops heavily when applied to these images (difference are 0.122, 0.122, and 0.029, respectively). Additionally, it can be observed from Fig. 2(f), there is less center bias over the SED2 dataset where there is less activation in the center of its average annotation map. We can therefore study the center bias on it. Similarly, DRFI and DSR outperforms other models in terms of F , AUC, β Fig.14. Left:Histogramofobjectcenteroverallimages,threshold(red and MAE scores, indicating they are more robust to the line=0.247),andannotationmapover1000lesscenter-biasedimagesin MSRA10K dataset. Right: Four less center-biased images. The overlaid locationvariationsofsalientobjects.HSagainrankssecond circleillustratesthecenter-biasthreshold. according to the F score. Fig. 16 shows best and worst β un-centered stimuli for DRFI and DSR models. Overall, all the models perform well above the chance are all shown in Fig. 15. DRFI and DSR again perform levelovereitherthelesscenter-biasedsubsetofMSRA10K thebest.Overall,mostmodels’performancedecreasewhen or SED2. It is also worth noticing that the AAM model testingonno/lesscenterbiasedimages(e.g.,theAUCscore performs significantly worse on these two datasets, as well of MC declines from 0.951 to 0.888), while a few others as JuddDB, validating our motivation of studying center show increase. For example, the AUC score of SVO raises bias on them. from 0.930 to 0.942 and it gets the second ranking. Some models, e.g., HS (with the second ranking in terms of F β C. Analysis of Salient Object Existence score),performsbetteraccordingtotheirrankchangesw.r.t the whole MSRA10K dataset. DRFI still wins over other Almost all of existing salient object detection models models here with a large margin. The difference in F , assume that there is at least one salient object in the β AUC, and MAE scores are not very big for this model input image. This impractical assumption might lead to overalldataand1000lesscenter-biasedimages(difference less optimal performance on “background images”, which are 0.05, 0.05, and 0.009, respectively). This means that do not contain any dominated salient objects, as studied this model is not taking advantage of center-bias much. In in [111]. To verify the effectiveness of models on back-

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.