ebook img

Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study PDF

6.7 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study

Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study Yi-LingChen1,2 Tzu-WeiHuang3 Kai-HanChang2 Yu-ChenTsai2 Hwann-TzongChen3 Bing-YuChen2 1UniversityofCalifornia,Davis 2NationalTaiwanUniversity 3NationalTsingHuaUniversity 7 1 Abstract 0 2 Automatic photo cropping is an important tool for im- n proving visual quality of digital photos without resorting a J to tedious manual selection. Traditionally, photo cropping 5 is accomplished by determining the best proposal window throughvisualqualityassessmentorsaliencydetection. In ] essence, the performance of an image cropper highly de- V pendsontheabilitytocorrectlyrankanumberofvisually C similar proposal windows. Despite the ranking nature of . s automaticphotocropping, littleattentionhasbeenpaidto c learning-to-rankalgorithmsintacklingsuchaproblem. In [ thiswork,weconductanextensivestudyontraditionalap- 1 proachesaswellasranking-basedcropperstrainedonvar- v iousimagefeatures.Inaddition,anewdatasetconsistingof 0 high quality cropping and pairwise ranking annotations is 8 4 presentedtoevaluatetheperformanceofvariousbaselines. 1 Theexperimentalresultsonthenewdatasetprovideuseful 0 insightsintothedesignofbetterphotocroppingalgorithms. . 1 0 7 1 1.Introduction : v i X Photo cropping is an important operation for improv- ing visual quality of photos, which is mainly performed r a toremoveunwantedscenecontentsorirrelevantdetailsby cutting away the outer parts of the image. Nowadays, it Figure1.Anewimagecroppingdatasetispresentedinthiswork. is mostly performed on digital images but remains a te- Example images of an existing database [37] (upper) and ours dious manual selection process and requires experience to (bottom).Notethat[37]containsmoreimagesofcanonicalviews. obtain quality crops. Therefore, a lot of computational Weattempttobuildanewdatasetcontainingmoreimagesofnon- techniques have been proposed to automate this process canonicalperspectivesandrichercontextualinformation. [41,37,10,40]. Automaticphotocroppingiscloselyrelatedtootherap- plications like image thumbnail generation [32, 25], view of visually similar proposal windows. Traditionally, auto- findingandrecommendation[4,6,31]. Inanutshell,these matic photo cropping techniques follow two mainstreams, approachesshareonecorecapabilityincommon–finding i.e.attention-based[30]andaesthetics-basedmethods[28], anoptimalsubviewintermsofitsaestheticsorcomposition which aim to search for a crop window covering the most within a larger scene. In other words, their performance visually significant objects or assess the visual quality of highly depends on the ability to correctly rank a number the candidate windows according to certain photographic guidelines,respectively. However,inspiteofitsnatureofa ofimagesispredictablebyusingglobalimagedescriptors. rankingproblem, tothebestofourknowledgenoneofthe Inrecentyears,deepconvolutionalneuralnetwork(DCNN) existing researches have adopted the learning-to-ranking has been proven to gain tremendous success in various vi- approaches to accomplish this task, which is proven to be sualrecognitiontasksandseveralworksalsoexploiteditas useful and widely used in many information retrieval sys- themachinerytolearneffectivefeaturesforaestheticspre- tems. Themaingoalofthisworkisthustostudytheeffec- diction [15, 22, 23]. In [16], Karayev et al. compare the tivenessofapplyingrankingalgorithmsonimagecropping performance of different image features for style recogni- andviewfindingproblems. tionandshowthatCNNfeaturesgenerallyoutperformother We believe that the ability of ranking pairwise views in featuresevenwhentrainedonobjectclasslabels. thesamecontext isessentialforevaluatingphotocropping 2.2.PhotoCroppingandViewFindingMethods techniques. Therefore,webuildanewdatasetconsistingof 1,743imageswithhumanlabeledcropwindowsand31,430 Generally, automatic photo cropping techniques can be pairs of subviews with visual preference annotations. To categorized into two lines of researches: attention-based obtainqualityannotations,wecarefullydesignedanimage and aesthetics-based approaches. The basic principle of collection and annotation pipeline which extensively ex- attention-based methods is to place the crop window over ploitedacrowd-sourcingplatformtovalidatetheannotated themostvisuallysignificantregionsinanimageaccording images. Weconductextensiveevaluationontraditionalap- tocertainattentionscores,e.g.saliencymap[32,30],orby proaches and a variety of machine learned rankers trained resorting to eye tracker [29], face detector [42] to find the on the AVA dataset [27] and our dataset with various aes- regions of interest. In [25], a classifier trained on an an- thetic features [26, 9]. Experimental validations show that notateddatabaseforsaliencypredictionisusedtofacilitate ranking based image croppers consistenly achieve higher imagethumbnailextraction. Recently,Chenetal.[5]con- croppingaccuracyinboththeimagecroppingdataset[37] duct a complexity study of several different formulations andourdataset. Additionally,italsosuggeststhatranking- ofoptimalwindowsearchundertheattention-basedframe- based algorithms still have great potential to further im- work. Althoughtheattention-basedapproachescanusually provetheirperformanceonautomaticimagecroppingwith determine a crop receiving the most human attention, the moreeffectivefeatures. Thedatasetpresentedinthiswork croppedimagesarenotnecessarilyvisuallypleasingdueto ispubliclyavailable1. littleconsiderationofimagecomposition. The aesthetics-based methods accomplish the cropping 2.PreviousWork task mainly by analyzing the attractiveness of the cropped imagewiththehelpofaqualityclassifier[28,10],andare 2.1.AestheticAssessmentandModeling thus closely related to photo quality assessment [7, 8, 24]. The main goal of aesthetic visual analysis is to imitate Recently, Yan et al. [37] proposed a cropping technique humaninterpretationofthebeautyofnaturalimages. Tra- that employs features designed to capture the changes be- ditionally, aesthetics visual analysis mainly focuses on the tweentheoriginalandcroppedimages. In[4],findinggood binary classification problem of predicting high- and low- subviews in a panoramic scene is achieved by analyzing quality images [7, 8, 24]. To this end, researchers design thestructuralfeaturesandlayoutofvisualsaliencylearned variousfeaturestocapturetheaestheticpropertiesofanim- from reference images of professional photographs. Sev- agecompliantwithphotographicrulesorpractices,suchas eral other related works achieve view finding by learning theruleofthirdsandvisualbalance. Forexample,thespa- aesthetic features based on position relationships between tial distribution of edges is exploited as a feature to model regions[6]orimagedecomposition[31]. thephotographicruleof“simplicity”[17]. Somephotore- compositiontechniquesattempttoenhanceimagecomposi- 2.3.DatasetsforComputationalAesthetics tionbyrearrangingthevisualelements[1], applyingcrop- Datasets play an important role for computer vision re- and-retarget operations [21] or providing on-site aesthetic searches since they provide a means to train and evalu- feedback[38]toimprovetheaestheticsscoreofthemanip- ate algorithms. There are already several publicly avail- ulatedimage. able databases containing aesthetic annotations, such as Instead of using “hand-crafted” features highly related [7, 17, 24, 27]. Among them, AVA [27] is a large-scale tothebestphotographicpractices, Marchesottietal.show dataset which takes advantage of community-shared data that generic image descriptors previously used for image (e.g. dpchallenge) to provide a rich collection of aes- classification are also capable of capturing aesthetic prop- thetic and semantic annotations. Despite all these efforts, erties[26]. In[12],Isolaetal.showthatthememorability thereisstillnotastandardbenchmarkforevaluatingauto- 1https://github.com/yiling-chen/ maticphotocroppingalgorithms. In[37],theauthorsbuilt flickr-cropping-dataset a dataset consisting of 950 images, which are divided into Figure2.Examplesofcroppairgeneration. Ontheleftarethesourceimagesandfourcorrespondingcroppairsareshownontheright. Eachpairofcropwindowswererandomlygeneratedwiththeguidanceofasaliencymaptopreventfromtoomuchunimportantcontents. TheaestheticspreferencerelationshipbetweenthecroppairsweredeterminedbytherankingresultsfromAMTworkers. seven categories and individually cropped by three profes- thatmostunwantedregionshadbeenalreadycutaway sionalexperts.Weseetwodeficienciesofthisdataset.First, beforetheywereuploaded. Therefore,itisessentialto theselectedimagearefromadatabaseoriginallyforphoto searchfor“raw”imagesforannotation. quality assessment. Some images of professional quality 3. Sometimes people do agree others’ crops are good and compositions are also included and cropped. These eventhoughtheyaredifferentfromtheirowncrops.To cropsmaynotfaithfullyreflecttheunwantedregionsofthe images. Second, many images in the database are iconic obtainqualitycrops, wedecidedtoresorttoacrowd- sourcing platform to review all the cropping annota- objectorsceneimageswhichweretakenfromacanonical perspective,particularlyintheanimal,architecture, tions and adopt only the highly ranked ones as final static categories, which may lack non-canonical views results. and contextual information. To provide a more general In manual image cropping, human typically iterates the benchmark,wechoosetobuildanewdatasetfromscratch procedure of moving and adjusting the position, size and with a carefully designed image collection and annotation aspect ratio of the crop window, and examining the visual pipeline. qualitybeforeandafterthemanipulationuntilanidealcrop window is obtained. It is essentially a problem of ranking 3.DatasetConstruction anumberofpairwisesubviewsthatarevisuallysimilar. In- Inthissectionwedescribehowthecandidateimagesare spired by the aforementioned process, we also build anno- selectedandthedesignprinciplesoftheannotationpipeline. tationsindicatingsuchpreferencerelationships. Webelieve that this type of data will be beneficial for researchers to 3.1.DesignPrinciples more faithfully evaluating the performance of image crop- While designing the image annotation procedure, a pi- pingtechniques. lotstudywascarriedoutamongtheauthors. Werandomly 3.2.ImageCollection downloadedasmallnumberoftestimagesandhadtheau- thorstoannotatetheidealcropwindowsindividually. Sev- In the image collection stage, we aimed to collect as eralobservationswereobtainedafterthepilotstudy. many non-iconic images as possible for better generaliza- tion capability [33]. Following the strategy suggested in 1. Photocroppingissometimesverysubjective. Particu- [20], we chose Flickr as our data source, which tends to larly, it is extremely difficult to define an appropriate have fewer iconic images. In addition, we searched Flickr crop for photos of both professional and poor quality with many combinations of a pre-defined set of keywords, sincethereareno“obvious”answers. bywhichmorenon-iconicimageswithrichercontextualin- 2. Most online images are post-processed which means formation are more likely to be returned. The above pro- cess resulted in an initial set of candidate images consist- source images can also be treated as ranking annota- ing of 31,888 images, which were then passed through a tions. data cleaning process. We employed workers on Amazon Tosummarize,ourdatasetiscomposedof3,413cropped MechanicalTurk(AMT)tofilteroutinappropriateimages, images and 34,130 crop pairs generated from the corre- suchascollage,computer-generatedimagesorimageswith sponding images. All the source/crop and crop/crop pairs post-processedframes.Particularly,wealsoaskedtheAMT werereviewedbyanumberofhumanworkerstoderivethe workerstopickthephotosofexcellentquality,becausethey pairwiseaestheticsrelationshipsastherankingannotation. arepotentiallynotnecessaryforcroppingandthusnotsuit- Finally,1,743outofthe3,413humancroppedimageswere ableforannotation. Afterdatacleaning,18,925imagesre- selectedasthefinalcroppingannotationofourdataset. mainedtoenterthenextstageofdataannotation. 4.AlgorithmicAnalysis 3.3.ImageAnnotation Inthissection,wefirstdescribetheexperimentalsettings We collected two types of annotation through crowd- and baseline algorithms, and then demonstrate the experi- sourcinginourdataset. mentalvalidationresults. • Croppingannotation:Webuiltaweb-basedinterface 4.1.ExperimentalSettings for performing the image cropping tasks. The users 4.1.1 Datasets were recruited from the photography club in our uni- versity by invitation. In our task design, we allowed WeadoptboththeAVA[27]andourdatasettotrainvarious userstoskiptheimageswhichwerejudgedtobeun- image croppers to be compared in this study. The average necessaryforcropping. Weeventuallyretrieved3,413 aestheticscoreassociatedwitheachimageinAVAareused cropped images after the human labeling process was toselectasetofhighandlowqualityimagestotrainaphoto finished. Forvalidation, wegroupedpairsofcropped qualityclassifier(Section4.2.2). Additionally,wealsoex- image and its corresponding source image as Human ploittheaestheticscorestoformrelativerankingconstraints Intelligence Tasks (HITs) and assigned each of them totrainranking-basedimagecroppers(Section4.2.3). to 7 distinct workers on AMT. It is worth noting that For our dataset, we split the cropping and ranking an- a qualification test consisting of 10 pictorial ranking notationsintotrainingandtestsetwitharoughly4:1ratio. questions was given to each worker. Only the work- Specifically,348outofthe1,743imageswithhighlyranked ers who correctly answered at least 8 questions were cropsareadoptedasgroundtruthforevaluatingtheperfor- allowedtotaketheHITs. ForeachHIT,theorderthat manceofimagecroppers. Therankingannotationsarealso thesourceandcroppedimageappearedintheHITwas usedtotrainranking-basedimagecroppers(Section4.2.3). randomized and the workers were asked to pick the Finally, the image cropping annotations in [37] is also morepreferableoneintermsoftheiraesthetics. Into- usedtoevaluatetheperformanceofimagecroppers. tal,1,743outofthe3,413croppedimageswereranked aspreferablebyatleast4workersandtheyconstitute 4.1.2 EvaluationProtocol thefinalcroppingannotationofourdataset. For fair comparison, we take the strategy of evaluating all • Ranking annotation: Besides the cropping annota- baseline algorithms on a number of sliding windows. For tion,wewanttoenrichthedatasetwithpairwiserank- simplicity, we set the size of search window to each scale ingrelationshipsbetweensubviewsinthesameimage. among[0.5,0.6,...,0.9]oftheoriginalimageandslidethe Foreachimagewithhumanlabeling,10pairsofcrop searchwindowovera5×5uniformgrid. Theoptimalcrop windowswererandomlygeneratedandthenrankedby windowsdeterminedbyimagecroppersarecomparedtothe 5workerswithasimilarprocessasthecroppinganno- groundtruthtoevaluatetheirperformance. tations. Topreventthecropwindowsfromcontaining Weadoptthesameevaluationmetricsasin[37],i.e.,av- toomuchunimportantcontents,weutilizedasaliency erageoverlappedratioandaverageboundarydisplacement map [34] to guide crop selection. The size of crop errortomeasurethecroppingaccuracyofimagecroppers. windowsvariedtoimitatetheeffectofzoomin/zoom Theaverageoverlappedratioiscomputedby out and each pair of crop windows possessed suffi- N cientoverlapping. Figure2illustratessomeexamples 1 (cid:88) area(Wg∩Wc)/area(Wg∪Wc), (1) ofthegeneratedcroppairsforranking. Weeventually N i i i i i=1 obtained a collection of totally 34,130 pairs of crop windowswithaestheticspreferenceinformation. Note where Wg and Wc denote the ground-truth crop window i i thatthehumancroppedimagesandthecorresponding andthecropwindowdeterminedbythebaselinealgorithms forthei-thtestimage,respectively. N isthenumberoftest Method Overlap Disp. images. Theboundarydisplacementerrorisgivenby RankSVM[14] 0.6019 0.1060 RankNet[2] 0.6015 0.1058 (cid:88) ||Bjg−Bjc||/4, RankBoost[11] 0.5017 0.1383 j={l,r,b,u} LambdaMART[36] 0.5451 0.1217 whereBig andBic denotethefourcorrespondingedgesbe- Table 1. Benchmarking of various learning-to-rank algorithms. tween Wg and Wc. Note that the boundary displacements DeCAF7 featureisusedtotraintheimagerankers. Thecrop- havetobenormalizedbythewidthorheightoftheoriginal pingaccuracyisevaluatedonthe348testimagesofourdataset. image. Thebestresultsarehighlightedinbold. Weoptionallyreporttheswaperrorevaluatedonthetest setofAVAandourdataset. Itistheratioofswappedpairs windowandtheouterregionoftheimage(MaxDiff). The averagedoverallqueries,whichmeasurestherankingaccu- saliency maps are generated by the implementation of the racyofimagecropperstocorrectlyrankpairwisesubviews. originalauthorswiththedefaultparametersettings. 4.1.3 AestheticFeatures 4.2.2 Aesthetics-BasedMethod For all the learning-based image croppers, we adopt the The second category of comparison techniques represent “deep”activationfeatures[9]toaccomplishaestheticspre- the research line of aesthetics-based methods, which ex- dictionassuggestedin[16]. Forfeatureextraction,weex- ploitaqualityclassifierthatmeasureswhetherthecropped ploit the implementation of AlexNet [18] provided by the regionisvisuallyattractivetousers[28,10]. Insteadofthe Caffe library [13]. Each training sample is resized to 227- low-levelfeaturesusedinthepreviousworks,weadoptthe by-227 pixels and forward-propagated into the network. moreadvancedDeCAF features[9]toachieveaesthetics Theactivationsofthelastfully-connectedlayerareretained 7 recognition. Atotalof52,000imageswiththehighestand as the aesthetic features (DeCAF ), which are of 4,096- 7 lowest aesthetics scores are selected from the AVA dataset dimension. [27] as the training (67%) and testing (33%) samples. We We optionally train the ranking-based image croppers thustrainabinarySVMclassifierwithRBFkernels,which with generic image descriptors [26] to inspect the perfor- predictsaphotoashighorlowquality. Theparametersof mance variations. Specifically, Fisher vectors of SIFT de- theclassifierareobtainedthrough5-foldcrossvalidationon scriptors with spatial pyramid (SIFT-FV) and Fisher vec- the training set and the testing accuracy achieved 80.27%. tors of color descriptors with spatial pyramid (Color-FV) To use the binary classifier as an image cropper, we take areconsidered. ForSIFT-FVandColor-FV,thecardinal- advantage of the method described in [19] to compute the ityofvisualwordsis256,andtheimagedescriptoriscon- posteriorclassprobabilityastheaestheticsscoretopickthe structedbyconcatenatingthefeaturesextractedfrom8sub- bestcropamongallcandidatewindows. imagelayouts:“1×1”(wholeimage),“3×1”(upper,center, bottom),“2×2”(quadrant). Thefeaturepointsaredensely evaluated every 4 pixels, resulting in 262,144-dimension 4.2.3 Ranking-BasedMethods featurevectors. Thethirdcategoryofcomparisontechniquesareafamilyof 4.2.BaselineAlgorithms aesthetics-aware image rankers. To choose an appropriate training algorithm, we have test several pairwise learning- 4.2.1 Attention-BasedMethods to-rankingalgorithmstotraintheimagerankers,including The first category of methods to be compared are the RankSVM [14], RankNet [2], RankBoost [11] and Lamb- extension of the attention-based photo cropping methods daMART[36]. Weexploittheimplementationoftheabove [32, 30], which take advantage of the saliency map ac- algorithms provided by SVMrank2 and RankLib3 libraries companying the original image to search for an optimal forourexperiments.Theimagerankersaretrainedbyusing crop window with the highest average saliency. Instead of thetrainingsetofourdatasetwithmanydifferentconfigu- the outdated saliency detection methods used in the pre- rations of the individual algorithms. The best-performing vious works, we adopt two state-of-the-art methods, i.e., modelsaredeterminedby5-foldcrossvalidation. Assum- BMS[39]andeDN[34], withleadingperformanceonthe marized in Table 1, RankSVM and RankNet achieve very CAT2000 dataset from MIT Saliency Benchmark [3]. In 2https://www.cs.cornell.edu/people/tj/svm_ addition to the aforementioned search strategy (MaxAvg), light/svm_rank.html wefurtherimplementanothersearchcriterion,whichmax- 3https://people.cs.umass.edu/˜vdang/ranklib. imizesthedifferenceofaveragesaliencybetweenthecrop html competitive performance in terms of cropping accuracy. Method Overlap Disp. Swap However, sinceRankNetrankerstakemuchlongertimeto eDN(MaxAvg) 0.3573 0.1729 0.6366 train,wethuschooseRankSVMasthetrainingmethodfor eDN(MaxDiff) 0.4857 0.1372 0.4534 the rest of the experiments in this study. Specifically, all BMS(MaxAvg) 0.3427 0.1815 0.5775 the SVM rankers are trained with a linear kernel and use BMS(MaxDiff) 0.3905 0.1674 0.4962 L1-norm penalty for the slack variables. The loss is mea- SVM+DeCAF 0.5154 0.1325 0.4201 7 sured by the total number of swapped pairs summed over AVA1-1+DeCAF 0.5223 0.1294 0.1317 all queries. The parameter C, which controls the trade-off 7 AVA2-2+DeCAF 0.5069 0.1346 0.2379 betweentrainingerrorandmargin,isdeterminedvia5-fold 7 AVA5-5+DeCAF 0.4931 0.1384 0.2775 crossvalidation. 7 Our+SIFT-FV 0.5917 0.1084 0.3068 4.3.EvaluationsandAnalysis Our+Color-FV 0.5042 0.1405 0.3692 Our+DeCAF 0.6019 0.1060 0.3225 1)Comparisonoftraditionalmethods: AsshowninTa- 7 ble2,thefirstfiverowssummarizetheperformancesofthe Table 2. Summarization of performance evaluation. The middle fourvariantsofattention-basedmethodsandtheaesthetics- two columns measure the cropping accuracy on the 348 testing based method. One can see that the search strategy of imagesofourdataset.Thebestresultsarehighlightedinbold. MaxDiffconsistentlyoutperformsMaxAvgforeithertype ofsaliencymaps.ThepossiblereasonisthatMaxDifftends toincludemoresalientregionsintothecropwindowinor- classification. Although SIFT-FV achieves comparable dertolowerthetotalsaliencyscoreoftheouterregion. Un- croppingaccuracywithDeCAF ,thelaterobviouslypro- 7 like MaxAvg which usually only concentrates on a single videsamuchmorecompactfeaturerepresentationofvisual salientregion,MaxDiffismorelikelytoobtainacropwin- aesthetics. dowthatformsagoodcomposition. 3) Comparison of different datasets: In this experiment Theperformanceofattention-basedmethodsarehighly we examine the effectiveness of training image rankers on dependentontheunderlyingsaliencydetectionscheme.Al- theAVAdataset[27]. Sameastheaesthetics-basedmethod, though eDN [34] and BMS [39] possess comparable per- 52,000imageswiththehighestandlowestaestheticsscores formancein[3], theirperformancegreatlyvariedinimage arefirstselected.AconfigurationofAVAn-nmeansthatwe cropping. It suggests that a standard benchmark is essen- repeatedly select n images from the high- and low-ranked tial to choose the best saliency detection method for au- group,respectively,andgenerateallcombinationsofthese- tomatic image cropping. A hybrid method that optimizes lected images to form the ranking constraints. Note that the compositional layout of salient objects might be less the characteristics of pairwise ranking constraints formed sensitive to the selection of saliency maps, such as [42]. by AVA and our dataset are very different since AVA dif- In general, attention-based methods performed poorly in ferentiates the visual preferences between distinct images determining the aesthetics preferences between crop pairs whileourdatasetranksvisuallysimilarsubviewswithinthe (45.34% – 63.66% swap error). We believe that this phe- sameimages. nomenoncouldbeaccountedforthelackofaestheticscon- Row 6-8 in Table 2 give the performances of three siderations in this family of methods. Note that the swap rankerstrainedonAVAusingDeCAF feature. AVA1-1 7 errorsarecalculatedbytheattentionscoresreceivedbythe performsbestbothincroppingandrankingaccuracy. How- croppairs. ever, surprisingly, increasingrankingconstraints(AVA2-2 Comparingwithattention-basedmethods,theaesthetics- andAVA5-5)causedtheperformancetoconsiderablydrop based method (SVM+DeCAF7) achieved better perfor- instead. Itindicatesthatonlyasparsesetofpairwiserank- mance in all evaluation metrics. However, although the ing constraints defined by the aesthetic scores are useful SVMclassifiershowedgoodcapabilityofpredictinghigh- for image ranking and naively pairing images would not andlow-qualityimages, itdidnotperformwellinranking improve the ranking accuracy. Besides, although AVA n- pairwiseviews(i.e.,42%swaperror),resultinginmoderate n+DeCAF rankers generally outperform the traditional 7 performanceinimagecroppingaccuracy. methods in ranking accuracy, it does not reflect on their 2)Comparisonofvariousaestheticfeatures: The9-thto croppingcapability. Forexample,thecroppingaccuracyof 11-th rows of Table 2 compare the performance of image AVA 1-1+DeCAF outperforms the best-performing tra- 7 rankerstrainedbydifferentaestheticfeaturesusingournew ditional method (i.e., SVM+DeCAF ) with only an in- 7 dataset. DeCAF achieves the best accuracy in all met- significantmargineventhoughithasamuchgreaterrank- 7 rics. Thisresultisconsistentwiththefindingsreportedby ing accuracy. One possible reason is that the training data [16],i.e.,DeCAF generalizeswelltoothervisualrecog- ofAVAdonotreflectthevisualpreferenceamongvisually 7 nition tasks even though the DCNN was trained for object similarviews,whichisessentialforimagecropping. (a) thathighercroppingaccuracyisreportedin[37]. Sincethis Method Overlap Disp. is a comparative study, no optimization on the parameters eDN(MaxDiff) 0.4636 0.1578 ofcropwindows(i.e.,x,y,wandh)wasperformedforfair SVM+DeCAF7 0.5005 0.1444 comparison.Webelievethattheperformanceoftheranking AVA1-1+DeCAF 0.5142 0.1399 basedimagecropperscanbefurtherenhancedbyincorpo- 7 Our+DeCAF 0.6643 0.092 ratingappropriatecropselectionprocedures. 7 (b) Tosummarize,thefindingsofthisstudyleadtothemost Method Overlap Disp. important insight of this work: ranking pairwise views is eDN(MaxDiff) 0.4399 0.1651 crucial for image cropping. To the best of our knowl- SVM+DeCAF 0.4918 0.1483 edge, all existing methods attempted to tackle this prob- 7 AVA1-1+DeCAF 0.5034 0.1443 lem by visual saliency detection or learning an aesthetics- 7 Our+DeCAF 0.6556 0.095 aware model from distinct images. However, according 7 (c) to our experimental study, these approaches do not neces- Method Overlap Disp. sarily perform well in differentiating pairwise views with substantial overlaps, which is crucial for image cropping. eDN(MaxDiff) 0.437 0.1659 Figure 3 demonstrates several examples of comparing the SVM+DeCAF 0.4882 0.1491 7 groundtruthandthebestcropwindowsdeterminedbyvar- AVA1-1+DeCAF 0.4939 0.147 7 ious methods. To maximize the performance of machine Our+DeCAF 0.6439 0.099 7 learnedimagerankers,twopossibledirectionscanbecon- sidered: 1)adoptingmoreeffectivefeaturerepresentations Table3.Crossdatasetvalidation. (a)-(c)summarizethecropping accuracyofthebestperformingimagecroppersofeachcategory learnedfrompairwiserankingofimagesubviews;2)devel- showninTable2,whichareevaluatedonthethreedifferentsets oping effective crop selection method to determine poten- ofannotationsinthedatabaseof[37]. Thebestresultsarehigh- tiallygoodcandidatewindowsforimageranking. lightedinbold. 5.Conclusions Such observation can be further validated by compar- Inthispaper,wepresentedanewdatasetwhichaimsto ing to the rankers trained on our dataset. Our+DeCAF provide a benchmarking platform for photo cropping and 7 achievessignificantimprovementincroppingaccuracyus- viewfindingalgorithms. Withcarefullydesigneddatacol- ing the same feature. It is also interesting to note that lectionpipeline,wewereabletocollecthighqualityanno- Our+DeCAF does not perform well in its ranking ac- tations. Onesignificantdifferencebetweenourdatasetand 7 curacy. The reason for the low ranking accuracy could be otherdatabasesistheintroductionofpairwiseviewranking explained as follows: Since DeCAF is trained for the annotations. Inspired by the procedure of iteratively com- 7 purposeofobjectrecognition,itisthusverylikelythatthe paring in manual image cropping, we argue that learning- DeCAF featuresextractedfromsimilarviewscontaining to-rank approaches possess great potential in this problem 7 thesame“object”tobealsosimilar.Thesamephenomenon domain, whichhavebeenoverlookedbymostpreviousre- canalsobeobservedinotheraestheticfeatures,i.e.,SIFT- searchers. Weconductedextensivestudyonevaluatingthe FVandColor-FV. Itsuggeststhatthereisstillgreatpo- performancesoftraditionalimagecroppingtechniquesand tentialtoimproveranking-basedimagecroppersbyjointly several machine learned image rankers. The experimental learningthefeaturerepresentationandsemanticallymean- resultsshowedthatimagerankerstrainedonpairwiseview ingfulembeddingofimagesimilaritywithDCNN[35]in- rankingannotationsoutperformthetraditionalmethods. steadofdirectlyusingDeCAF . 7 4) Cross-dataset validation: In this experiment, we se- Acknowledgement lect the best performing image croppers from each cate- ThisworkwassupportedinpartbyMinistryofScienceand gory shown in Table 2 and directly apply them on the im- Technology, National Taiwan University and Intel Corpo- age cropping databset of [37]. This dataset is composed rationunderGrantsMOST-105-2633-E-002-001andNTU- of 950 images, which were annotated by three different ICRP-105R104045. T.-W. Huang and H.-T. Chen are par- users. Since this dataset contains only cropping annota- tiallysupportedbyMOST103-2221-E-007-045-MY3. tions,itisthusonlyusedtoevaluatethecroppingaccuracy. A similar sliding window approach as described in Sec- References tion 4.1.2 is adopted for evaluation. As shown in Table 3, Our+DeCAF7 consistentlyachievesthehighestaccuracy [1] S.Bhattacharya,R.Sukthankar,andM.Shah. Aframework inallannotationsets,whichfurthervalidatestheeffective- forphoto-qualityassessmentandenhancementbasedonvi- nessofrankingpairwisesubviewsinimagecropping. Note sualaesthetics. InACMMultimedia,2010. 2 (a)GroundTruth (b)eDN(MaxDiff) (c)SVM+DeCAF (d)Our+DeCAF 7 7 Figure3.Exampleimagecroppingresults.Theoptimalcropwindowsdeterminedbyvariousbaselinesaredrawnasgreenrectangles. [2] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, panoramicscenes. InIEEEICCV,2009. 1,2 N.Hamilton,andG.Hullender. Learningtorankusinggra- [5] J.Chen,G.Bai,S.Liang,andZ.Li. Automaticimagecrop- dientdescent. InICML,pages89–96,2005. 5 ping: Acomputationalcomplexitystudy. InIEEECVPR, [3] Z.Bylinskii,T.Judd,A.Borji,L.Itti,F.Durand,A.Oliva, pages507–515,2016. 2 andA.Torralba. MITSaliencyBenchmark. 5,6 [6] B. Cheng, B. Ni, S. Yan, and Q. Tian. Learning to photo- [4] Y.-Y.ChangandH.-T.Chen. Findinggoodcompositionin graph. InACMMultimedia,pages291–300,2010. 1,2 [7] R.Datta,D.Joshi,J.Li,andJ.Z.Wang. Studyingaesthetics [25] L.Marchesotti,C.Cifarelli,andG.Csurka.Aframeworkfor inphotographicimagesusingacomputationalapproach. In visualsaliencydetectionwithapplicationstoimagethumb- ECCV,pages288–301,2006. 2 nailing. InIEEEICCV,pages2232–2239,2009. 1,2 [8] S. Dhar, V. Ordonez, and T. L. Berg. High level describ- [26] L.Marchesotti,F.Perronnin,D.Larlus,andG.Csurka. As- able attributes for predicting aesthetics and interestingness. sessing the aesthetic quality of photographs using generic InIEEECVPR,pages1657–1664,2011. 2 imagedescriptors. InIEEEICCV,pages1784–1791,2011. 2,5 [9] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, [27] N.Murray,L.Marchesotti,andF.Perronnin. AVA:Alarge- E.Tzeng,andT.Darrell.DeCAF:Adeepconvolutionalacti- vationfeatureforgenericvisualrecognition.InICML,pages scaledatabaseforaestheticvisualanalysis. InIEEECVPR, pages2408–2415,2012. 2,4,5,6 647–655,2013. 2,5 [28] M. Nishiyama, T. Okabe, Y. Sato, and I. Sato. Sensation- [10] C. Fang, Z. Lin, R. Mech, and X. Shen. Automatic im- basedphotocropping. InACMMultimedia,pages669–672, agecroppingusingvisualcomposition,boundarysimplicity 2009. 1,2,5 andcontentpreservationmodels.InACMMultimedia,pages [29] A. Santella, M. Agrawala, D. DeCarlo, D. Salesin, and 1105–1108,2014. 1,2,5 M.Cohen. Gaze-basedinteractionforsemi-automaticphoto [11] Y.Freund,R.Iyer,R.E.Schapire,andY.Singer.Anefficient cropping. InACMCHI,pages771–780,2006. 2 boosting algorithm for combining preferences. Journal of [30] F.Stentiford. Attentionbasedautoimagecropping. InICVS MachineLearningResearch,4:933–969,Dec.2003. 5 WorkshoponComputationAttentionandApplications,2007. [12] P.Isola,J.Xiao,D.Parikh,A.Torralba,andA.Oliva. What 1,2,5 makesaphotographmemorable? IEEETPAMI,36(7):1469– [31] H.-H. Su, T.-W. Chen, C.-C. Kao, W. H. Hsu, and S.- 1482,2014. 2 Y. Chien. Preference-aware view recommendation system [13] Y.Jia,E.Shelhamer,J.Donahue,S.Karayev,J.Long,R.Gir- forscenicphotosbasedonbag-of-aesthetics-preservingfea- shick,S.Guadarrama,andT.Darrell. Caffe: Convolutional tures. IEEETransactionsonMultimedia,14(3-2):833–843, architectureforfastfeatureembedding.InACMMultimedia, 2012. 1,2 pages675–678,2014. 5 [32] B. Suh, H. Ling, B. B. Bederson, and D. W. Jacobs. Au- [14] T.Joachims. TraininglinearSVMsinlineartime. InACM tomatic thumbnail cropping and its effectiveness. In ACM KDD,pages217–226,2006. 5 UIST,pages95–104,2003. 1,2,5 [15] L.Kang,P.Ye,Y.Li,andD.Doermann. Convolutionalneu- [33] A.TorralbaandA.A.Efros. Unbiasedlookatdatasetbias. ralnetworksforno-referenceimagequalityassessment. In InIEEECVPR,pages1521–1528,2011. 3 IEEECVPR,pages1733–1740,2014. 2 [34] E.Vig,M.Dorr,andD.Cox.Large-scaleoptimizationofhi- [16] S.Karayev,M.Trentacoste,H.Han,A.Agarwala,T.Darrell, erarchicalfeaturesforsaliencypredictioninnaturalimages. A. Hertzmann, and H. Winnemoeller. Recognizing image InIEEECVPR,pages2798–2805,2014. 4,5,6 style. InBMVC,pages1–20,2014. 2,5,6 [35] J. Wang, Y. Song, T. Leung, C. Rosenberg, J. Wang, J.Philbin,B.Chen,andY.Wu. Learningfine-grainedimage [17] Y.Ke,X.Tang,andF.Jing.Thedesignofhigh-levelfeatures similaritywithdeepranking. InIEEECVPR,pages1386– for photo quality assessment. In IEEE CVPR, pages 419– 1393,2014. 7 426,2006. 2 [36] Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting [18] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet boostingforinformationretrievalmeasures. JournalofIn- classification with deep convolutional neural networks. In formationRetrieval,13(3):254–270,June2010. 5 NIPS,pages1106–1114,2012. 5 [37] J.Yan,S.Lin,S.B.Kang,andX.Tang.Learningthechange [19] H.-T.Lin,C.-J.Lin,andR.C.Weng.AnoteonPlatt’sproba- forautomaticimagecropping. InIEEECVPR,pages971– bilisticoutputsforsupportvectormachines.MachineLearn- 978,2013. 1,2,4,7 ing,68(3):267–276,2007. 5 [38] L. Yao, P. Suryanarayan, M. Qiao, J. Z. Wang, and J. Li. [20] T.-Y.Lin,M.Maire,S.Belongie,J.Hays,P.Perona,D.Ra- Oscar:On-sitecompositionandaestheticsfeedbackthrough manan,P.Dollr,andC.L.Zitnick. MicrosoftCOCO:Com- exemplarsforphotographers. InternationalJournalofCom- monobjectsincontext. InECCV,pages740–755,2014. 3 puterVision,96(3):353–383,2012. 2 [21] L. Liu, R. Chen, L. Wolf, and D. Cohen-Or. Optimizing [39] J.ZhangandS.Sclaroff.Saliencydetection:Abooleanmap photocomposition.ComputerGraphicsForum(Proc.ofEu- approach. InIEEEICCV,pages153–160,2013. 5,6 rographics’10),29(2):469–478,2010. 2 [40] L. Zhang, Y. Gao, R. Ji, Y. Xia, Q. Dai, and X. Li. Ac- [22] X.Lu,Z.Lin,H.Jin,J.Yang,andJ.Z.Wang. RAPID:Rat- tively learning human gaze shifting paths for semantics- ingpictorialaestheticsusingdeeplearning. InACMMulti- aware photo cropping. IEEE Transactions on Image Pro- media,pages457–466,2014. 2 cessing,23(5):2235–2245,2014. 1 [23] X. Lu, Z. Lin, X. Shen, R. Mech, and J. Z. Wang. Deep [41] L. Zhang, M. Song, Q. Zhao, X. Liu, J. Bu, and C. Chen. multi-patchaggregationnetworkforimagestyle,aesthetics, Probabilistic graphlet transfer for photo cropping. IEEE andqualityestimation.InIEEEICCV,pages990–998,2015. TransactionsonImageProcessing,22(2):802–815,2013. 1 2 [42] M.Zhang,L.Zhang,Y.Sun,L.Feng,andW.Ma.Autocrop- ping for digital photographs. In ICME, pages 2218–2221, [24] W.Luo,X.Wang,andX.Tang.Content-basedphotoquality 2005. 2,6 assessment. InIEEEICCV,pages2206–2213,2011. 2

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.