ebook img

Super-resolution Using Constrained Deep Texture Synthesis PDF

9.2 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Super-resolution Using Constrained Deep Texture Synthesis

Super-resolution Using Constrained Deep Texture Synthesis LibinSun∗ JamesHays† BrownUniversity GeorgiaInstituteofTechnology Abstract Weiss 2011]. It is known that the filters learned in these higher order models are essentially tuned low high-pass filters [Weiss Hallucinatinghighfrequencyimagedetailsinsingleimagesuper- and Freeman 2007]. As a result, no matter how these priors are resolutionisachallengingtask. Traditionalsuper-resolutionmeth- formulated,theyworkunderthesameprinciplebypenalizinghigh ods tend to produce oversmoothed output images due to the am- frequency image content, imposing the constraint that “images biguityinmappingbetweenlowandhighresolutionpatches. We should be smooth” unless required by the image reconstruction buildonrecentsuccessindeeplearningbasedtexturesynthesisand constraint.Whenthesepriorsareuniversallyappliedtoeverypixel show that this rich feature space can facilitate successful transfer location in the image, it is bound to yield over-smoothed output. and synthesis of high frequency image details to improve the vi- Butsmoothnessisjustanotherformofblur,whichisexactlywhat 7 sualqualityofsuper-resolutionresultsonawidevarietyofnatural wearetryingtoavoidinthesolutionspaceinsuper-resolution. 1 texturesandimages. 0 2 Keywords: detail synthesis, texture transfer, image synthesis, Toachievesharpnessintheupsampledimage,successfulmethods super-resolution usuallylearnastatisticalmappingbetweenlowresolution(LR)and n high resolution (HR) image patches. The mapping itself can be a non-parametric [Freeman et al. 2002; Huang et al. 2015], sparse J 1 Introduction coding [Yang et al. 2008], regression functions [Kim and Kwon 6 2010;YangandYang2013],randomforest[Schulteretal.2015], 2 Singleimagesuper-resolution(SISR)isachallengingproblemdue andconvolutionalneuralnetworks[Dongetal.2014; Wangetal. toitsill-posednature–thereexistmanyhighresolutionimages(out- 2015;Johnsonetal.2016]. Thereareprosandconsofbothpara- ] put)thatcoulddownsampletothesamelowresolutioninputimage. metric and non-parametric representations. Parametric methods V Givenmoderatescalingfactors,highcontrastedgesmightwarrant typically offer much faster performance at test time and produce C some extent of certainty in the high resolution output image, but higher PSNR/SSIM scores. But no matter how careful one engi- smoothregionsareimpossibletorecoverunambiguously. Asare- neers the loss function during training, the learned mapping will . s sult, most methods aim to intelligently hallucinate image details sufferfromtheinherentambiguityinlowtohighresolutionpatch c andtextureswhilebeingfaithfultothelowresolutionimage[Free- mapping(many-to-one), andendupwithaconservativemapping [ manetal.2002;SunandTappen2010;HaCohenetal.2010;Sun to minimize loss (typically MMSE). This regression-towards-the- andHays2012]. Whilerecentstate-of-the-artmethods[Yangand 1 meanproblemsuppresseshighfrequencydetailsintheHRoutput. Yang 2013; Timofte et al. 2014; Dong et al. 2014; Wang et al. v Non-parametricmethodsareboundtotheavailableexamplepatch 2015]arecapableofdeliveringimpressiveperformanceintermof 4 pairs in the training process, hence unable to synthesize new im- PSNR/SSIMmetrics,theimprovementinvisualqualitycompared 0 agecontentbesidessimpleblendingofpatches. Asaresult,more toearliersuccessfulmethodssuchas[Yangetal.2008]arenotas 6 artifactscanbefoundintheoutputimageduetomisalignmentof apparent.Inparticular,theamountofimagetexturaldetailsarestill 7 image content in overlapping patches. However, non-parametric lackingintheseleadingmethods.Webuildontraditionalandrecent 0 methodstendtobemoreaggressiveininsertingimagetexturesand deeplearningbasedtexturesynthesisapproachestoshowthatreli- . details[HaCohenetal.2010;SunandHays2012]. 1 abletexturetransfercanbeachievedinthecontextofsingleimage 0 super-resolutionandhallucination. Morerecently,deeplearningbasedapproacheshavebeenadopted 7 Being able to model and represent natural image content is often withgreatsuccessinmanyimagerestorationandsynthesistasks. 1 a required first step towards recovering and hallucinating image Thekeyistousewell-establisheddeepnetworksasanextremely : v details. Natural image models and priors have come a long way, expressivefeaturespacetoachievehighqualityresults. Inpartic- i from simple edge representations to more complex patch based ular, a large body of work on image and texture synthesis have X models. Image restoration applications such as image super- emerged and offer promising directions for single image super- r resolution, deblurring, and denoising, share a similar philosophy resolution. By constraining the Gram matrix at different layers a in their respective framework to address the ill-posed nature of in a large pre-trained network, Gatys etal.showed that it is pos- thesetasks. Acommonstrategyistointroduceimagepriorsasa sible to synthesize a wide variety of natural image textures with constraintinconjunctionwiththeimageformationmodel. Natural almostphoto-realisticquality[Gatysetal.2015b].Augmentingthe image content spans a broad range of spatial frequencies, and it same constraint with another image similarity term, they showed is typically easy to constrain the restoration process to reliable thatartisticstylescanbetransfered[Gatysetal.2015a;Gatysetal. recover information in the low frequency bands. These typically 2016] from paintings to photos in the same efficient framework. include smoothly varying regions without large gradients (edges, Recentwork[Sajjadietal.2016;Johnsonetal.2016]showthatby sky). In fact, a Gaussian or Laplacian prior would suit well for trainingtominimizeperceptuallossinthefeaturespace,superior mostimagerestorationtask.Thisfamilyofimagepriorshavebeen visualqualitycanbeachievedforSISR.However,theirsuccessat showntoworkinavarietyofsettings,in[Fergusetal.2006;Levin synthesizingnaturaltexturesisstilllimitedasshownintheirexam- andWeiss2007;Levinetal.2009;ChoandLee2009;XuandJia ples. 2010],tonameafew. Moreadvancedpriormodelshavealsobeen developedsuchasFRAME[Zhuetal.1998],theFieldsofExperts In this work, we build on the same approach from [Gatys et al. model [Roth and Black 2009], and the GMM model [Zoran and 2015a]andadaptithandleSISR.Wefocusonsynthesisandtrans- feraspectofnaturalimagetextures,andshowthathighfrequency ∗e-mail:[email protected] details can be reliably transfered and hallucinated from example †e-mail:[email protected] imagestorenderconvincingHRoutput. 2 Related Work difficult task. Yet, recent advances in deep learning have shown promising success. Goodfellow etal. [Goodfellow et al. 2014] 2.1 SingleImageSuper-resolution(SISR) introduced the Generative Adversarial Network (GAN) to pair a discriminative and generative network together to train deep gen- erativemodelscapableofsynthesizingrealisticimages. Follow-up Singleimagesuper-resolutionisalongstandingchallengeincom- works[Dentonetal.2015;Radfordetal.2016;Nguyenetal.2016] puter vision and image processing due to its extremely ill-posed extendedtheGANframeworktoimprovethequalityandresolution nature. However,ithasattractedmuchattentioninrecentresearch ofgeneratedimages. However, thefocusofthislineofworkhas duetonewpossibilitiesintroducedbybigdataanddeeplearning. been to generate realistic images consistent with semantic labels Unlike traditional multi-frame SR, it is impossible to unambigu- suchasobjectandimageclasses,inwhichlowandmidlevelimage ouslyrestorehighfrequenciesinaSISRframework. Asaresult, featurestypicallyplayamorecrucialrole,whereastheemphasison existingmethodshallucinateplausibleimagecontentbyrelyingon highresolutionimagedetailsandtexturesisnottheprimarygoal. carefullyengineeredconstraintsandoptimizationprocedures. Overthepastdecade,SISRmethodshaveevolvedfrominterpola- 2.3 ImageStyleandDetailTransfer tionbasedandedgeorientedmethodstolearningbasedapproaches. Such methods learn a statistical model that maps low resolution Manyworksexistinthedomainofstyleanddetailtransferbetween (LR) patches to high resolution (HR) patches [Yang et al. 2008; images.[Johnsonetal.2010]enhancetherealismofcomputergen- Kim and Kwon 2010; Yang and Yang 2013; Timofte et al. 2013; eratedscenesbytransferingcolorandtexturedetailsfromrealpho- Timofteetal.2014;Schulteretal.2015],withdeep-learningframe- tographs. [Shihetal.2013]considertheproblemofhallucinating works being the state-of-the-art [Dong et al. 2014; Wang et al. timeofdayforasinglephotobylearninglocalaffinetransformsin 2015].WhilethesemethodsperformwellintermsofPSNR/SSIM, adatabaseoftime-lapsevideos.[Laffontetal.2014]utilizecrowd- highfrequencydetailssuchastexturesarestillchallengingtohal- sourcing to establish an annotated webcam database to facilitate lucinate because of the ambiguous mapping between LR and HR transfering high level transient attributes among different scenes. imagepatches. Inthisrespect,non-parametericpatch-basedmeth- Styletransferforspecificimagetypessuchasportraitsisalsoex- odshaveshownpromisingresults[Freemanetal.2002;Sunetal. ploredby[Shihetal.],inwhichmulti-scalelocaltransformsina 2010;HaCohenetal.2010;SunandHays2012;Huangetal.2015]. Laplacian pyramid are used to transfer contrast and color styling Thesemethodsintroduceexplicitspatial[Freemanetal.2002]and fromexemplarprofessionalportraits. contextual [Sun et al. 2010; HaCohen et al. 2010; Sun and Hays 2012] constraints to insert appropriate image details using exter- Morerecently,[Gatysetal.2015a]proposeastyletransfersystem nal example images. On the other hand, internal image statistics usingthe19-layerVGGnetwork[SimonyanandZisserman2014]. basedmethodshavealsoshowngreatsuccess[FreedmanandFat- ThekeyconstraintistomatchtheGrammatrixofnumerousfea- tal 2011; Glasner et al. 2009; Yang et al. 2013; ?; Huang et al. turelayersbetweentheoutputimageandastyleimage,whilehigh 2015]. These methods directly exploit self-similarity within and levelfeaturesoftheoutputismatchedthatofacontentimage. In acrossspatialscalestoachievehighqualityresults. thisway,texturesofthestyleimageistransferedtotheoutputim- age as if painted over the content image, similar to Image Quilt- More recently, new SISR approaches have emerged with an em- ing[EfrosandFreeman2001]. Drawinginspirationsfromtexture phasisonsynthesizingimagedetailsviadeepnetworkstoachieve synthesismethods,[LiandWand2016]proposetocombineaMRF bettervisualquality.Johnsonetal. [Johnsonetal.2016]showthat withCNNforimagesynthesis. ThisCNNMRFmodeladdsaddi- the style transfer framework of [Gatys et al. 2015a] can be made tionallayersinthenetworktoenableresampling‘neuralpatches’, real-time,andshowthatnetworkstrainedbasedonperceptualloss namely,eachlocalwindowoftheoutputimageshouldbesimilarto inthefeaturespacecanproducesuperiorsuper-resolutionresults. somepatchinthestyleimageinfeaturespaceinanearestneighbor Sajjadietal. [Sajjadietal.2016]considerthecombinationofsev- sense.Thishasthebenefitofmorecoherentdetailsshouldthestyle eral loss functions for training deep networks and compare their image be sufficiently representative of the content image. How- visualqualityforSISR. ever,thiscopy-pasteresamplingmechanismisunabletosynthesize newcontent. Inaddition,thismethodispronetoproduce‘washed 2.2 TextureandImagesynthesis out’ artifacts due the blending/averaging of neural patches. This isacommonproblemtopatch-basedsynthesismethods[Efrosand Freeman 2001; Freeman et al. 2002; Kwatra et al. 2005]. Other In texture synthesis, the goal is to create an output image that interesting deep learning based applications such as view synthe- matches the textural appearance of an input texture to minimize sis[Zhouetal.2016]andgenerativevisualmanipulation[Zhuetal. perceptual differences. Early attempts took a parametric ap- 2016]havealsobeenproposed. Thesemethodsallowustobetter proach [Heeger and Bergen 1995; Portilla and Simoncelli 2000] understand how to manipulate and transfer image details without bymatchingstatisticalcharacteristicsinasteerablepyramid. Non- sacrificingvisualquality. parametricmethods[Bonet1997;EfrosandLeung1999;Efrosand Freeman2001; Kwatraetal.2003; WeiandLevoy2000; Kwatra et al. 2005] completely sidestep statistical representation for tex- 3 Method tures, and synthesize textures by sampling pixels or patches in a nearestneighborfashion. Morerecently,Gatysetal. [Gatysetal. Our method is based on [Gatys et al. 2015a; Gatys et al. 2015b], 2015b]proposeGrammatrixbasedconstraintsintherichandcom- whichencodesfeaturecorrelationsofanimageintheVGGnetwork plexfeaturespaceofthewell-knownVGGnetwork[Simonyanand via the Gram matrix. The VGG-Network is a 19-layer CNN that Zisserman 2014], and show impressive synthesized results on a rivalshumanperformanceforthetaskofobjectrecognition. This diverse set of textures and images. This deep learning based ap- networkconsistsof16convolutionallayers,5poolinglayers,anda proach shares many connections with earlier parametric models seriesoffullyconnectedlayersforsoftmaxclassification. suchas[HeegerandBergen1995; PortillaandSimoncelli2000], Alatentimagexistobeestimatedgivenconstraintssuchascon- butreliesonordersofmagnitudesmoreparameters,henceiscapa- tentsimilarityandstylesimilarity. Weassumeastyleorexample bleofmoreexpressiverepresentationoftextures. imagesisavailableforthetransferofappropriatetexturesfroms Synthesizinganentirenaturalimagefromscratchisanextremely tox, andthatxshouldstaysimilartoacontentimagecinterms 2 bicubicx3 ScSR SRCNN groundtruth CNNMRF Gatystransfer ourglobaltransfer exampleimage 1 e pl m a x e 2 e pl m a x e (a) bicubicx3 ScSR SRCNN groundtruth exampleimage CNNMRF Gatystransfer ourglobaltransfer 1 e pl m a x e 2 e pl m a x e (b) Figure1: Asamplecomparisonofvariousalgorithmsappliedtoupsamplingtextureimagesforafactorof×3. Twoexampleimagesare providedinboth(a)and(b)forexample-basedapproaches. Itcanbeseenthattheexampleimagehassignificantimpactontheappearance ofthehallucinateddetailsintheoutputimages,indicatingeffectivenessofthetexturetransferprocess. 3 ofmidtohighlevelimagecontent. Thefeaturespacerepresenta- constraintisgloballyappliedtothewholeimage.Formally,theour tionswiththenetworkareX,SandC respectively. Ateachlayer globalmethodsolvesthefollowingobjectiveviagradientdescent: l,anon-linearfilterbankofN filtersisconvolvedwiththeprevi- l ouslayer’sfeaturemaptoproduceanencodinginthecurrentlayer, x=argmin(αEfaithfulness(c,x)+βEstyle(s,x)) (4) whichcanbestoredinafeaturematrixXl ∈RNl×Ml,whereMl x isthenumberofelementsinthefeaturemap(heighttimeswidth). WeuseXl todenotetheactivationoftheithfilteratpositionjin Wefurthermakethefollowingchangestotheoriginalsetup: ij layerlgeneratedbyimagex. • All processing is done in gray scale. The original work of In[Gatysetal.2015a],thegoalistosolveforanimagexthatis [Gatys et al. 2015a] computes the feature maps using RGB similar to a content image c but takes on the style or textures of images.However,thisrequiresstrongsimilarityamongcolor s. Specifically, the followingobjectivefunctionis minimizedvia channel correlations between the example and input image, gradientdescenttosolveforx: whichishardtoachieve.Fortransferingartisticstyles,thisis notaproblem. Wedropthecolorinformationtoallowbetter x=argmin(αEcontent(c,x)+βEstyle(s,x)) (1) sharingofimagestatisticsbetweentheimagepair. x • We use the layers {conv1 1, pool1 1, pool2 1, pool3 1, whereEcontentisdefinedas: pool4 1,pool5 1}tocapturethestatisticsoftheexampleim- ageforbettervisualquality,asdonein[Gatysetal.2015b]. E (c,x)= 1(cid:88)(cid:88)(cid:16)Cl −Xl (cid:17)2 (2) content 2 ij ij Weshowthattheabovesetup,whilesimpleandbasic,iscapableof l ij transferingtexturedetailsreliablyforawidevarietyoftextures(see Fig.1andFig.6),evenifthetexturesarestructuredandregular(see The content similarity term is simply a L loss given the differ- 2 Fig.5). However, for general natural scenes, this adaptation falls encebetweenthefeaturemapofthelatentimageinlayerlandthe shortandproducespainterlyartifactsorinappropriateimagedetails correspondingfeaturemapfromthecontentimage. forsmoothimageregions,becausetheirglobalimagestatisticsno The definition of E is based on the the L loss between the longermatcheseachother. style 2 Grammatrixofthelatentimageandthestyleimageinasetofcho- senlayers. TheGrammatrixencodesthecorrelationsbetweenthe 3.2 LocalTextureTransferviaMaskedGramMatrices filter responses via the inner product of vectorized feature maps. Given a feature map Xl for image x in layer l, the Gram matrix Naturalimagesarecomplexinnature,usuallyconsistingofalarge G(Xl) ∈ RNl×Ml hasentriesGlij = (cid:80)kXilkXjlk,wherei,jin- numberofsegmentsandparts,someofwhichmightcontainhomo- dexthroughpairsoffeaturemaps,andkindexesthroughpositions geneousandstochastictextures. Clearly,globallymatchingimage ineachvectorizedfeaturemap. Thenthestylesimilaritycompo- statisticsforsuchcomplexscenescannotbeexpectedtoyieldgood nentoftheobjectivefunctionisdefinedas: results. However,withcarefullychosenlocalcorrespondences,we canselectivelytransferimagedetailsbypairingimagepartsofthe (cid:32) (cid:33) Estyle(s,x)=(cid:88)l 4Nwl2Ml l2 (cid:88)i,j (cid:16)G(Sl)ij−G(Xl)ij(cid:17)2 {Esammkxe}oK1rte.srimTmoitloaacrlhotioeexpvteouvrtehesirsve,aiwacehtwicnootrrsroeedtssupcooenfdabininngoaurpytaeimrrsaousfmkcsom{mamtpiooksnn}eK1tnotastnhinde (3) style themasks(seeEq(5)). where w is a relative weight given to a particular layer l. The l derivativesoftheaboveenergytermscanbefoundin[Gatysetal. In this setup, Rl is an image resizing operator that resamples an x 2015a].Toachievebesteffect,theenergycomponentsaretypically image(abinarymaskinthiscase)totheresolutionoffeaturemap enforcedoverasetoflayersinthenetwork. Forexample,thecon- xlusingnearestneighborinterpolation.Thenormalizationconstant tentlayercanbeasingleconv4 2layer,whilethestylelayerscanbe alsoreflectsthatweareaggregatingimagestatisticsoverasubset overalargerset{conv1 1,conv2 1,conv3 1,conv4 1,conv5 1}to ofpixelsintheimages. Theparameterβ fromEq.1isdividedby allowconsistenttextureappearancesacrossallspatialfrequencies. thenumberofmasksKtoensurethesamerelativeweightbetween E andE .Notethatthesebinarymasksarenot Thisfeaturespaceconstrainthasbeenshowntoexcelatrepresent- faithfulness stylelocal necessarilyexclusive,namely,pixelscanbeexplainedbymultiple ingnaturalimagetexturesfortexturesynthesis,styletransfer,and masksifneedbe. super-resolution.Weintroduceafewadaptationstothetaskofsin- gle image super-resolution and examine its effectiveness in terms Thesparsecorrespondencesarenon-trivialtoobtain. Weexamine oftransferingandsynthesizingnaturaltextures. two cases for the correspondence via masks: manual masks, and automaticmasksviathePatchMatch[Barnesetal.2009]algorithm. 3.1 BasicAdaptationtoSR ManualMasksFormoderatelysimplesceneswithlargeareasof homogeneoustexturessuchasgrass,trees,sky,etc.,wemanually TheobjectivefunctioninEquation1consistsofacontentsimilarity generate2to3masksperimageatthefullresolutiontotestoutthe termandastyleterm. Thecontenttermisanalogoustothefaith- localtexturetransfer. Werefertothissetupasourlocalmanual. fulness term in SISR frameworks. The style term can be seen as AvisualizationoftheimagesandmaskscanbefoundinFigure2. anaturalimagepriorderivedfromasingleexampleimage,which isassumedtorepresentthedesiredimagestatistics. Afirststepin PatchMatchMasksToautomaticallygeneratethemasks,weap- ourexperimentsistoreplacethecontentsimilaritytermE ply the PatchMatch algorithm to the LR input image c and a LR content withafaithfulnesstermE =|G∗x↓ −c|2,wheref versionofthestyleimagesafterapplyingthesamedownsampling faithfulness f isthedownsamplingfactor,GaGaussianlowpassfilter,andcthe processusedtogeneratec. Bothimagesaregrayscale. Oncethe lowresolutioninputimagethatwewouldliketoupsample. These nearest-neighborfield(NNF)iscomputedatthelowerresolution, variables associated with the downsampling process are assumed wedividetheoutputimageintocellsandpoolanddilatetheinter- known a-priori (non-blind SR). In the subsequent discussion, we polatedoffsetsatthefullresolutiontoformthemaskpairs. Each refertothisbasicadaptationasourglobal,sincetheGrammatrix mk containsasquarecellof1’s, anditscorrespondingmaskmk x s 4 (cid:32) (cid:33) E =(cid:88)E (s⊗mk,x⊗mk)=(cid:88)(cid:88) wl (cid:88)(cid:16)G(Sl⊗Rl(mk)) −G(Xl⊗Rl(mk)) (cid:17)2 (5) stylelocal style s x 4N2|Rl(mk)|2 s s ij x x ij k k l l x x i,j Figure3: VisualizationofthemasksautomaticallygeneratedusingthePatchMatchalgorithm. PatchMatchisappliedtothelowresolution grayscaleinputandexampleimagestocomputeadensecorrespondence.TheHRoutputimageisdividedintocells,andallcorrespondences containedintheinputcellareaggregatedtoformtheexampleimagemask. willbetheunionofnumerousofbinarypatches. Werefertothis variationasourlocal.AsamplevisualizationisgiveninFigure3. 4 Experimental Results 4.1 BaselineMethods For comparison, we first describe several baseline methods from recentliteratureonsuper-resolutionandtexturetransfer,andcom- paretoourmethods. Thesebaselinemethodsarerepresentativeof state-of-the-artperformanceintheirrespectivetasks,andformthe basisofcomparisonforSection4.2. ScSR[Yangetal.2008;Yangetal.2010]isoneofthemostwidely used methods for comparison in recent SISR literature. It is a sparse coding based approach, using a dictionary of 1024 atoms learnedoveratrainingsetof91naturalimages. Sparsecodingisa wellstudiedframeworkforimagereconstructionandrestoration,in whichtheoutputsignalisassumedtobeasparselinearactivation ofatomsfromalearneddictionary.WeusetheMatlabimplementa- tionprovidedbytheauthors1asabaselinemethodforcomparison. SRCNN [Dong et al. 2014] is a CNN based SISR method that produces state-of-the-art performance for PSNR/SSIM measures among recent methods. It combines insights from sparse coding approachesandfindingsindeeplearning.A3-layerCNNarchitec- ture is proposed as an end-to-end system. We can view this rep- resentationasagiantnon-linearregressionsysteminneuralspace, Figure2:Sampleimagesandtheircorrespondingmasks,eachone mapping LR to HR image patches. For subsequent comparisons, ismanuallygenerated. weusetheversionofSRCNNlearnedfrom5millionof33×33 subimagesrandomlysampledfromImageNet. TheMatlabcode packagecanbefoundontheauthor’swebsite2. Gatys[Gatysetal.2015a;Gatysetal.2015b]firstconsiderrefor- mulatingthetexturesynthesisproblemwithinaCNNframework. 1WeusetheMatlabScSRcodepackagefromhttp://www.ifp.illinois. edu/∼jyang29/codes/ScSR.rar 2WeusetheSRCNNcodepackagefromhttp://mmlab.ie.cuhk.edu.hk/ projects/SRCNN.html 5 Inbothwork,theVGGnetworkisusedforfeaturerepresentation 4.2.2 BlackandWhitePatterns andmodelingimagespace, andthecorrelationoffeaturemapsat each layer is the key component in encoding textures and struc- The simplest test images are texts and black and white patterns. turesacrossspatialfrequencies.TheGrammatrixrepresentationis As shown in Figure 4, traditional SR algorithms do a decent job compactandextremelyeffectiveatsynthesizingawidevarietyof at sharpening strong edges, with SRCNN producing slightly less textures[Gatysetal.2015b]. WeuseaLasagneandTheanobased ringingartifactsthanScSR.Asexpected,theexamplebasedmeth- implementation of [Gatys et al. 2015a] as a baseline method for comparison3. odsproduceinterestinghallucinatedpatternsbasedontheexample image. CNNMRF yields considerable amount of artifacts due to CNNMRF[LiandWand2016]addressthelossofspatialinforma- averagingpatchesinneuralspace. Gatysandourglobalintroduce tionduetotheGrammatrixrepresentationbyintroducinganMRF abiasinbackgroundintensitybutarecapableofkeepingtheedges stylelayerontopoftheVGGhiddenlayerstoconstrainlocalsim- crispandsharp.Muchfinedetailsandpatternsarehallucinatedfor ilarity of neural patches, where each local window in the output thebottomexample. imagefeaturemapisconstrainedtobesimilartothenearestneigh- borinthecorrespondinglayerofthestyleimagefeaturemaps.We usethetorchbasedimplementationfromtheauthors4. 4.2.3 Textures ToadaptthecodefromGatysetal.andCNNMRFforourexperi- Forhomogeneoustextures,mostSISRmethodssimplycannotin- ments,weupsampletheLRinputimagebicubiclytoserveasthe sertmeaningfulhighfrequencycontentbesidesedges.Ontheother content image. All other processing remain identical to their re- hand, we see that the Gram matrix constraint from [Gatys et al. spectiveimplementation. 2015a;Gatysetal.2015b]worksextremelywellbecauseitisco- WeshowasamplecomparisonofthesemethodsinFigure1,where ercing image statistics across spatial frequencies in neural space, alowresolutiontextureimageisupsampledbyafactorof3.Forthe and ensuring that the output image match these statistics. How- examplebasedmethods[Gatysetal.2015a;LiandWand2016]and everitislesseffectivewhenitcomestonon-homogeneousimage ours,weprovidetwoexampleimagestotestthealgorithm’sability contentsuchasedgesandsalientstructures,oranytypeofimage intransferringtextures.Someinitialobservationscanbemade: phenomena that is spatially unexchangeable. Finally, CNNMRF worksreasonablywellbutstillfallsshortintermsofrealism. This • ScSR[Yangetal.2008]andSRCNN[Dongetal.2014]pro- isbecauselinearblendingofneuralpatchesinevitablyreduceshigh duce nearly identical results qualitatively, even though their frequencies. Another artifact of this method is that this blending model complexity is orders of magnitude apart. This repre- processcanproduceneuralpatchesfromthenullspaceofnatural sentshalfadecadeofprogressintheSISRliterature. imagepatches, introducingcoloredhalosandtinyrainbowswhen zoomedin. • CNNMRF [Li and Wand 2016] produces painterly artifacts due to averaging in neural space. The highest frequencies Themainbenefitsoftheourglobalmethodare(1)betterfaithful- amongdifferentcolorchannelscanbemisalignedandappear nesstotheinputLRimage,and(2)lesscolorartifacts. TheGatys ascoloredhaloswhenzoomedin. transferbaselineoperatesinRGBcolorspace,henceanycorrelated • Ourmethodproducesconvincinghighfrequencydetailswhile color patterns from the style image will remain in the output im- beingfaithfultotheLRinput. Theeffectoftheexampleim- age.However,thestyleimagemightmightnotrepresentthecorrect agecanbeclearlyseenintheoutputimage. colorcorrelationobservedintheinputimage,e.g.,bluevsyellow flowers against a background of green grass. Our global transfer methodoperatesingrayscale,relaxingthecorrelationamongcolor 4.2 ComparisonofResults channels and allowing better sharing of image statistics. This re- laxationhelpsbringoutamorerealisticoutputimage,asshownin Inthissectionweshowcasetheperformanceofthealgorithmvari- Figure5,6,7. antsourglobal,ourlocal(PatchMatchbased)andourlocalman- ualonavarietyoftexturesandnaturalimages. Wealsocompare against leading methods in single-image super-resolution such as ComparisonsonregulartexturesareshowninFigure5.ourglobal ScSR[Yangetal.2008]andSRCNN[Dongetal.2014],aswellas produces better details and color faithfulness, whereas traditional deeplearningbasedstyletransfermethodsincluding[Gatysetal. SISRmethodsdonotappeartoodifferentfrombicubicinterpola- 2015a]andCNNMRF[LiandWand2016] tion. Figure6showsresultsonnumerousstochastichomogeneous textures.Examplebasedmethodsexhibitstronginfluencefromex- ampleimagesandcanproduceanoutputimagevisuallydifferent 4.2.1 TestData from the input, such as the fur image (third row). However, bet- ter details can be consistently observed throughout the examples. We collect a variety of images from the Internet including natu- Gatys can be seen to produce a typical flat appearance in color ralandman-madetextures, regulartextures, blackandwhitepat- (e.g.,rock,firstrow),thisisbecauseofthecolorprocessingcon- terns,textimages,simplenaturalscenesconsistingof2or3clearly straint. distinguishablesegments,andfaceimages. Thesetestimagesare collectedspecificallytotestthetexturetransferaspectofthealgo- Goingbeyondhomogeneoustextures,wetestthesealgorithmson rithms.Asaresult,wedonotevaluateperformanceofsingleimage simplenaturalimagesinFigure7.Realistictexturesanddetailscan super-resolutioninitstraditionalsense,namely,measuringPSNR bereasonablywellhallucinatedbyourglobal,especiallytheroots andSSIM. intheroil(firstrow)andthepatternsonthebutterflywings(bottom 3Our implementation is adapted from the art style transfer recipe row).Thepipes(secondrow)aresynthesizedwelllocally,however, from Lasagne: https://github.com/Lasagne/Recipes/tree/master/examples/ theoutoutputimagebecomestoo‘busy’whenviewedglobally. It styletransfer isworthpointingoutthatCNNMRFessentiallyproducesapaint- 4Chuan Li’s CNNMRF implementation is available at: https://github. ingfortheforestimage(thirdrow),thisisaclearexampleofthe com/chuanli11/CNNMRF disadvantagesofaveraging/blendingpatches. 6 example bicubicx3 ScSR SRCNN CNNMRF Gatys ourglobal groundtruth Figure4: ExamplecomparisonsonaChinesetextimage(top)andblackandwhitepatternimage(bottom). Examplebasedmethodscan hallucinateedgesininterestingways,butalsoproducebiasesinbackgroundintensity,copiedfromtheexampleimage. Otherartifactsare alsopresent.Bestviewedelectronicallyandzoomedin. example bicubicx3 ScSR SRCNN CNNMRF Gatys ourglobal groundtruth Figure5:Examplecomparisonsonregulartextures.Bestviewedelectronicallyandzoomedin. 7 example bicubicx3 ScSR SRCNN CNNMRF Gatys ourglobal groundtruth Figure6:Examplecomparisonsonvarioustypesoftextures.Bestviewedelectronicallyandzoomedin. example bicubicx3 ScSR SRCNN CNNMRF Gatys ourglobal groundtruth Figure7:Examplecomparisonsonsimplenaturalimages.Bestviewedelectronicallyandzoomedin. 8 example bicubicx3 ScSR SRCNN CNNMRF Gatys ourlocal groundtruth Figure8:Examplecomparisonsonmoderatelycomplexnaturalimages. CNNMRF,Gatysand‘ourlocal’consistentlysynthesizemorehigh frequenciesappropriatetothescene.CNNMRFandGatyssufferfromcolorartifactsduetomismatchingcolorsbetweentheexampleandthe inputimage.CNNMRFalsoproducessignificantamountofcolorartifactswhenviewedmoreclosely,especiallyinsmoothregionsandnear imageborders.GrammatrixbasedmethodssuchasGatysand‘ourlocal’outperformothermethodsintermsofhallucinatingimagedetails, howeveralsoproducemoreartifactsinafewtestcases.Bestviewedelectronicallyandzoomedin. 9 bicubicx3 SRCNN ourlocalpatchmatch ourlocalmanual Figure9:Examplecomparisonsonnaturalsceneswithmanuallysuppliedmasks.Bestviewedelectronicallyandzoomedin. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.