Table Of Content

Super-resolution Using Constrained Deep Texture Synthesis LibinSun∗ JamesHays† BrownUniversity GeorgiaInstituteofTechnology Abstract Weiss 2011]. It is known that the filters learned in these higher order models are essentially tuned low high-pass filters [Weiss Hallucinatinghighfrequencyimagedetailsinsingleimagesuper- and Freeman 2007]. As a result, no matter how these priors are resolutionisachallengingtask. Traditionalsuper-resolutionmeth- formulated,theyworkunderthesameprinciplebypenalizinghigh ods tend to produce oversmoothed output images due to the am- frequency image content, imposing the constraint that “images biguityinmappingbetweenlowandhighresolutionpatches. We should be smooth” unless required by the image reconstruction buildonrecentsuccessindeeplearningbasedtexturesynthesisand constraint.Whenthesepriorsareuniversallyappliedtoeverypixel show that this rich feature space can facilitate successful transfer location in the image, it is bound to yield over-smoothed output. and synthesis of high frequency image details to improve the vi- Butsmoothnessisjustanotherformofblur,whichisexactlywhat 7 sualqualityofsuper-resolutionresultsonawidevarietyofnatural wearetryingtoavoidinthesolutionspaceinsuper-resolution. 1 texturesandimages. 0 2 Keywords: detail synthesis, texture transfer, image synthesis, Toachievesharpnessintheupsampledimage,successfulmethods super-resolution usuallylearnastatisticalmappingbetweenlowresolution(LR)and n high resolution (HR) image patches. The mapping itself can be a non-parametric [Freeman et al. 2002; Huang et al. 2015], sparse J 1 Introduction coding [Yang et al. 2008], regression functions [Kim and Kwon 6 2010;YangandYang2013],randomforest[Schulteretal.2015], 2 Singleimagesuper-resolution(SISR)isachallengingproblemdue andconvolutionalneuralnetworks[Dongetal.2014; Wangetal. toitsill-posednature–thereexistmanyhighresolutionimages(out- 2015;Johnsonetal.2016]. Thereareprosandconsofbothpara- ] put)thatcoulddownsampletothesamelowresolutioninputimage. metric and non-parametric representations. Parametric methods V Givenmoderatescalingfactors,highcontrastedgesmightwarrant typically offer much faster performance at test time and produce C some extent of certainty in the high resolution output image, but higher PSNR/SSIM scores. But no matter how careful one engi- smoothregionsareimpossibletorecoverunambiguously. Asare- neers the loss function during training, the learned mapping will . s sult, most methods aim to intelligently hallucinate image details sufferfromtheinherentambiguityinlowtohighresolutionpatch c andtextureswhilebeingfaithfultothelowresolutionimage[Free- mapping(many-to-one), andendupwithaconservativemapping [ manetal.2002;SunandTappen2010;HaCohenetal.2010;Sun to minimize loss (typically MMSE). This regression-towards-the- andHays2012]. Whilerecentstate-of-the-artmethods[Yangand 1 meanproblemsuppresseshighfrequencydetailsintheHRoutput. Yang 2013; Timofte et al. 2014; Dong et al. 2014; Wang et al. v Non-parametricmethodsareboundtotheavailableexamplepatch 2015]arecapableofdeliveringimpressiveperformanceintermof 4 pairs in the training process, hence unable to synthesize new im- PSNR/SSIMmetrics,theimprovementinvisualqualitycompared 0 agecontentbesidessimpleblendingofpatches. Asaresult,more toearliersuccessfulmethodssuchas[Yangetal.2008]arenotas 6 artifactscanbefoundintheoutputimageduetomisalignmentof apparent.Inparticular,theamountofimagetexturaldetailsarestill 7 image content in overlapping patches. However, non-parametric lackingintheseleadingmethods.Webuildontraditionalandrecent 0 methodstendtobemoreaggressiveininsertingimagetexturesand deeplearningbasedtexturesynthesisapproachestoshowthatreli- . details[HaCohenetal.2010;SunandHays2012]. 1 abletexturetransfercanbeachievedinthecontextofsingleimage 0 super-resolutionandhallucination. Morerecently,deeplearningbasedapproacheshavebeenadopted 7 Being able to model and represent natural image content is often withgreatsuccessinmanyimagerestorationandsynthesistasks. 1 a required first step towards recovering and hallucinating image Thekeyistousewell-establisheddeepnetworksasanextremely : v details. Natural image models and priors have come a long way, expressivefeaturespacetoachievehighqualityresults. Inpartic- i from simple edge representations to more complex patch based ular, a large body of work on image and texture synthesis have X models. Image restoration applications such as image super- emerged and offer promising directions for single image super- r resolution, deblurring, and denoising, share a similar philosophy resolution. By constraining the Gram matrix at different layers a in their respective framework to address the ill-posed nature of in a large pre-trained network, Gatys etal.showed that it is pos- thesetasks. Acommonstrategyistointroduceimagepriorsasa sible to synthesize a wide variety of natural image textures with constraintinconjunctionwiththeimageformationmodel. Natural almostphoto-realisticquality[Gatysetal.2015b].Augmentingthe image content spans a broad range of spatial frequencies, and it same constraint with another image similarity term, they showed is typically easy to constrain the restoration process to reliable thatartisticstylescanbetransfered[Gatysetal.2015a;Gatysetal. recover information in the low frequency bands. These typically 2016] from paintings to photos in the same efficient framework. include smoothly varying regions without large gradients (edges, Recentwork[Sajjadietal.2016;Johnsonetal.2016]showthatby sky). In fact, a Gaussian or Laplacian prior would suit well for trainingtominimizeperceptuallossinthefeaturespace,superior mostimagerestorationtask.Thisfamilyofimagepriorshavebeen visualqualitycanbeachievedforSISR.However,theirsuccessat showntoworkinavarietyofsettings,in[Fergusetal.2006;Levin synthesizingnaturaltexturesisstilllimitedasshownintheirexam- andWeiss2007;Levinetal.2009;ChoandLee2009;XuandJia ples. 2010],tonameafew. Moreadvancedpriormodelshavealsobeen developedsuchasFRAME[Zhuetal.1998],theFieldsofExperts In this work, we build on the same approach from [Gatys et al. model [Roth and Black 2009], and the GMM model [Zoran and 2015a]andadaptithandleSISR.Wefocusonsynthesisandtrans- feraspectofnaturalimagetextures,andshowthathighfrequency ∗e-mail:[email protected] details can be reliably transfered and hallucinated from example †e-mail:[email protected] imagestorenderconvincingHRoutput. 2 Related Work difficult task. Yet, recent advances in deep learning have shown promising success. Goodfellow etal. [Goodfellow et al. 2014] 2.1 SingleImageSuper-resolution(SISR) introduced the Generative Adversarial Network (GAN) to pair a discriminative and generative network together to train deep gen- erativemodelscapableofsynthesizingrealisticimages. Follow-up Singleimagesuper-resolutionisalongstandingchallengeincom- works[Dentonetal.2015;Radfordetal.2016;Nguyenetal.2016] puter vision and image processing due to its extremely ill-posed extendedtheGANframeworktoimprovethequalityandresolution nature. However,ithasattractedmuchattentioninrecentresearch ofgeneratedimages. However, thefocusofthislineofworkhas duetonewpossibilitiesintroducedbybigdataanddeeplearning. been to generate realistic images consistent with semantic labels Unlike traditional multi-frame SR, it is impossible to unambigu- suchasobjectandimageclasses,inwhichlowandmidlevelimage ouslyrestorehighfrequenciesinaSISRframework. Asaresult, featurestypicallyplayamorecrucialrole,whereastheemphasison existingmethodshallucinateplausibleimagecontentbyrelyingon highresolutionimagedetailsandtexturesisnottheprimarygoal. carefullyengineeredconstraintsandoptimizationprocedures. Overthepastdecade,SISRmethodshaveevolvedfrominterpola- 2.3 ImageStyleandDetailTransfer tionbasedandedgeorientedmethodstolearningbasedapproaches. Such methods learn a statistical model that maps low resolution Manyworksexistinthedomainofstyleanddetailtransferbetween (LR) patches to high resolution (HR) patches [Yang et al. 2008; images.[Johnsonetal.2010]enhancetherealismofcomputergen- Kim and Kwon 2010; Yang and Yang 2013; Timofte et al. 2013; eratedscenesbytransferingcolorandtexturedetailsfromrealpho- Timofteetal.2014;Schulteretal.2015],withdeep-learningframe- tographs. [Shihetal.2013]considertheproblemofhallucinating works being the state-of-the-art [Dong et al. 2014; Wang et al. timeofdayforasinglephotobylearninglocalaffinetransformsin 2015].WhilethesemethodsperformwellintermsofPSNR/SSIM, adatabaseoftime-lapsevideos.[Laffontetal.2014]utilizecrowd- highfrequencydetailssuchastexturesarestillchallengingtohal- sourcing to establish an annotated webcam database to facilitate lucinate because of the ambiguous mapping between LR and HR transfering high level transient attributes among different scenes. imagepatches. Inthisrespect,non-parametericpatch-basedmeth- Styletransferforspecificimagetypessuchasportraitsisalsoex- odshaveshownpromisingresults[Freemanetal.2002;Sunetal. ploredby[Shihetal.],inwhichmulti-scalelocaltransformsina 2010;HaCohenetal.2010;SunandHays2012;Huangetal.2015]. Laplacian pyramid are used to transfer contrast and color styling Thesemethodsintroduceexplicitspatial[Freemanetal.2002]and fromexemplarprofessionalportraits. contextual [Sun et al. 2010; HaCohen et al. 2010; Sun and Hays 2012] constraints to insert appropriate image details using exter- Morerecently,[Gatysetal.2015a]proposeastyletransfersystem nal example images. On the other hand, internal image statistics usingthe19-layerVGGnetwork[SimonyanandZisserman2014]. basedmethodshavealsoshowngreatsuccess[FreedmanandFat- ThekeyconstraintistomatchtheGrammatrixofnumerousfea- tal 2011; Glasner et al. 2009; Yang et al. 2013; ?; Huang et al. turelayersbetweentheoutputimageandastyleimage,whilehigh 2015]. These methods directly exploit self-similarity within and levelfeaturesoftheoutputismatchedthatofacontentimage. In acrossspatialscalestoachievehighqualityresults. thisway,texturesofthestyleimageistransferedtotheoutputim- age as if painted over the content image, similar to Image Quilt- More recently, new SISR approaches have emerged with an em- ing[EfrosandFreeman2001]. Drawinginspirationsfromtexture phasisonsynthesizingimagedetailsviadeepnetworkstoachieve synthesismethods,[LiandWand2016]proposetocombineaMRF bettervisualquality.Johnsonetal. [Johnsonetal.2016]showthat withCNNforimagesynthesis. ThisCNNMRFmodeladdsaddi- the style transfer framework of [Gatys et al. 2015a] can be made tionallayersinthenetworktoenableresampling‘neuralpatches’, real-time,andshowthatnetworkstrainedbasedonperceptualloss namely,eachlocalwindowoftheoutputimageshouldbesimilarto inthefeaturespacecanproducesuperiorsuper-resolutionresults. somepatchinthestyleimageinfeaturespaceinanearestneighbor Sajjadietal. [Sajjadietal.2016]considerthecombinationofsev- sense.Thishasthebenefitofmorecoherentdetailsshouldthestyle eral loss functions for training deep networks and compare their image be sufficiently representative of the content image. How- visualqualityforSISR. ever,thiscopy-pasteresamplingmechanismisunabletosynthesize newcontent. Inaddition,thismethodispronetoproduce‘washed 2.2 TextureandImagesynthesis out’ artifacts due the blending/averaging of neural patches. This isacommonproblemtopatch-basedsynthesismethods[Efrosand Freeman 2001; Freeman et al. 2002; Kwatra et al. 2005]. Other In texture synthesis, the goal is to create an output image that interesting deep learning based applications such as view synthe- matches the textural appearance of an input texture to minimize sis[Zhouetal.2016]andgenerativevisualmanipulation[Zhuetal. perceptual differences. Early attempts took a parametric ap- 2016]havealsobeenproposed. Thesemethodsallowustobetter proach [Heeger and Bergen 1995; Portilla and Simoncelli 2000] understand how to manipulate and transfer image details without bymatchingstatisticalcharacteristicsinasteerablepyramid. Non- sacrificingvisualquality. parametricmethods[Bonet1997;EfrosandLeung1999;Efrosand Freeman2001; Kwatraetal.2003; WeiandLevoy2000; Kwatra et al. 2005] completely sidestep statistical representation for tex- 3 Method tures, and synthesize textures by sampling pixels or patches in a nearestneighborfashion. Morerecently,Gatysetal. [Gatysetal. Our method is based on [Gatys et al. 2015a; Gatys et al. 2015b], 2015b]proposeGrammatrixbasedconstraintsintherichandcom- whichencodesfeaturecorrelationsofanimageintheVGGnetwork plexfeaturespaceofthewell-knownVGGnetwork[Simonyanand via the Gram matrix. The VGG-Network is a 19-layer CNN that Zisserman 2014], and show impressive synthesized results on a rivalshumanperformanceforthetaskofobjectrecognition. This diverse set of textures and images. This deep learning based ap- networkconsistsof16convolutionallayers,5poolinglayers,anda proach shares many connections with earlier parametric models seriesoffullyconnectedlayersforsoftmaxclassification. suchas[HeegerandBergen1995; PortillaandSimoncelli2000], Alatentimagexistobeestimatedgivenconstraintssuchascon- butreliesonordersofmagnitudesmoreparameters,henceiscapa- tentsimilarityandstylesimilarity. Weassumeastyleorexample bleofmoreexpressiverepresentationoftextures. imagesisavailableforthetransferofappropriatetexturesfroms Synthesizinganentirenaturalimagefromscratchisanextremely tox, andthatxshouldstaysimilartoacontentimagecinterms 2 bicubicx3 ScSR SRCNN groundtruth CNNMRF Gatystransfer ourglobaltransfer exampleimage 1 e pl m a x e 2 e pl m a x e (a) bicubicx3 ScSR SRCNN groundtruth exampleimage CNNMRF Gatystransfer ourglobaltransfer 1 e pl m a x e 2 e pl m a x e (b) Figure1: Asamplecomparisonofvariousalgorithmsappliedtoupsamplingtextureimagesforafactorof×3. Twoexampleimagesare providedinboth(a)and(b)forexample-basedapproaches. Itcanbeseenthattheexampleimagehassignificantimpactontheappearance ofthehallucinateddetailsintheoutputimages,indicatingeffectivenessofthetexturetransferprocess. 3 ofmidtohighlevelimagecontent. Thefeaturespacerepresenta- constraintisgloballyappliedtothewholeimage.Formally,theour tionswiththenetworkareX,SandC respectively. Ateachlayer globalmethodsolvesthefollowingobjectiveviagradientdescent: l,anon-linearfilterbankofN filtersisconvolvedwiththeprevi- l ouslayer’sfeaturemaptoproduceanencodinginthecurrentlayer, x=argmin(αEfaithfulness(c,x)+βEstyle(s,x)) (4) whichcanbestoredinafeaturematrixXl ∈RNl×Ml,whereMl x isthenumberofelementsinthefeaturemap(heighttimeswidth). WeuseXl todenotetheactivationoftheithfilteratpositionjin Wefurthermakethefollowingchangestotheoriginalsetup: ij layerlgeneratedbyimagex. • All processing is done in gray scale. The original work of In[Gatysetal.2015a],thegoalistosolveforanimagexthatis [Gatys et al. 2015a] computes the feature maps using RGB similar to a content image c but takes on the style or textures of images.However,thisrequiresstrongsimilarityamongcolor s. Specifically, the followingobjectivefunctionis minimizedvia channel correlations between the example and input image, gradientdescenttosolveforx: whichishardtoachieve.Fortransferingartisticstyles,thisis notaproblem. Wedropthecolorinformationtoallowbetter x=argmin(αEcontent(c,x)+βEstyle(s,x)) (1) sharingofimagestatisticsbetweentheimagepair. x • We use the layers {conv1 1, pool1 1, pool2 1, pool3 1, whereEcontentisdefinedas: pool4 1,pool5 1}tocapturethestatisticsoftheexampleim- ageforbettervisualquality,asdonein[Gatysetal.2015b]. E (c,x)= 1(cid:88)(cid:88)(cid:16)Cl −Xl (cid:17)2 (2) content 2 ij ij Weshowthattheabovesetup,whilesimpleandbasic,iscapableof l ij transferingtexturedetailsreliablyforawidevarietyoftextures(see Fig.1andFig.6),evenifthetexturesarestructuredandregular(see The content similarity term is simply a L loss given the differ- 2 Fig.5). However, for general natural scenes, this adaptation falls encebetweenthefeaturemapofthelatentimageinlayerlandthe shortandproducespainterlyartifactsorinappropriateimagedetails correspondingfeaturemapfromthecontentimage. forsmoothimageregions,becausetheirglobalimagestatisticsno The definition of E is based on the the L loss between the longermatcheseachother. style 2 Grammatrixofthelatentimageandthestyleimageinasetofcho- senlayers. TheGrammatrixencodesthecorrelationsbetweenthe 3.2 LocalTextureTransferviaMaskedGramMatrices filter responses via the inner product of vectorized feature maps. Given a feature map Xl for image x in layer l, the Gram matrix Naturalimagesarecomplexinnature,usuallyconsistingofalarge G(Xl) ∈ RNl×Ml hasentriesGlij = (cid:80)kXilkXjlk,wherei,jin- numberofsegmentsandparts,someofwhichmightcontainhomo- dexthroughpairsoffeaturemaps,andkindexesthroughpositions geneousandstochastictextures. Clearly,globallymatchingimage ineachvectorizedfeaturemap. Thenthestylesimilaritycompo- statisticsforsuchcomplexscenescannotbeexpectedtoyieldgood nentoftheobjectivefunctionisdefinedas: results. However,withcarefullychosenlocalcorrespondences,we canselectivelytransferimagedetailsbypairingimagepartsofthe (cid:32) (cid:33) Estyle(s,x)=(cid:88)l 4Nwl2Ml l2 (cid:88)i,j (cid:16)G(Sl)ij−G(Xl)ij(cid:17)2 {Esammkxe}oK1rte.srimTmoitloaacrlhotioeexpvteouvrtehesirsve,aiwacehtwicnootrrsroeedtssupcooenfdabininngoaurpytaeimrrsaousfmkcsom{mamtpiooksnn}eK1tnotastnhinde (3) style themasks(seeEq(5)). where w is a relative weight given to a particular layer l. The l derivativesoftheaboveenergytermscanbefoundin[Gatysetal. In this setup, Rl is an image resizing operator that resamples an x 2015a].Toachievebesteffect,theenergycomponentsaretypically image(abinarymaskinthiscase)totheresolutionoffeaturemap enforcedoverasetoflayersinthenetwork. Forexample,thecon- xlusingnearestneighborinterpolation.Thenormalizationconstant tentlayercanbeasingleconv4 2layer,whilethestylelayerscanbe alsoreflectsthatweareaggregatingimagestatisticsoverasubset overalargerset{conv1 1,conv2 1,conv3 1,conv4 1,conv5 1}to ofpixelsintheimages. Theparameterβ fromEq.1isdividedby allowconsistenttextureappearancesacrossallspatialfrequencies. thenumberofmasksKtoensurethesamerelativeweightbetween E andE .Notethatthesebinarymasksarenot Thisfeaturespaceconstrainthasbeenshowntoexcelatrepresent- faithfulness stylelocal necessarilyexclusive,namely,pixelscanbeexplainedbymultiple ingnaturalimagetexturesfortexturesynthesis,styletransfer,and masksifneedbe. super-resolution.Weintroduceafewadaptationstothetaskofsin- gle image super-resolution and examine its effectiveness in terms Thesparsecorrespondencesarenon-trivialtoobtain. Weexamine oftransferingandsynthesizingnaturaltextures. two cases for the correspondence via masks: manual masks, and automaticmasksviathePatchMatch[Barnesetal.2009]algorithm. 3.1 BasicAdaptationtoSR ManualMasksFormoderatelysimplesceneswithlargeareasof homogeneoustexturessuchasgrass,trees,sky,etc.,wemanually TheobjectivefunctioninEquation1consistsofacontentsimilarity generate2to3masksperimageatthefullresolutiontotestoutthe termandastyleterm. Thecontenttermisanalogoustothefaith- localtexturetransfer. Werefertothissetupasourlocalmanual. fulness term in SISR frameworks. The style term can be seen as AvisualizationoftheimagesandmaskscanbefoundinFigure2. anaturalimagepriorderivedfromasingleexampleimage,which isassumedtorepresentthedesiredimagestatistics. Afirststepin PatchMatchMasksToautomaticallygeneratethemasks,weap- ourexperimentsistoreplacethecontentsimilaritytermE ply the PatchMatch algorithm to the LR input image c and a LR content withafaithfulnesstermE =|G∗x↓ −c|2,wheref versionofthestyleimagesafterapplyingthesamedownsampling faithfulness f isthedownsamplingfactor,GaGaussianlowpassfilter,andcthe processusedtogeneratec. Bothimagesaregrayscale. Oncethe lowresolutioninputimagethatwewouldliketoupsample. These nearest-neighborfield(NNF)iscomputedatthelowerresolution, variables associated with the downsampling process are assumed wedividetheoutputimageintocellsandpoolanddilatetheinter- known a-priori (non-blind SR). In the subsequent discussion, we polatedoffsetsatthefullresolutiontoformthemaskpairs. Each refertothisbasicadaptationasourglobal,sincetheGrammatrix mk containsasquarecellof1’s, anditscorrespondingmaskmk x s 4 (cid:32) (cid:33) E =(cid:88)E (s⊗mk,x⊗mk)=(cid:88)(cid:88) wl (cid:88)(cid:16)G(Sl⊗Rl(mk)) −G(Xl⊗Rl(mk)) (cid:17)2 (5) stylelocal style s x 4N2|Rl(mk)|2 s s ij x x ij k k l l x x i,j Figure3: VisualizationofthemasksautomaticallygeneratedusingthePatchMatchalgorithm. PatchMatchisappliedtothelowresolution grayscaleinputandexampleimagestocomputeadensecorrespondence.TheHRoutputimageisdividedintocells,andallcorrespondences containedintheinputcellareaggregatedtoformtheexampleimagemask. willbetheunionofnumerousofbinarypatches. Werefertothis variationasourlocal.AsamplevisualizationisgiveninFigure3. 4 Experimental Results 4.1 BaselineMethods For comparison, we first describe several baseline methods from recentliteratureonsuper-resolutionandtexturetransfer,andcom- paretoourmethods. Thesebaselinemethodsarerepresentativeof state-of-the-artperformanceintheirrespectivetasks,andformthe basisofcomparisonforSection4.2. ScSR[Yangetal.2008;Yangetal.2010]isoneofthemostwidely used methods for comparison in recent SISR literature. It is a sparse coding based approach, using a dictionary of 1024 atoms learnedoveratrainingsetof91naturalimages. Sparsecodingisa wellstudiedframeworkforimagereconstructionandrestoration,in whichtheoutputsignalisassumedtobeasparselinearactivation ofatomsfromalearneddictionary.WeusetheMatlabimplementa- tionprovidedbytheauthors1asabaselinemethodforcomparison. SRCNN [Dong et al. 2014] is a CNN based SISR method that produces state-of-the-art performance for PSNR/SSIM measures among recent methods. It combines insights from sparse coding approachesandfindingsindeeplearning.A3-layerCNNarchitec- ture is proposed as an end-to-end system. We can view this rep- resentationasagiantnon-linearregressionsysteminneuralspace, Figure2:Sampleimagesandtheircorrespondingmasks,eachone mapping LR to HR image patches. For subsequent comparisons, ismanuallygenerated. weusetheversionofSRCNNlearnedfrom5millionof33×33 subimagesrandomlysampledfromImageNet. TheMatlabcode packagecanbefoundontheauthor’swebsite2. Gatys[Gatysetal.2015a;Gatysetal.2015b]firstconsiderrefor- mulatingthetexturesynthesisproblemwithinaCNNframework. 1WeusetheMatlabScSRcodepackagefromhttp://www.ifp.illinois. edu/∼jyang29/codes/ScSR.rar 2WeusetheSRCNNcodepackagefromhttp://mmlab.ie.cuhk.edu.hk/ projects/SRCNN.html 5 Inbothwork,theVGGnetworkisusedforfeaturerepresentation 4.2.2 BlackandWhitePatterns andmodelingimagespace, andthecorrelationoffeaturemapsat each layer is the key component in encoding textures and struc- The simplest test images are texts and black and white patterns. turesacrossspatialfrequencies.TheGrammatrixrepresentationis As shown in Figure 4, traditional SR algorithms do a decent job compactandextremelyeffectiveatsynthesizingawidevarietyof at sharpening strong edges, with SRCNN producing slightly less textures[Gatysetal.2015b]. WeuseaLasagneandTheanobased ringingartifactsthanScSR.Asexpected,theexamplebasedmeth- implementation of [Gatys et al. 2015a] as a baseline method for comparison3. odsproduceinterestinghallucinatedpatternsbasedontheexample image. CNNMRF yields considerable amount of artifacts due to CNNMRF[LiandWand2016]addressthelossofspatialinforma- averagingpatchesinneuralspace. Gatysandourglobalintroduce tionduetotheGrammatrixrepresentationbyintroducinganMRF abiasinbackgroundintensitybutarecapableofkeepingtheedges stylelayerontopoftheVGGhiddenlayerstoconstrainlocalsim- crispandsharp.Muchfinedetailsandpatternsarehallucinatedfor ilarity of neural patches, where each local window in the output thebottomexample. imagefeaturemapisconstrainedtobesimilartothenearestneigh- borinthecorrespondinglayerofthestyleimagefeaturemaps.We usethetorchbasedimplementationfromtheauthors4. 4.2.3 Textures ToadaptthecodefromGatysetal.andCNNMRFforourexperi- Forhomogeneoustextures,mostSISRmethodssimplycannotin- ments,weupsampletheLRinputimagebicubiclytoserveasthe sertmeaningfulhighfrequencycontentbesidesedges.Ontheother content image. All other processing remain identical to their re- hand, we see that the Gram matrix constraint from [Gatys et al. spectiveimplementation. 2015a;Gatysetal.2015b]worksextremelywellbecauseitisco- WeshowasamplecomparisonofthesemethodsinFigure1,where ercing image statistics across spatial frequencies in neural space, alowresolutiontextureimageisupsampledbyafactorof3.Forthe and ensuring that the output image match these statistics. How- examplebasedmethods[Gatysetal.2015a;LiandWand2016]and everitislesseffectivewhenitcomestonon-homogeneousimage ours,weprovidetwoexampleimagestotestthealgorithm’sability contentsuchasedgesandsalientstructures,oranytypeofimage intransferringtextures.Someinitialobservationscanbemade: phenomena that is spatially unexchangeable. Finally, CNNMRF worksreasonablywellbutstillfallsshortintermsofrealism. This • ScSR[Yangetal.2008]andSRCNN[Dongetal.2014]pro- isbecauselinearblendingofneuralpatchesinevitablyreduceshigh duce nearly identical results qualitatively, even though their frequencies. Another artifact of this method is that this blending model complexity is orders of magnitude apart. This repre- processcanproduceneuralpatchesfromthenullspaceofnatural sentshalfadecadeofprogressintheSISRliterature. imagepatches, introducingcoloredhalosandtinyrainbowswhen zoomedin. • CNNMRF [Li and Wand 2016] produces painterly artifacts due to averaging in neural space. The highest frequencies Themainbenefitsoftheourglobalmethodare(1)betterfaithful- amongdifferentcolorchannelscanbemisalignedandappear nesstotheinputLRimage,and(2)lesscolorartifacts. TheGatys ascoloredhaloswhenzoomedin. transferbaselineoperatesinRGBcolorspace,henceanycorrelated • Ourmethodproducesconvincinghighfrequencydetailswhile color patterns from the style image will remain in the output im- beingfaithfultotheLRinput. Theeffectoftheexampleim- age.However,thestyleimagemightmightnotrepresentthecorrect agecanbeclearlyseenintheoutputimage. colorcorrelationobservedintheinputimage,e.g.,bluevsyellow flowers against a background of green grass. Our global transfer methodoperatesingrayscale,relaxingthecorrelationamongcolor 4.2 ComparisonofResults channels and allowing better sharing of image statistics. This re- laxationhelpsbringoutamorerealisticoutputimage,asshownin Inthissectionweshowcasetheperformanceofthealgorithmvari- Figure5,6,7. antsourglobal,ourlocal(PatchMatchbased)andourlocalman- ualonavarietyoftexturesandnaturalimages. Wealsocompare against leading methods in single-image super-resolution such as ComparisonsonregulartexturesareshowninFigure5.ourglobal ScSR[Yangetal.2008]andSRCNN[Dongetal.2014],aswellas produces better details and color faithfulness, whereas traditional deeplearningbasedstyletransfermethodsincluding[Gatysetal. SISRmethodsdonotappeartoodifferentfrombicubicinterpola- 2015a]andCNNMRF[LiandWand2016] tion. Figure6showsresultsonnumerousstochastichomogeneous textures.Examplebasedmethodsexhibitstronginfluencefromex- ampleimagesandcanproduceanoutputimagevisuallydifferent 4.2.1 TestData from the input, such as the fur image (third row). However, better details can be consistently observed throughout the examples. We collect a variety of images from the Internet including natu- Gatys can be seen to produce a typical flat appearance in color ralandman-madetextures, regulartextures, blackandwhitepat- (e.g.,rock,firstrow),thisisbecauseofthecolorprocessingcon- terns,textimages,simplenaturalscenesconsistingof2or3clearly straint. distinguishablesegments,andfaceimages. Thesetestimagesare collectedspecificallytotestthetexturetransferaspectofthealgo- Goingbeyondhomogeneoustextures,wetestthesealgorithmson rithms.Asaresult,wedonotevaluateperformanceofsingleimage simplenaturalimagesinFigure7.Realistictexturesanddetailscan super-resolutioninitstraditionalsense,namely,measuringPSNR bereasonablywellhallucinatedbyourglobal,especiallytheroots andSSIM. intheroil(firstrow)andthepatternsonthebutterflywings(bottom 3Our implementation is adapted from the art style transfer recipe row).Thepipes(secondrow)aresynthesizedwelllocally,however, from Lasagne: https://github.com/Lasagne/Recipes/tree/master/examples/ theoutoutputimagebecomestoo‘busy’whenviewedglobally. It styletransfer isworthpointingoutthatCNNMRFessentiallyproducesapaint- 4Chuan Li’s CNNMRF implementation is available at: https://github. ingfortheforestimage(thirdrow),thisisaclearexampleofthe com/chuanli11/CNNMRF disadvantagesofaveraging/blendingpatches. 6 example bicubicx3 ScSR SRCNN CNNMRF Gatys ourglobal groundtruth Figure4: ExamplecomparisonsonaChinesetextimage(top)andblackandwhitepatternimage(bottom). Examplebasedmethodscan hallucinateedgesininterestingways,butalsoproducebiasesinbackgroundintensity,copiedfromtheexampleimage. Otherartifactsare alsopresent.Bestviewedelectronicallyandzoomedin. example bicubicx3 ScSR SRCNN CNNMRF Gatys ourglobal groundtruth Figure5:Examplecomparisonsonregulartextures.Bestviewedelectronicallyandzoomedin. 7 example bicubicx3 ScSR SRCNN CNNMRF Gatys ourglobal groundtruth Figure6:Examplecomparisonsonvarioustypesoftextures.Bestviewedelectronicallyandzoomedin. example bicubicx3 ScSR SRCNN CNNMRF Gatys ourglobal groundtruth Figure7:Examplecomparisonsonsimplenaturalimages.Bestviewedelectronicallyandzoomedin. 8 example bicubicx3 ScSR SRCNN CNNMRF Gatys ourlocal groundtruth Figure8:Examplecomparisonsonmoderatelycomplexnaturalimages. CNNMRF,Gatysand‘ourlocal’consistentlysynthesizemorehigh frequenciesappropriatetothescene.CNNMRFandGatyssufferfromcolorartifactsduetomismatchingcolorsbetweentheexampleandthe inputimage.CNNMRFalsoproducessignificantamountofcolorartifactswhenviewedmoreclosely,especiallyinsmoothregionsandnear imageborders.GrammatrixbasedmethodssuchasGatysand‘ourlocal’outperformothermethodsintermsofhallucinatingimagedetails, howeveralsoproducemoreartifactsinafewtestcases.Bestviewedelectronicallyandzoomedin. 9 bicubicx3 SRCNN ourlocalpatchmatch ourlocalmanual Figure9:Examplecomparisonsonnaturalsceneswithmanuallysuppliedmasks.Bestviewedelectronicallyandzoomedin. 10