ebook img

Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses PDF

8.8 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses

Stable and Controllable Neural Texture Synthesis and Style Transfer Using Histogram Losses EricRisser1,PierreWilmot1,ConnellyBarnes1,2 1Artomatix,2UniversityofVirginia 7 1 0 2 b e F 1 ] (a) Style images (b) Style transfer (c) Style transfer (d) Texture (e) Texture synthesis R Figure1:Ourstyletransferandtexturesynthesisresults. Theinputstylesareshownin(a),andstyletransferresultsarein(b,c). Notethat G theangularshapesofthePicassopaintingaresuccessfullytransferredonthetoprow,andthatthemoresubtlebrushstrokesaretransferred . onthebottomrow.Theoriginalcontentimagesareinsetintheupperrightcorner.Unlessotherwisenoted,ouralgorithmisalwaysrunwith s defaultparameters(wedonotmanuallytuneparameters). Inputtexturesareshownin(d)andtexturesynthesisresultsarein(e). Forthe c [ texturesynthesis,notethatthealgorithmsynthesizescreativenewpatternsandconnectivitiesintheoutput. 2 v Abstract 1 Introduction 3 9 Recently,methodshavebeenproposedthatperformtexturesynthe- In recent years, deep convolutional neural networks have demon- 8 sisandstyletransferbyusingconvolutionalneuralnetworks(e.g. strated dramatic improvements in performance for computer vi- 8 Gatysetal.[2015;2016]).Thesemethodsareexcitingbecausethey sion tasks such as object classification, detection, and segmenta- 0 caninsomecasescreateresultswithstate-of-the-artquality. How- tion[Krizhevskyetal.2012;Heetal.2016].Becauseofthesuccess 1. ever, in this paper, we show these methods also have limitations ofthesemodels,therehasalsobeenmuchinterestinadaptingthese 0 intexturequality,stability,requisiteparametertuning,andlackof architecturesforsynthesistasksingraphicsandcomputationalpho- 7 user controls. This paper presents a multiscale synthesis pipeline tography. Forinstance,incomputationalphotography,deeparchi- 1 based on convolutional neural networks that ameliorates these is- tectureshavebeenusedformanytasks,includingeditingofphotos : sues. We first give a mathematical explanation of the source of [Tsaietal.2016;Yanetal.2016;Changetal.2016],objects[Zhu v instabilitiesinmanypreviousapproaches. Wethenimprovethese etal.2016],imagestyletransfer[Gatysetal.2016],texturesynthe- i instabilities by using histogram losses to synthesize textures that sis[Gatysetal.2015],newviewsynthesis[Kalantarietal.2016], X better statistically match the exemplar. We also show how to in- andimageinpainting[Pathaketal.2016;Yangetal.2016]. r tegrate localized style losses in our multiscale framework. These a In this paper, we specifically focus on the use of convolutional lossescanimprovethequalityoflargefeatures,improvethesepa- neural networks (CNNs) for style transfer and texture synthesis. rationofcontentandstyle,andofferartisticcontrolssuchaspaint Recently, Gatys et al. [Gatys et al. 2015; Gatys et al. 2016] pro- by numbers. We demonstrate that our approach offers improved posedparametricsynthesismodelsfortheseproblems, whichuti- quality,convergenceinfeweriterations,andmorestabilityoverthe lize CNNs. For some inputs, particularly for style transfer, these optimization. models can result in quite successful, state-of-the-art results (see Figures9-12). However,wefoundasweinvestigatedthesemeth- Keywords: styletransfer,texturesynthesis,neuralnetworks odsindepththattheyaresubjecttoanumberoflimitations.These Concepts:•Computingmethodologies→Imagemanipulation; includelimitationsinstability,ghostingartifacts,theneedforper- Computationalphotography; imageparametertuning,andchallengesinreproducinglarge-scale features. Furthermore, these methods do not incorporate artistic Permission to make digital or hard copies of all or part of this work for controlssuchaspaintingbynumbers[Hertzmannetal.2001;Rit- personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenot teretal.2006;Luka´cˇetal.2013;Luka´cˇetal.2015]. madeordistributedforprofitorcommercialadvantageandthatcopiesbear thisnoticeandthefullcitationonthefirstpage.Copyrightsforcomponents [email protected].(cid:13)c 2017ACM. ofthisworkownedbyothersthanACMmustbehonored.Abstractingwith CONFERENCEPROGRAMNAME,MONTH,DAY,andYEAR creditispermitted.Tocopyotherwise,orrepublish,topostonserversorto ISBN:THIS-IS-A-SAMPLE-ISBN...$15.00 redistributetolists,requirespriorspecificpermissionand/orafee.Request DOI:http://doi.acm.org/THIS/IS/A/SAMPLE/DOI thesismethod. Theseincludedcross-correlationsbetweenpairsof filterresponses. Neural texture synthesis and style transfer. In this paper, for short, we use “neural” to refer to convolutional neural networks. Recently, Gatys et al. [2015] showed that texture synthesis can be performed by using ImageNet-pretrained convolutional neural networkssuchasVGG[SimonyanandZisserman2014]. Specifi- cally,Gatysetal.[2015]imposelossesonco-occurrencestatistics forpairsoffeatures. ThesestatisticsarecomputedviaGramma- trices, which measure inner products between all pairs of feature (a) (b) (c) mapswithinthesamelayersoftheCNN.Thetexturesynthesisre- sultsofGatysetal.[2015]typicallyimproveuponthoseofPortilla Figure 2: Instabilities in texture synthesis of Gatys et al. [Gatys andSimoncelli[2000]. Gatysetal.[2016]laterextendedthisap- etal.2015]. Weshowtheinputtexture(a). Alargertexture(b)is proachtostyletransfer, byincorporatingwithinCNNlayersboth synthesizedandshowsasignificantamountofinstabilitywherethe aFrobeniusnorm“contentloss”toacontentexemplarimage,and brightnessandcontrastvarysignificantlythroughouttheimage.By a Gram matrix “style loss” to a style exemplar image. We build hand-tuningparameters, wecanfindparametersettingsthatpro- upon this framework, and offer a brief review of how it works in ducepleasingresultsforthisexample(c).However,westillobserve thenextsection. Concurrentlytoourresearch,Bergeretal.[2016] artifactssuchasghostingduetoasmallerdegreeofinstability. observedthatintheapproachofGatysetal.,textureregularitymay be lost during synthesis, and proposed a loss that improves regu- laritybasedonco-occurrencestatisticsbetweentranslationsofthe Our first contribution in this paper is a demonstration of how featuremaps.Recently,Aittalaetal.[2016]usedneuralnetworksto and why such instabilities occur in the neural network synthe- extractSVBRDFmaterialmodelsfromasinglephotoofatexture. sis approaches (see Section 4.2). Examples of instabilities are Theirmethodfocusesonamorespecificproblemofrecoveringa shown in Figure 2. In neural network synthesis methods such as SVBRDFmodelfromahead-litflashimage.However,theydoob- Gatysetal[2015; 2016], wehavefoundthatcarefullytuningpa- serve that certain instabilities such as non-stationary textures can rameters on a per-image basis is often necessary to obtain good easilyresultifsufficientlyinformativestatisticsarenotused. We qualityresults,theoptimizationisoftenunstableoveritsiterations, seethisasconnectedwithourobservationsaboutinstabilitiesand andthesynthesisprocessbecomesmoreunstableasthesizeofthe howtorepairthem. outputincreases. Feedforward neural texture synthesis. Recently, a few pa- WenextdemonstrateinSection4.3thatsuchinstabilitiescanbead- pers [Ulyanov et al. 2016a; Ulyanov et al. 2016b; Johnson et al. dressedbytheuseofnovelhistogramlossterms. Theselossterms 2016]haveinvestigatedthetrainingoffeedforwardsynthesismod- matchnotonlythemeanbutalsothestatisticaldistributionofacti- els,whichcanbepre-trainedonagivenexemplartextureorstyle, vationswithinCNNlayers. Wealsoshowhowtoeffectivelymin- and then used to quickly synthesize a result using fixed network imizesuchhistogramlosses. Ourhistogramlossesimprovequal- weights. The feed-forward strategy is faster at run-time and uses ity,reduceghosting,andaccelerateconvergence. Weinitiallyfor- less memory. However, feed-forward methods must be trained mulatethehistogramlossesfortexturesynthesisandtheninSec- specifically on a given style or texture, making the approach im- tion4.4discusshowtoextendthemforstyletransfer. practicalforapplicationswheresuchapre-trainingwouldtaketoo long(pre-trainingtimesof2to4hoursarereportedinthesepapers). WenextdemonstrateinSection5howqualityandcontrolcanbe Non-parametric texture synthesis. Non-parametric texture syn- furtherimprovedbyusinglocalizedlossesinamultiscalesynthesis thesismethodsworkbycopyingneighborhoodsorpatchesfroman framework. In Section 5.2, we demonstrate how localized style exemplartexturetoasynthesizedimageaccordingtoalocalsimi- lossescanbeusedtoincorporateartisticcontrolssuchaspaintby larityterm[EfrosandLeung1999;WeiandLevoy2000;Lefebvre numbers, as well as improve separation of content and style, and and Hoppe 2005; Lefebvre and Hoppe 2006; Kwatra et al. 2003; betterreproducelargefeatures. InSection6, weexplainhowwe Kwatraetal.2005;Barnesetal.2009].Thisapproachhasalsobeen automaticallyselectparameters,whichremovestherequirementfor usedtotransferstyle[EfrosandFreeman2001; Hertzmannetal. ahumantomanuallytuneparameters. 2001; Barnes et al. 2015]. Some papers have recently combined These contributions together allow us to achieve state-of-the-art parametricneuralnetworkmodelswithnon-parametricpatch-based quality for neural style transfer and for parametric neural texture models[ChenandSchmidt2016;LiandWand2016]. synthesis.1 3 A brief introduction to neural texture syn- 2 Related work thesis and style transfer Parametric texture synthesis. Some early methods for texture Inthissection,webrieflyintroducethetexturesynthesisandstyle synthesisexploredparametricmodels. HeegerandBergen[1995] transfermethodsofGatysetal.[2015;2016].Foramorethorough used histogram matching combined with Laplacian and steerable introductiontothesetechniques, pleaserefertothosepapers. We pyramids to synthesize textures. We are inspired by their use of showsomeexampleresultsfromthesemethodslaterinthepaper: histogram matching. Portilla and Simoncelli [2000] investigated neuraltexturesynthesisisshowninFigure8,andstyletransferre- theintegrationofmanywaveletstatisticsoverdifferentlocations, sultsareshowninFigures9-12. orientations,andscalesintoasophisticatedparametrictexturesyn- For texture synthesis [Gatys et al. 2015], we are given an input 1Exceptforonregulartextures,whereBergeretal.[2016]haverecently sourcetextureS,andwishtosynthesizeanoutputtextureO. We demonstrated a loss that improves regularity, which we do not currently pass S and O through a CNN such as VGG [Simonyan and Zis- include. serman 2014]. This results in feature maps for the activations of thefirstLconvolutionallayers,whichwedenoteasS ...S and 1 L O ...O . ThenweminimizealossL overthelayers,which 1 L gram preservessomepropertiesoftheinputtexturebymeansofaGram matrix: L L =(cid:88) αl (cid:107)G(S)−G(O))(cid:107)2 (1) gram |S|2 l l F l l=1 Here α are user parameters that weight terms in the loss, |·| is l thenumberofelementsinatensor,(cid:107)·(cid:107) istheFrobeniusnorm, F andtheGrammatrixG(F)isdefinedoveranyfeaturemapF asan N ×N matrixofinnerproductsbetweenpairsoffeatures: l l (cid:88) G (F)= F F (2) ij ik jk k HereF referstofeaturei’spixelj withinthefeaturemap. The ij synthesized output image O is initialized with white noise and Figure3: Differentstatisticsthatcanbeusedforneuralnetwork is then optimized by applying gradient descent to equation (1). texturesynthesis.SeethebodyofSection4.1fordiscussion. Specifically,thegradientofequation(1)withrespecttotheoutput imageOiscomputedviabackpropagation. Styletransfer[Gatysetal.2016]workssimilarly,butwearegivena contentimageC,styleimageS,andwouldliketosynthesizeastyl- izedoutputimageO.WepassallthreeimagesthroughaCNNsuch asVGG,whichgivesactivationsforthefirstLconvolutionallayers of C ...C , S ...S , O ...O . Then the total style transfer 1 L 1 L 1 L losscombinesthelossesforthestyleimage(L )andthecon- gram tentimage: Figure4:Atleft,anexampleinputimage,whichhasauniformdis- L =L +L (3) √ transfer gram content tributionofintensitieswithameanofµ = 1/ 2 ≈ 0.707anda 1 Thecontentlossisafeaturedistancebetweencontentandoutput, standarddeviationofσ = 0. Atright,anexampleoutputimage, 1 whichaimstomakeoutputandcontentlooksimilar: which has a non-uniform distribution with a mean of µ = 1/2 2 andastandarddeviationofσ =1/2.Ifinterpretedastheactiva- 2 Lcontent =(cid:88)L |Cβl|(cid:107)Cl−Ol(cid:107)2F (4) teiqounivoaflaenfetantounre-cmenatprawlisthecoonnedfemaotumree,ntthseosfe1tw/2o,daisntdribeuqtuioanlsGhraavme l l=1 matrices. Again, β are user weight parameters, and the output image O is l initializedwithwhitenoiseandoptimizedusinggradientdescent. additionalstatisticscouldalsobeinvestigated,suchasthetransla- tionalco-occurrencestatisticsofBergeretal.[BergerandMemise- 4 Our basic method using histogram losses vic2016],butthisisbeyondthescopeofourpaper. Inthissection,wepresentourbaselinetexturesynthesisandstyle WefinditinterestingthatalthoughGrammatricesarewidelyused transfer method, which incorporates histogram losses. We first forsynthesis[Gatysetal.2015;Gatysetal.2016;Selimetal.2016; brieflydiscussinSection4.1somestatisticsthatcanbeusefulfor BergerandMemisevic2016;Johnsonetal.2016],theirresultsare parametericneuralnetworksynthesis.WethendemonstrateinSec- oftenunstable. Weinvestigatethecausesofthisinstabilityinthe tion4.2howandwhyinstabilitiesoccurinthepreviousapproach nextsection. of Gatys et al [2015; 2016]. We next explain in Section 4.3 how we address these instabilities using histogram losses, and how to 4.2 Theproblem: instabilities effectivelyminimizehistogramlosses. TheinstabilitiesareshowninFigure2. Thecauseoftheseisquite 4.1 Prelude: usefulstatisticsforparametricsynthesis simpletounderstand.Tosimplifyandillustratetheproblem,imag- inethatourinputimageisgrey, asshowninFigure4ontheleft. Ifonerevisitsearlierparametrictexturesynthesisresearchsuchas Ourdesiredoutputfromtexturesynthesisisthusalsoagreyimage, √ PortillaandSimoncelli[2000],onewillquicklynotethatmanydif- withameanvaluebeing1/ 2(about70.7%gray). Theproblem ferentstatisticscanbeusedfortexturesynthesis. Weinvestigated isthattherearemanydistributionsthatwillresultinanequivalent theeffectofmatchingseveralsuchstatisticsforneuralnetworksin Gram matrix, such as the output image on the right of Figure 4, Figure3. Weshowtheresultsofmatchingonlythemeanactiva- wherehalfoftheimageis0(black)andtheotherhalfis1(white). tions of each feature (with an L2 loss), matching Gram matrices Ofcourse,inreality,theGrammatricesdonotmatchimageintensi- using equation (1), the full histogram matching that we develop ties,butratherfeatureactivations,i.e. featuremapsafterapplying laterinSection4.3,andthecombinationofGrammatricesandour the activation functions. However, the same argument applies: if fullhistogrammatching. ResultswithGrammatrixstatisticstend weinterprettheimagesofFigure4asfeaturemapactivations,then tobegenerallybetterthanresultsusingmeanactivations. Results activationmapswithquitedifferentmeansandvariancescanstill with histogram statistics are often more stable in terms of image havethesameGrammatrix. intensity,butcanbreaksomesmallstructures. Thebestresultsare obtained using our full method (Gram + histogram). Of course, Itisslightlytechnicaltounderstandthisissuemoreformally. This is because the Gram matrix is statistically related to neither the we wish to explore the set of output feature random vectors X 2 mean nor covariance matrices, but instead to the matrix of non- withequalGrammatrixbutdifferentvariance. Incomputergraph- central second moments. We now explain this formally. Let us ics, color adjustment models based on affine operations are often considerthecaseofafeatureactivationmapF withmfeatures.In explored for finding correspondences [HaCohen et al. 2011], and therestofthissection,forbrevity,wewillreferto“featuremapac- adjustingcolors[SiddiquiandBouman2008],includingfortexture tivations”simplyas“features,”soall“feature”alwaysreferstothe synthesis[Darabietal.2012; Diamantietal.2015]. Duetothese resultofapplyingtheactivationfunction. Wecansummarizethe affinemodels,wethoughtitappropriatetoinvestigatewhethersyn- statisticsofthefeaturesinthefeaturemapF byusinganmdimen- thesizedtextureswillbestableifweholdtheirGrammatrixcon- sionalrandomvariableXtomodeltheprobabilitydistributionofa stant but otherwise allow features to undergo affine changes. We givenm-tupleoffeatures. Wecanrelatetherandomvectoroffea- thusexploredapplyinganaffinetransformationtoourrandomvec- turesXandthefeaturemapF. Forexample,ifwenormalizethe torofinputfeatureactivationsX toobtainatransformedrandom 1 GrammatrixG(F)bythenumberofsamplesn,weobtainasam- vectorofoutputfeatureactivationsX = AX +b,whereAis 2 1 pleestimatorforthesecondnon-centralmixedmomentsE[XXT]. anm×mmatrix,andbisanmvector. Then,usingequation(5), Consequently,inthefollowingdiscussion,wewillsometimesinfor- wecansettheGrammatricesofX1andX2equaltoobtain: mallyrefertothe(normalized)“Grammatrix”andE[XXT]inter- changeably,eventhoughoneisactuallyasampledestimatorforthe E[X2X2T]=AΣ(X1)AT +(Aµ+b)(Aµ+b)T = (8) oDtehfiern.eFtohrethmeefaonllfoewaitnugreaµrgu=meEn[tXw]e.thBuysasegten1neGra(lFp)ro=peErt[yXoXfcTo]-. E[X1X1T]=Σ(X1)+µµT. variancematrices,wehavethatΣ(X)=E[XXT]−µµT,where We decided to constrain the variances of the output random fea- Σindicatesacovariancematrix.Afterrearranging,weobtain: tureactivationvectorX alongthemaindiagonalofitscovariance 2 E[XXT]=Σ(X)+µµT (5) matrix,sothatthevariancesareequaltoasetof“target”outputim- agefeatureactivationvariances. Wecanthensolvefortheremain- Forsimplicity,letusnowconsiderthecasewherewehaveonlyone ingunknownvariablesinthetransformationmatrixAandvector feature,m=1.Bysubstitutingintoequation(5),weobtain: b. Unfortunately, closed-formsolutionsoftheresultingquadratic equations appear to generate multi-page long formulas, even for 1 thetwo-dimensionalcase. However,intuitively,therearemoreun- G(F)=E[X2]=σ2+µ2 =(cid:107)(σ,µ)(cid:107)2 (6) n knownsthanequations, soitshouldoftenbepossibletogenerate a output feature distribution X with different variances than the 2 HereσisthestandarddeviationofourfeatureX.Supposewehave input texture’s feature distribution X , but with the same Gram 1 a feature map F1 for the input source image, and a feature map matrix. Specifically,therearem(m+1)unknownsforAandb, F2 for the synthesized output, and that these have respective fea- whereasthereareonlym(m+3)/2constraints,duetoequation(8) ture distributions X1,X2, means µ1,µ2, and standard deviations (m(m+1)/2 constraints due to the upper half of the symmetric σ1,σ2. Thus,fromequation(6),weknowthatthemapswillhave matrix,plusmconstraintsfortheknownoutputfeaturevariances). thesameGrammatrixifthisconditionholds: To verify the intuition that it is possible to construct multidimen- (cid:107)(σ ,µ )(cid:107)=(cid:107)(σ ,µ )(cid:107) (7) sional distributions that are related by affine transformations and 1 1 2 2 havedifferentvariancebutequalGrammatrices,weranextensive Wecaneasilyusethistogenerateaninfinitenumberof1Dfeature numerical experiments for solving the equations 8 in dimensions mapswithdifferentvariancesbutequalGrammatrix. Clearlythis m=1...16,32,64.Wefoundthatineverycasetherewasasolu- isbadforsynthesis. Specifically, thismeansthatevenifwehold tion.Specifically,wegeneratedlargenumbersofmeanandcovari- theGrammatrixconstant,thenthevarianceσ2 ofthesynthesized ancematricesforthefeaturedistributionsoftheinputtexture,and 2 texturemapcan befreelychange(withcorrespondingchangesto theoutputfeaturevariances, witheachsampledfromtheuniform themeanµ2 basedonequation(7)), orconversely, thatthemean distributionU(0,1). µ of the synthesized texture map can freely change (with corre- 2 spondingchangestothevariance,σ2). Thispropertyleadstothe Weconcludethattherearemanywaystochangethevarianceofan 2 inputtexturewithoutchangingitsGrammatrix. Thisleadstothe instabilitiesshowninFigure2. Forsimplicity,weassumethatthe instabilitiesshowninFigure2. CNNisflexibleenoughtogenerateanydistributionofoutputimage features that we request. Suppose we wish to generate an output We also note that recent papers on neural texture synthesis texturewithadifferentvariance(e.g. σ2 (cid:29) σ1)butequalGram (e.g.[Gatysetal.2015; Ulyanovetal.2016a; BergerandMemi- matrix. Thenwecansimplysolveequation(6)forµ2,andobtain sevic2016])usuallydemonstrateoutputsatthesameresolutionor (cid:112) µ2 = σ12+µ21−σ22. Infact,thisishowwegeneratedthedif- slightly larger than the input. This is a departure from papers in ferentdistributionswithequal√GrammatricesinFigure4. Theleft computer graphics, which traditionally synthesized output texture distributionX hasµ = 1/ 2andσ = 0,andwesetalarger atasignificantlylargersizethantheinputexemplar[EfrosandLe- 1 1 1 standarddeviationσ =1/2fortherightdistribution,soweobtain ung1999;WeiandLevoy2000;LefebvreandHoppe2005;Lefeb- 2 µ =1/2. vreandHoppe2006;Kwatraetal.2003;Kwatraetal.2005]. For 2 smallertexturesizes,theinstabilityartifactismoresubtleandcan Inthemultidimensionalcasem > 1,ifthereisnocorrelationbe- bepartiallymitigatedbymanuallytweakinglearningratesandgra- tweenfeatures,thenwesimplyhavemseparatecasesoftheprevi- dientsizesandwhichlayersinthenetworkcontributetotheopti- ous1Dscenario. Thus,whilemaintainingthesameGrammatrix, mizationprocess(butitcanstillbeobservedinresults,e.g.[Gatys wecanclearlychangeallofthevarianceshoweverwelike,aslong etal.2015;BergerandMemisevic2016]). Theinstabilityartifact aswemakeacorrespondingchangetothemean. Thiscanclearly growsastheoutputtextureisenlarged. leadtoinstabilitiesinvarianceormean. However,inthemultidimensionalscenario,typicallytherearecor- 4.3 Solution: HistogramLosses relations between features. Consider the scenario where we are givenaninputfeaturemapF ,andthustheinputfeaturerandom Aswehavejustdiscussed,previousneuraltexturesynthesismodels 1 vector X and its mean µ and covariance matrix Σ(X ). Now, [Gatysetal.2015;Gatysetal.2016;BergerandMemisevic2016] 1 1 typically use Gram matrices to guide the synthesis process. This resultsininstabilitiesduetonotprovidingguaranteesthatthemean orvarianceofthetextureispreserved. Oneimprovementcouldbe to explicitly preserve statistical moments of various orders in the texture’sactivations. However,wetookafurtherstepanddecided tojustpreservetheentirehistogramofthefeatureactivations.More specifically,weaugmentthesynthesislosswithmadditionalhis- togramlosses,oneforeachfeatureineachfeaturemap. Thesecan beviewedasmatchingthemarginaldistributionsofeachfeature’s activations.Wealsoincorporateatotalvariationloss[Johnsonetal. (a) Texture (b) No Histogram Loss 2016],whichimprovessmoothnessslightlyintheoutputimage. Thus,ourcombinedlossfortexturesynthesisis: L(ours) =L +L +L (9) texture gram histogram tv However,itisslightlysubtletodevelopasuitablehistogramloss. SupposewetakethenaiveapproachanddirectlyplaceanL2 loss between histograms of the input source texture S and the output imageO.Thenthislosshaszerogradientalmosteverywhere,soit doesnotcontributetotheoptimizationprocess. (c) Histogram Loss relu4_1 (d) Histogram Loss relu4_1&relu1_1 Instead, we propose a loss based on histogram matching. First, Figure5: Instabilityandghosting: inadditiontoinstabilityprob- wetransformthesynthesizedlayerwisefeatureactivationssothat lems,thebaselinesynthesismethod[Gatysetal.2015]usingGram theirhistogramsmatchthecorrespondinghistogramsoftheinput matrices tends to interpolate sharp transitions in colors. As we sourcetextureS. Thismatchingmustbeperformedonceforeach discuss in Section 7, our implementation uses the VGG-19 net- histogramlossencounteredduringbackpropagation.Wethenadda work[SimonyanandZisserman2014]. Ifweaddahistogramloss lossbetweentheoriginalactivationsandactivationsafterhistogram (equation (10)) at later convolutional layers (rectified linear unit matching. or“relu”4 1inVGG-19),thissolvestheinstabilityissue,butthe largeoverlappingreceptivefieldstendtoblendfeaturesandcreate We simply use an ordinary histogram matching tech- gradientswherenoneexistintheinput.Ifwealsoaddahistogram nique [Wikipedia 2017] to remap the synthesized output losstothefirstlayer(relu1 1),thisamelioratesthisproblem. activations to match the activations of the input source texture S. Let O be the output activations for convolutional layer i, ij feature j, and O(cid:48) be the remapped activations. We compute the ij normalizedhistogramfortheoutputactivationsOijandmatchitto We have found that style transfer also suffers from the same in- thenormalizedhistogramfortheactivationsofinputsourcetexture stabilityartifactswehaveshownintexturesynthesis. Introducing S,thusobtainingtheremappedactivationsOi(cid:48)j. Werepeatthisfor histogramlosscanofferthesamebenefitsforthisproblemaswell. eachfeatureinthefeaturemap. OurstyletransferstrategythereforefollowsGatysetal.[2016].We include both a per-pixel content loss and a histogram loss in the Onceouroutputactivationshavebeenupdatedtomimictheinput parametric texture synthesis equation. After including these, the source texture, we find the Frobenius norm distance between the originalactivationsandtheremappedones. LetO betheactiva- overallstyletransferlossbecomes: i tionmapoffeaturemapiandR(O )bethehistogramremapped i activationmap.Thenwehave: L(ours) =L +L +L +L (11) transfer gram histogram content tv L L =(cid:88)γ(cid:107)O −R(O )(cid:107)2 (10) histogram l i i F l=1 5 Localized losses for control and stability Hereγ isauserweightparameterthatcontrolsthestrengthofthe l loss. Oneconcernwiththeparametricneuraltexturesynthesisandstyle To compute the gradient of this loss for backpropagation, we ob- transfer approaches is that they do not include manual and auto- servethatthehistogramremappingfunctionR(O )haszerogra- matic control maps that were previously used in non-parametric i dient almost everywhere, and therefore can be effectively treated approaches[Hertzmannetal.2001; Barnesetal.2009; Diamanti asaconstantforthegradientoperator. Therefore, thegradientof etal.2015; Fisˇeretal.2016]. Wefirstintroduceourmultiresolu- equation (10) can be computed simply by realizing R(O ) into a tion(pyramidal)synthesisapproach.Thenweintroducealocalized i temporary array O(cid:48) and then computing the Frobenius norm loss stylelossforaddingartisticusercontrols. i betweenO andO(cid:48). i i 5.1 Multiresolutionsynthesis 4.4 Extensiontostyletransfer The problem definition for style transfer can be thought of as a Forbothtexturesynthesisandstyletransfer,wehavefoundresults broadeningofthetexturesynthesisproblem. Texturesynthesisis are generally improved by a coarse-to-fine synthesis using image the problem of statistically resynthesizing an input texture. Style pyramids[BurtandAdelson1983]. Weusearatiooftwobetween transferissimilar: onestatisticallyresynthesizesaninputstyleex- successive image widths in the pyramid. We show in Figure 6 a emplarS withtheconstraintthatwealsodonotwantthesynthe- comparisonofpyramidandnon-pyramidresults.Weusepyramids sizedimageOtodeviatetoomuchfromacontentimageC. forallresultsinthispaperunlessotherwiseindicated. the matching separately within each of the M regions defined by theindexedmasks. Becausethereceptivefieldsforadjacentpixels overlap,backpropagationautomaticallyperformsblendingbetween adjacentregions. Forstyletransferitisimportanttonotethatoftenboththestyleand contentimagescontainsetsoftexturesthataresemanticallysimilar andshouldbetransferredtoeachother.Anexampleofthisisshown inFigure7. Usingthisapproachwecantransferhigherlevelstyle featuressuchaseyesandlips,ratherthanjustthelowerorderstyle featuressuchascolorandbrushstrokes.InFigure7wealsoshowa comparisonoftheresultsofourmethodagainstSelimetal.[2016], which is a specialized method designed only for faces. For our method,wehadtomanuallypaintmasks,unlikeSelimetal.[2016], however,ourmethodcanapplytomanydomainsandisnotspecific tofaces. 6 Automatic tuning of parameters (a) No Pyramid (b) Pyramid To obtain the best results, previous methods such as Gatys et al. [2016] would often manually tune parameters on Figure6: Comparisonofpyramidandnon-pyramidresults. Style aper-imagebasis. Withourmethod,ofcourse,thesametuningof imagesareshownatleft,andcontentimagesareshowninset.First parameters could be performed manually. However, this manual row:pyramidsblendcoarsescalestylefeaturesinwithcontentfea- tuningprocesscanbequitetedious.Thus,forallofourresults,our turesbetter. Secondrow: pyramidstransfercoarsescalefeatures methodautomaticallydeterminestheparameters. Toensureafair betterandreduceCNNnoiseartifacts. Thirdrow: azoominfrom comparisonwithpreviousapproachessuchasGatysetal.[2016], thesecondrow,showingnoiseartifacts(atleft)andbettertransfer we also automatically determine their parameters in the figures ofcoarse-scalefeatures(atright). throughout our paper, by selecting the default parameter values3. We also show in a supplemental PDF a comparison of our auto- maticmethodwithhand-tunedparametersforGatysetal.[2016]. Wenowdescribeourautomaticparametertuningprocess. 5.2 Localizedstylelossforartisticcontrols Theparametersrefertothecoefficientsα intheGramlossofequa- l Ourapproachforlocalizinglossfunctionsistomaketheobserva- tion(1),β inthecontentlossofequation(4),γ inthehistogram l l tionthatmanyimagesorstylesareactuallynotasingletexturebut lossofequation(10),andafourthparameterwecallωthatismul- acollectionofcompletelydifferenttextureslaidoutinascene. As tipliedagainstthetotalvariationloss[Johnsonetal.2016]. suchitdoesnotmakesensetocombinemultipletexturesintoasin- gleparametricmodel. Instead,weseparatethemoutintomultiple Our automatic tuning process is inspired by approaches such as models. Thisapproachhasbeenknownas“paintingbynumbers” batchnormalization[IoffeandSzegedy2015], whichtunehyper- or “painting by texture” in the literature [Hertzmann et al. 2001; parameters during the training process so as to prevent extreme Ritteretal.2006;Luka´cˇ etal.2013;Luka´cˇ etal.2015]. Thisway values for gradients. We thus dynamically adjust the parameters of framing the problem has advantages for both texture synthesis αl, βl, γl, ω during the optimization process. Presently, our dy- andstyletransfer. namic tuning is carried out with the aid of gradient information. Weacknowledgethattheoptimizationwouldlikelybemoremath- One way to do this would be to use multiple CNNs for different ematicallywell-foundedifweinsteaddynamicallytunedbasedon regions, and blend the regions together. However, that approach non-gradient information such as the magnitude of the losses or wouldbelessefficient,andmightintroduceproblemsfortheblend- statisticswithinthelosses. Nevertheless,thisiswhatwecurrently ingintransitionregions. Instead,weperformpaintingbynumbers do. Duringbackpropagation,weencounterdifferentlosstermsL , i synthesisinasingleCNN. eachofwhichhasanassociatedparameterc thatneedstobedeter- i mined(c isoneoftheparametersα,β,γ,ω).Wefirstcalculate Asaninput,theartistpaintsan“indexedmask,”whereindicesare i l l l thebackpropagatedgradientg fromthecurrentlosstermasifc paintedonboththesourcetexture(orstyleimage)andtheoutput i i were1.However,ifthemagnitudeofg exceedsaconstantmagni- image. An example of these masks is shown in Figure 7. We i tudethresholdT ,thenwenormalizethegradientg soitslengthis assumethereareM indices. i i equaltoT .Weusemagnitudethresholdsof1forallparametersex- i Ourlocalizedstylelossalgorithmthenproceedsbyfirstcollecting ceptforthecoefficientα oftheGramloss,whichhasamagnitude l the pixels associated with each of the M indexed regions within thresholdof100. the texture or style exemplar S. For each region, we build a dif- ferent Gram matrix and list of histograms.2 We track the indices 7 Implementation details alsoonthesynthesizedoutputO,alsotrackingthemasnecessary throughanimagepyramidforcoarse-to-finesynthesis.Duringsyn- thesis, we modify our previous losses (Equations 11 and 9) to be Thissectiondescribesourimplementationinmoredetail.Similarly spatiallyvarying. Specifically, weimposespatiallyvaryingGram tothepreviousliterature,wealsobasedourimplementationonthe andhistogramlosses,wherethestyleexemplarGrammatricesand VGG-19network,whichispre-trainedontheImageNetdataset[Si- exemplar histograms vary spatially based on the output index for monyanandZisserman2014]. Weuselayersrelu(rectifiedlinear thecurrentpixel. Forthehistogrammatching,wesimplyperform 3We use Johnson’s code [2015] for the comparisons with 2Onehistogramisbuiltforeachfeature,thesameasinSection4.3. Gatysetal.[2016] (1) (2) (3) (a) content (b) style (c) Selim et al. 2016 (d) Ours Figure7: (1)PaintingbyNumbers: Hereweshowafirstexampleofcontrollableparametricneuraltexturesynthesis. Originalimagesare ontheleft,synthesisresultsontheright,correspondingmasksaboveeachimage.(2)Oneexampleofportraitstyletransferusingpaintingby numbers. (3)Asecondexampleofportraitstyletransfer. WeshowstyletransferresultsforourmethodascomparedtoSelimetal.[2016]. Shownarethecontent(a)andstyleimage(b).Forourmethod,theseneedtobeaugmentedwithindexedmasks,asdescribedinSection5.2. In(c)and(d)aretheresultsofSelimetal.[2016]andourmethod.Notethatourmethodpreservesfine-scaleartistictexturebetter.However, ourmethodalsotransfersabitmoreoftheperson’s“identity,”primarilyduetohairandeyecolorchanges.Nevertheless,ourmethodisnot specializedforfaces,sothisisalreadyaninterestingresult. unit)1 1,relu2 1,relu3 1andrelu4 1fortheGramlossesintex- Graphics(TOG)35,4,65. turesynthesisandstyletransfer.Thehistogramlossesarecomputed onlyatlayersrelu4 1andrelu1 1(otherlayersareignored:thisis BARNES, C., SHECHTMAN, E., FINKELSTEIN, A., AND GOLD- equivalenttofixingtheirweightsaszerointheloss). Contentloss MAN, D. 2009. Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Transactions on iscomputedonlyatrelu4 1whendoingstyletransfer. Thetotal Graphics-TOG28,3,24. variationlosssmoothsoutnoisethatresultsfromtheoptimization processandisthusperformedonlyonthefirstconvolutionallayer. BARNES,C.,ZHANG,F.-L.,LOU,L.,WU,X.,ANDHU,S.-M. 2015. Patchtable: efficientpatchqueriesforlargedatasetsand As noted in Section 5.1, we synthesize our images in a multi- applications. ACMTransactionsonGraphics(TOG)34,4,97. resolution process. We convert our input images into a pyramid. Duringsynthesis, westartatthebottomofthepyramid, initialize BERGER, G., AND MEMISEVIC, R. 2016. Incorporating towhitenoise,andaftereachlevelisfinishedsynthesizingweuse long-range consistency in cnn-based texture generation. arXiv bi-linearinterpolationtoupsampletothenextlevel. preprintarXiv:1606.01286. Ratherthanaddingapaddinginournetwork,weinsteadoptforcir- BURT, P., AND ADELSON, E. 1983. Thelaplacianpyramidasa cularconvolution. Thisproducestexturesthattilewiththemselves compactimagecode.IEEETransactionsoncommunications31, anddoesnotseemtocauseanystrangeeffectsduringstyletransfer. 4,532–540. 8 Results and discussion CHANG,H.,YU,F.,WANG,J.,ASHLEY,D.,ANDFINKELSTEIN, A.2016.Automatictriageforaphotoseries.ACMTransactions onGraphics(TOG). ResultsforourtexturesynthesismethodareshowninFigure8.Our styletransferresultsareshowninFigure9,Figure10,Figure11and CHEN, T. Q., AND SCHMIDT, M. 2016. Fast patch-based style Figure12.OursupplementalHTMLincludesmanymoreresults. transferofarbitrarystyle. arXivpreprintarXiv:1612.04337. Wenowdiscusssomeadvantagesofourresult.Ourhistogramloss DARABI,S.,SHECHTMAN,E.,BARNES,C.,GOLDMAN,D.B., addressesinstabilitiesbyensuringthatthefullstatisticaldistribu- AND SEN, P. 2012. Image melding: Combining inconsistent tion of the features is preserved. In addition to improving image imagesusingpatch-basedsynthesis. ACMTrans.Graph.31,4, quality, our full loss (including the histogram loss) also requires 82–1. feweriterationstoconverge. Weuseameanof700iterationsfor ourresults,whichwefindconsistentlygivegoodquality,and1000 DIAMANTI, O., BARNES, C., PARIS, S., SHECHTMAN, E., AND iterationsfortheresultsofGatysetal[2016],whichwefindisstill SORKINE-HORNUNG,O.2015.Synthesisofcompleximageap- pearancefromlimitedexemplars. ACMTransactionsonGraph- sometimesunstableatthatpoint. Wenotethatmethodsbasedon ics(TOG)34,2,22. GrammatricessuchasGatysetal[Gatysetal.2016]canbecome unstableovertheiterationcount. Weinterpretthisasbeingcaused EFROS, A. A., AND FREEMAN, W. T. 2001. Image quilting by the mean and variance being free to drift, as we discussed in for texture synthesis and transfer. In Proceedings of the 28th Section4.2.Byaddinghistogramstoourlossfunction,theresultis annualconferenceonComputergraphicsandinteractivetech- morestableandconvergesbetter,bothspatiallyandoveriterations. niques,ACM,341–346. Running times for our method are as follows. We used a ma- EFROS, A. A., AND LEUNG, T. K. 1999. Texture synthesis chinewithfourphysicalcores(IntelCorei5-6600k),with3.5GHz, by non-parametric sampling. In Computer Vision, 1999. The 64 GB of RAM, and an Nvidia Geforce GTX1070 GPU with 8 Proceedings of the Seventh IEEE International Conference on, GB of GPU RAM, running ArchLinux. For a single iteration on vol.2,IEEE,1033–1038. the CPU, our method takes 7 minutes and 8 seconds, whereas Gatysetal.[2016]takes15minutes35seconds. Thisequatesto FISˇER, J., JAMRISˇKA, O., LUKA´Cˇ, M., SHECHTMAN, E., ourmethodrequiringonly45.7%oftherunningtimefortheorigi- ASENTE, P., LU, J., AND SY`KORA, D. 2016. Stylit: nalGatysmethod.Ourapproachusedthreepyramidlevelsandhis- illumination-guidedexample-basedstylizationof3drenderings. togramlossatrelu4 1andrelu1 1. Thesemetricsweremeasured ACMTransactionsonGraphics(TOG)35,4,92. over 50 iterations synthesizing a 512x512 output. We currently GATYS, L., ECKER, A. S., AND BETHGE, M. 2015. Texture have most but not all of our algorithm implemented on the GPU. synthesisusingconvolutionalneuralnetworks. InAdvancesin BecausenotallofitisimplementedontheGPU,aspeedcompar- NeuralInformationProcessingSystems,262–270. isonwithourall-GPUimplementationofGatysetal.[2016]isnot meaningful,thereforeweranbothapproachesusingCPUonly. GATYS, L. A., ECKER, A. S., AND BETHGE, M. 2016. Image styletransferusingconvolutionalneuralnetworks. InProceed- 9 Conclusion ings of the IEEE Conference on Computer Vision and Pattern Recognition,2414–2423. The key insight of this paper is that the loss function introduced HACOHEN, Y., SHECHTMAN, E., GOLDMAN, D. B., AND byGatysetal.[2015]andcarriedforwardbyfollow-uppaperscan LISCHINSKI,D.2011.Non-rigiddensecorrespondencewithap- beimprovedinstabilityandqualitybyimposinghistogramlosses, plicationsforimageenhancement. ACMtransactionsongraph- whichbetterconstrainthedispersionofthetexturestatistics. We ics(TOG)30,4,70. alsoshowimprovementsbyautomatingtheparametertuning,and inartisticcontrols. Thispaperimprovesontheseaspectsforboth HE,K.,ZHANG,X.,REN,S.,ANDSUN,J. 2016. Deepresidual texturesynthesisandstyletransferapplications. learning for image recognition. In 2016 IEEE Conference on ComputerVisionandPatternRecognition(CVPR),770–778. References HEEGER,D.J.,ANDBERGEN,J.R.1995.Pyramid-basedtexture analysis/synthesis. In Proceedings of the 22nd annual confer- AITTALA, M., AILA, T., AND LEHTINEN, J. 2016. Reflectance ence on Computer graphics and interactive techniques, ACM, modeling by neural texture synthesis. ACM Transactions on 229–238. (a) Input (b) Gatys et al. 2015 (c) Ours (a) Input (b) Gatys et al. 2015 (c) Ours Figure8:Resultsofourmethodfortexturesynthesis.OurGatysresultsweregeneratedfromJohnson’scode[Johnson2015]. Gatys et al. 2016 Ours Figure9:Resultsofourmethodforstyletransfer.BonsaiimagefromBonsaiExperience.com.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.