ebook img

Unsupervised Image-to-Image Translation with Generative Adversarial Networks PDF

0.84 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Unsupervised Image-to-Image Translation with Generative Adversarial Networks

Unsupervised Image-to-Image Translation with Generative Adversarial Networks HaoDong PaarthNeekhara ChaoWu ImperialCollegeLondon IndianInstituteofTechnology ImperialCollegeLondon [email protected] [email protected] [email protected] 7 1 YikeGuo 0 ImperialCollegeLondon 2 [email protected] n a J 0 Abstract 1 ] It’susefultoautomaticallytransformanimagefromits V originalformtosomesyntheticform(style,partialcontents, C etc.),whilekeepingtheoriginalstructureorsemantics. We . definethisrequirementasthe”image-to-imagetranslation” s c problem, and propose a general approach to achieve it, [ basedondeepconvolutionalandconditionalgenerativead- versarial networks (GANs), which has gained a phenome- 1 v nalsuccesstolearnmappingimagesfromnoiseinputsince 6 2014. In this work, we develop a two step (unsupervised) 7 learning method to translate images between different do- 6 mains by using unlabeled images without specifying any 2 correspondence between them, so that to avoid the cost of 0 acquiring labeled data. Compared with prior works, we . 1 demonstrated the capacity of generality in our model, by 0 which variance of translations can be conduct by a single 7 typeofmodel. Suchcapabilityisdesirableinapplications Figure1.Exampleresultsofunsupervisedimage-to-imagetrans- 1 lation. Theoddcolumnsareinputimages,theevencolumnsare : likebidirectionaltranslation v thecorrespondingsynthesizedoutputimages. i X r 1.Introduction a posed photo from sketch [3], but each of these tasks has Inthisworkweareinterestedintheproblemoftranslat- been tackled with separate, special-purpose machinery. It ingimagesfromonedomaintotheimagesofotherdomains wouldbemorepreferableifthereisanuniversalmodelto e.g.faceofapersontoanother,aerialimagetomap,edges achieveallthesefunctions. image to photos, black and white image to color image. Fortunately, based on Generative Adversarial Networks Such translation is useful in creating synthetic images for (GANs) [5], a study [7] develop a common framework variouspurpose.Forexample,bytranslatingrealimagesto suitable for different problems, the generator in [7] is an syntheticcounterparts, wecaneasilycreatebalancedsizes encoder-decodernetworkwhichtriestosynthesizefakeim- ofdataforimageclassificationtasks.Wemightalsousethis age conditioned on a given image to fool discriminator, method to modify the context of photos (such as weather while the discriminator tries to identify the fake image by andbackground). comparingwiththecorrespondingtargetimage. However, Thecommunityhasbeenputaloteffortinthisdirection this work is supervised learning, therefore, it needs inten- e.g. face swapping [9], changing time of day for outdoor siveinputoflabeleddatafortraining, whichisnotalways image[19],changingweatherofoutdoorimage[10],com- applicableinrealcases. 1 Recently, another work [20] demonstrated an unsuper- indicatesthescoreofaimagetoberealandX =G(z). fake visedapproachforimage-to-imagetranslation,thetraining images do not need to have a specific input and target. It = log(P(sD(X )))+log(1 P(sD(X ))) employedaDomainTransferNetworktolearnagenerative LD | real − | fake network to map images from source domain to images in G = log(P(sD(Xfake))) (1) L | targetdomain. Italsoadoptedafunctionfrombothdomain Following this, mass of studies show that images can tosomemetric,whichisinvariantunderG. Themethodis be synthesized based on conditions such as sentence [18], generalinitsscopeanddoesnotrelyonapredefinedfam- class label [13], image inpainting [16] and image manip- ily of perceptual losses. Therefore, it can also be used for ulationbyconstraints[21]etc. Tosynthesizeimagecondi- unsuperviseddomainadaption[2,4]. tionedonagivenclasslabelc,auxiliaryclassifierGAN[15] Inthisworkwepurposedatwosteplearningmethodthat demonstratehowtosynthesizehighqualityimagebyusing utilizeconditionalGANtolearntheglobalfeatureofdiffer- anewstructureandlossfunction. InthegeneratorG,input entdomains, andlearntosynthesizeplausibleimagefrom the class label c and the noise vector z, then a deconvolu- random noise and given class/domain label. After that, a tionalnetwork[17]performsimagegenerationconditioned deepconvolutionalencoderisusedtolearnamappingfrom on both class label c and the noise vector z. The discrim- an image to it’s latent representation (noise). We demon- inator is a convolutional network, which outputs the class strated that using the trained deep convolutional encoder label for the training data, that except requiring generate andconditionalgenerator,imagesofdifferentdomainscan plausibleimage,thegeneratoralsotrainedtomaximizethe be translated bidirectionally. Besides unsupervised learn- log-likelihood of the synthesize image to the correct class ing, our method proposed a new way of training image [14]. Thelossfunctionsasbelow,whereX =G(z,c): fake encoder. To encourage the community to train different dataset,wewillmakecodepubliclyavailableat1. One significant feature of our model is that it has pre- D = log(P(sD(Xreal)))+log(1 P(sD(Xfake))) L | − | liminarilycapabilityofonelearningalgorithm. According +log(P(cD(X ))) real | to one learning algorithm hypothesis, popularized by Jeff = log(P(sD(X )))+log(P(cD(X ))) (2) G fake fake Hawkins[6],auniversallearningmachineisapowerfuland L | | general model for intelligent agents like human, in which To synthesize an image from another image, the input thesamenetworkstructurecanservefordifferentlearning image need to be projected into latent representation. In purposes. Suchcapabilityisandesirableextensionofdeep [21],authorintroducedamethodtotrainafeedforwardnet- learning algorithms, so that a universal algorithm as along work for projecting an image to latent variables. The loss withitsdeploymentcanadapttothevarioustasksonhand. consistsoftwoparts,mean-square-error(MSE)ofinputand Comparedtopriorworksinimagestranslation,ourmethod outputimageforpixel-wiselevelreconstructionandtheloss hassuchuniversalfeaturetosupportdifferentlearningsce- of hidden outputs for feature level reconstruction. Com- narios paredwithpreviousworks,weintroducedanewmethodto trainanimageencoderspeciallyforgenerator. 2.Background The one learning algorithm is a concept in machine learningandneurosciencethathasbeenstudiedextensively Inthissection,wediscusstherelatedworkandthetech- by Je Hawkins [6]. It’s based on a memory-prediction niques be used in our work. A generative adversarial net- framework. The framework is based upon the work on work(GAN)consistsofageneratorGandadiscriminator columnar organization of the cerebral cortex which shows D[5]. Thediscriminatoristrainedtomaximizetheproba- thatallregionsofthecortexperformthesameoperation. bilityofassigningthecorrectlabeltobothtrainingsamples andsamplesgeneratedfromgenerator.While,thegenerator 3.Method is trained to minimize the probability of assigning a sam- Our image translation model is constituted with both ples from generator to be fake, so as to generate plausible deep convolutional encoder and decoder. It has input of samplesconditionedontherandomnoisevectorz.Follow- imagesfromsourcedomainandoutputimagesoftargetdo- ingthis,deepconvolutionalGAN[17]learnstosynthesize main(s), which conditioned on a given class/domain label. images from unlabeled data, then an image can be repre- Thetrainingprocesshastwosteps:firststepforimagegen- sented by a fixed number of latent variables, and each of eratorGforalldomains,andsecondstepforimageencoder variable reflects a certain feature [17] e.g. controlling the E for all domains as well. During the translation process, windowsizeofbathroom,smileoffaceandsoon.Theloss the conditional class label is applied to input image from functions of D and G are shown as below, where P(sX) | source domain and go through the trained network for the 1https://github.com/coming-soon outputimageintargetdomain. Figure2.Networkarchitecturesoftwosteplearning. Thisexampleisfortwoclasses/domains,butourmethodcanbeextendtomultiple domains. 3.1.Learningsharedfeature 3.2.Learningimageencoder Afterthefirststep,thegeneratorisabletogeneratecor- To utilize both the capability of unsupervised feature respondingimagesfordifferentdomainsbykeepingthela- learningofGANs,andtherepresentationcapabilityofcon- tentvariablesfixedandchangingtheclasslabel. Insecond volutional encoder, we separated the learning process into step, welearnamappingfromimagetogloballatentvari- twosteps. ablerepresentationz. Instead of applying trained generator G on top of im- Infirststep,asthelefthandsideofFigures2shows,we ageencoderE [21]andexceptthattheoutputimagelooks usedauxiliaryclassifierGAN[15]tolearntheglobalshared thesamewiththeinputimage,weintroduceanewmethod featuresoftheimagessampledfromdifferentdomains.The that applies E after the G from the first step as the mid- shared features are represented as a latent variable (noise) dleofFigure2shows. Wethenexceptthenetworkoutputs vectorz.Inordertolimitthevariablesintoaspecificrange, the same variables with the input variables. Mean-square- wedefineditasz R ( 1,1). error(MSE)betweeninputlatentvariables(noise)andout- ∈ ∼U − putvariableisusedastheloss.Wefoundthismethodisnot Theideaisbasedonthefactthatclass-independentinfor- onlyabletoreconstructthedetailedfeaturebutitalsospeed mationcontainsglobalstructureaboutthesynthesizedim- upthetraining. age[15]. Therefore,ifthedomainsaresemanticallysame, The reason of this method working well is that: as the orsimilar(likedifferentfacesofpeoples,differentweather trainedgeneratorisabletosynthesizeimagefromarbitrary ofenvironment),weexpectthecommonfeaturesacrossthe noise,weusetheinfinitesynthesizeimagesfromthetrained domains(likesmile,facepose,backgroundetc)tobecap- generator instead of the finite real image from dataset, so turedbythesamelatentrepresentationinthedomains. We that we can have much more training data. Such feature leveragethisfeaturefortranslatingtheglobalstructurefrom is crucial since to train a image encoder with similar level onedomaintotheother. ofrepresentationcapacityasgenerator,thetrainingdatasets usedshouldbeatsimilarscale. ConditionalGANsusuallyuseanembeddinglayer[18], Besides,therearetwokeyspointsabouttraining: firstly, wefoundsameperformancebetweenembeddinglayerand instead of passing more than just one class of images, we one-hot format input. Although through experiment, we passedimagesofallclasses,specifyingtheclassalongwith found the embedding layer contributed less to the perfor- it,thisenablestheimageencoderwhichisabletoworkfor mance. Webelievethisisduetothelimitsizeofsampling all classes of image so as to achieve bidirectional transla- domains in our experiment. If there are increasing size of tion; Secondly,asthenoisez hasuniformdistribution,the domains(alongwiththeirrelationshipsasword2vec[12]), outputsofEarescaledinto[ 1,1]bytanhactivationfunc- such embedding layer can contribute more to the perfor- tion. − mance,comparedwithone-hotformat. Figure3.Exampleresultsofourmethodongendertransformation. Theoddcolumnsareinputimages,theevencolumnsarethecorre- spondingsynthesizedoutputimages. 3.3.Translation 0.0002, andusedtheAdamoptimizer[8]withmomentum of0.5forbothstep1and2.Duringtraining,weusedbatch After finishing the training of generator and image en- size of 64, trained for 100 epochs for step 1 and 20,000 coder, as the right of Figure 2 shows, given an image and stepsforstep2. Besides, werandomlyflip, zoomin, crop desiredclass/domainlabel,thenetworkfirstlymapstheim- androtatethetrainingimagesfordataaugmentationinstep age to global latent representation z and then synthesizes 1. TheimplementationwasbuiltwithTensorLayer2library theimageoftargetdomainbyusingconditionalgenerator. ontopofTensorFlow[1]. 4.Experiments 4.1.Gendertransformation In this section, we firstly present results on the celebA As Figure 3 illustrated, the proposed method not only face dataset for gender transformation and the presiden- learned the characteristics and expressions of human face, tial debate video for face swapping. The celebA face butalsolearntoreconstructbackgroundinsomedegree.On dataset [11] contains 84434 male images and 118165 fe- onehand,thetranslationhasachieveddesiredperformance, maleimages, whichhasvarietyofbackgroundsandfaces. with the synthetic image having similar image quality as Fortheseconddataset,wedetectedandextractedfaceim- inputimages. Ontheotherhand,thenetworkcansuccess- agesofObamaandHillaryfromtwoshortpresidentialde- fullytranslatefaceimagesforbothfemaletomaleandmale bate videos, there are 8452 images for Obama and 5065 tofemale,whichmeans,ourmethodisbidirectional. imagesforHillary,whichhasvarietyoffaceexpression. 4.2.Faceswapping WeusedthesameGANandimageencoderarchitecture forbothdatasets.Theimagesizewas64x64,with3chan- Figure 4 shows the proposed method learns the expres- nel colors. We set the number of latent variables z to 100 sionsandfaceorientationveryefficiently,soitdoesnotvi- andembeddedtheclasslabeltoavectorof5values. Asin olatethecontext. Suchfeatureisveryusefulifwewantto auxiliaryclassifierGAN[15], weusedthelearningrateof swapthefacesinvideo. 5.Conclusion Dataset Task In this work, we proposed a simple and effective two celebA gendertransformation step learning method for universal unsupervised image-to- presidentialdebatevideos faceswapping imagetranslation. Innextstep, weaimtofurtherscaleup Table1.Datasetsandtasks 2https://github.com/zsdonghao/tensorlayer [7] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image- to-image translation with conditional adversarial networks. arXivpreprintarXiv:1611.07004,2016. 1 [8] D.KingmaandJ.Ba. Adam: AMethodforStochasticOp- timization.InternationalConferenceonLearningRepresen- tations,2014. 4 [9] I.Korshunova,W.Shi,J.Dambre,andL.Theis. Fastface- swap using convolutional neural networks. arXiv preprint arXiv:1611.09577,2016. 1 [10] P.-Y.Laffont,Z.Ren,X.Tao,C.Qian,andJ.Hays.Transient attributesforhigh-levelunderstandingandeditingofoutdoor scenes. ACMTransactionsonGraphics(TOG),33(4):149, 2014. 1 [11] Z.Liu,P.Luo,X.Wang,andX.Tang. Deeplearningface attributesinthewild. InProceedingsofInternationalCon- ferenceonComputerVision(ICCV),2015. 4 [12] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases andtheircompositionality. InAdvancesinneuralinforma- tionprocessingsystems,pages3111–3119,2013. 3 [13] M.MirzaandS.Osindero. Conditionalgenerativeadversar- ialnets. arXivpreprintarXiv:1411.1784,2014. 2 Figure4.Exampleresultsofourmethodonfaceswapping [14] A.Odena. Semi-supervisedlearningwithgenerativeadver- sarialnetworks. arXivpreprintarXiv:1606.01583,2016. 2 [15] A.Odena,C.Olah,andJ.Shlens. ConditionalImageSyn- thenetworkforhigherresolutionandtryonmoredatasets. thesisWithAuxiliaryClassifierGANs. 2016. 2,3,4 In the future, it might be interesting to combine symbolic [16] D.Pathak,P.Krahenbuhl,J.Donahue,T.Darrell,andA.A. features(likeedgedetection)withtheoriginaldata,sothat Efros. Context encoders: Feature learning by inpainting. theperformanceoftranslationcanbefurtherimproved.We CVPR,2016. 2 will also try to make the method more general, to give it [17] A.Radford,L.Metz,andS.Chintala. Unsupervisedrepre- bettercharacteristicofonelearningalgorithm,andapplyit sentationlearningwithdeepconvolutionalgenerativeadver- tovariousdifferentotherapplications. Weareconsidering sarialnetworks. arXivpreprintarXiv:1511.06434,2015. 2 toproposedanparadigmof”modularlearning”whichcon- [18] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, ducts continuous recursive self-improvement with regards andH.Lee. Generativeadversarialtexttoimagesynthesis. totheutilityfunction(rewardsystem). Wecanviewthisas ICML,2016. 2,3 second(andhigher)orderoptimization. [19] Y.Shih,S.Paris,F.Durand,andW.T.Freeman.Data-driven hallucinationofdifferenttimesofdayfromasingleoutdoor References photo. ACM Transactions on Graphics (TOG), 32(6):200, 2013. 1 [1] A. Agarwal, P. Barham, and e. a. Brevdo, Eugene. Ten- [20] Y. Taigman, A. Polyak, and L. Wolf. Unsupervised cross- sorFlow: Large-ScaleMachineLearningonHeterogeneous domainimagegeneration.arXivpreprintarXiv:1611.02200, DistributedSystems. 2015. 4 2016. 2 [2] M.Chen,Z.Xu,K.Weinberger,andF.Sha.Marginalizedde- [21] J.-Y. Zhu, P. Kra¨henbu¨hl, E. Shechtman, and A. A. Efros. noisingautoencodersfordomainadaptation. arXivpreprint Generativevisualmanipulationonthenaturalimagemani- arXiv:1206.4683,2012. 2 fold. In European Conference on Computer Vision, pages [3] T. Chen, M.-M. Cheng, P. Tan, A. Shamir, and S.-M. Hu. 597–613.Springer,2016. 2,3 Sketch2photo: internetimagemontage. ACMTransactions onGraphics(TOG),28(5):124,2009. 1 [4] Y.Ganin,E.Ustinova,H.Ajakan,P.Germain,H.Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain- adversarial training of neural networks. arXiv preprint arXiv:1505.07818,2015. 2 [5] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley,S.Ozair,A.Courville,andY.Bengio.Gen- erativeadversarialnets. NIPS,2014. 1,2 [6] J.HawkinsandS.Blakeslee. Onintelligence. Macmillan, 2007. 2

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.