ebook img

Wasserstein Generative Adversarial Network Based De-Blurring Using Perceptual Similarity PDF

27 Pages·2017·13.15 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Wasserstein Generative Adversarial Network Based De-Blurring Using Perceptual Similarity

applied sciences Article Wasserstein Generative Adversarial Network based Article WDea-sbselursrtreining Guseinnegr aPteivrceeAptduvaelr SsaimriaillaNrietytw ork Based De-Blurring Using Perceptual Similarity Minsoo Hong and Yoonsik Choe * M1 inDseopoarHtmonengt aonf EdleYcotroicnasli &k CElhecotero*nic Engineering, Yonsei University, Seoul, Korea; [email protected], [email protected] DepartmentofElectrical&ElectronicEngineering,YonseiUniversity,Seoul03722,Korea;[email protected] * Correspondence: [email protected]; Tel.: +82-2-2123-2774 * Correspondence: [email protected];Tel.:+82-2-2123-2774 Revised: May 31, 2019; Accepted: 04 June 2019; Published: date (cid:1)(cid:2)(cid:3)(cid:1)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:1) (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7) Received: 31May2019;Accepted: 4June2019;Published: 8June2019 Abstract: The de-blurring of blurred images is one of the most important image processing Ambesthtroadcts: aTnhde idt ec-abnlu brrei nugseodf bfolurr trhede pimreapgreosciessosinnego sfttehpe imn omstainmy pmorutlatnimtiemdaiag eanpdro ccoesmsipnugtemr evthisoiodns aanpdpliitcacatinonbse. uRseecdefnotrlyt,h edep-rbelpurrorcinegss imngetshtoepdsi nhmavaen ybemeun ltpimerefodriamaendd bcyo mnpeuurtaerl vniestiwonorakp pmliectahtioodnss,. Rsuecchen talsy ,dthee-b lguernreinrgatmiveet haoddvsehrasavreiable enneptweroforkrm (eGdAbNy)n, euwrhailcnhe twis oar kpmoewtheorfdusl, sguecnhearsattihveeg enneetwraotirvke. aAdmveornsga rmiaalnnyet wdioffrekre(GntA tNyp),ews hofic GhAisNa,p tohwe eprrfouplogseende rmateivtheonde tiws porekrf.oArmmeodn gusminagn ythdei ffWeraesnstertsytpeeins ogfenGeAraNti,vteh eadpvroerpsoasreiadl mneettwhoodrki swpiethrf ogrrmadeidenuts pinegnathltey W(WasGseArsNteGinP)g. eSninercaet ievdegaed ivneforsramriaatlionne tiws othrke wmiothst girmadpioernttanpte nfaaclttyor( WinG aAnN imGaPg).eS, itnhcee setdyglee lionsfos rfmunatcitoionni sisth aepmploisetdi mtop roerptarnestefnact ttohrei npaenrciemptaugael, tinhfeosrtmyaletiolons soffu tnhcet ieodngies ianp porlideedr ttoo rperperseesrevnet sthmeapll eerdcegpet uinafloirnmfoartmioant iaonndo cfatphteureed gites ipneorcredpetrutaol psirmesielarrvietys. mAasl lae dregseulitn, ftohrem partoiopnosaendd mcaepthtuorde iimtspproervceesp tthuea lssimimilialarirtiyty b.eAtwseaenre ssuhlatr,pt haendp rbolpuorsreedd mimeathgeosd bimy pmroinviemsitzhiengsi mthiela rWityasbseetrwsteeeinn sdhisatrapnacen,d abnldu rirte dcaipmtaugreess bwyemll itnhiem pizeirncgepthtueaWl assimseirlsatreiitny duissitnagn cteh,ea nstdyliet claopsst ufruenscwtieolnl,t hceonpseirdceerpintuga lthseim ciolarrrietlyatuiosinn goft hfeeasttuyrleesl oisns tfhuen cctoionnv,oclountisoindaelr innegutrhael cnoertrweolartkio (nCoNfNfe)a. tuTroe scionntfhiremc otnhve opluetrifoonrmalannecuer aolfn tehtew oprrkop(CosNeNd )m.Teothcoodn,fi rthmreteh eepxperefroimrmeanntsc eaoref tchoendpurocpteods eudsimnget htwodo, dtharteaeseetxsp: ethriem GenOtPsRaOre cLoanrgdeu catnedd uKsoinhgletrw doatdaasteats. eTtsh:et hoeptGimOaPlR sOoluLtairognes aanrde Kfoouhnlder bdya tcahsaent.gTinhge othpeti mpaarlasmoleutetiro nvsalaureesf oeuxpnderbimyecnhtaanllgyi.n Cgothnesepqaureanmtleyt,e rthvea leuxepseerximpeernimts ednetpalilcyt. Cthoant stehqeu epnrtolyp,otsheede xmpeetrhimode natcshdieevpeicst 0th.9a8t thhieghperor ppoesrefdormmeatnhcoed ianc hsitervuecstu0r.9a8l shiimghilearripteyr f(oSrSmIMan) caenidn soturutpceturfroarlmsism oitlhareirt yd(eS-bSlIuMr)rianngd moeutthpoedrfso irnm tsheo tchaesre doef- bboluthrr dinagtamseettsh. odsinthecaseofbothdatasets. KKeeyywwoorrddss:: ddeebblluurrrriinngg;; ggeenneerraattiivvee aaddvveerrssaarriiaall nneettwwoorrkk;; ppeerrcceeppttuuaall ssiimmiillaarriittyy;; ssttyyllee iinnffoorrmmaattiioonn;; WWaasssseerrsstteeiinn ddiissttaannccee 11.. IInnttrroodduuccttiioonn DDee--bblluurrrriinngg iiss oonnee ooff tthhee sstteeaaddiillyy ssttuuddiieedd tteecchhnniiqquueess iinn iimmaaggee pprroocceessssiinngg fifieellddss aanndd aaiimmss ttoo iimmpprroovvee tthhee sshhaarrppnneessss ooff aann iimmaaggee bbyy eelilmimininaatitningg bblulur rnnooisiese. .InI nFiFgiugruer e1,1 t,hteh deed-be-lbulrurirnrign mgemtheothdo ids iussuedse tdo totratrnasnfosfromrm thteh eblbulrurrerde dimimagaeg etoto ththee sshhaarrpp imimaaggee. .TToo rreemmoovvee bblluurr nnooiissee,, mmaannyy ddee--bblluurrrriinngg tteecchhnniiqquueess wweerreer erseesaeracrhcehde,de,. ge..,gt.h, ethbeli nbdlidnedc odnevcoolnuvtioolnutaiolgno raitlghomr,itbhimlat, ebraillaatnerdalW aiennde rWfiiletenrear mfioltnegr oamthoernsg[ 1o,t2h]e.rs [1,2]. Figure1. BlurredandsharpimageofGOPROdataset. Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci Appl.Sci.2019,9,2358;doi:10.3390/app9112358 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, x FOR PEER REVIEW 2 of 29 Appl.Sci.2019,9,2358 2of27 Figure 1. Blurred and sharp image of GOPRO dataset. In recent years, deep learning methods have been adapted to image processing fields, showing Inrecentyears,deeplearningmethodshavebeenadaptedtoimageprocessingfields,showing better performance than the previous methods. Likewise, many researchers have applied deep betterperformancethanthepreviousmethods. Likewise,manyresearchershaveapplieddeeplearning learning methods, e.g., convolutional neural network (CNN) and generative adversarial network methods, e.g., convolutional neural network (CNN) and generative adversarial network (GAN), (GAN), in order to resolve blur noise [3–5]. Especially, since GAN shows high quality in generating inordertoresolveblurnoise[3–5]. Especially,sinceGANshowshighqualityingeneratingtexture texture information of images, this paper adapts the GAN-based approach to eliminate blur noise. informationofimages,thispaperadaptstheGAN-basedapproachtoeliminateblurnoise. TTyyppiiccaall GGAANN--bbaasseedd ddee--bblluurrrriinngg mmeetthhooddss [[66]] aarree pprroocceesssseedd bbyy mmiinniimmiizziinngg tthhee ddiiffffeerreennccee ooff pixel values, which improves peak signal to noise ratio (PSNR). However, it cannot reflect the pixel values, which improves peak signal to noise ratio (PSNR). However, it cannot reflect the similarity for target images. Therefore, the proposed method uses the conditional GAN [7], which similarityfortargetimages. Therefore,theproposedmethodusestheconditionalGAN[7],which mmiinniimmiizzeess tthhee ddiiffffeerreennccee ooff iimmaaggee ddiissttrriibbuuttiioonnss.. IInn ddeettaaiill,, pprrooppoosseedd mmeetthhoodd aaddaappttss WWaasssseerrsstteeiinn generative adversarial network with gradient penalty (WGANGP), which is based on conditional generative adversarial network with gradient penalty (WGANGP), which is based on conditional GAN. The Wasserstein generative adversarial network (WGAN) was proposed to stabilize the GAN.TheWassersteingenerativeadversarialnetwork(WGAN)wasproposedtostabilizethetraining training of GAN by minimizing the Wasserstein distance of joint distributions instead of whole ofGANbyminimizingtheWassersteindistanceofjointdistributionsinsteadofwholeprobability probability distributions [8]. Additionally, to improve the performance of WGAN, the gradient distributions [8]. Additionally, to improve the performance of WGAN, the gradient penalty term penalty term was proposed [9], known as a weight clipping method, in order to limit the gradient wasproposed[9],knownasaweightclippingmethod,inordertolimitthegradientweightinrange, weight in range, and applying this term to the WGA can prevent the gradient vanishing and andapplyingthistermtotheWGAcanpreventthegradientvanishingandexplodingproblem. exploding problem. By applying the content loss function, i.e., the procedure of extracting feature maps in CNN, By applying the content loss function, i.e., the procedure of extracting feature maps in CNN, to toWGANGP,theimprovedde-blurringmethodwasproposedtocapturetheperceptualinformation WGANGP, the improved de-blurring method was proposed to capture the perceptual information ofimages[10]. Thecontentlossfunctionisoneofmanyperceptuallossfunctionsandimprovesthe of images [10]. The content loss function is one of many perceptual loss functions and improves the similaritybetweentheblurredandsharpimages. ApplyingthecontentlossfunctiontoWGANGPcan similarity between the blurred and sharp images. Applying the content loss function to WGANGP capturetheperceptualinformation,suchascolorandspatialstructure. However,itdoesnotpreserve can capture the perceptual information, such as color and spatial structure. However, it does not thedetailededgeinformationofimageatthesametime. preserve the detailed edge information of image at the same time. Topreservemoreedgeinformationofthesharpimage,anotherperceptuallossfunctioncalled To preserve more edge information of the sharp image, another perceptual loss function called stylelossfunction[11]isintroduced. ThestylelossfunctionextractsmultiplefeaturemapsinCNN style loss function [11] is introduced. The style loss function extracts multiple feature maps in CNN andcalculatesbycovariancematrixtocapturemoreperceptualinformationanddetailededge. Asa and calculates by covariance matrix to capture more perceptual information and detailed edge. As a result,thestylelossfunctioncaptureshighersimilaritythanthecontentlossfunction,shownwith result, the style loss function captures higher similarity than the content loss function, shown with thesimilaritymapinFigure2. Thefigureshowsthattheboundaryinformationbetweenobjectand the similarity map in Figure 2. The figure shows that the boundary information between object and backgroundispreservedin(b)betterthan(a). background is preserved in (b) better than (a). (a) (b) FFiigguurree 22.. SSiimmiillaarriittyy mmaappss ooff ((aa)) tthhee ccoonntteenntt lloossss ffuunnccttiioonn aanndd ((bb)) tthhee ssttyyllee lloossss ffuunnccttiioonn.. IInn tthhiiss ppaappeerr,, tthhee ccoonntteenntt lloossss ffuunnccttiioonn iiss rreeppllaacceedd bbyy tthhee ssttyyllee lloossss ffuunnccttiioonn ttoo pprreesseerrvvee eeddggee iinnffoorrmmaattiioonn aanndd ppeerrcceeppttuuaall ssiimmiillaarriittyy.. SSuubbssttiittuuttiinngg tthhee ssttyyllee lloossss ffuunnccttiioonn ttoo WWGGAANNGGPP ccaann pprreesseerrvvee tthhee ddeettaaiilleedd eeddggee ssiinnccee iitt ccaann ccaappttuurree aa llaarrggee ppeerrcceeppttuuaall ssttyyllee aanndd hhiigghh lleevveell ffeeaattuurreess.. TThhee oobbjjeeccttiivvee ffuunnccttiioonn ooffn netewtwoorkrkis ips ropproopseodsebdy cboym cboimnibnigntinhge ltohses fluonssc tifounncotfioWnG oAf NWGGPAanNdGtPh easntydl ethloes sstfyulnec tlioosns. fTuhnecrteifoonr.e ,Tthheerneefotwreo, rtkhgee nneertwatoesrkd eg-ebnluerrraetdesi mdaeg-beluwrhreilde ciampatugeri nwghtihlee sciampitluarriitnygo fthpee rsciempitluaarlitsyt yolef panerdceppretusearlv sitnygled aentadi lperdeseedrgveiningf doermtaailteidon e.dge information. Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 29 Appl.Sci.2019,9,2358 3of27 In this work, it is experimentally shown that WGANGP with the style loss function can reconstruct edge-preserved de-blurring image and achieves high performance in structural similIanritthyi smweoarsku,riet i(sSSexIMpe)r. iAmlesnot,a all ycosmhopwarnistohnat eWxpGeAriNmGenPtw witihtht hdeifsfteyrleenlto pssafruanmcetitoenr vcaanlureesc oannsdtr tuhcet eddifgfeer-epnret sleorcvaetidondse -obfl uthrer ifnegatiumrae gmeaapn dis apcehrifeovremsehdig tho pfienrdfo trhme aonpcteiminals tsroulucttuiornal. sInim Sielactriiotyn m2, eGasAuNre, (WSSGIMAN).GAPls aon,da cpoemrcpepartuisaoln loesxsp feurnimcteionnt wariet hindtrioffderuecnetdp aanrdam theete prrvoablluemes danefdintihtieodn itffoe rseonlvtel othcaet ibolnusr onfoitshee isfe daetuscrreibmedap. Inis Speecrtfioornm 3e, dpetrocefipntduatlh oebojepcttiimvea lfusnolcutitoionn a.nIdn dSeetcatiilo anrc2h,itGeActuNr,eW ofG nAetNwGoPrka anrde pexeprcleapintuedal. lIons sSefuctnicotnio 4n, arreecoinntsrtorduuctc etdhea ndde-tbhleurprreodb liemmagdee fiunsiitniogn ptroospoolsveedt hmeebtlhuordn oisi sreeicsodnesstrcuricbteedd. IanndSe tchtieo pna3r,apmerecteeprst uaanldo bvjaelcutievse tfou nficntdio onpatnimdadle staoilluatricohni taercet uarneaolyfzneedtw, foorlkloawreeedx bpyla tinheed c.oInncSluescitoionn i4n, rSeeccotinosnt r5u. ctthede-blurredimageusingproposedmethodisreconstructedandtheparametersand valuestofindoptimalsolutionareanalyzed,followedbytheconclusioninSection5. 2. Preliminaries 2. Preliminaries In this section, basic deep learning methods and the perceptual loss function for de-blurring are descrIinbethdi.s Sseeccttiioonn ,2b.1a siinctrdoedeupcleesa rwnhinagt mGAetNho idss aanndd etxhpelpaienrsc etphteu oablljeocstsivfue nfcutniocntiofonr odfe G-bAluNrr. iIntg aalsroe ddeessccrriibbeesd .thSee clitmiointa2ti.1onins torof dGuAceNs wanhda tthGeA wNayiss aton dsoelxvpel athinesmt.h Ieno 2b.j2e,c tthivee Wfuanscsteirosnteoinf GGAANN. wItiathls oa dgreasdcrieibnets ptehneallitmy itteartmio niss oinftrGoAduNceadn dfotrh estawbailyizsintog stohlev etrtahienmin.g Ipnro2c.2e,ssth oef WGaAsNse,r sfotellionwGeAd Nbyw tihthe adegsrcardipietinotnps eonfa lptyertceerpmtuiasl ilnotsrso dfuuncecdtiofnosr satnadbi lhizoiwng toth ecatlrcauinlaitneg pperrocceepstsuaolf GloAssN in,f o2l.l3o. wFeindabllyy t2h.4e ddeesficnriepst itohnes iosfsupeesr cienp tsuoallvlionsgs fbuluncr tinoonissea nwdithho wthteo ccoamlcubliantaetipoenr coefp tWuaGlAloNss-iGnP2 .a3n. dFi npaelrlyce2p.4tudael filnoesss tfhuencistisoune.s insolvingblurnoisewiththecombinationofWGAN-GPandperceptuallossfunction. 2.1. GenerativeAdversarialNetwork 2.1. Generative Adversarial Network Thegenerativeadversarialnetwork(GAN)wasproposedin2014byGoodfellow[6]. GANis The generative adversarial network (GAN) was proposed in 2014 by Goodfellow [6]. GAN is usedforimagegenerationtaskandislearnedbyacompetitionbetweentwoneuralnetworkmodels. used for image generation task and is learned by a competition between two neural network models. Twoneuralnetworkmodelsarecalledthegenerator,G,andthediscriminator,D.ThegoalofGisto gTewnoe rnaetueriaml angeetwstohrakt mDocdaenlns oatred icsatlilnegdu tihshe gfreonmeraretoalr,i mG,a agneds, tahned dtihscergimoainlaotfoDr, iDs.t oThdeiff georaeln otifa Gte irse taol generate images that D cannot distinguish from real images, and the goal of D is to differentiate real imagesandgeneratedimages. Therefore,thegoalofGANcanbeexpressedas: images and generated images. Therefore, the goal of GAN can be expressed as: mmG(cid:3008)iinnmm(cid:3005)DaaxxE𝐸x(cid:3051)∼~P(cid:3017)(cid:3293)r[(cid:3427)lloogg((cid:3435)D𝐷((𝑥x)))(cid:3439)](cid:3431)++E𝐸z(cid:3053)∼~P(cid:3017)(cid:3301)z[(cid:3427)lloogg(cid:3435)(11−−D𝐷((G𝐺((z𝑧)))))(cid:3439)](cid:3431),, (1(1) ) wwhheerree 𝑥x iiss rreeaall ddaattaa,, z𝑧i sisr arannddoommG Gaauussssiaiannn nooiissee.. PPr(cid:2928) aanndd PPz(cid:2936)a areret htheed disisttrribibuuttiioonnss ooff rreeaall iimmaaggeess aanndd ggeenneerraatteedd iimmaaggeess,, rreessppeeccttiivveellyy,, aanndd tthhee bbaassee aarrcchhiitteeccttuurreeo offG GAANNi isss shhoowwnna assF Figiguurree3 3.. Real data 𝑥 ~ 𝑃 (cid:3045) Real Discriminator or Network Gaussia Generator Generated Fake n Network Data FFiigguurree 33.. AArrcchhiitteeccttuurree ooff GGAANN.. TThhee oobbjjeeccttiivvee ffuunnccttiioonn ooff GGAANN iiss aa ttyyppee ooff mmiinn--mmaaxx pprroobblleemm,, aanndd iitt iiss hhaarrdd ttoo aacchhiieevvee NNaasshh eeqquuiilliibbrriiuumm.. FFoorr tthhiiss rreeaassoonn,, GGAANN hhaass aann iinnsseeccuurree ttrraaiinniinngg pprroocceessss aanndd ssoommee pprroobblleemmss,, ssuucchh aass tthhee ggrraaddiieenntt vvaanniisshhiinngg oorr eexxppllooddiinngg pprroobblleemm aanndd mmooddee ccoollllaappssee.. Figure4showstheoptimalsolutionforGAN.ThedistributionofDshouldbeaflatform,andthe distributionofGshouldbethesameastherealdata,resultingingeneratingthebestdata. WhenDis perfect,GANisguaranteedwithD(x) =1andD(G(z)) =0. Then,theobjectivefunctionfallstozero andthelossupdatinggradientbecomeszero,presentingthegradientvanishingproblem. WhenD doesnotoperatesproperly,inaccuratefeedbackisfedintoG,andobjectivefunctioncannotrepresent thereality. Also,ifGdoesnotlearnthedistributionoftheentiretrainingdataandonlyafractionofthe trainingdataislearned,themodecollapseproblemoccurs. Inthiscase,Ggeneratesalimiteddiversity ofsamplesoreventhesamesample,regardlessoftheinput. Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 29 Appl.Sci.2019,9,2358 4of27 Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 29 Figure 4. Distribution of discriminator (blue), generator (green) and real data (black) according to the learning process. Figure 4 shows the optimal solution for GAN. The distribution of D should be a flat form, and the distribution of G should be the same as the real data, resulting in generating the best data. When D is perfect, GAN is guaranteed with 𝐷(𝑥)=1 and 𝐷(𝐺(𝑧))=0. Then, the objective function falls to zero and the loss updating gradient becomes zero, presenting the gradient vanishing problem. When D does not operates properly, inaccurate feedback is fed into G, and objective function cannot represent the reality. Also, if G does not learn the distribution of the entire training data and only a fraction of the training data is learned, the mode collapse problem occurs. In this case, G generates a Figure4. Distributionofdiscriminator(blue),generator(green)andrealdata(black)accordingtothe Figure 4. Distribution of discriminator (blue), generator (green) and real data (black) according to the limited diversity of samples or even the same sample, regardless of the input. learningprocess. learning process. To avoid the above problems and make stable training process, Wasserstein GAN was propTooseadv o[i8d].t hIte ahbaosv ae pnreowbl eombjseactnidvem fauknecstitoabnl edterraiivneindg fproromc etshs,eW WasassesresrtsetieninG AdNistwanacsep, rwophoicshed is[8 a]. Figure 4 shows the optimal solution for GAN. The distribution of D should be a flat form, and Imtheaassuarnee wofo tbhjeec tdivisetfaunnccet iobnetdweereivne dtwfroo mjotihnet Wparossbearbsitleiitny ddisitsatrnicbeu,twiohnisc.h Ais admeteaaisleudre doefsthcreipdtiisotann coef the distribution of G should be the same as the real data, resulting in generating the best data. When bWetGwAeeNn wtwilol bjoei nmtepnrtoiboanbeidli tiyn d2.i2st. ributions. AdetaileddescriptionofWGANwillbementionedin2.2. D is perfect, GAN is guaranteed with 𝐷(𝑥)=1 and 𝐷(𝐺(𝑧))=0. Then, the objective function falls to zero and the loss updating gradient becomes zero, presenting the gradient vanishing problem. 22..22.. WWaasssseerrsstteeiinn GGAANN wwiitthh GGrraaddiieenntt PPeennaallttyy When D does not operates properly, inaccurate feedback is fed into G, and objective function cannot repreTTshheene tWW thaaesss sreeerrasslttieetiyinn. GAGAlAsoNN, i((fWW GGG dAAoNNes)) nwwoaat ssl epparrroonpp toohsseee ddd ibbsyytr AiAbrurjjotoivvossnkk oyyf i intnh 22e00 e11n77t [i[r88e]] ,,t wwrahhiniiccihnhg uu dsseeasst aWW aaanssdsse eorrnssttleeyii nna dfdriiasstctaatinnoccnee ottofo tmmheee aatrssauuirrneei ntthhgee d ddaiitssatt aaisnn clceeea bbrenettewwde,e eethnne tt wwmooo djjooeii nnctot plplarropobbsaeab bpiilrliiottybyl dedmiisstt roriicbbcuuuttriioso.nn Issn.. tWWhiaasss scseaersrsestt,e eGiinn g ddeiinsstetaarnantcceees i isas eelixxmppirrteeessdss eedddiv aaessr Eseiqqtyuu aaottfii oosnnam (22:p ):les or ev(cid:16)en the(cid:17) same sample, regard(cid:104)(cid:12)l(cid:12)ess o(cid:12)f(cid:12) (cid:105)the input. To avoid the above proWbWleP(cid:3435)mPr(cid:2928),s,P Pg(cid:2917)a(cid:3439)n==d γm∈Πaiink(nPfefr ,Psgt)𝐸aE(b((cid:3051)xl,(cid:3052)e,y) )~∼t(cid:2963)rγ[a|i|(cid:12)(cid:12)n𝑥(cid:12)(cid:12)xi−n−g𝑦y |(cid:12)(cid:12)p(cid:12)(cid:12)|]r ,o, cess, Wasserstein GAN w(2(2a) )s proposed [8]. It has a new objective funct(cid:2963)i∈o(cid:2952)n((cid:3017) (cid:3293)d,(cid:3017)e(cid:3282)r)ived from the Wasserstein distance, which is a (cid:16) (cid:17) wwmhheeaersreue ΠrΠe (cid:3435)Po𝑃rf, ,P𝑃thg(cid:3439)e dddeeinsnotoattenescset ht hebees teswtetoe foenfa l altljwlo jioon itnjodt iidnsttirs itpbrruibotuibotanibosinlγist( yxγ ,(dyxi),sy,t)ar,in baduntγdio( xnγ,s(y.x ),Ayre) pdrreeetpsaerienlestdesn thtdsee tsdhciersi tpdatniisocteann focoerf (cid:3045) (cid:3034) tWfroarGn tsArfaoNnrs mwfoiinrlmlg bitenh gem dtehinsett irdoiibnsuetrdtiib oiunnt 2iPo.r2ni. nPto itnhteod tihsetr dibisuttriiobnutPiog.n P. (cid:2928) (cid:2917) The Wasserstein distance is a weaker metric than the others, such as total variance (TV), The Wasserstein distance is a weaker metric than the others, such as total variance (TV), K2.u2.l lWbaacskse-Lrsetiebinle GrAdNiv ewrigthe nGcread(KienLt) PaenndaltJye nsen-Shannon divergence (JS). Minimizing the objective Kullback-Leibler divergence (KL) and Jensen-Shannon divergence (JS). Minimizing the objective function of GAN is equal to minimizing JS divergence and JS divergence, determined that two functTiohne Wof aGssAerNst eiisn eGqAuaNl (tWo GmAinNim) iwziansg p rJSo pdoisveedr gbeyn Acer joanvdsk yJS i nd 2iv0e1r7g [e8n]c, ew, hdiecthe rumseisn eWda tshseart sttweino probabilitydistributionsP , P arecompletelydifferentwhenmeasuredindifferentareas. Inother pdrisotbaanbciel ittoy mdiesatrsiubruet tiohne sd iPsrt,aPng cae rbee ctwomeepnle ttweloy jodiinffte prernotb awbhileitny mdiesatrsiubruetdio inns .d Wiffaesrseenrts taeriena ds.i sItna nocthe eisr words,theylookharshlydi(cid:2928)ffe(cid:2917)rentintwoprobabilitydistributions. InGAN,thisreasoncancausethe wexoprrdess,s ethde ays leoqouka htiaorns h2l: y different in two probability distributions. In GAN, this reason can cause discriminatortofailtolearn. Therefore,theWassersteindistance,whichisflexibleandfocuseson the discriminator to fail to learn. Therefore, the Wasserstein distance, which is flexible and focuses convergence,isappliedtotrainWth(cid:3435)ePp(cid:2928),rPo(cid:2917)c(cid:3439)e=ssofGinAf N.𝐸((cid:3051),(cid:3052))~(cid:2963)[||𝑥−𝑦||] , (2) on convergence, is applied to train the proc(cid:2963)e∈s(cid:2952)s( (cid:3017)o(cid:3293)f,(cid:3017) G(cid:3282))AN. ThereasonwhyWassersteindistanceisaweakmetricisshowninFigure5. The reason why Wasserstein distance is a weak metric is shown in Figure 5. where Π(cid:3435)𝑃,𝑃(cid:3439) denotes the set of all joint distributions γ(x,y), and γ(x,y) represents the distance (cid:3045) (cid:3034) for transforming the distribution P into the distribution P. (cid:2928) (cid:2917) The Wasserstein distance is a weaker metric than the others, such as total variance (TV), Kullback-Leibler divergence (KL) and Jensen-Shannon divergence (JS). Minimizing the objective function of GAN𝑋 (i𝑤s )eq=ua(0l ,t o𝑍 (cid:2869)m(𝑤in)im) izing J𝑑S( 𝑋di,v𝑌e)r g≥en𝜃ce and J𝑌S (d𝑤iv)e=rg(e0n,c e𝑍, (cid:2870)d(𝑤et)e)rm ined that two probability distributions P,P are completely different when measured in different areas. In other (cid:2928) (cid:2917) words, they look harshly different in two probability distributions. In GAN, this reason can cause the discriminator to fail to learn. There0fo re, the Wasserstein𝜃 distance, which is flexible and focuses on convergence, is applied to train the process of GAN. Figure5. Exampleoftwoprobabilitydistributions. The reason why WasseFrisgtuerien 5d. iEsxtaanmcpel eis o af twweoa pkr ombaebtriilict yis d sishtoriwbunt iionn sF.i gure 5. WhereXandYarerandomvariablesmappedas X ∼ P0, Y ∼ Pθ,respectively,andd(X,Y)is distancebetweenXandY.Here,d(X,Y)iscalculatedasfollows: 𝑋(𝑤) = (0d, (𝑍X(cid:2869),(Y𝑤))=) (|θ−0𝑑|2(+𝑋,(cid:12)(cid:12)(cid:12)𝑌Z)1 (w≥)𝜃− Z2(w)(cid:12)(cid:12)(cid:12)𝑌)12(𝑤≥)|θ=|.(0, 𝑍(cid:2870)(𝑤)) (3) Theexpectedvalueofd(X,Y)isequaltoorgreaterthanθwithanyjointprobabilitydistributionγ: Eγ[0d (X,Y)] ≥ Eγ[|θ|] =𝜃| θ|. (4) Figure 5. Example of two probability distributions. Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 29 where X and Y are random variables mapped as 𝑋 ~ 𝑃 ,𝑌 ~ 𝑃 , respectively, and 𝑑(𝑋,𝑌) is distance (cid:2868) (cid:3087) between X and Y. Here, 𝑑(𝑋,𝑌) is calculated as follows: (cid:2869) 𝑑(𝑋,𝑌)=(|𝜃−0|(cid:2870)+|𝑍 (𝑤)−𝑍 (𝑤)|)(cid:2870) ≥|𝜃|. (3) (cid:2869) (cid:2870) The expected value of 𝑑(𝑋,𝑌) is equal to or greater than 𝜃 with any joint probability distribution γ: Appl.Sci.2019,9,2358 5of27 E(cid:2963)[𝑑(𝑋,𝑌)] ≥ E(cid:2963)[|𝜃|]=|𝜃|. (4) WWhhenenZ 𝑍i(cid:2869)s eisq ueaqlutoalZ to, th𝑍e(cid:2870)e, xtpheec teedxpveacltueed ofvadl(uXe, Yof) b𝑑e(c𝑋o,m𝑌e)s b|θe|c.oTmheesn ,|t𝜃h|e. dTehsierned, cthoen clduessiiorned 1 2 conclusion is achieved as the following equation 5 and Figure 6. isachievedasthefollowingEquation(5)andFigure6. 𝑊(𝑃 ,𝑃 )=|𝜃| (5) (cid:2868) (cid:3087) W(P0, Pθ) =|θ| (5) FigFuigreur6e. 6G. rGaprahpohf oEfq euqautiaotnio(n5 )5.. Asaresult,theobjectivefunctionofWGANcanbeexpressedbytheKanotorovich-Rubinstein As a result, the objective function of WGAN can be expressed by the Kanotorovich-Rubinstein duality[12]: duality [12]: minmax E [D(x)]− E [D(G((cid:101)x))], (6) GminDm∈Laxx∼PEr [𝐷(𝑥)]−x∼PEg (cid:3427)𝐷(cid:3435)𝐺(𝑥(cid:3556))(cid:3439)(cid:3431), (6) (cid:3008) (cid:3005)∈(cid:3013) (cid:3051)~(cid:3017)(cid:3293) (cid:3051)~(cid:3017)(cid:3282) λλ(cid:101)x(cid:3051)∼(cid:3556)E~PE(cid:3017)(cid:101)x[(cid:3299)(cid:3557)(cid:16)[(cid:12)(cid:12)(cid:12)(cid:4672)(cid:12)(cid:12)(cid:12)∇(cid:3627)|(cid:101)x∇D(cid:3051)(cid:3556)𝐷((cid:101)x()𝑥(cid:3556)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12))2|(cid:3627)−(cid:2870)1−(cid:17)21](cid:4673).(cid:2870)]. ((77)) Gradientpenaltytermlimitsgradientweighttorange[−c,c],wherecisthethreshold. Applying Gradient penalty term limits gradient weight to range [-c, c], where c is the threshold. Applying agradientpenaltytermtoWGANcanpreventgradientvanishingandtheexplodingproblem. a gradient penalty term to WGAN can prevent gradient vanishing and the exploding problem. 2.3. PerceptualLossFunction 2.3. Perceptual Loss Function Theperceptuallossfunctionisusedinimagetransfertasksbyextractingrepresentationsofa The perceptual loss function is used in image transfer tasks by extracting representations of a featuremap. Therearetwotypesofperceptuallosseswhicharethecontentlossfunctionandthe feature map. There are two types of perceptual losses which are the content loss function and the stylelossfunction. Basicmeritoftheperceptuallossfunctionistheabilitytoextractfeatureresponses style loss function. Basic merit of the perceptual loss function is the ability to extract feature inlayersofCNN.Asthenetworkgoesdeeper,theinputimageischangedtoberepresentationsof responses in layers of CNN. As the network goes deeper, the input image is changed to be features, not of pixel values. That is because features of the high layer have large receptive fields, representations of features, not of pixel values. That is because features of the high layer have large anditrepresentstheactualcontentoftheimageandspatialstructure. Therefore,thehighlayerofthe receptive fields, and it represents the actual content of the image and spatial structure. Therefore, the networkcancapturehigh-levelcontentsandperceptualinformation,asinFigure7,andthefeature high layer of the network can capture high-level contents and perceptual information, as in Figure 7, responseofhighlayersiscalledthecontentrepresentation. andT tohem feinaitmuriez eretshpeondsieff eorfe hnicgeh olafyceorns tiesn ctalrleepdr tehsee nctoanttioennt breeptwreeseennttahtieonin. put and target images, thecontentlossfunctionwasproposedin[10],anditisrepresentedasfollows: L∅co,njtent (x,xˆ) = C H1W (cid:12)(cid:12)(cid:12)|∅j(xˆ)−∅j(x)|(cid:12)(cid:12)(cid:12)22, (8) j j j wherex,xˆ,∅ andC ×H ×W areinputimage,targetimage,featuremapofj-thlayerandsizeof j j j j feature map, respectively. Usually, feature map is extracted in pre-trained networks by ImageNet datasets,suchasVGGNet16orVGGNet19. Thecontentlossfunctioniscalculatedbymeansquareerror. Inthede-blurringtask,usingthecontentlossfunctionforanetworkencouragestheoutputimage tobeperceptuallysimilartothesharpimage,sinceitcancapturethecontentofsharpnessinformation. Tocapturemoreperceptualsimilarityandtextureinformation,thestylelossfunctionwasproposed in[11]. Thestylelossfunctionissimilartothecontentlossfunction,whichusesfeatureresponses inlayerofnetwork. However,thestylelossfunctionconsistsofthecorrelationsbetweendifferent Appl.Sci.2019,9,2358 6of27 featureresponses,anditextractsmultiplefeaturemapsofnetwork. Byconsideringthecorrelationof features,thestylelossfunctionobtainsmulti-scalerepresentations. Correlationoffeaturesisgiven byaGrammatrix,whichisacovariancematrixthatrepresentsthedistributionofanimage,andthe Grammatrixistheinnerproductbetweendifferentvectoredfeaturemaps. TheGrammatrixforstyle lossisexpressedasfollows: G∅j (x)c,c(cid:48) = C H1W (cid:88)H (cid:88)W ∅j(x)h,w,c∅j(x)h,w,c(cid:48) (9) j j j h=1w=1 wAphple. rSeci.G 2∅j019(,x 9), ,x∅ FOj(Rx )PhE,wE,Rc RaEnVdIE∅Wj( x)h,w,c(cid:48),andCj×Hj×Wj aretheGrammatrixoffeaturemapi6n ojf- t2h9 layer,differentfeaturemapswithdifferentchannels,andthesizeoffeaturemap,respectively. FFiigguurree 77.. FFeeaattuurree rreessppoonnsseeo fofla lyaeyresrisn iCn NCNN.NF.i rFstirrsotw roiwso irsi goinriaglinimala gime,acgoen, vc_o2nlva_y2e rlaaynedr caonndv _c4olnavy_e4r. Slaeyceorn. dSercoowndis rcoown vis_ 6colanyve_r6, claoynevr_, 9colanyve_r9a lnadyecro annvd_1 c2olnavy_e1r.2 layer. FortheefficientcalculationoftheGrammatrix,featuremapsarereshapedbychangingthesize To minimize the difference of content representation between the input and target images, the ofC ×H ×W toC ×H W . Afterreshaping,thestylelossfunctioniscalculatedusingdifference contejnt lojss funj ctionj wasj prjoposed in [10], and it is represented as follows: betweentheGrammatricesofthetargetandtheoutputimages. ItisminimizedbyusingtheFrobenius 1 (cid:2870) norm,andtheformulaofthesℒty(cid:2913)∅(cid:2925),l(cid:3037)e(cid:2924)(cid:2930)(cid:2915)lo(cid:2924)(cid:2930)s s(𝑥fu,𝑥(cid:3548)n)ct=io n𝐶(cid:3037)i𝐻s(cid:3037)a𝑊s(cid:3037)f(cid:4698)o(cid:3627)l∅lo(cid:2920)(w𝑥(cid:3548))s:−∅(cid:2920)(𝑥)(cid:3627)(cid:4698)(cid:2870), (8) wfeahteurree x m, 𝑥a(cid:3548),p ,∅ r(cid:2920) easnpdec tCiv(cid:2920)×ely𝐻.(cid:3037) U×s𝑊ua(cid:3037)L ll∅saytry,,jel ef ien=aptuurCte ij Hmm1jaaWgpej ,(cid:12)(cid:12)(cid:12)(cid:12)i (cid:12)(cid:12)(cid:12)(cid:12)stG ae∅jrxgt(erxtˆa )cim−teadGg ∅jien(, xfpe)(cid:12)(cid:12)(cid:12)(cid:12)ra(cid:12)(cid:12)(cid:12)(cid:12)e2Ftu-.trrea imneadp noef tjw-tho rlkasy ebry a Inmda sgize(eN1 o0ef)t datasets, such as VGGNet16 or VGGNet19. The content loss function is calculated by mean square Asaresult,applyingthesylelossfunctiontoWGANcanpreservemoredetailededgeinformation error. andcapturethesimilarityofperceptualstyles. In the de-blurring task, using the content loss function for a network encourages the output 2im.4a.gPer otbol ebme Dpeefirnceitpiotnually similar to the sharp image, since it can capture the content of sharpness information. To capture more perceptual similarity and texture information, the style loss function Figures 8 and 9 show experimental results of WGAN-GP with the content loss function. was proposed in [11]. The style loss function is similar to the content loss function, which uses TheseexperimentsareperformedwiththeGOPROdataset. Thegoalofthisnetworkistoreconstruct feature responses in layer of network. However, the style loss function consists of the correlations thede-blurredimage,thatissimilartothesharpimage,withtheuseoftheblurimage. Blurredand between different feature responses, and it extracts multiple feature maps of network. By sharpimagesarefedtothenetwork,withthenetworklearningtheperceptualcontentandsimilarity considering the correlation of features, the style loss function obtains multi-scale representations. ofthesharpimage. Correlation of features is given by a Gram matrix, which is a covariance matrix that represents the In Figure 8, trained network withWGAN-GP and content loss function generates de-blurred distribution of an image, and the Gram matrix is the inner product between different vectored image,whichhasperceptualcontentinformation. ItresultsingoodperformanceinPSNRandoutputs feature maps. The Gram matrix for style loss is expressed as follows: de-blurredimagebyeliminatingblurnoise. However,aproblemexistswhenthenetworkreconstructs (cid:3009) (cid:3024) 1 G∅ (𝑥) = (cid:3533)(cid:3533)∅(𝑥) ∅(𝑥) (9) (cid:2920) (cid:3030),(cid:3030)(cid:4593) 𝐶𝐻𝑊 (cid:2920) (cid:3035),(cid:3050),(cid:3030) (cid:2920) (cid:3035),(cid:3050),(cid:3030)(cid:4593) (cid:3037) (cid:3037) (cid:3037) (cid:3035)(cid:2880)(cid:2869)(cid:3050)(cid:2880)(cid:2869) where G∅ (𝑥), ∅(𝑥) and ∅(𝑥) , and C ×𝐻 ×𝑊 are the Gram matrix of feature map in j-th (cid:2920) (cid:2920) (cid:3035),(cid:3050),(cid:3030) (cid:2920) (cid:3035),(cid:3050),(cid:3030)(cid:4593) (cid:2920) (cid:3037) (cid:3037) layer, different feature maps with different channels, and the size of feature map, respectively. For the efficient calculation of the Gram matrix, feature maps are reshaped by changing the size of C ×𝐻 ×𝑊 to C ×𝐻𝑊. After reshaping, the style loss function is calculated using difference (cid:2920) (cid:3037) (cid:3037) (cid:2920) (cid:3037) (cid:3037) between the Gram matrices of the target and the output images. It is minimized by using the Frobenius norm, and the formula of the style loss function is as follows: 1 ℒ∅,(cid:3037) = ||G∅ (𝑥(cid:3548))−G∅(𝑥)||(cid:2870). (10) (cid:2929)(cid:2930)(cid:2935)(cid:2922)(cid:2915) 𝐶𝐻𝑊 (cid:2920) (cid:2920) (cid:2890) (cid:3037) (cid:3037) (cid:3037) Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 29 As a result, applying the syle loss function to WGAN can preserve more detailed edge information and capture the similarity of perceptual styles. 2.4. Problem Definition Figures 8 and 9 show experimental results of WGAN-GP with the content loss function. These experiments are performed with the GOPRO dataset. The goal of this network is to reconstruct the de-blurred image, that is similar to the sharp image, with the use of the blur image. Blurred and sharp images are fed to the network, with the network learning the perceptual content and similarity of the sharp image. Appl.SInci .F20ig19u,r9e,2 835, 8trained network with WGAN-GP and content loss function generates de-blu7rorfe2d7 image, which has perceptual content information. It results in good performance in PSNR and oimutapguet.s Adse-sbhluowrrendi nimexatgeen bdyin eglitmheinsautibn-gp abrltuor fngoeisnee.r Hatoedweimvearg, eas pinroFbilgeumr eex9i,sbtslo wckheenff etchtes noectcwurorink reconstructs image. As shown in extending the sub-part of generated images in figure 9, block objects and backgrounds. This is because the content loss function uses only one feature map in effects occur in objects and backgrounds. This is because the content loss function uses only one CNN.Thatis,usingasinglefeaturemaptocaptureperceptualinformationisnotenoughtorepresent feature map in CNN. That is, using a single feature map to capture perceptual information is not micro-edgesinsmallobjects,suchasaleaf,branch,etc. enough to represent micro-edges in small objects, such as a leaf, branch, etc. (a) (b) (c) FFiigguurree 88.. EExxaammppllee iimmaaggeess ooff WWGGAANN--GGPP wwiitthh ccoonntteenntt lloossss ffuunnccttiioonn:: ((aa)) SShhaarrpp iimmaaggee.. ((bb)) BBlluurrrreedd Appl. Siimmci.a a2gg0ee1 9((,cc 9)) , RRx eeFccOooRnn ssPttErruuEccRtt RiimmEVaaIggEeeW.. 8 of 29 FFiigguurree 99.. EExxtteennddiinngg ppaarrtt ooff eexxaammppllee iimmaaggeess.. IInn tthhiiss rereggaardrd,t, otoim ipmropvroevthe ethshea rsphnarepssnoesfst hoef otuhtep uotuitmpaugt ei,mthaegceo, nttheen tcloonstsefnutn lcotisosn fiusnrcetpiolanc eids rbeyptlhaecesdty lbeyl otshsef usntyclteio lnoisns ofrudnecrtitoone xitnr aocrtdmeur lttoip leexfteraatcut rmemulatpipslien fCeaNtuNr.eT mhearpefso rine, aCdNapNt.i nTghtehreesfotyrlee, alodsaspftuinncgt itohnei nsctyrelea sleossfse afutunrceticoonm ipnlcerxeitayseasn dfetahteurseiz ecoomftphleexrietcye patnivde tfiheel ds.izMe ooref otvheer ,rietcceopntsividee rfsietlhde. Mcoorrreeloavtieorn, ibt ectowneseidnedrsiff theree cnotrfreealtautiroenm baeptwsoeefnla dyeifrfse.rent feature maps of layers. The next section explains the addition of the style loss function to WGAN-GP and analyzes how it is calculated. Finally, the de-blurring network that can generate the realistic de-blurred image and show a high similarity between output and sharp image is proposed. 3. Proposed Algorithm This section describes the proposed method and detailed composition to reconstruct de-blurred image. First, perceptual style by extracting multiple feature map in high layer is described. Then, the total loss function which combines WGAN-GP and the style loss function follows. Next, the way to minimize the total loss function is introduced. Finally, the architecture of the de-blurring network is depicted. 3.1. Multi-Scale Representation of Perceptual Style To capture the perceptual style of the sharp image, style loss function is used in the proposed method. As mentioned in Section 2, applying the style loss function can increase the similarity between the generated and sharp images by minimizing the difference of distributions. It is similar to the content loss function, but the style loss function extracts multiple feature maps in CNN to increase feature complexity and receptive field. Usually, feature maps of layers in VGG16 are used to capture similarity and perceptual style. VGG16 [13] is a type of CNN, which has deep convolutional layers and is trained on ImageNet. The architecture of VGG 16 is shown in Figure 10. Appl.Sci.2019,9,2358 8of27 ThenextsectionexplainstheadditionofthestylelossfunctiontoWGAN-GPandanalyzeshowit iscalculated. Finally,thede-blurringnetworkthatcangeneratetherealisticde-blurredimageand showahighsimilaritybetweenoutputandsharpimageisproposed. 3. ProposedAlgorithm Thissectiondescribestheproposedmethodanddetailedcompositiontoreconstructde-blurred image. First, perceptualstylebyextractingmultiplefeaturemapinhighlayerisdescribed. Then, thetotallossfunctionwhichcombinesWGAN-GPandthestylelossfunctionfollows. Next,theway tominimizethetotallossfunctionisintroduced. Finally,thearchitectureofthede-blurringnetwork isdepicted. 3.1. Multi-ScaleRepresentationofPerceptualStyle Tocapturetheperceptualstyleofthesharpimage,stylelossfunctionisusedintheproposed method. AsmentionedinSection2,applyingthestylelossfunctioncanincreasethesimilaritybetween the generated and sharp images by minimizing the difference of distributions. It is similar to the contentlossfunction,butthestylelossfunctionextractsmultiplefeaturemapsinCNNtoincrease featurecomplexityandreceptivefield. Appl. SciU. 2s0u19a, l9l,y x, FfeOaRt uPEreERm RaEpVsIEoWf layers in VGG16 are used to capture similarity and perceptu9a lofs t2y9l e. VGG16 [13] is a type of CNN, which has deep convolutional layers and is trained on ImageNet. ThearchitectureofVGG16isshowninFigure10. 224 x 224 x 3 224 x 224 x 64 56 x 56 x 256 28 x 28 x 512 14 x 14 x 512 7 x 7 x 512 112 x 112 x 128 Pixel Representation Figure10. ArchitectureofVGG16network. VGG16 adapts a 3 × 3 filter convolution repeatedly to capture the representation of the image. Then fVeaGtuGr1e6 madaapp tosf ath3e× h3igfihlt elraycoern vinoclulutidoens rhepigeha-tleedvleyl tfoeactauprteusr eththate orebptareins ean tahtiigohn coofmthpeleimxiatyg e. representation with large receptive field. Then feature map of the high layer includes high-level features that obtain a high complexity When the network goes deeper, the input image is transformed to representation values, and representationwithlargereceptivefield. the size of feature maps is reduced as the depth is increased. To take multi-scale representation, five Whenthenetworkgoesdeeper,theinputimageistransformedtorepresentationvalues,andthe feature maps, which are in different shapes, are extracted. Also, feature maps of the high layer, sizeoffeaturemapsisreducedasthedepthisincreased. Totakemulti-scalerepresentation,fivefeature wmhiachp sc,ownthaiicnh haigreh innond-ilffinereeanrittys,h sahpoews, baeretteerx ptrearcftoerdm.aAncleso t,hafena tfuearetumrea mpsaposf othf elohwig lhayleary. eFre,awtuhriec h maps of the low layer are not sufficient to express the perceptual style of an image because they containhighnon-linearity,showbetterperformancethanfeaturemapsoflowlayer. Featuremaps coonftathine llooww-lleavyeelr faeraetunroets.s ufficienttoexpresstheperceptualstyleofanimagebecausetheycontain The reconstructed image using low-level features is shown in figure 11. Low-level feature low-levelfeatures. cannot capture texture, color and style because the feature is too simple and takes small receptive Thereconstructedimageusinglow-levelfeaturesisshowninFigure11. Low-levelfeaturecannot field. The reconstructed image has black space owing that it does not fully express the style and capture texture, color and style because the feature is too simple and takes small receptive field. texture of the image. As a result, to reconstruct the de-blurring image and preserve the content with perceptual style, this paper proposes extracting five feature maps in high layer, which have enough perceptual style and edge information. Figure 11. Generated image using low-level feature. 3.2. Total Loss Function of Network The objective of the proposed method is to generate de-blurring image from the blurred image, I , by extracting the perceptual style information from the sharp image, I . The network learns the (cid:2886) (cid:2903) similarity of the sharp image, and it reconstructs the de-blurred image from the blurred image by Appl. Sci. 2019, 9, x FOR PEER REVIEW 9 of 29 224 x 224 x 3 224 x 224 x 64 56 x 56 x 256 28 x 28 x 512 14 x 14 x 512 7 x 7 x 512 112 x 112 x 128 Pixel Representation VGG16 adapts a 3 × 3 filter convolution repeatedly to capture the representation of the image. Then feature map of the high layer includes high-level features that obtain a high complexity representation with large receptive field. When the network goes deeper, the input image is transformed to representation values, and the size of feature maps is reduced as the depth is increased. To take multi-scale representation, five feature maps, which are in different shapes, are extracted. Also, feature maps of the high layer, which contain high non-linearity, show better performance than feature maps of low layer. Feature maps of the low layer are not sufficient to express the perceptual style of an image because they Acoppnl.taSicni. 2l0o1w9,-9l,e2v3e5l8 features. 9of27 The reconstructed image using low-level features is shown in figure 11. Low-level feature cannot capture texture, color and style because the feature is too simple and takes small receptive Thereconstructedimagehasblackspaceowingthatitdoesnotfullyexpressthestyleandtextureof field. The reconstructed image has black space owing that it does not fully express the style and theimage. Asaresult,toreconstructthede-blurringimageandpreservethecontentwithperceptual texture of the image. As a result, to reconstruct the de-blurring image and preserve the content with style,thispaperproposesextractingfivefeaturemapsinhighlayer,whichhaveenoughperceptual perceptual style, this paper proposes extracting five feature maps in high layer, which have enough styleandedgeinformation. perceptual style and edge information. FFiigguurree 1111.. GGeenneerraatteedd iimmaaggee uussiinngg llooww--lleevveell ffeeaattuurree.. 3.2. TotalLossFunctionofNetwork 3.2. Total Loss Function of Network Theobjectiveoftheproposedmethodistogeneratede-blurringimagefromtheblurredimage, The objective of the proposed method is to generate de-blurring image from the blurred image, IIB,, bbyy eexxttrraaccttiinngg tthhee ppeerrcceeppttuuaall ssttyyllee iinnffoorrmmaattiioonn ffrroomm tthhee sshhaarrpp iimmaaggee,, IIS.. TThhee nneettwwoorrkk lleeaarrnnss tthhee (cid:2886) (cid:2903) similarityofthesharpimage,anditreconstructsthede-blurredimagefromtheblurredimagebyusing similarity of the sharp image, and it reconstructs the de-blurred image from the blurred image by micro-edgeinformationandperceptualsimilarity. Thesimilaritycanbeimprovedbyminimizingthe differenceofdistributionbetweenI andI . Asaresult,well-trainedgenerator,componentofnetwork, S B generatesoutputimagethatincludeedgeinformation. Totrainthenetwork,adiscriminatorandgeneratorarealternatelylearnedbyblurredandsharp images. ThelearningdirectionofthediscriminatorintheWGAN-GPisdecidedtofindtheoptimal solutionbyminimizingtheWassersteindistancebetweenjointdistributionsofthegeneratedandsharp images. Thelossfunctionofthediscriminatorisasfollows: LD = (cid:88)N [L(Is)−L(Gθ(IB)) + λ(cid:16)(cid:12)(cid:12)(cid:12)|∇L((cid:15)IS+(1−(cid:15))Gθ(IB))|(cid:12)(cid:12)(cid:12)2−1(cid:17)2], (11) n=1 whereLisadifferentiablefunction,whichis1-Lipschitizonlyifithasgradientswithnormatmost 1everywhere. Gand(cid:15)aregeneratorandrandomnumberU[0,1],respectively,andλisthegradient penaltycoefficientsetto10. Thegradientpenaltytermpenalizesthenetworkifthegradientnorm movesawayfromitstargetnormvalue1. Thediscriminatoristrainedfirst,alongwiththetrainingofagenerator,togeneratethede-blurring image. Tomakegeneratorreconstructimageswithperceptualstyle, thetotalobjectivefunctionis obtainedbythecombinationofgeneratorlossfunctionandstylelossasshowninEquation(12): L= L + λ·L , (12) G style where L is generator loss function of WGAN-GP, and L is style loss function to capture the G style perceptualstylefromsharpimage. Also,λisaconstantvalueofweightparameterthatdetermines howmuchperceptualstyleshouldbeadapted. Inmoredetail,L iscalculatedbyWGAN-GPmethod. G (cid:88)N LG = −L(Gθ(IB)), (13) n=1 Appl.Sci.2019,9,2358 10of27 whereListhecriticfunctionasmentionedaboveandGisthegenerator. Moreover,θisparameterof thenetworkwhichminimizeslossfunction: Lstyle = m(cid:88)=M 1 CjH1jWj(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)G∅j (Gθ(IB))−G∅j (IS)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)2F, (14) whereMisthenumberoffeaturemapwhichisextractedfromVGG16. Proposedmethodusesfive featuremapsofhighlayerinVGG16. Usingfewerfeaturemapscannotexpressperceptualstyle,andif morefeaturemapsareused,morecalculationandlotoftrainingtimeareneeded. Insummary,anovel combinedlossfunctionconsistingofWGAN-GPandstylelossfunctionisproposedtoreconstruct edgepreservedde-blurringimagewithperceptualstyle. 3.3. NetworkArchitecture ThearchitectureofthegeneratorisshowninFigure12,wherethegeneratorisaCNNbasedon residualnetwork. Itiscomposedofthreeconvolutionallayers,nineresidualblock(Resblock)[14]and twotransposedconvolutionallayers. First,toencodethecharacteristicsofimages,convolutionallayer, instancenormalizationlayer[15]andReLUactivationlayer[16]aredesignedinfrontofthenetwork. Thesizeofanimageisdecreasedandthedepthoffeaturemapisincreased. Afterthat,nineresidual blocksareconnectedbehindtheconvolutionallayertoincreasefeaturecomplexity. EachResblock Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 29 consistsofaconvolutionallayerwithdropoutregularization[17],instancenormalizationlayerand ReLUactivationlayer. 9 ResBlocks 64d 128d 256d 128d 64d 7x7Convlayer Instancenormalizationlayer ReLUactivation 3x3Convlayer Transposed Conv layer Tanhactivation Figure12. ArchitectureofgeneratorinWGAN-GP. Figure 12. Architecture of generator in WGAN-GP. Inthebackofthenetwork,thetransposedconvolutionallayerisattachedtoreshapefeaturemaps In the back of the network, the transposed convolutional layer is attached to reshape feature togenerateoutputimagesbyup-sampling,andTanhactivationisappliedinlastconvolutionallayer. maps to generate output images by up-sampling, and Tanh activation is applied in last The architecture of the discriminator is shown in Figure 13, having the same architecture as convolutional layer. Patch-GAN[18,19]. PatchGANwasproposedtoclassifywhethereachN × Npatchinanimageis The architecture of the discriminator is shown in Figure 13, having the same architecture as realorfake. Patch-GAN [18,19]. Patch GAN was proposed to classify whether each N × N patch in an image is Thediscriminatorconsistsoffiveconvolutionallayersandthreeinstancenormalizationlayers. real or fake. Unlikethegenerator,LeakyReLU[20]activationlayerisappliedtotheconvolutionallayers. Figure14showsarchitectureoftheentirenetwork. ThenetworkisbasedonconditionalGAN, andblurredandsharpimagesaretheinputforthenetwork. Thegeneratorproducestheestimateof 64d 128d 256d 512d 4 x 4 Conv layer Instance normalization layer LeakyReLU activation Figure 13. Architecture of discriminator in WGAN-GP. The discriminator consists of five convolutional layers and three instance normalization layers. Unlike the generator, LeakyReLU [20] activation layer is applied to the convolutional layers.

Description:
information of the edge in order to preserve small edge information and capture its perceptual similarity. As a result, the proposed method improves
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.