Table Of Contentapplied
sciences
Article
Wasserstein Generative Adversarial Network based
Article
WDea-sbselursrtreining Guseinnegr aPteivrceeAptduvaelr SsaimriaillaNrietytw ork Based
De-Blurring Using Perceptual Similarity
Minsoo Hong and Yoonsik Choe *
M1 inDseopoarHtmonengt aonf EdleYcotroicnasli &k CElhecotero*nic Engineering, Yonsei University, Seoul, Korea; hms9110@naver.com,
yschoe@yonsei.ac.kr
DepartmentofElectrical&ElectronicEngineering,YonseiUniversity,Seoul03722,Korea;hms9110@naver.com
* Correspondence: yschoe@yonsei.ac.kr; Tel.: +82-2-2123-2774
* Correspondence: yschoe@yonsei.ac.kr;Tel.:+82-2-2123-2774
Revised: May 31, 2019; Accepted: 04 June 2019; Published: date
(cid:1)(cid:2)(cid:3)(cid:1)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:1)
(cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)
Received: 31May2019;Accepted: 4June2019;Published: 8June2019
Abstract: The de-blurring of blurred images is one of the most important image processing
Ambesthtroadcts: aTnhde idt ec-abnlu brrei nugseodf bfolurr trhede pimreapgreosciessosinnego sfttehpe imn omstainmy pmorutlatnimtiemdaiag eanpdro ccoesmsipnugtemr evthisoiodns
aanpdpliitcacatinonbse. uRseecdefnotrlyt,h edep-rbelpurrorcinegss imngetshtoepdsi nhmavaen ybemeun ltpimerefodriamaendd bcyo mnpeuurtaerl vniestiwonorakp pmliectahtioodnss,.
Rsuecchen talsy ,dthee-b lguernreinrgatmiveet haoddvsehrasavreiable enneptweroforkrm (eGdAbNy)n, euwrhailcnhe twis oar kpmoewtheorfdusl, sguecnhearsattihveeg enneetwraotirvke.
aAdmveornsga rmiaalnnyet wdioffrekre(GntA tNyp),ews hofic GhAisNa,p tohwe eprrfouplogseende rmateivtheonde tiws porekrf.oArmmeodn gusminagn ythdei ffWeraesnstertsytpeeins
ogfenGeAraNti,vteh eadpvroerpsoasreiadl mneettwhoodrki swpiethrf ogrrmadeidenuts pinegnathltey W(WasGseArsNteGinP)g. eSninercaet ievdegaed ivneforsramriaatlionne tiws othrke
wmiothst girmadpioernttanpte nfaaclttyor( WinG aAnN imGaPg).eS, itnhcee setdyglee lionsfos rfmunatcitoionni sisth aepmploisetdi mtop roerptarnestefnact ttohrei npaenrciemptaugael,
tinhfeosrtmyaletiolons soffu tnhcet ieodngies ianp porlideedr ttoo rperperseesrevnet sthmeapll eerdcegpet uinafloirnmfoartmioant iaonndo cfatphteureed gites ipneorcredpetrutaol
psirmesielarrvietys. mAasl lae dregseulitn, ftohrem partoiopnosaendd mcaepthtuorde iimtspproervceesp tthuea lssimimilialarirtiyty b.eAtwseaenre ssuhlatr,pt haendp rbolpuorsreedd
mimeathgeosd bimy pmroinviemsitzhiengsi mthiela rWityasbseetrwsteeeinn sdhisatrapnacen,d abnldu rirte dcaipmtaugreess bwyemll itnhiem pizeirncgepthtueaWl assimseirlsatreiitny
duissitnagn cteh,ea nstdyliet claopsst ufruenscwtieolnl,t hceonpseirdceerpintuga lthseim ciolarrrietlyatuiosinn goft hfeeasttuyrleesl oisns tfhuen cctoionnv,oclountisoindaelr innegutrhael
cnoertrweolartkio (nCoNfNfe)a. tuTroe scionntfhiremc otnhve opluetrifoonrmalannecuer aolfn tehtew oprrkop(CosNeNd )m.Teothcoodn,fi rthmreteh eepxperefroimrmeanntsc eaoref
tchoendpurocpteods eudsimnget htwodo, dtharteaeseetxsp: ethriem GenOtPsRaOre cLoanrgdeu catnedd uKsoinhgletrw doatdaasteats. eTtsh:et hoeptGimOaPlR sOoluLtairognes aanrde
Kfoouhnlder bdya tcahsaent.gTinhge othpeti mpaarlasmoleutetiro nvsalaureesf oeuxpnderbimyecnhtaanllgyi.n Cgothnesepqaureanmtleyt,e rthvea leuxepseerximpeernimts ednetpalilcyt.
Cthoant stehqeu epnrtolyp,otsheede xmpeetrhimode natcshdieevpeicst 0th.9a8t thhieghperor ppoesrefdormmeatnhcoed ianc hsitervuecstu0r.9a8l shiimghilearripteyr f(oSrSmIMan) caenidn
soturutpceturfroarlmsism oitlhareirt yd(eS-bSlIuMr)rianngd moeutthpoedrfso irnm tsheo tchaesre doef- bboluthrr dinagtamseettsh. odsinthecaseofbothdatasets.
KKeeyywwoorrddss:: ddeebblluurrrriinngg;; ggeenneerraattiivvee aaddvveerrssaarriiaall nneettwwoorrkk;; ppeerrcceeppttuuaall ssiimmiillaarriittyy;; ssttyyllee iinnffoorrmmaattiioonn;;
WWaasssseerrsstteeiinn ddiissttaannccee
11.. IInnttrroodduuccttiioonn
DDee--bblluurrrriinngg iiss oonnee ooff tthhee sstteeaaddiillyy ssttuuddiieedd tteecchhnniiqquueess iinn iimmaaggee pprroocceessssiinngg fifieellddss aanndd aaiimmss ttoo
iimmpprroovvee tthhee sshhaarrppnneessss ooff aann iimmaaggee bbyy eelilmimininaatitningg bblulur rnnooisiese. .InI nFiFgiugruer e1,1 t,hteh deed-be-lbulrurirnrign mgemtheothdo ids
iussuedse tdo totratrnasnfosfromrm thteh eblbulrurrerde dimimagaeg etoto ththee sshhaarrpp imimaaggee. .TToo rreemmoovvee bblluurr nnooiissee,, mmaannyy ddee--bblluurrrriinngg
tteecchhnniiqquueess wweerreer erseesaeracrhcehde,de,. ge..,gt.h, ethbeli nbdlidnedc odnevcoolnuvtioolnutaiolgno raitlghomr,itbhimlat, ebraillaatnerdalW aiennde rWfiiletenrear mfioltnegr
oamthoernsg[ 1o,t2h]e.rs [1,2].
Figure1. BlurredandsharpimageofGOPROdataset.
Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci
Appl.Sci.2019,9,2358;doi:10.3390/app9112358 www.mdpi.com/journal/applsci
Appl. Sci. 2019, 9, x FOR PEER REVIEW 2 of 29
Appl.Sci.2019,9,2358 2of27
Figure 1. Blurred and sharp image of GOPRO dataset.
In recent years, deep learning methods have been adapted to image processing fields, showing
Inrecentyears,deeplearningmethodshavebeenadaptedtoimageprocessingfields,showing
better performance than the previous methods. Likewise, many researchers have applied deep
betterperformancethanthepreviousmethods. Likewise,manyresearchershaveapplieddeeplearning
learning methods, e.g., convolutional neural network (CNN) and generative adversarial network
methods, e.g., convolutional neural network (CNN) and generative adversarial network (GAN),
(GAN), in order to resolve blur noise [3–5]. Especially, since GAN shows high quality in generating
inordertoresolveblurnoise[3–5]. Especially,sinceGANshowshighqualityingeneratingtexture
texture information of images, this paper adapts the GAN-based approach to eliminate blur noise.
informationofimages,thispaperadaptstheGAN-basedapproachtoeliminateblurnoise.
TTyyppiiccaall GGAANN--bbaasseedd ddee--bblluurrrriinngg mmeetthhooddss [[66]] aarree pprroocceesssseedd bbyy mmiinniimmiizziinngg tthhee ddiiffffeerreennccee ooff
pixel values, which improves peak signal to noise ratio (PSNR). However, it cannot reflect the
pixel values, which improves peak signal to noise ratio (PSNR). However, it cannot reflect the
similarity for target images. Therefore, the proposed method uses the conditional GAN [7], which
similarityfortargetimages. Therefore,theproposedmethodusestheconditionalGAN[7],which
mmiinniimmiizzeess tthhee ddiiffffeerreennccee ooff iimmaaggee ddiissttrriibbuuttiioonnss.. IInn ddeettaaiill,, pprrooppoosseedd mmeetthhoodd aaddaappttss WWaasssseerrsstteeiinn
generative adversarial network with gradient penalty (WGANGP), which is based on conditional
generative adversarial network with gradient penalty (WGANGP), which is based on conditional
GAN. The Wasserstein generative adversarial network (WGAN) was proposed to stabilize the
GAN.TheWassersteingenerativeadversarialnetwork(WGAN)wasproposedtostabilizethetraining
training of GAN by minimizing the Wasserstein distance of joint distributions instead of whole
ofGANbyminimizingtheWassersteindistanceofjointdistributionsinsteadofwholeprobability
probability distributions [8]. Additionally, to improve the performance of WGAN, the gradient
distributions [8]. Additionally, to improve the performance of WGAN, the gradient penalty term
penalty term was proposed [9], known as a weight clipping method, in order to limit the gradient
wasproposed[9],knownasaweightclippingmethod,inordertolimitthegradientweightinrange,
weight in range, and applying this term to the WGA can prevent the gradient vanishing and
andapplyingthistermtotheWGAcanpreventthegradientvanishingandexplodingproblem.
exploding problem.
By applying the content loss function, i.e., the procedure of extracting feature maps in CNN,
By applying the content loss function, i.e., the procedure of extracting feature maps in CNN, to
toWGANGP,theimprovedde-blurringmethodwasproposedtocapturetheperceptualinformation
WGANGP, the improved de-blurring method was proposed to capture the perceptual information
ofimages[10]. Thecontentlossfunctionisoneofmanyperceptuallossfunctionsandimprovesthe
of images [10]. The content loss function is one of many perceptual loss functions and improves the
similaritybetweentheblurredandsharpimages. ApplyingthecontentlossfunctiontoWGANGPcan
similarity between the blurred and sharp images. Applying the content loss function to WGANGP
capturetheperceptualinformation,suchascolorandspatialstructure. However,itdoesnotpreserve
can capture the perceptual information, such as color and spatial structure. However, it does not
thedetailededgeinformationofimageatthesametime.
preserve the detailed edge information of image at the same time.
Topreservemoreedgeinformationofthesharpimage,anotherperceptuallossfunctioncalled
To preserve more edge information of the sharp image, another perceptual loss function called
stylelossfunction[11]isintroduced. ThestylelossfunctionextractsmultiplefeaturemapsinCNN
style loss function [11] is introduced. The style loss function extracts multiple feature maps in CNN
andcalculatesbycovariancematrixtocapturemoreperceptualinformationanddetailededge. Asa
and calculates by covariance matrix to capture more perceptual information and detailed edge. As a
result,thestylelossfunctioncaptureshighersimilaritythanthecontentlossfunction,shownwith
result, the style loss function captures higher similarity than the content loss function, shown with
thesimilaritymapinFigure2. Thefigureshowsthattheboundaryinformationbetweenobjectand
the similarity map in Figure 2. The figure shows that the boundary information between object and
backgroundispreservedin(b)betterthan(a).
background is preserved in (b) better than (a).
(a) (b)
FFiigguurree 22.. SSiimmiillaarriittyy mmaappss ooff ((aa)) tthhee ccoonntteenntt lloossss ffuunnccttiioonn aanndd ((bb)) tthhee ssttyyllee lloossss ffuunnccttiioonn..
IInn tthhiiss ppaappeerr,, tthhee ccoonntteenntt lloossss ffuunnccttiioonn iiss rreeppllaacceedd bbyy tthhee ssttyyllee lloossss ffuunnccttiioonn ttoo pprreesseerrvvee eeddggee
iinnffoorrmmaattiioonn aanndd ppeerrcceeppttuuaall ssiimmiillaarriittyy.. SSuubbssttiittuuttiinngg tthhee ssttyyllee lloossss ffuunnccttiioonn ttoo WWGGAANNGGPP ccaann pprreesseerrvvee
tthhee ddeettaaiilleedd eeddggee ssiinnccee iitt ccaann ccaappttuurree aa llaarrggee ppeerrcceeppttuuaall ssttyyllee aanndd hhiigghh lleevveell ffeeaattuurreess.. TThhee oobbjjeeccttiivvee
ffuunnccttiioonn ooffn netewtwoorkrkis ips ropproopseodsebdy cboym cboimnibnigntinhge ltohses fluonssc tifounncotfioWnG oAf NWGGPAanNdGtPh easntydl ethloes sstfyulnec tlioosns.
fTuhnecrteifoonr.e ,Tthheerneefotwreo, rtkhgee nneertwatoesrkd eg-ebnluerrraetdesi mdaeg-beluwrhreilde ciampatugeri nwghtihlee sciampitluarriitnygo fthpee rsciempitluaarlitsyt yolef
panerdceppretusearlv sitnygled aentadi lperdeseedrgveiningf doermtaailteidon e.dge information.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 29
Appl.Sci.2019,9,2358 3of27
In this work, it is experimentally shown that WGANGP with the style loss function can
reconstruct edge-preserved de-blurring image and achieves high performance in structural
similIanritthyi smweoarsku,riet i(sSSexIMpe)r. iAmlesnot,a all ycosmhopwarnistohnat eWxpGeAriNmGenPtw witihtht hdeifsfteyrleenlto pssafruanmcetitoenr vcaanlureesc oannsdtr tuhcet
eddifgfeer-epnret sleorcvaetidondse -obfl uthrer ifnegatiumrae gmeaapn dis apcehrifeovremsehdig tho pfienrdfo trhme aonpcteiminals tsroulucttuiornal. sInim Sielactriiotyn m2, eGasAuNre,
(WSSGIMAN).GAPls aon,da cpoemrcpepartuisaoln loesxsp feurnimcteionnt wariet hindtrioffderuecnetdp aanrdam theete prrvoablluemes danefdintihtieodn itffoe rseonlvtel othcaet ibolnusr
onfoitshee isfe daetuscrreibmedap. Inis Speecrtfioornm 3e, dpetrocefipntduatlh oebojepcttiimvea lfusnolcutitoionn a.nIdn dSeetcatiilo anrc2h,itGeActuNr,eW ofG nAetNwGoPrka anrde
pexeprcleapintuedal. lIons sSefuctnicotnio 4n, arreecoinntsrtorduuctc etdhea ndde-tbhleurprreodb liemmagdee fiunsiitniogn ptroospoolsveedt hmeebtlhuordn oisi sreeicsodnesstrcuricbteedd.
IanndSe tchtieo pna3r,apmerecteeprst uaanldo bvjaelcutievse tfou nficntdio onpatnimdadle staoilluatricohni taercet uarneaolyfzneedtw, foorlkloawreeedx bpyla tinheed c.oInncSluescitoionn i4n,
rSeeccotinosnt r5u. ctthede-blurredimageusingproposedmethodisreconstructedandtheparametersand
valuestofindoptimalsolutionareanalyzed,followedbytheconclusioninSection5.
2. Preliminaries
2. Preliminaries
In this section, basic deep learning methods and the perceptual loss function for de-blurring are
descrIinbethdi.s Sseeccttiioonn ,2b.1a siinctrdoedeupcleesa rwnhinagt mGAetNho idss aanndd etxhpelpaienrsc etphteu oablljeocstsivfue nfcutniocntiofonr odfe G-bAluNrr. iIntg aalsroe
ddeessccrriibbeesd .thSee clitmiointa2ti.1onins torof dGuAceNs wanhda tthGeA wNayiss aton dsoelxvpel athinesmt.h Ieno 2b.j2e,c tthivee Wfuanscsteirosnteoinf GGAANN. wItiathls oa
dgreasdcrieibnets ptehneallitmy itteartmio niss oinftrGoAduNceadn dfotrh estawbailyizsintog stohlev etrtahienmin.g Ipnro2c.2e,ssth oef WGaAsNse,r sfotellionwGeAd Nbyw tihthe
adegsrcardipietinotnps eonfa lptyertceerpmtuiasl ilnotsrso dfuuncecdtiofnosr satnadbi lhizoiwng toth ecatlrcauinlaitneg pperrocceepstsuaolf GloAssN in,f o2l.l3o. wFeindabllyy t2h.4e
ddeesficnriepst itohnes iosfsupeesr cienp tsuoallvlionsgs fbuluncr tinoonissea nwdithho wthteo ccoamlcubliantaetipoenr coefp tWuaGlAloNss-iGnP2 .a3n. dFi npaelrlyce2p.4tudael filnoesss
tfhuencistisoune.s insolvingblurnoisewiththecombinationofWGAN-GPandperceptuallossfunction.
2.1. GenerativeAdversarialNetwork
2.1. Generative Adversarial Network
Thegenerativeadversarialnetwork(GAN)wasproposedin2014byGoodfellow[6]. GANis
The generative adversarial network (GAN) was proposed in 2014 by Goodfellow [6]. GAN is
usedforimagegenerationtaskandislearnedbyacompetitionbetweentwoneuralnetworkmodels.
used for image generation task and is learned by a competition between two neural network models.
Twoneuralnetworkmodelsarecalledthegenerator,G,andthediscriminator,D.ThegoalofGisto
gTewnoe rnaetueriaml angeetwstohrakt mDocdaenlns oatred icsatlilnegdu tihshe gfreonmeraretoalr,i mG,a agneds, tahned dtihscergimoainlaotfoDr, iDs.t oThdeiff georaeln otifa Gte irse taol
generate images that D cannot distinguish from real images, and the goal of D is to differentiate real
imagesandgeneratedimages. Therefore,thegoalofGANcanbeexpressedas:
images and generated images. Therefore, the goal of GAN can be expressed as:
mmG(cid:3008)iinnmm(cid:3005)DaaxxE𝐸x(cid:3051)∼~P(cid:3017)(cid:3293)r[(cid:3427)lloogg((cid:3435)D𝐷((𝑥x)))(cid:3439)](cid:3431)++E𝐸z(cid:3053)∼~P(cid:3017)(cid:3301)z[(cid:3427)lloogg(cid:3435)(11−−D𝐷((G𝐺((z𝑧)))))(cid:3439)](cid:3431),, (1(1) )
wwhheerree 𝑥x iiss rreeaall ddaattaa,, z𝑧i sisr arannddoommG Gaauussssiaiannn nooiissee.. PPr(cid:2928) aanndd PPz(cid:2936)a areret htheed disisttrribibuuttiioonnss ooff rreeaall iimmaaggeess aanndd
ggeenneerraatteedd iimmaaggeess,, rreessppeeccttiivveellyy,, aanndd tthhee bbaassee aarrcchhiitteeccttuurreeo offG GAANNi isss shhoowwnna assF Figiguurree3 3..
Real data
𝑥 ~ 𝑃
(cid:3045)
Real
Discriminator
or
Network
Gaussia Generator Generated Fake
n Network Data
FFiigguurree 33.. AArrcchhiitteeccttuurree ooff GGAANN..
TThhee oobbjjeeccttiivvee ffuunnccttiioonn ooff GGAANN iiss aa ttyyppee ooff mmiinn--mmaaxx pprroobblleemm,, aanndd iitt iiss hhaarrdd ttoo aacchhiieevvee NNaasshh
eeqquuiilliibbrriiuumm.. FFoorr tthhiiss rreeaassoonn,, GGAANN hhaass aann iinnsseeccuurree ttrraaiinniinngg pprroocceessss aanndd ssoommee pprroobblleemmss,, ssuucchh aass tthhee
ggrraaddiieenntt vvaanniisshhiinngg oorr eexxppllooddiinngg pprroobblleemm aanndd mmooddee ccoollllaappssee..
Figure4showstheoptimalsolutionforGAN.ThedistributionofDshouldbeaflatform,andthe
distributionofGshouldbethesameastherealdata,resultingingeneratingthebestdata. WhenDis
perfect,GANisguaranteedwithD(x) =1andD(G(z)) =0. Then,theobjectivefunctionfallstozero
andthelossupdatinggradientbecomeszero,presentingthegradientvanishingproblem. WhenD
doesnotoperatesproperly,inaccuratefeedbackisfedintoG,andobjectivefunctioncannotrepresent
thereality. Also,ifGdoesnotlearnthedistributionoftheentiretrainingdataandonlyafractionofthe
trainingdataislearned,themodecollapseproblemoccurs. Inthiscase,Ggeneratesalimiteddiversity
ofsamplesoreventhesamesample,regardlessoftheinput.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 29
Appl.Sci.2019,9,2358 4of27
Appl. Sci. 2019, 9, x FOR PEER REVIEW 4 of 29
Figure 4. Distribution of discriminator (blue), generator (green) and real data (black) according to the
learning process.
Figure 4 shows the optimal solution for GAN. The distribution of D should be a flat form, and
the distribution of G should be the same as the real data, resulting in generating the best data. When
D is perfect, GAN is guaranteed with 𝐷(𝑥)=1 and 𝐷(𝐺(𝑧))=0. Then, the objective function falls
to zero and the loss updating gradient becomes zero, presenting the gradient vanishing problem.
When D does not operates properly, inaccurate feedback is fed into G, and objective function cannot
represent the reality. Also, if G does not learn the distribution of the entire training data and only a
fraction of the training data is learned, the mode collapse problem occurs. In this case, G generates a
Figure4. Distributionofdiscriminator(blue),generator(green)andrealdata(black)accordingtothe
Figure 4. Distribution of discriminator (blue), generator (green) and real data (black) according to the
limited diversity of samples or even the same sample, regardless of the input.
learningprocess.
learning process.
To avoid the above problems and make stable training process, Wasserstein GAN was
propTooseadv o[i8d].t hIte ahbaosv ae pnreowbl eombjseactnidvem fauknecstitoabnl edterraiivneindg fproromc etshs,eW WasassesresrtsetieninG AdNistwanacsep, rwophoicshed is[8 a].
Figure 4 shows the optimal solution for GAN. The distribution of D should be a flat form, and
Imtheaassuarnee wofo tbhjeec tdivisetfaunnccet iobnetdweereivne dtwfroo mjotihnet Wparossbearbsitleiitny ddisitsatrnicbeu,twiohnisc.h Ais admeteaaisleudre doefsthcreipdtiisotann coef
the distribution of G should be the same as the real data, resulting in generating the best data. When
bWetGwAeeNn wtwilol bjoei nmtepnrtoiboanbeidli tiyn d2.i2st. ributions. AdetaileddescriptionofWGANwillbementionedin2.2.
D is perfect, GAN is guaranteed with 𝐷(𝑥)=1 and 𝐷(𝐺(𝑧))=0. Then, the objective function falls
to zero and the loss updating gradient becomes zero, presenting the gradient vanishing problem.
22..22.. WWaasssseerrsstteeiinn GGAANN wwiitthh GGrraaddiieenntt PPeennaallttyy
When D does not operates properly, inaccurate feedback is fed into G, and objective function cannot
repreTTshheene tWW thaaesss sreeerrasslttieetiyinn. GAGAlAsoNN, i((fWW GGG dAAoNNes)) nwwoaat ssl epparrroonpp toohsseee ddd ibbsyytr AiAbrurjjotoivvossnkk oyyf i intnh 22e00 e11n77t [i[r88e]] ,,t wwrahhiniiccihnhg uu dsseeasst aWW aaanssdsse eorrnssttleeyii nna
dfdriiasstctaatinnoccnee ottofo tmmheee aatrssauuirrneei ntthhgee d ddaiitssatt aaisnn clceeea bbrenettewwde,e eethnne tt wwmooo djjooeii nnctot plplarropobbsaeab bpiilrliiottybyl dedmiisstt roriicbbcuuuttriioso.nn Issn.. tWWhiaasss scseaersrsestt,e eGiinn g ddeiinsstetaarnantcceees i isas
eelixxmppirrteeessdss eedddiv aaessr Eseiqqtyuu aaottfii oosnnam (22:p ):les or ev(cid:16)en the(cid:17) same sample, regard(cid:104)(cid:12)l(cid:12)ess o(cid:12)f(cid:12) (cid:105)the input.
To avoid the above proWbWleP(cid:3435)mPr(cid:2928),s,P Pg(cid:2917)a(cid:3439)n==d γm∈Πaiink(nPfefr ,Psgt)𝐸aE(b((cid:3051)xl,(cid:3052)e,y) )~∼t(cid:2963)rγ[a|i|(cid:12)(cid:12)n𝑥(cid:12)(cid:12)xi−n−g𝑦y |(cid:12)(cid:12)p(cid:12)(cid:12)|]r ,o, cess, Wasserstein GAN w(2(2a) )s
proposed [8]. It has a new objective funct(cid:2963)i∈o(cid:2952)n((cid:3017) (cid:3293)d,(cid:3017)e(cid:3282)r)ived from the Wasserstein distance, which is a
(cid:16) (cid:17)
wwmhheeaersreue ΠrΠe (cid:3435)Po𝑃rf, ,P𝑃thg(cid:3439)e dddeeinsnotoattenescset ht hebees teswtetoe foenfa l altljwlo jioon itnjodt iidnsttirs itpbrruibotuibotanibosinlγist( yxγ ,(dyxi),sy,t)ar,in baduntγdio( xnγ,s(y.x ),Ayre) pdrreeetpsaerienlestdesn thtdsee tsdhciersi tpdatniisocteann focoerf
(cid:3045) (cid:3034)
tWfroarGn tsArfaoNnrs mwfoiinrlmlg bitenh gem dtehinsett irdoiibnsuetrdtiib oiunnt 2iPo.r2ni. nPto itnhteod tihsetr dibisuttriiobnutPiog.n P.
(cid:2928) (cid:2917)
The Wasserstein distance is a weaker metric than the others, such as total variance (TV),
The Wasserstein distance is a weaker metric than the others, such as total variance (TV),
K2.u2.l lWbaacskse-Lrsetiebinle GrAdNiv ewrigthe nGcread(KienLt) PaenndaltJye nsen-Shannon divergence (JS). Minimizing the objective
Kullback-Leibler divergence (KL) and Jensen-Shannon divergence (JS). Minimizing the objective
function of GAN is equal to minimizing JS divergence and JS divergence, determined that two
functTiohne Wof aGssAerNst eiisn eGqAuaNl (tWo GmAinNim) iwziansg p rJSo pdoisveedr gbeyn Acer joanvdsk yJS i nd 2iv0e1r7g [e8n]c, ew, hdiecthe rumseisn eWda tshseart sttweino
probabilitydistributionsP , P arecompletelydifferentwhenmeasuredindifferentareas. Inother
pdrisotbaanbciel ittoy mdiesatrsiubruet tiohne sd iPsrt,aPng cae rbee ctwomeepnle ttweloy jodiinffte prernotb awbhileitny mdiesatrsiubruetdio inns .d Wiffaesrseenrts taeriena ds.i sItna nocthe eisr
words,theylookharshlydi(cid:2928)ffe(cid:2917)rentintwoprobabilitydistributions. InGAN,thisreasoncancausethe
wexoprrdess,s ethde ays leoqouka htiaorns h2l: y different in two probability distributions. In GAN, this reason can cause
discriminatortofailtolearn. Therefore,theWassersteindistance,whichisflexibleandfocuseson
the discriminator to fail to learn. Therefore, the Wasserstein distance, which is flexible and focuses
convergence,isappliedtotrainWth(cid:3435)ePp(cid:2928),rPo(cid:2917)c(cid:3439)e=ssofGinAf N.𝐸((cid:3051),(cid:3052))~(cid:2963)[||𝑥−𝑦||] , (2)
on convergence, is applied to train the proc(cid:2963)e∈s(cid:2952)s( (cid:3017)o(cid:3293)f,(cid:3017) G(cid:3282))AN.
ThereasonwhyWassersteindistanceisaweakmetricisshowninFigure5.
The reason why Wasserstein distance is a weak metric is shown in Figure 5.
where Π(cid:3435)𝑃,𝑃(cid:3439) denotes the set of all joint distributions γ(x,y), and γ(x,y) represents the distance
(cid:3045) (cid:3034)
for transforming the distribution P into the distribution P.
(cid:2928) (cid:2917)
The Wasserstein distance is a weaker metric than the others, such as total variance (TV),
Kullback-Leibler divergence (KL) and Jensen-Shannon divergence (JS). Minimizing the objective
function of GAN𝑋 (i𝑤s )eq=ua(0l ,t o𝑍 (cid:2869)m(𝑤in)im) izing J𝑑S( 𝑋di,v𝑌e)r g≥en𝜃ce and J𝑌S (d𝑤iv)e=rg(e0n,c e𝑍, (cid:2870)d(𝑤et)e)rm ined that two
probability distributions P,P are completely different when measured in different areas. In other
(cid:2928) (cid:2917)
words, they look harshly different in two probability distributions. In GAN, this reason can cause
the discriminator to fail to learn. There0fo re, the Wasserstein𝜃 distance, which is flexible and focuses
on convergence, is applied to train the process of GAN.
Figure5. Exampleoftwoprobabilitydistributions.
The reason why WasseFrisgtuerien 5d. iEsxtaanmcpel eis o af twweoa pkr ombaebtriilict yis d sishtoriwbunt iionn sF.i gure 5.
WhereXandYarerandomvariablesmappedas X ∼ P0, Y ∼ Pθ,respectively,andd(X,Y)is
distancebetweenXandY.Here,d(X,Y)iscalculatedasfollows:
𝑋(𝑤) = (0d, (𝑍X(cid:2869),(Y𝑤))=) (|θ−0𝑑|2(+𝑋,(cid:12)(cid:12)(cid:12)𝑌Z)1 (w≥)𝜃− Z2(w)(cid:12)(cid:12)(cid:12)𝑌)12(𝑤≥)|θ=|.(0, 𝑍(cid:2870)(𝑤)) (3)
Theexpectedvalueofd(X,Y)isequaltoorgreaterthanθwithanyjointprobabilitydistributionγ:
Eγ[0d (X,Y)] ≥ Eγ[|θ|] =𝜃| θ|. (4)
Figure 5. Example of two probability distributions.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 29
where X and Y are random variables mapped as 𝑋 ~ 𝑃 ,𝑌 ~ 𝑃 , respectively, and 𝑑(𝑋,𝑌) is distance
(cid:2868) (cid:3087)
between X and Y. Here, 𝑑(𝑋,𝑌) is calculated as follows:
(cid:2869)
𝑑(𝑋,𝑌)=(|𝜃−0|(cid:2870)+|𝑍 (𝑤)−𝑍 (𝑤)|)(cid:2870) ≥|𝜃|. (3)
(cid:2869) (cid:2870)
The expected value of 𝑑(𝑋,𝑌) is equal to or greater than 𝜃 with any joint probability
distribution γ:
Appl.Sci.2019,9,2358 5of27
E(cid:2963)[𝑑(𝑋,𝑌)] ≥ E(cid:2963)[|𝜃|]=|𝜃|. (4)
WWhhenenZ 𝑍i(cid:2869)s eisq ueaqlutoalZ to, th𝑍e(cid:2870)e, xtpheec teedxpveacltueed ofvadl(uXe, Yof) b𝑑e(c𝑋o,m𝑌e)s b|θe|c.oTmheesn ,|t𝜃h|e. dTehsierned, cthoen clduessiiorned
1 2
conclusion is achieved as the following equation 5 and Figure 6.
isachievedasthefollowingEquation(5)andFigure6.
𝑊(𝑃 ,𝑃 )=|𝜃| (5)
(cid:2868) (cid:3087)
W(P0, Pθ) =|θ| (5)
FigFuigreur6e. 6G. rGaprahpohf oEfq euqautiaotnio(n5 )5..
Asaresult,theobjectivefunctionofWGANcanbeexpressedbytheKanotorovich-Rubinstein
As a result, the objective function of WGAN can be expressed by the Kanotorovich-Rubinstein
duality[12]:
duality [12]:
minmax E [D(x)]− E [D(G((cid:101)x))], (6)
GminDm∈Laxx∼PEr [𝐷(𝑥)]−x∼PEg (cid:3427)𝐷(cid:3435)𝐺(𝑥(cid:3556))(cid:3439)(cid:3431), (6)
(cid:3008) (cid:3005)∈(cid:3013) (cid:3051)~(cid:3017)(cid:3293) (cid:3051)~(cid:3017)(cid:3282)
λλ(cid:101)x(cid:3051)∼(cid:3556)E~PE(cid:3017)(cid:101)x[(cid:3299)(cid:3557)(cid:16)[(cid:12)(cid:12)(cid:12)(cid:4672)(cid:12)(cid:12)(cid:12)∇(cid:3627)|(cid:101)x∇D(cid:3051)(cid:3556)𝐷((cid:101)x()𝑥(cid:3556)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12))2|(cid:3627)−(cid:2870)1−(cid:17)21](cid:4673).(cid:2870)]. ((77))
Gradientpenaltytermlimitsgradientweighttorange[−c,c],wherecisthethreshold. Applying
Gradient penalty term limits gradient weight to range [-c, c], where c is the threshold. Applying
agradientpenaltytermtoWGANcanpreventgradientvanishingandtheexplodingproblem.
a gradient penalty term to WGAN can prevent gradient vanishing and the exploding problem.
2.3. PerceptualLossFunction
2.3. Perceptual Loss Function
Theperceptuallossfunctionisusedinimagetransfertasksbyextractingrepresentationsofa
The perceptual loss function is used in image transfer tasks by extracting representations of a
featuremap. Therearetwotypesofperceptuallosseswhicharethecontentlossfunctionandthe
feature map. There are two types of perceptual losses which are the content loss function and the
stylelossfunction. Basicmeritoftheperceptuallossfunctionistheabilitytoextractfeatureresponses
style loss function. Basic merit of the perceptual loss function is the ability to extract feature
inlayersofCNN.Asthenetworkgoesdeeper,theinputimageischangedtoberepresentationsof
responses in layers of CNN. As the network goes deeper, the input image is changed to be
features, not of pixel values. That is because features of the high layer have large receptive fields,
representations of features, not of pixel values. That is because features of the high layer have large
anditrepresentstheactualcontentoftheimageandspatialstructure. Therefore,thehighlayerofthe
receptive fields, and it represents the actual content of the image and spatial structure. Therefore, the
networkcancapturehigh-levelcontentsandperceptualinformation,asinFigure7,andthefeature
high layer of the network can capture high-level contents and perceptual information, as in Figure 7,
responseofhighlayersiscalledthecontentrepresentation.
andT tohem feinaitmuriez eretshpeondsieff eorfe hnicgeh olafyceorns tiesn ctalrleepdr tehsee nctoanttioennt breeptwreeseennttahtieonin. put and target images,
thecontentlossfunctionwasproposedin[10],anditisrepresentedasfollows:
L∅co,njtent (x,xˆ) = C H1W (cid:12)(cid:12)(cid:12)|∅j(xˆ)−∅j(x)|(cid:12)(cid:12)(cid:12)22, (8)
j j j
wherex,xˆ,∅ andC ×H ×W areinputimage,targetimage,featuremapofj-thlayerandsizeof
j j j j
feature map, respectively. Usually, feature map is extracted in pre-trained networks by ImageNet
datasets,suchasVGGNet16orVGGNet19. Thecontentlossfunctioniscalculatedbymeansquareerror.
Inthede-blurringtask,usingthecontentlossfunctionforanetworkencouragestheoutputimage
tobeperceptuallysimilartothesharpimage,sinceitcancapturethecontentofsharpnessinformation.
Tocapturemoreperceptualsimilarityandtextureinformation,thestylelossfunctionwasproposed
in[11]. Thestylelossfunctionissimilartothecontentlossfunction,whichusesfeatureresponses
inlayerofnetwork. However,thestylelossfunctionconsistsofthecorrelationsbetweendifferent
Appl.Sci.2019,9,2358 6of27
featureresponses,anditextractsmultiplefeaturemapsofnetwork. Byconsideringthecorrelationof
features,thestylelossfunctionobtainsmulti-scalerepresentations. Correlationoffeaturesisgiven
byaGrammatrix,whichisacovariancematrixthatrepresentsthedistributionofanimage,andthe
Grammatrixistheinnerproductbetweendifferentvectoredfeaturemaps. TheGrammatrixforstyle
lossisexpressedasfollows:
G∅j (x)c,c(cid:48) = C H1W (cid:88)H (cid:88)W ∅j(x)h,w,c∅j(x)h,w,c(cid:48) (9)
j j j h=1w=1
wAphple. rSeci.G 2∅j019(,x 9), ,x∅ FOj(Rx )PhE,wE,Rc RaEnVdIE∅Wj( x)h,w,c(cid:48),andCj×Hj×Wj aretheGrammatrixoffeaturemapi6n ojf- t2h9
layer,differentfeaturemapswithdifferentchannels,andthesizeoffeaturemap,respectively.
FFiigguurree 77.. FFeeaattuurree rreessppoonnsseeo fofla lyaeyresrisn iCn NCNN.NF.i rFstirrsotw roiwso irsi goinriaglinimala gime,acgoen, vc_o2nlva_y2e rlaaynedr caonndv _c4olnavy_e4r.
Slaeyceorn. dSercoowndis rcoown vis_ 6colanyve_r6, claoynevr_, 9colanyve_r9a lnadyecro annvd_1 c2olnavy_e1r.2 layer.
FortheefficientcalculationoftheGrammatrix,featuremapsarereshapedbychangingthesize
To minimize the difference of content representation between the input and target images, the
ofC ×H ×W toC ×H W . Afterreshaping,thestylelossfunctioniscalculatedusingdifference
contejnt lojss funj ctionj wasj prjoposed in [10], and it is represented as follows:
betweentheGrammatricesofthetargetandtheoutputimages. ItisminimizedbyusingtheFrobenius
1 (cid:2870)
norm,andtheformulaofthesℒty(cid:2913)∅(cid:2925),l(cid:3037)e(cid:2924)(cid:2930)(cid:2915)lo(cid:2924)(cid:2930)s s(𝑥fu,𝑥(cid:3548)n)ct=io n𝐶(cid:3037)i𝐻s(cid:3037)a𝑊s(cid:3037)f(cid:4698)o(cid:3627)l∅lo(cid:2920)(w𝑥(cid:3548))s:−∅(cid:2920)(𝑥)(cid:3627)(cid:4698)(cid:2870), (8)
wfeahteurree x m, 𝑥a(cid:3548),p ,∅ r(cid:2920) easnpdec tCiv(cid:2920)×ely𝐻.(cid:3037) U×s𝑊ua(cid:3037)L ll∅saytry,,jel ef ien=aptuurCte ij Hmm1jaaWgpej ,(cid:12)(cid:12)(cid:12)(cid:12)i (cid:12)(cid:12)(cid:12)(cid:12)stG ae∅jrxgt(erxtˆa )cim−teadGg ∅jien(, xfpe)(cid:12)(cid:12)(cid:12)(cid:12)ra(cid:12)(cid:12)(cid:12)(cid:12)e2Ftu-.trrea imneadp noef tjw-tho rlkasy ebry a Inmda sgize(eN1 o0ef)t
datasets, such as VGGNet16 or VGGNet19. The content loss function is calculated by mean square
Asaresult,applyingthesylelossfunctiontoWGANcanpreservemoredetailededgeinformation
error.
andcapturethesimilarityofperceptualstyles.
In the de-blurring task, using the content loss function for a network encourages the output
2im.4a.gPer otbol ebme Dpeefirnceitpiotnually similar to the sharp image, since it can capture the content of sharpness
information. To capture more perceptual similarity and texture information, the style loss function
Figures 8 and 9 show experimental results of WGAN-GP with the content loss function.
was proposed in [11]. The style loss function is similar to the content loss function, which uses
TheseexperimentsareperformedwiththeGOPROdataset. Thegoalofthisnetworkistoreconstruct
feature responses in layer of network. However, the style loss function consists of the correlations
thede-blurredimage,thatissimilartothesharpimage,withtheuseoftheblurimage. Blurredand
between different feature responses, and it extracts multiple feature maps of network. By
sharpimagesarefedtothenetwork,withthenetworklearningtheperceptualcontentandsimilarity
considering the correlation of features, the style loss function obtains multi-scale representations.
ofthesharpimage.
Correlation of features is given by a Gram matrix, which is a covariance matrix that represents the
In Figure 8, trained network withWGAN-GP and content loss function generates de-blurred
distribution of an image, and the Gram matrix is the inner product between different vectored
image,whichhasperceptualcontentinformation. ItresultsingoodperformanceinPSNRandoutputs
feature maps. The Gram matrix for style loss is expressed as follows:
de-blurredimagebyeliminatingblurnoise. However,aproblemexistswhenthenetworkreconstructs
(cid:3009) (cid:3024)
1
G∅ (𝑥) = (cid:3533)(cid:3533)∅(𝑥) ∅(𝑥) (9)
(cid:2920) (cid:3030),(cid:3030)(cid:4593) 𝐶𝐻𝑊 (cid:2920) (cid:3035),(cid:3050),(cid:3030) (cid:2920) (cid:3035),(cid:3050),(cid:3030)(cid:4593)
(cid:3037) (cid:3037) (cid:3037)
(cid:3035)(cid:2880)(cid:2869)(cid:3050)(cid:2880)(cid:2869)
where G∅ (𝑥), ∅(𝑥) and ∅(𝑥) , and C ×𝐻 ×𝑊 are the Gram matrix of feature map in j-th
(cid:2920) (cid:2920) (cid:3035),(cid:3050),(cid:3030) (cid:2920) (cid:3035),(cid:3050),(cid:3030)(cid:4593) (cid:2920) (cid:3037) (cid:3037)
layer, different feature maps with different channels, and the size of feature map, respectively.
For the efficient calculation of the Gram matrix, feature maps are reshaped by changing the size
of C ×𝐻 ×𝑊 to C ×𝐻𝑊. After reshaping, the style loss function is calculated using difference
(cid:2920) (cid:3037) (cid:3037) (cid:2920) (cid:3037) (cid:3037)
between the Gram matrices of the target and the output images. It is minimized by using the
Frobenius norm, and the formula of the style loss function is as follows:
1
ℒ∅,(cid:3037) = ||G∅ (𝑥(cid:3548))−G∅(𝑥)||(cid:2870). (10)
(cid:2929)(cid:2930)(cid:2935)(cid:2922)(cid:2915) 𝐶𝐻𝑊 (cid:2920) (cid:2920) (cid:2890)
(cid:3037) (cid:3037) (cid:3037)
Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 29
As a result, applying the syle loss function to WGAN can preserve more detailed edge
information and capture the similarity of perceptual styles.
2.4. Problem Definition
Figures 8 and 9 show experimental results of WGAN-GP with the content loss function. These
experiments are performed with the GOPRO dataset. The goal of this network is to reconstruct the
de-blurred image, that is similar to the sharp image, with the use of the blur image. Blurred and
sharp images are fed to the network, with the network learning the perceptual content and similarity
of the sharp image.
Appl.SInci .F20ig19u,r9e,2 835, 8trained network with WGAN-GP and content loss function generates de-blu7rorfe2d7
image, which has perceptual content information. It results in good performance in PSNR and
oimutapguet.s Adse-sbhluowrrendi nimexatgeen bdyin eglitmheinsautibn-gp abrltuor fngoeisnee.r Hatoedweimvearg, eas pinroFbilgeumr eex9i,sbtslo wckheenff etchtes noectcwurorink
reconstructs image. As shown in extending the sub-part of generated images in figure 9, block
objects and backgrounds. This is because the content loss function uses only one feature map in
effects occur in objects and backgrounds. This is because the content loss function uses only one
CNN.Thatis,usingasinglefeaturemaptocaptureperceptualinformationisnotenoughtorepresent
feature map in CNN. That is, using a single feature map to capture perceptual information is not
micro-edgesinsmallobjects,suchasaleaf,branch,etc.
enough to represent micro-edges in small objects, such as a leaf, branch, etc.
(a) (b) (c)
FFiigguurree 88.. EExxaammppllee iimmaaggeess ooff WWGGAANN--GGPP wwiitthh ccoonntteenntt lloossss ffuunnccttiioonn:: ((aa)) SShhaarrpp iimmaaggee.. ((bb)) BBlluurrrreedd
Appl. Siimmci.a a2gg0ee1 9((,cc 9)) , RRx eeFccOooRnn ssPttErruuEccRtt RiimmEVaaIggEeeW.. 8 of 29
FFiigguurree 99.. EExxtteennddiinngg ppaarrtt ooff eexxaammppllee iimmaaggeess..
IInn tthhiiss rereggaardrd,t, otoim ipmropvroevthe ethshea rsphnarepssnoesfst hoef otuhtep uotuitmpaugt ei,mthaegceo, nttheen tcloonstsefnutn lcotisosn fiusnrcetpiolanc eids
rbeyptlhaecesdty lbeyl otshsef usntyclteio lnoisns ofrudnecrtitoone xitnr aocrtdmeur lttoip leexfteraatcut rmemulatpipslien fCeaNtuNr.eT mhearpefso rine, aCdNapNt.i nTghtehreesfotyrlee,
alodsaspftuinncgt itohnei nsctyrelea sleossfse afutunrceticoonm ipnlcerxeitayseasn dfetahteurseiz ecoomftphleexrietcye patnivde tfiheel ds.izMe ooref otvheer ,rietcceopntsividee rfsietlhde.
Mcoorrreeloavtieorn, ibt ectowneseidnedrsiff theree cnotrfreealtautiroenm baeptwsoeefnla dyeifrfse.rent feature maps of layers.
The next section explains the addition of the style loss function to WGAN-GP and analyzes how
it is calculated. Finally, the de-blurring network that can generate the realistic de-blurred image and
show a high similarity between output and sharp image is proposed.
3. Proposed Algorithm
This section describes the proposed method and detailed composition to reconstruct de-blurred
image. First, perceptual style by extracting multiple feature map in high layer is described. Then, the
total loss function which combines WGAN-GP and the style loss function follows. Next, the way to
minimize the total loss function is introduced. Finally, the architecture of the de-blurring network is
depicted.
3.1. Multi-Scale Representation of Perceptual Style
To capture the perceptual style of the sharp image, style loss function is used in the proposed
method. As mentioned in Section 2, applying the style loss function can increase the similarity
between the generated and sharp images by minimizing the difference of distributions. It is similar
to the content loss function, but the style loss function extracts multiple feature maps in CNN to
increase feature complexity and receptive field.
Usually, feature maps of layers in VGG16 are used to capture similarity and perceptual style.
VGG16 [13] is a type of CNN, which has deep convolutional layers and is trained on ImageNet. The
architecture of VGG 16 is shown in Figure 10.
Appl.Sci.2019,9,2358 8of27
ThenextsectionexplainstheadditionofthestylelossfunctiontoWGAN-GPandanalyzeshowit
iscalculated. Finally,thede-blurringnetworkthatcangeneratetherealisticde-blurredimageand
showahighsimilaritybetweenoutputandsharpimageisproposed.
3. ProposedAlgorithm
Thissectiondescribestheproposedmethodanddetailedcompositiontoreconstructde-blurred
image. First, perceptualstylebyextractingmultiplefeaturemapinhighlayerisdescribed. Then,
thetotallossfunctionwhichcombinesWGAN-GPandthestylelossfunctionfollows. Next,theway
tominimizethetotallossfunctionisintroduced. Finally,thearchitectureofthede-blurringnetwork
isdepicted.
3.1. Multi-ScaleRepresentationofPerceptualStyle
Tocapturetheperceptualstyleofthesharpimage,stylelossfunctionisusedintheproposed
method. AsmentionedinSection2,applyingthestylelossfunctioncanincreasethesimilaritybetween
the generated and sharp images by minimizing the difference of distributions. It is similar to the
contentlossfunction,butthestylelossfunctionextractsmultiplefeaturemapsinCNNtoincrease
featurecomplexityandreceptivefield.
Appl. SciU. 2s0u19a, l9l,y x, FfeOaRt uPEreERm RaEpVsIEoWf layers in VGG16 are used to capture similarity and perceptu9a lofs t2y9l e.
VGG16 [13] is a type of CNN, which has deep convolutional layers and is trained on ImageNet.
ThearchitectureofVGG16isshowninFigure10.
224 x 224 x 3 224 x 224 x 64
56 x 56 x 256
28 x 28 x 512 14 x 14 x 512 7 x 7 x 512
112 x 112 x 128
Pixel Representation
Figure10. ArchitectureofVGG16network.
VGG16 adapts a 3 × 3 filter convolution repeatedly to capture the representation of the image.
Then fVeaGtuGr1e6 madaapp tosf ath3e× h3igfihlt elraycoern vinoclulutidoens rhepigeha-tleedvleyl tfoeactauprteusr eththate orebptareins ean tahtiigohn coofmthpeleimxiatyg e.
representation with large receptive field.
Then feature map of the high layer includes high-level features that obtain a high complexity
When the network goes deeper, the input image is transformed to representation values, and
representationwithlargereceptivefield.
the size of feature maps is reduced as the depth is increased. To take multi-scale representation, five
Whenthenetworkgoesdeeper,theinputimageistransformedtorepresentationvalues,andthe
feature maps, which are in different shapes, are extracted. Also, feature maps of the high layer,
sizeoffeaturemapsisreducedasthedepthisincreased. Totakemulti-scalerepresentation,fivefeature
wmhiachp sc,ownthaiicnh haigreh innond-ilffinereeanrittys,h sahpoews, baeretteerx ptrearcftoerdm.aAncleso t,hafena tfuearetumrea mpsaposf othf elohwig lhayleary. eFre,awtuhriec h
maps of the low layer are not sufficient to express the perceptual style of an image because they
containhighnon-linearity,showbetterperformancethanfeaturemapsoflowlayer. Featuremaps
coonftathine llooww-lleavyeelr faeraetunroets.s ufficienttoexpresstheperceptualstyleofanimagebecausetheycontain
The reconstructed image using low-level features is shown in figure 11. Low-level feature
low-levelfeatures.
cannot capture texture, color and style because the feature is too simple and takes small receptive
Thereconstructedimageusinglow-levelfeaturesisshowninFigure11. Low-levelfeaturecannot
field. The reconstructed image has black space owing that it does not fully express the style and
capture texture, color and style because the feature is too simple and takes small receptive field.
texture of the image. As a result, to reconstruct the de-blurring image and preserve the content with
perceptual style, this paper proposes extracting five feature maps in high layer, which have enough
perceptual style and edge information.
Figure 11. Generated image using low-level feature.
3.2. Total Loss Function of Network
The objective of the proposed method is to generate de-blurring image from the blurred image,
I , by extracting the perceptual style information from the sharp image, I . The network learns the
(cid:2886) (cid:2903)
similarity of the sharp image, and it reconstructs the de-blurred image from the blurred image by
Appl. Sci. 2019, 9, x FOR PEER REVIEW 9 of 29
224 x 224 x 3 224 x 224 x 64
56 x 56 x 256
28 x 28 x 512 14 x 14 x 512 7 x 7 x 512
112 x 112 x 128
Pixel Representation
VGG16 adapts a 3 × 3 filter convolution repeatedly to capture the representation of the image.
Then feature map of the high layer includes high-level features that obtain a high complexity
representation with large receptive field.
When the network goes deeper, the input image is transformed to representation values, and
the size of feature maps is reduced as the depth is increased. To take multi-scale representation, five
feature maps, which are in different shapes, are extracted. Also, feature maps of the high layer,
which contain high non-linearity, show better performance than feature maps of low layer. Feature
maps of the low layer are not sufficient to express the perceptual style of an image because they
Acoppnl.taSicni. 2l0o1w9,-9l,e2v3e5l8 features. 9of27
The reconstructed image using low-level features is shown in figure 11. Low-level feature
cannot capture texture, color and style because the feature is too simple and takes small receptive
Thereconstructedimagehasblackspaceowingthatitdoesnotfullyexpressthestyleandtextureof
field. The reconstructed image has black space owing that it does not fully express the style and
theimage. Asaresult,toreconstructthede-blurringimageandpreservethecontentwithperceptual
texture of the image. As a result, to reconstruct the de-blurring image and preserve the content with
style,thispaperproposesextractingfivefeaturemapsinhighlayer,whichhaveenoughperceptual
perceptual style, this paper proposes extracting five feature maps in high layer, which have enough
styleandedgeinformation.
perceptual style and edge information.
FFiigguurree 1111.. GGeenneerraatteedd iimmaaggee uussiinngg llooww--lleevveell ffeeaattuurree..
3.2. TotalLossFunctionofNetwork
3.2. Total Loss Function of Network
Theobjectiveoftheproposedmethodistogeneratede-blurringimagefromtheblurredimage,
The objective of the proposed method is to generate de-blurring image from the blurred image,
IIB,, bbyy eexxttrraaccttiinngg tthhee ppeerrcceeppttuuaall ssttyyllee iinnffoorrmmaattiioonn ffrroomm tthhee sshhaarrpp iimmaaggee,, IIS.. TThhee nneettwwoorrkk lleeaarrnnss tthhee
(cid:2886) (cid:2903)
similarityofthesharpimage,anditreconstructsthede-blurredimagefromtheblurredimagebyusing
similarity of the sharp image, and it reconstructs the de-blurred image from the blurred image by
micro-edgeinformationandperceptualsimilarity. Thesimilaritycanbeimprovedbyminimizingthe
differenceofdistributionbetweenI andI . Asaresult,well-trainedgenerator,componentofnetwork,
S B
generatesoutputimagethatincludeedgeinformation.
Totrainthenetwork,adiscriminatorandgeneratorarealternatelylearnedbyblurredandsharp
images. ThelearningdirectionofthediscriminatorintheWGAN-GPisdecidedtofindtheoptimal
solutionbyminimizingtheWassersteindistancebetweenjointdistributionsofthegeneratedandsharp
images. Thelossfunctionofthediscriminatorisasfollows:
LD = (cid:88)N [L(Is)−L(Gθ(IB)) + λ(cid:16)(cid:12)(cid:12)(cid:12)|∇L((cid:15)IS+(1−(cid:15))Gθ(IB))|(cid:12)(cid:12)(cid:12)2−1(cid:17)2], (11)
n=1
whereLisadifferentiablefunction,whichis1-Lipschitizonlyifithasgradientswithnormatmost
1everywhere. Gand(cid:15)aregeneratorandrandomnumberU[0,1],respectively,andλisthegradient
penaltycoefficientsetto10. Thegradientpenaltytermpenalizesthenetworkifthegradientnorm
movesawayfromitstargetnormvalue1.
Thediscriminatoristrainedfirst,alongwiththetrainingofagenerator,togeneratethede-blurring
image. Tomakegeneratorreconstructimageswithperceptualstyle, thetotalobjectivefunctionis
obtainedbythecombinationofgeneratorlossfunctionandstylelossasshowninEquation(12):
L= L + λ·L , (12)
G style
where L is generator loss function of WGAN-GP, and L is style loss function to capture the
G style
perceptualstylefromsharpimage. Also,λisaconstantvalueofweightparameterthatdetermines
howmuchperceptualstyleshouldbeadapted. Inmoredetail,L iscalculatedbyWGAN-GPmethod.
G
(cid:88)N
LG = −L(Gθ(IB)), (13)
n=1
Appl.Sci.2019,9,2358 10of27
whereListhecriticfunctionasmentionedaboveandGisthegenerator. Moreover,θisparameterof
thenetworkwhichminimizeslossfunction:
Lstyle = m(cid:88)=M 1 CjH1jWj(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)G∅j (Gθ(IB))−G∅j (IS)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)2F, (14)
whereMisthenumberoffeaturemapwhichisextractedfromVGG16. Proposedmethodusesfive
featuremapsofhighlayerinVGG16. Usingfewerfeaturemapscannotexpressperceptualstyle,andif
morefeaturemapsareused,morecalculationandlotoftrainingtimeareneeded. Insummary,anovel
combinedlossfunctionconsistingofWGAN-GPandstylelossfunctionisproposedtoreconstruct
edgepreservedde-blurringimagewithperceptualstyle.
3.3. NetworkArchitecture
ThearchitectureofthegeneratorisshowninFigure12,wherethegeneratorisaCNNbasedon
residualnetwork. Itiscomposedofthreeconvolutionallayers,nineresidualblock(Resblock)[14]and
twotransposedconvolutionallayers. First,toencodethecharacteristicsofimages,convolutionallayer,
instancenormalizationlayer[15]andReLUactivationlayer[16]aredesignedinfrontofthenetwork.
Thesizeofanimageisdecreasedandthedepthoffeaturemapisincreased. Afterthat,nineresidual
blocksareconnectedbehindtheconvolutionallayertoincreasefeaturecomplexity. EachResblock
Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 29
consistsofaconvolutionallayerwithdropoutregularization[17],instancenormalizationlayerand
ReLUactivationlayer.
9 ResBlocks
64d 128d 256d 128d 64d
7x7Convlayer Instancenormalizationlayer ReLUactivation
3x3Convlayer Transposed Conv layer Tanhactivation
Figure12. ArchitectureofgeneratorinWGAN-GP.
Figure 12. Architecture of generator in WGAN-GP.
Inthebackofthenetwork,thetransposedconvolutionallayerisattachedtoreshapefeaturemaps
In the back of the network, the transposed convolutional layer is attached to reshape feature
togenerateoutputimagesbyup-sampling,andTanhactivationisappliedinlastconvolutionallayer.
maps to generate output images by up-sampling, and Tanh activation is applied in last
The architecture of the discriminator is shown in Figure 13, having the same architecture as
convolutional layer.
Patch-GAN[18,19]. PatchGANwasproposedtoclassifywhethereachN × Npatchinanimageis
The architecture of the discriminator is shown in Figure 13, having the same architecture as
realorfake.
Patch-GAN [18,19]. Patch GAN was proposed to classify whether each N × N patch in an image is
Thediscriminatorconsistsoffiveconvolutionallayersandthreeinstancenormalizationlayers.
real or fake.
Unlikethegenerator,LeakyReLU[20]activationlayerisappliedtotheconvolutionallayers.
Figure14showsarchitectureoftheentirenetwork. ThenetworkisbasedonconditionalGAN,
andblurredandsharpimagesaretheinputforthenetwork. Thegeneratorproducestheestimateof
64d 128d 256d 512d
4 x 4 Conv layer Instance normalization layer LeakyReLU activation
Figure 13. Architecture of discriminator in WGAN-GP.
The discriminator consists of five convolutional layers and three instance normalization layers.
Unlike the generator, LeakyReLU [20] activation layer is applied to the convolutional layers.
Description:information of the edge in order to preserve small edge information and capture its perceptual similarity. As a result, the proposed method improves