ebook img

Semantic Adversarial Deep Learning PDF

23 Pages·2017·2.64 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Semantic Adversarial Deep Learning

Semantic Adversarial Deep Learning TommasoDreossi1,SomeshJha2,andSanjitA.Seshia1 UniversityofCaliforniaatBerkeley,Berkeley1 UniversityofWisconsin,Madison2 [email protected], [email protected], [email protected] Abstract. Fueled by massive amounts of data, models produced by machine- learning(ML)algorithms,especiallydeepneuralnetworks,arebeingusedindi- versedomainswheretrustworthinessisaconcern,includingautomotivesystems, finance,healthcare,naturallanguageprocessing,andmalwaredetection.Ofpar- ticular concern is the use of ML algorithms in cyber-physical systems (CPS), suchasself-drivingcarsandaviation,whereanadversarycancauseseriouscon- sequences. However, existing approaches to generating adversarial examples and devising robustMLalgorithmsmostlyignorethesemanticsandcontextoftheoverallsys- temcontainingtheMLcomponent.Forexample,inanautonomousvehicleusing deep learning for perception, not every adversarial example for the neural net- workmightleadtoaharmfulconsequence.Moreover,onemaywanttoprioritize the search for adversarial examples towards those that significantly modify the desiredsemanticsoftheoverallsystem.Alongthesamelines,existingalgorithms forconstructingrobustMLalgorithmsignorethespecificationoftheoverallsys- tem. In this paper, we argue that the semantics and specification of the overall systemhasacrucialroletoplayinthislineofresearch.Wepresentpreliminary researchresultsthatsupportthisclaim. 1 Introduction Machine learning (ML) algorithms, fueled by massive amounts of data, are increas- ingly being utilized in several domains, including healthcare, finance, and transporta- tion. Models produced by ML algorithms, especially deep neural networks (DNNs), arebeingdeployedindomainswheretrustworthinessisabigconcern,suchasautomo- tive systems [36], finance [26], health care [2], computer vision [29], speech recogni- tion[18],naturallanguageprocessing[39],andcyber-security[9,43].Ofparticularcon- cernistheuseofML(includingdeeplearning)incyber-physicalsystems(CPS)[30], wherethepresenceofanadversarycancauseseriousconsequences.Forexample,much ofthetechnologybehindautonomousanddriver-lessvehicledevelopmentis“powered” bymachinelearning[5,15,4].DNNshavealsobeenusedinairbornecollisionavoidance systemsforunmannedaircraft(ACASXu)[23].However,indesigninganddeploying thesealgorithmsincriticalcyber-physicalsystems,thepresenceofanactiveadversary isoftenignored. Adversarialmachinelearning(AML)isafieldconcernedwiththeanalysisofML algorithmstoadversarialattacks,andtheuseofsuchanalysisinmakingMLalgorithms robust to attacks. It is part of the broader agenda for safe and verified ML-based sys- tems [40,42]. In this paper, we first give a brief survey of the field of AML, with a particularfocusondeeplearning.Wefocusmainlyonattacksonoutputsormodelsthat are produced by ML algorithms that occur after training or “external attacks”, which are especially relevant to cyber-physical systems (e.g., for a driverless car the ML al- gorithmusedfornavigationhasbeenalreadytrainedbythemanufactureroncethe“car is on the road”). These attacks are more realistic and are distinct from other type of attacksonMLmodels,suchasattacksthatpoisonthetrainingdata(seethepaper[19] forasurveyofsuchattacks).Wesurveyattackscausedbyadversarialexamples,which areinputscraftedbyaddingsmall,oftenimperceptible,perturbationstoforceatrained MLmodeltomisclassify. We contend that the work on adversarial ML, while important and useful, is not enough. In particular, we advocate for the increased use of semantics in adversarial analysisanddesignofMLalgorithms.Semanticadversariallearningexploresaspace ofsemanticmodificationstothedata,usessystem-levelsemanticspecificationsinthe analysis,utilizessemanticadversarialexamplesintraining,andproducesnotjustoutput labelsbutalsoadditionalsemanticinformation.Focusingondeeplearning,weexplore theseideasandprovideinitialexperimentaldatatosupportthem. Roadmap. Section 2 provides the relevant background. A brief survey of adversarial analysisisgiveninSection3.Ourproposalforsemanticadversariallearningisgiven inSection4. 2 Background BackgroundonMachineLearning Nextwedescribesomegeneralconceptsinma- chinelearning(ML).Wewillconsiderthesupervisedlearningsetting.Considerasam- plespaceZoftheformX×Y,andanorderedtrainingsetS = ((x ,y ))m (x isthe i i i=1 i dataandy isthecorrespondinglabel).LetH beahypothesisspace(e.g.,weightscor- i respondingtoalogistic-regressionmodel).Thereisalossfunction(cid:96):H ×Z (cid:55)→Rso thatgivenahypothesisw ∈ H andasample(x,y) ∈ Z,weobtainaloss(cid:96)(w,(x,y)). WeconsiderthecasewherewewanttominimizethelossoverthetrainingsetS, m 1 (cid:88) L (w)= (cid:96)(w,(x ,y )) + λR(w). S m i i i=1 Intheequationgivenabove,λ>0andthetermR(w)iscalledtheregularizeranden- forces“simplicity”inw.SinceS isfixed,wesometimesdenote(cid:96) (w)=(cid:96)(w,(x ,y )) i i i asafunctiononlyofw.WewishtofindawthatminimizesL (w)orwewishtosolve s thefollowingoptimizationproblem: minL (w) S w∈H Example: We will consider the example of logistic regression. In this case X = Rn, Y = {+1,−1},H = Rn,andthelossfunction(cid:96)(w,(x,y))isasfollows(·represents thedotproductoftwovectors): (cid:16) (cid:17) log 1+e−y(wT·x) IfweusetheL regularizer(i.e.R(w)=(cid:107)w (cid:107) ),thenL (w)becomes: 2 2 S m 1 (cid:88)log(cid:16)1+e−yi(wT·xi)(cid:17) + λ(cid:107)w (cid:107) m 2 i=1 StochasticGradientDescent.StochasticGradientDescent(SGD)isapopularmethod for solving optimization tasks (such as the optimization problem min L (w) we w∈H S consideredbefore).Inanutshell,SGDperformsaseriesofupdateswhereeachupdateis agradientdescentupdatewithrespecttoasmallsetofpointssampledfromthetraining set. Specifically, suppose that we perform SGD T times. There are two typical forms of SGD: in the first form, which we call Sample-SGD, we uniformly and randomly sample i ∼ [m] at time t, and perform a gradient descent based on the i -th sample t t (x ,y ): it it w =G (w )=w −η (cid:96)(cid:48) (w ) (1) t+1 (cid:96)t,ηt t t t it t where w is the hypothesis at time t, η is a parameter called the learning rate, and t t (cid:96)(cid:48) (w )denotesthederivativeof(cid:96) (w)evaluatedatw .WewilldenoteG asG . it t it t (cid:96)t,ηt t Inthesecondform,whichwecallPerm-SGD,wefirstperformarandompermutation ofS,andthenapplyEquation1T timesbycyclingthroughSaccordingtotheorderof thepermutation.TheprocessofSGDcanbesummarizedasadiagram: w −G→1 w −G→2 ···−G→t w G−t→+1···−G→T w 0 1 t T Classifiers.Theoutputofthelearningalgorithmgivesusaclassifier,whichisafunc- tion from (cid:60)n to C, where (cid:60) denotes the set of reals and C is the set of class labels. Toemphasizethataclassifierdependsonahypothesisw ∈ H,whichistheoutputof thelearningalgorithmdescribedearlier,wewillwriteitasF (ifw isclearfromthe w context,wewillsometimessimplywriteF).Forexample,aftertraininginthecaseof logisticregressionweobtainafunctionfrom(cid:60)nto{−1,+1}. SomeclassifiersF (x)(vectorswillbedenotedinboldface)areoftheform w argmax s(F )(x)[l](i.e.,theclassifierF outputsthelabelwiththemaximumprob- l w w abilityaccordingtothe“softmaxlayer”).Forexample,inseveraldeep-neuralnetwork (DNN)architecturesthelastlayeristhesoftmaxlayer.Recallthatthesoftmaxfunction fromRk toaprobabilitydistributionover{1,··· ,k} = [k]suchthattheprobability ofj ∈[k]foravectorx∈Rk is ex[j] (cid:80)k ex[r] r=1 Ther-thcomponentofavectorxisdenotedbyx[r].Weareassumingthatthereader isafamiliarwithbasicsofdeep-neuralnetworks(DNNs).Forreadersnotfamiliarwith DNNswecanrefertotheexcellentbookbyGoodfellow,Bengio,andCourville[16]. Throughoutthepaper,werefertothefunctions(F )asthesoftmaxlayercorresponding w totheclassifierF .Inthecaseoflogisticregression,s(F )(x)isthefollowingtuple w w (thefirstelementistheprobabilityof−1andthesecondoneistheprobabilityof+1): 1 1 (cid:104) , (cid:105) 1+ewT·x 1+e−wT·x Formally, let c = |C| and F be a classifier, we let s(F ) be the function that maps w w Rn to Rc such that (cid:107)s(F )(x)(cid:107) = 1 for any x (i.e., s(F ) computes a probability + w 1 w vector).Wedenotes(F )(x)[l]tobetheprobabilityofs(F )(x)atlabell. w w Background on Logic Temporal logics are commonly used for specifying desired and undesired properties of systems. For cyber-physical systems, it is common to use temporallogicsthatcanspecifypropertiesofreal-valuedsignalsoverrealtime,suchas signaltemporallogic(STL)[31]ormetrictemporallogic(MTL)[28]. Asignalisafunctions : D → S,withD ⊆ R anintervalandeitherS ⊆ Bor ≥0 S ⊆ R, where B = {(cid:62),⊥} and R is the set of reals. Signals defined on B are called booleans,whilethoseonRaresaidreal-valued.Atracew ={s ,...,s }isafiniteset 1 n ofreal-valuedsignalsdefinedoverthesameintervalD.Weusevariablesx todenote i thevalueofareal-valuedsignalataparticulartimeinstant. Let Σ = {σ ,...,σ } be a finite set of predicates σ : Rn → B, with σ ≡ 1 k i i p (x ,...,x ) (cid:67) 0, (cid:67) ∈ {<,≤}, and p : Rn → R a function in the variables i 1 n i x ,...,x .AnSTLformulaisdefinedbythefollowinggrammar: 1 n ϕ:=σ|¬ϕ|ϕ∧ϕ|ϕU ϕ (2) I where σ ∈ Σ is a predicate and I ⊂ R is a closed non-singular interval. Other ≥0 commontemporaloperatorscanbedefinedassyntacticabbreviationsintheusualway, likeforinstanceϕ ∨ϕ :=¬(¬ϕ ∧ϕ ),F ϕ:=(cid:62)U ϕ,orG ϕ :=¬F ¬ϕ.Given 1 2 1 2 I I I I at ∈ R ,ashiftedintervalI isdefinedast+I = {t+t(cid:48) | t(cid:48) ∈ I}.Thequalitative ≥0 (orBoolean)semanticsofSTLisgivenintheusualway: Definition1 (Qualitative semantics). Let w be a trace, t ∈ R , and ϕ be an STL ≥0 formula.Thequalitativesemanticsofϕisinductivelydefinedasfollows: w,t|=σiffσ(w(t))istrue w,t|=¬ϕiffw,t(cid:54)|=ϕ (3) w,t|=ϕ ∧ϕ iffw,t|=ϕ andw,t|=ϕ 1 2 1 2 w,t|=ϕ U ϕ iff∃t(cid:48) ∈t+I s.t.w,t(cid:48) |=ϕ and∀t(cid:48)(cid:48) ∈[t,t(cid:48)],w,t(cid:48)(cid:48) |=ϕ 1 I 2 2 1 A trace w satisfies a formula ϕ if and only if w,0 |= ϕ, in short w |= ϕ. STL also admits a quantitative or robust semantics, which we omit for brevity. This pro- videsquantitativeinformationontheformula,tellinghowstronglythespecificationis satisfiedorviolatedforagiventrace. 3 Attacks ThereareseveraltypesofattacksonMLalgorithms.Forexcellentmaterialonvarious attacks on ML algorithms we refer the reader to [19,3]. For example, in training time attacks an adversary wishes to poison a data set so that a “bad” hypothesis is learned byanML-algorithm.ThisattackcanbemodeledasagamebetweenthealgorithmML andanadversaryAasfollows: – MLpicksanorderedtrainingsetS = ((x ,y ))m . i i i=1 – ApicksanorderedtrainingsetS(cid:98) = ((xˆi,yˆi))ri=1,whereris(cid:98)(cid:15)m(cid:99). – MLlearnsonS∪S(cid:98)byessentiallyminimizing minL (w). w∈H S∪S(cid:98) TheattackerwantstomaximizetheabovequantityandthuschoosesS(cid:98)suchthat min L (w)ismaximized.Forarecentpaperoncertifieddefensesforsuchat- w∈H S∪S(cid:98) tackswereferthereaderto[45].Inmodelextractionattacksanadversarywithblack- boxaccesstoaclassifier,butnopriorknowledgeoftheparametersofaMLalgorithmor trainingdata,aimstoduplicatethefunctionalityof(i.e.,steal)theclassifierbyquerying itonwellchosendatapoints.Foranexample,model-extractionattackssee[46]. In this paper, we consider test-time attacks. We assume that the classifier F has w been trained without any interference from the attacker (i.e. no training time attacks). Roughly speaking, an attacker has an image x (e.g. an image of stop sign) and wants tocraftaperturbationδsothatthelabelofx+δiswhattheattackerdesires(e.g.yield sign).Thenextsub-sectiondescribestest-timeattacksindetail.Wewillsometimesrefer to F as simply F, but the hypothesis w is lurking in the background (i.e., whenever w werefertow,itcorrespondstotheclassifierF). 3.1 Test-timeAttacks The adversarial goal is to take any input vector x ∈ (cid:60)n and produce a minimally alteredversionofx,adversarialsampledenotedbyx(cid:63),thathasthepropertyofbeing misclassified by a classifier F : (cid:60)n → C. Formally speaking, an adversary wishes to solvethefollowingoptimizationproblem: min µ(δ) δ∈(cid:60)n suchthat F(x+δ)∈T δ·M = 0 The various terms in the formulation are µ is a metric on (cid:60)n, T ⊆ C is a subset of the labels (the reader should think of T as the target labels for the attacker), and M(calledthemask)isan-dimensional0−1vectorofsizen.Theobjectivefunction minimizesthemetricµontheperturbationδ.Nextwedescribevariousconstraintsin theformulation. – F(x+δ)∈T ThesetT constrainstheperturbedvectorx+δ1tohavethelabel(accordingtoF) inthesetT.Formis-classificationproblems(thelabelofxandx+δ)aredifferent we have T = C−{F(x)}. For targeted mis-classification we have T = {t} (for t ∈ C), where t is the target that an attacker wants (e.g., the attacker wants t to correspondtoayieldsign). 1Thevectorsareaddedcomponentwise – δ·M = 0 The vector M can be considered as a mask (i.e., an attacker can only perturb a dimensioniifM[i] = 0),i.e.,ifM[i] = 1thenδ[i]isforcedtobe0.Essentially the attacker can only perturb dimension i if the i-th component of M is 0, which meansthatδliesink-dimensionalspacewherekisthenumberofnon-zeroentries in∆.Thisconstraintisimportantifanattackerwantstotargetacertainareaofthe image(e.g.,glassesofinapictureofperson)toperturb. – Convexity Notice that even if the metric µ is convex (e.g., µ is the L norm), because of 2 theconstraintinvolvingF,theoptimizationproblemisnotconvex(theconstraint δ·M = 0isconvex).Ingeneral,solvingconvexoptimizationproblemsismore tractablenon-convexoptimization[35]. Notethattheconstraintδ·M = 0essentiallyconstrainsthevectortobeinalower- dimensional space and does add additional complexity to the optimization problem. Therefore, for the rest of the section we will ignore that constraint and work with the followingformulation: min µ(δ) δ∈(cid:60)n suchthat F(x+δ)∈T FGSM mis-classification attack - This algorithm is also known as the fast gradient signmethod(FGSM)[17].Theadversarycraftsanadversarialsamplex(cid:63) =x+δfora givenlegitimatesamplexbycomputingthefollowingperturbation: δ =εsign(∇ L (x)) (4) x F ThefunctionL (x)isashorthandfor(cid:96)(w,x,l(x)),wherew isthehypothesiscorre- F spondingtotheclassifierF,xisthedatapointandl(x)isthelabelofx(essentiallywe evaluatethelossfunctionatthehypothesiscorrespondingtotheclassifier).Thegradi- entofthefunctionL iscomputedwithrespecttoxusingsamplexandlabely =l(x) F as inputs. Note that ∇ L (x)) is an n-dimensional vector and sign(∇ L (x)) is a x F x F n-dimensional vector whose i-th element is the sign of the ∇ L (x))[i]. The value x F oftheinputvariationparameterεfactoringthesignmatrixcontrolstheperturbation’s amplitude.Increasingitsvalueincreasesthelikelihoodofx(cid:63)beingmisclassifiedbythe classifierF butonthecontrarymakesadversarialsampleseasiertodetectbyhumans. ThekeyideaisthatFGSMtakesastepinthedirectionofthegradientofthelossfunc- tionandthustriestomaximizeit.RecallthatSGDtakesastepinthedirectionthatis opposite to the gradient of the loss function because it is trying to minimize the loss function. JSMAtargettedmis-classificationattack-Thisalgorithmissuitablefortargetedmis- classification [38]. We refer to this attack as JSMA throughout the rest of the paper. To craft the perturbation δ, components are sorted by decreasing adversarial saliency value.TheadversarialsaliencyvalueS(x,t)[i]ofcomponentiforanadversarialtarget classtisdefinedas: (cid:40)0if ∂s(F)[t](x) <0or (cid:80) ∂s(F)[j](x) >0 ∂x[i] j(cid:54)=t ∂x[i] S(x,t)[i]= (cid:12) (cid:12) (5) ∂s(F)[t](x)(cid:12)(cid:80) ∂s(F)[j](x)(cid:12) otherwise ∂x[i] (cid:12) j(cid:54)=t ∂x[i] (cid:12) (cid:104) (cid:105) wherematrixJ = ∂s(F)[j](x) istheJacobianmatrixfortheoutputofthesoftmax F ∂x[i] ij (cid:80) layers(F)(x).Since s(F)[k](x) = 1,wehavethefollowingequation: k∈C ∂s(F)[t](x) (cid:88)∂s(F)[j](x) =− ∂x[i] ∂x[i] j(cid:54)=t The first case corresponds to the scenario if changing the i-th component of x takes us further away from the target label t. Intuitively, S(x,t)[i] indicates how likely is changing the i-th component of x going to “move towards” the target label t. Input components i are added to perturbation δ in order of decreasing adversarial saliency value S(x,t)[i] until the resulting adversarial sample x(cid:63) = x+δ achieves the target labelt.Theperturbationintroducedforeachselectedinputcomponentcanvary.Greater individual variations tend to reduce the number of components perturbed to achieve misclassification. CW targetted mis-classification attack. The CW-attack [6] is widely believed to be oneofthemost“powerful”attacks.ThereasonisthatCWcasttheirproblemasanun- constrainedoptimizationproblem,andthenusestate-of-theartsolver(i.e.Adam[25]). Inotherwords,theyleveragetheadvancesinoptimizationforthepurposesofgenerat- ingadversarialexamples. IntheirpaperCarlini-Wagnerconsiderawidevarietyofformulations,butwepresent theonethatperformsbestaccordingtotheirevaluation.Theoptimizationproblemcor- respondingtoCWisasfollows: min µ(δ) δ∈(cid:60)n suchthat F(x+δ) = t CWuseanexistingsolver(Adam[25])andthusneedtomakesurethateachcomponent ofx+δisbetween0and1(i.e.validpixelvalues).Notethattheothermethodsdidnot face this issue because they control the “internals” of the algorithm (i.e., CW used a solverina“blackbox”manner).Weintroduceanewvectorwwhosei-thcomponent isdefinedaccordingtothefollowingequation: 1 δ[i]= (tanh(w[i])+1)−x[i] 2 Since−1 ≤ tanh(w[i]) ≤ 1,itfollowsthat0 ≤ x[i]+δ[i] ≤ 1.Intermsofthisnew variabletheoptimizationproblembecomes: min µ(1(tanh(w)+1)−x) w∈(cid:60)n 2 suchthat F(1(tanh(w)+1)) = t 2 Nexttheyapproximatetheconstraint(F(x) = t)withthefollowingfunction: (cid:18) (cid:19) g(x)=max maxZ(F)(x)[i] − Z(F)(x)[t],−κ i(cid:54)=t In the equation given above Z(F) is the input of the DNN to the softmax layer (i.e s(F)(x) = softmax(Z(F)(x)))andκisaconfidenceparameter(higherκencourages thesolvertofindadversarialexampleswithhigherconfidence).Thenewoptimization formulationisasfollows: min µ(1(tanh(w)+1)−x) w∈(cid:60)n 2 suchthat g(1(tanh(w)+1)) ≤ 0 2 Nextweincorporatetheconstraintintotheobjectivefunctionasfollows: min µ(1(tanh(w)+1)−x) + cg(1(tanh(w)+1)) w∈(cid:60)n 2 2 Intheobjectivegivenabove,the“lagrangianvariable”c > 0isasuitablychosencon- stant (from the optimization literature we know that there exists c > 0 such that the optimalsolutionsofthelasttwoformulationsarethesame). 3.2 AdversarialTraining Onceanattackerfindsanadversarialexample,thenthealgorithmcanberetrainedus- ing this example. Researchers have found that retraining the model with adversarial examples, produces a more robust model. For this section, we will work with attack algorithmsthathaveatargetlabelt(i.e.weareinthetargetedmis-classificationcase, suchasJSMAorCW).LetA(w,x,t)betheattackalgorithm,whereitsinputsareas follows: w ∈ H is the current hypothesis, x is the data point, and t ∈ C is the target label.TheoutputofA(w,x,t)isaperturbationδsuchthatF(x+δ)=t.Iftheattack algorithm is simply a mis-classification algorithm (e.g. FGSM or Deepfool) we will dropthelastparametert. AnadversarialtrainingalgorithmR (w,x,t)isparameterizedbyanattackalgo- A rithm A and outputs a new hypothesis w(cid:48) ∈ H. Adversarial training works by taking a datapoint x and an attack algorithm A(w,x,t) as its input and then retraiining the modelusingaspeciallydesignedlossfunction(essentiallyoneperformsasinglestep oftheSGDusingthenewlossfunction).Thequestionarises:whatlossfunctiontouse duringthetraining?Differentmethodsusedifferentlossfunctions. Next,wediscusssomeadversarialtrainingalgorithmsproposedintheliterature.At ahighlevel,animportantpointisthatthemoresophisticatedanadversarialperturbation algorithmis,harderitistoturnitintoadversarialtraining.Thereasonisthatitishardto “encode”theadversarialperturbationalgorithmasanobjectivefunctionandoptimize it.Wewillseethisbelow,especiallyforthevirtualadversarialtraining(VAT)proposed byMiyatoetal.[34]. Retraining for FGSM. We discussed the FGSM attack method earlier. In this case A = FGSM.ThelossfunctionusedbytheretrainingalgorithmR (w,x,t)as FGSM follows: (cid:96) (w,x ,y )=(cid:96)(w,x ,y )+λ(cid:96)(w,x +FGSM(w,x ),y ) FGSM i i i i i i i RecallthatFGSM(w,x)wasdefinedearlier,andλisaregularizationparameter.The simplicity of FGSM(w,x ) allows taking its gradient, but this objective function re- i quireslabely becausewearereusingthesamelossfunction(cid:96)usedtotraintheoriginal i model.Further,FGSM(w,x )maynotbeverygoodbecauseitmaynotproducegood i adversarial perturbation direction. The retraining algorithm is simply as follows: take onestepintheSGDusingthelossfunction(cid:96) atthedatapointx . FGSM i AcaveatisneededfortakinggradientduringtheSGDstep.Atiterationtsuppose wehavemodelparametersw ,andweneedtocomputegradientoftheobjective.Note t thatFGSM(w,x)dependsonwsobychainruleweneedtocompute∂FGSM(w,x)/∂w| . w=wt However,thisgradientisvolatile2,andsoinsteadGoodfellowetal.onlycompute: (cid:12) ∂(cid:96)(w,xi+FGSM(wt,xi),yi)(cid:12) (cid:12) ∂w (cid:12) w=wt EssentiallytheytreatFGSM(w ,x )asaconstantwhiletakingthederivative. t i Virtual Adversarial Training (VAT). Miyato et al. [34] observed the drawback of requiring label y above. They propose that, instead of reusing (cid:96), to use the following i fortheregularizer, ∆(r,x,w)=KL(s(F )(x)[y],s(F )(x+r)[y]) w w for some r such that (cid:107)r(cid:107) ≤ δ. As a result, the label y is no longer required. The i questionis:whatr touse?Miyatoetal.[34]proposethatintheoryweshouldusethe “best”oneas max KL(s(F )(x)[y],s(F )(x+r)[y]) w w r:(cid:107)r(cid:107)≤δ IntheequationgivenaboveKListheKullbackLeiblerdivergence.RecallthatKLdi- vergenceoftwodistributionsP andQoverthesamefinitedomainD isgivenbythe followingequation: (cid:18) (cid:19) (cid:88) P(i) KL(P,Q)= P(i)log Q(i) i∈D Thisthusgivesrisetothefollowinglossfunctiontouseduringretraining: (cid:96) (w,x ,y )=(cid:96)(w,,x ,y )+λ max ∆(r,x ,w) VAT i i i i i r:(cid:107)r(cid:107)≤δ However,onecannoteasilytakegradientfortheregularizer.Hencetheauthorsperform anapproximationasfollows: 1. Take Taylor expansion of ∆(r,x ,w) at r = 0, so ∆(r,x ,w) = rTH(x ,w) r i i i whereH(x ,w)istheHessianmatrixof∆(r,x ,w)withrespecttoratr =0. i i 2. Thus max ∆(r,x ,w) = max (cid:0)rTH(x ,w)r(cid:1). By variational char- (cid:107)r(cid:107)≤δ i (cid:107)r(cid:107)≤δ i acterization of the symmetric matrix (H(x ,w) is symmetric), r∗ = δv¯ where i v¯=v(x ,w)istheuniteigenvectorofH(x ,w)correspondingtoitslargesteigen- i i value.Notethatr∗dependsonx andw.Thereforethelossfunctionbecomes: i (cid:96) (θ,x ,y )=(cid:96)(θ,x ,y )+λ∆(r∗,x ,w) VAT i i i i i 2Ingeneral,second-orderderivativesofaclassifiercorrespondingtoaDNNvanishatseveral pointsbecauseseverallayersarepiece-wiselinear. 3. Now suppose in the process of SGD we are at iteration t with model parameters w ,andweneedtocompute∂(cid:96) /∂w| .Bychainruleweneedtocompute t VAT w=wt ∂r∗/∂w| . However the authors find that such gradients are volatile, so they w=wt insteadfixr∗asaconstantatthepointθ ,andcompute t (cid:12) ∂KL(s(Fw)(x)[y],s(Fw)(x+r)[y])(cid:12) (cid:12) ∂w (cid:12) w=wt 3.3 BlackBoxAttacks Recallthatearlierattacks(e.g.FGSMandJSMA)neededwhite-boxaccesstotheclas- sifierF (essentiallybecausetheseattacksrequirefirstorderinformationabouttheclas- sifier). In this section, we present black-box attacks. In this case, an attacker can only askforthelabelsF(x)forcertaindatapoints.Ourpresentationisbasedon[37],butis moregeneral. LetA(w,x,t)betheattackalgorithm,whereitsinputsare:w ∈ H isthecurrent hypothesis,xisthedatapoint,andt ∈ C isthetargetlabel.TheoutputofA(w,x,t) is a perturbation δ such that F(x+δ) = t. If the attack algorithm is simply a mis- classification algorithm (e.g. FGSM or Deepfool) we will drop the last parameter t (recallthatinthiscasetheattackalgorithmreturnsaδsuchthatF(x+δ)(cid:54)=F(x)).An adversarialtrainingalgorithmR (w,x,t)isparameterizedbyanattackalgorithmA A andoutputsanewhypothesisw(cid:48) ∈H (thiswasdiscussedintheprevioussubsection). Initialization:WepickasubstituteclassifierGandaninitialseeddatasetS andtrain 0 G.Forsimplicity,wewillassumethatthesamplespaceZ = X ×Y andthehypoth- esis space H for G is same as that of F (the classifier under attack). However, this is not crucial to the algorithm. We will call G the substitute classifier and F the target classifier.LetS =S betheinitialdataset,whichwillbeupdatedasweiterate. 0 Iteration: Run the attack algorithm A(w,x,t) on G and obtain a δ. If F(x+δ) = t, thenstopwearedone.IfF(x+δ) = t(cid:48) butnotequaltot,weaugmentthedatasetS asfollows: S =S∪(x+δ,t(cid:48)) WenowretrainGonthisnewdataset,whichessentiallymeansrunningtheSGDonthe newdatapoint(x+δ,t(cid:48)).NoticethatwecanalsouseadversarialtrainingR (w,x,t) A toupdateG(toourknowledgethishasbeennottriedoutintheliterature). 3.4 Defenses Defenseswithformalguaranteesagainsttest-timeattackshaveprovenelusive.Forex- ample,CarliniandWagner[7]havearecentpaperthatbreakstenrecentdefensepropos- als. However, defenses that are based on robust-optimization objectives have demon- strated promise [33,27,44]. Several techniques for verifying properties of a DNN (in isolation)haveappearedrecently(e.g.,[24,20,13,14]).Duetospacelimitationswewill notgiveadetailedaccountofallthesedefenses.

Description:
Semantic adversarial learning explores a space of semantic Computational and Biological Learning Society, 2015. 18. Geoffrey Hinton, Li Deng,
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.