Semantic Adversarial Deep Learning TommasoDreossi1,SomeshJha2,andSanjitA.Seshia1 UniversityofCaliforniaatBerkeley,Berkeley1 UniversityofWisconsin,Madison2 [email protected], [email protected], [email protected] Abstract. Fueled by massive amounts of data, models produced by machine- learning(ML)algorithms,especiallydeepneuralnetworks,arebeingusedindi- versedomainswheretrustworthinessisaconcern,includingautomotivesystems, finance,healthcare,naturallanguageprocessing,andmalwaredetection.Ofpar- ticular concern is the use of ML algorithms in cyber-physical systems (CPS), suchasself-drivingcarsandaviation,whereanadversarycancauseseriouscon- sequences. However, existing approaches to generating adversarial examples and devising robustMLalgorithmsmostlyignorethesemanticsandcontextoftheoverallsys- temcontainingtheMLcomponent.Forexample,inanautonomousvehicleusing deep learning for perception, not every adversarial example for the neural net- workmightleadtoaharmfulconsequence.Moreover,onemaywanttoprioritize the search for adversarial examples towards those that significantly modify the desiredsemanticsoftheoverallsystem.Alongthesamelines,existingalgorithms forconstructingrobustMLalgorithmsignorethespecificationoftheoverallsys- tem. In this paper, we argue that the semantics and specification of the overall systemhasacrucialroletoplayinthislineofresearch.Wepresentpreliminary researchresultsthatsupportthisclaim. 1 Introduction Machine learning (ML) algorithms, fueled by massive amounts of data, are increas- ingly being utilized in several domains, including healthcare, finance, and transporta- tion. Models produced by ML algorithms, especially deep neural networks (DNNs), arebeingdeployedindomainswheretrustworthinessisabigconcern,suchasautomo- tive systems [36], finance [26], health care [2], computer vision [29], speech recogni- tion[18],naturallanguageprocessing[39],andcyber-security[9,43].Ofparticularcon- cernistheuseofML(includingdeeplearning)incyber-physicalsystems(CPS)[30], wherethepresenceofanadversarycancauseseriousconsequences.Forexample,much ofthetechnologybehindautonomousanddriver-lessvehicledevelopmentis“powered” bymachinelearning[5,15,4].DNNshavealsobeenusedinairbornecollisionavoidance systemsforunmannedaircraft(ACASXu)[23].However,indesigninganddeploying thesealgorithmsincriticalcyber-physicalsystems,thepresenceofanactiveadversary isoftenignored. Adversarialmachinelearning(AML)isafieldconcernedwiththeanalysisofML algorithmstoadversarialattacks,andtheuseofsuchanalysisinmakingMLalgorithms robust to attacks. It is part of the broader agenda for safe and verified ML-based sys- tems [40,42]. In this paper, we first give a brief survey of the field of AML, with a particularfocusondeeplearning.Wefocusmainlyonattacksonoutputsormodelsthat are produced by ML algorithms that occur after training or “external attacks”, which are especially relevant to cyber-physical systems (e.g., for a driverless car the ML al- gorithmusedfornavigationhasbeenalreadytrainedbythemanufactureroncethe“car is on the road”). These attacks are more realistic and are distinct from other type of attacksonMLmodels,suchasattacksthatpoisonthetrainingdata(seethepaper[19] forasurveyofsuchattacks).Wesurveyattackscausedbyadversarialexamples,which areinputscraftedbyaddingsmall,oftenimperceptible,perturbationstoforceatrained MLmodeltomisclassify. We contend that the work on adversarial ML, while important and useful, is not enough. In particular, we advocate for the increased use of semantics in adversarial analysisanddesignofMLalgorithms.Semanticadversariallearningexploresaspace ofsemanticmodificationstothedata,usessystem-levelsemanticspecificationsinthe analysis,utilizessemanticadversarialexamplesintraining,andproducesnotjustoutput labelsbutalsoadditionalsemanticinformation.Focusingondeeplearning,weexplore theseideasandprovideinitialexperimentaldatatosupportthem. Roadmap. Section 2 provides the relevant background. A brief survey of adversarial analysisisgiveninSection3.Ourproposalforsemanticadversariallearningisgiven inSection4. 2 Background BackgroundonMachineLearning Nextwedescribesomegeneralconceptsinma- chinelearning(ML).Wewillconsiderthesupervisedlearningsetting.Considerasam- plespaceZoftheformX×Y,andanorderedtrainingsetS = ((x ,y ))m (x isthe i i i=1 i dataandy isthecorrespondinglabel).LetH beahypothesisspace(e.g.,weightscor- i respondingtoalogistic-regressionmodel).Thereisalossfunction(cid:96):H ×Z (cid:55)→Rso thatgivenahypothesisw ∈ H andasample(x,y) ∈ Z,weobtainaloss(cid:96)(w,(x,y)). WeconsiderthecasewherewewanttominimizethelossoverthetrainingsetS, m 1 (cid:88) L (w)= (cid:96)(w,(x ,y )) + λR(w). S m i i i=1 Intheequationgivenabove,λ>0andthetermR(w)iscalledtheregularizeranden- forces“simplicity”inw.SinceS isfixed,wesometimesdenote(cid:96) (w)=(cid:96)(w,(x ,y )) i i i asafunctiononlyofw.WewishtofindawthatminimizesL (w)orwewishtosolve s thefollowingoptimizationproblem: minL (w) S w∈H Example: We will consider the example of logistic regression. In this case X = Rn, Y = {+1,−1},H = Rn,andthelossfunction(cid:96)(w,(x,y))isasfollows(·represents thedotproductoftwovectors): (cid:16) (cid:17) log 1+e−y(wT·x) IfweusetheL regularizer(i.e.R(w)=(cid:107)w (cid:107) ),thenL (w)becomes: 2 2 S m 1 (cid:88)log(cid:16)1+e−yi(wT·xi)(cid:17) + λ(cid:107)w (cid:107) m 2 i=1 StochasticGradientDescent.StochasticGradientDescent(SGD)isapopularmethod for solving optimization tasks (such as the optimization problem min L (w) we w∈H S consideredbefore).Inanutshell,SGDperformsaseriesofupdateswhereeachupdateis agradientdescentupdatewithrespecttoasmallsetofpointssampledfromthetraining set. Specifically, suppose that we perform SGD T times. There are two typical forms of SGD: in the first form, which we call Sample-SGD, we uniformly and randomly sample i ∼ [m] at time t, and perform a gradient descent based on the i -th sample t t (x ,y ): it it w =G (w )=w −η (cid:96)(cid:48) (w ) (1) t+1 (cid:96)t,ηt t t t it t where w is the hypothesis at time t, η is a parameter called the learning rate, and t t (cid:96)(cid:48) (w )denotesthederivativeof(cid:96) (w)evaluatedatw .WewilldenoteG asG . it t it t (cid:96)t,ηt t Inthesecondform,whichwecallPerm-SGD,wefirstperformarandompermutation ofS,andthenapplyEquation1T timesbycyclingthroughSaccordingtotheorderof thepermutation.TheprocessofSGDcanbesummarizedasadiagram: w −G→1 w −G→2 ···−G→t w G−t→+1···−G→T w 0 1 t T Classifiers.Theoutputofthelearningalgorithmgivesusaclassifier,whichisafunc- tion from (cid:60)n to C, where (cid:60) denotes the set of reals and C is the set of class labels. Toemphasizethataclassifierdependsonahypothesisw ∈ H,whichistheoutputof thelearningalgorithmdescribedearlier,wewillwriteitasF (ifw isclearfromthe w context,wewillsometimessimplywriteF).Forexample,aftertraininginthecaseof logisticregressionweobtainafunctionfrom(cid:60)nto{−1,+1}. SomeclassifiersF (x)(vectorswillbedenotedinboldface)areoftheform w argmax s(F )(x)[l](i.e.,theclassifierF outputsthelabelwiththemaximumprob- l w w abilityaccordingtothe“softmaxlayer”).Forexample,inseveraldeep-neuralnetwork (DNN)architecturesthelastlayeristhesoftmaxlayer.Recallthatthesoftmaxfunction fromRk toaprobabilitydistributionover{1,··· ,k} = [k]suchthattheprobability ofj ∈[k]foravectorx∈Rk is ex[j] (cid:80)k ex[r] r=1 Ther-thcomponentofavectorxisdenotedbyx[r].Weareassumingthatthereader isafamiliarwithbasicsofdeep-neuralnetworks(DNNs).Forreadersnotfamiliarwith DNNswecanrefertotheexcellentbookbyGoodfellow,Bengio,andCourville[16]. Throughoutthepaper,werefertothefunctions(F )asthesoftmaxlayercorresponding w totheclassifierF .Inthecaseoflogisticregression,s(F )(x)isthefollowingtuple w w (thefirstelementistheprobabilityof−1andthesecondoneistheprobabilityof+1): 1 1 (cid:104) , (cid:105) 1+ewT·x 1+e−wT·x Formally, let c = |C| and F be a classifier, we let s(F ) be the function that maps w w Rn to Rc such that (cid:107)s(F )(x)(cid:107) = 1 for any x (i.e., s(F ) computes a probability + w 1 w vector).Wedenotes(F )(x)[l]tobetheprobabilityofs(F )(x)atlabell. w w Background on Logic Temporal logics are commonly used for specifying desired and undesired properties of systems. For cyber-physical systems, it is common to use temporallogicsthatcanspecifypropertiesofreal-valuedsignalsoverrealtime,suchas signaltemporallogic(STL)[31]ormetrictemporallogic(MTL)[28]. Asignalisafunctions : D → S,withD ⊆ R anintervalandeitherS ⊆ Bor ≥0 S ⊆ R, where B = {(cid:62),⊥} and R is the set of reals. Signals defined on B are called booleans,whilethoseonRaresaidreal-valued.Atracew ={s ,...,s }isafiniteset 1 n ofreal-valuedsignalsdefinedoverthesameintervalD.Weusevariablesx todenote i thevalueofareal-valuedsignalataparticulartimeinstant. Let Σ = {σ ,...,σ } be a finite set of predicates σ : Rn → B, with σ ≡ 1 k i i p (x ,...,x ) (cid:67) 0, (cid:67) ∈ {<,≤}, and p : Rn → R a function in the variables i 1 n i x ,...,x .AnSTLformulaisdefinedbythefollowinggrammar: 1 n ϕ:=σ|¬ϕ|ϕ∧ϕ|ϕU ϕ (2) I where σ ∈ Σ is a predicate and I ⊂ R is a closed non-singular interval. Other ≥0 commontemporaloperatorscanbedefinedassyntacticabbreviationsintheusualway, likeforinstanceϕ ∨ϕ :=¬(¬ϕ ∧ϕ ),F ϕ:=(cid:62)U ϕ,orG ϕ :=¬F ¬ϕ.Given 1 2 1 2 I I I I at ∈ R ,ashiftedintervalI isdefinedast+I = {t+t(cid:48) | t(cid:48) ∈ I}.Thequalitative ≥0 (orBoolean)semanticsofSTLisgivenintheusualway: Definition1 (Qualitative semantics). Let w be a trace, t ∈ R , and ϕ be an STL ≥0 formula.Thequalitativesemanticsofϕisinductivelydefinedasfollows: w,t|=σiffσ(w(t))istrue w,t|=¬ϕiffw,t(cid:54)|=ϕ (3) w,t|=ϕ ∧ϕ iffw,t|=ϕ andw,t|=ϕ 1 2 1 2 w,t|=ϕ U ϕ iff∃t(cid:48) ∈t+I s.t.w,t(cid:48) |=ϕ and∀t(cid:48)(cid:48) ∈[t,t(cid:48)],w,t(cid:48)(cid:48) |=ϕ 1 I 2 2 1 A trace w satisfies a formula ϕ if and only if w,0 |= ϕ, in short w |= ϕ. STL also admits a quantitative or robust semantics, which we omit for brevity. This pro- videsquantitativeinformationontheformula,tellinghowstronglythespecificationis satisfiedorviolatedforagiventrace. 3 Attacks ThereareseveraltypesofattacksonMLalgorithms.Forexcellentmaterialonvarious attacks on ML algorithms we refer the reader to [19,3]. For example, in training time attacks an adversary wishes to poison a data set so that a “bad” hypothesis is learned byanML-algorithm.ThisattackcanbemodeledasagamebetweenthealgorithmML andanadversaryAasfollows: – MLpicksanorderedtrainingsetS = ((x ,y ))m . i i i=1 – ApicksanorderedtrainingsetS(cid:98) = ((xˆi,yˆi))ri=1,whereris(cid:98)(cid:15)m(cid:99). – MLlearnsonS∪S(cid:98)byessentiallyminimizing minL (w). w∈H S∪S(cid:98) TheattackerwantstomaximizetheabovequantityandthuschoosesS(cid:98)suchthat min L (w)ismaximized.Forarecentpaperoncertifieddefensesforsuchat- w∈H S∪S(cid:98) tackswereferthereaderto[45].Inmodelextractionattacksanadversarywithblack- boxaccesstoaclassifier,butnopriorknowledgeoftheparametersofaMLalgorithmor trainingdata,aimstoduplicatethefunctionalityof(i.e.,steal)theclassifierbyquerying itonwellchosendatapoints.Foranexample,model-extractionattackssee[46]. In this paper, we consider test-time attacks. We assume that the classifier F has w been trained without any interference from the attacker (i.e. no training time attacks). Roughly speaking, an attacker has an image x (e.g. an image of stop sign) and wants tocraftaperturbationδsothatthelabelofx+δiswhattheattackerdesires(e.g.yield sign).Thenextsub-sectiondescribestest-timeattacksindetail.Wewillsometimesrefer to F as simply F, but the hypothesis w is lurking in the background (i.e., whenever w werefertow,itcorrespondstotheclassifierF). 3.1 Test-timeAttacks The adversarial goal is to take any input vector x ∈ (cid:60)n and produce a minimally alteredversionofx,adversarialsampledenotedbyx(cid:63),thathasthepropertyofbeing misclassified by a classifier F : (cid:60)n → C. Formally speaking, an adversary wishes to solvethefollowingoptimizationproblem: min µ(δ) δ∈(cid:60)n suchthat F(x+δ)∈T δ·M = 0 The various terms in the formulation are µ is a metric on (cid:60)n, T ⊆ C is a subset of the labels (the reader should think of T as the target labels for the attacker), and M(calledthemask)isan-dimensional0−1vectorofsizen.Theobjectivefunction minimizesthemetricµontheperturbationδ.Nextwedescribevariousconstraintsin theformulation. – F(x+δ)∈T ThesetT constrainstheperturbedvectorx+δ1tohavethelabel(accordingtoF) inthesetT.Formis-classificationproblems(thelabelofxandx+δ)aredifferent we have T = C−{F(x)}. For targeted mis-classification we have T = {t} (for t ∈ C), where t is the target that an attacker wants (e.g., the attacker wants t to correspondtoayieldsign). 1Thevectorsareaddedcomponentwise – δ·M = 0 The vector M can be considered as a mask (i.e., an attacker can only perturb a dimensioniifM[i] = 0),i.e.,ifM[i] = 1thenδ[i]isforcedtobe0.Essentially the attacker can only perturb dimension i if the i-th component of M is 0, which meansthatδliesink-dimensionalspacewherekisthenumberofnon-zeroentries in∆.Thisconstraintisimportantifanattackerwantstotargetacertainareaofthe image(e.g.,glassesofinapictureofperson)toperturb. – Convexity Notice that even if the metric µ is convex (e.g., µ is the L norm), because of 2 theconstraintinvolvingF,theoptimizationproblemisnotconvex(theconstraint δ·M = 0isconvex).Ingeneral,solvingconvexoptimizationproblemsismore tractablenon-convexoptimization[35]. Notethattheconstraintδ·M = 0essentiallyconstrainsthevectortobeinalower- dimensional space and does add additional complexity to the optimization problem. Therefore, for the rest of the section we will ignore that constraint and work with the followingformulation: min µ(δ) δ∈(cid:60)n suchthat F(x+δ)∈T FGSM mis-classification attack - This algorithm is also known as the fast gradient signmethod(FGSM)[17].Theadversarycraftsanadversarialsamplex(cid:63) =x+δfora givenlegitimatesamplexbycomputingthefollowingperturbation: δ =εsign(∇ L (x)) (4) x F ThefunctionL (x)isashorthandfor(cid:96)(w,x,l(x)),wherew isthehypothesiscorre- F spondingtotheclassifierF,xisthedatapointandl(x)isthelabelofx(essentiallywe evaluatethelossfunctionatthehypothesiscorrespondingtotheclassifier).Thegradi- entofthefunctionL iscomputedwithrespecttoxusingsamplexandlabely =l(x) F as inputs. Note that ∇ L (x)) is an n-dimensional vector and sign(∇ L (x)) is a x F x F n-dimensional vector whose i-th element is the sign of the ∇ L (x))[i]. The value x F oftheinputvariationparameterεfactoringthesignmatrixcontrolstheperturbation’s amplitude.Increasingitsvalueincreasesthelikelihoodofx(cid:63)beingmisclassifiedbythe classifierF butonthecontrarymakesadversarialsampleseasiertodetectbyhumans. ThekeyideaisthatFGSMtakesastepinthedirectionofthegradientofthelossfunc- tionandthustriestomaximizeit.RecallthatSGDtakesastepinthedirectionthatis opposite to the gradient of the loss function because it is trying to minimize the loss function. JSMAtargettedmis-classificationattack-Thisalgorithmissuitablefortargetedmis- classification [38]. We refer to this attack as JSMA throughout the rest of the paper. To craft the perturbation δ, components are sorted by decreasing adversarial saliency value.TheadversarialsaliencyvalueS(x,t)[i]ofcomponentiforanadversarialtarget classtisdefinedas: (cid:40)0if ∂s(F)[t](x) <0or (cid:80) ∂s(F)[j](x) >0 ∂x[i] j(cid:54)=t ∂x[i] S(x,t)[i]= (cid:12) (cid:12) (5) ∂s(F)[t](x)(cid:12)(cid:80) ∂s(F)[j](x)(cid:12) otherwise ∂x[i] (cid:12) j(cid:54)=t ∂x[i] (cid:12) (cid:104) (cid:105) wherematrixJ = ∂s(F)[j](x) istheJacobianmatrixfortheoutputofthesoftmax F ∂x[i] ij (cid:80) layers(F)(x).Since s(F)[k](x) = 1,wehavethefollowingequation: k∈C ∂s(F)[t](x) (cid:88)∂s(F)[j](x) =− ∂x[i] ∂x[i] j(cid:54)=t The first case corresponds to the scenario if changing the i-th component of x takes us further away from the target label t. Intuitively, S(x,t)[i] indicates how likely is changing the i-th component of x going to “move towards” the target label t. Input components i are added to perturbation δ in order of decreasing adversarial saliency value S(x,t)[i] until the resulting adversarial sample x(cid:63) = x+δ achieves the target labelt.Theperturbationintroducedforeachselectedinputcomponentcanvary.Greater individual variations tend to reduce the number of components perturbed to achieve misclassification. CW targetted mis-classification attack. The CW-attack [6] is widely believed to be oneofthemost“powerful”attacks.ThereasonisthatCWcasttheirproblemasanun- constrainedoptimizationproblem,andthenusestate-of-theartsolver(i.e.Adam[25]). Inotherwords,theyleveragetheadvancesinoptimizationforthepurposesofgenerat- ingadversarialexamples. IntheirpaperCarlini-Wagnerconsiderawidevarietyofformulations,butwepresent theonethatperformsbestaccordingtotheirevaluation.Theoptimizationproblemcor- respondingtoCWisasfollows: min µ(δ) δ∈(cid:60)n suchthat F(x+δ) = t CWuseanexistingsolver(Adam[25])andthusneedtomakesurethateachcomponent ofx+δisbetween0and1(i.e.validpixelvalues).Notethattheothermethodsdidnot face this issue because they control the “internals” of the algorithm (i.e., CW used a solverina“blackbox”manner).Weintroduceanewvectorwwhosei-thcomponent isdefinedaccordingtothefollowingequation: 1 δ[i]= (tanh(w[i])+1)−x[i] 2 Since−1 ≤ tanh(w[i]) ≤ 1,itfollowsthat0 ≤ x[i]+δ[i] ≤ 1.Intermsofthisnew variabletheoptimizationproblembecomes: min µ(1(tanh(w)+1)−x) w∈(cid:60)n 2 suchthat F(1(tanh(w)+1)) = t 2 Nexttheyapproximatetheconstraint(F(x) = t)withthefollowingfunction: (cid:18) (cid:19) g(x)=max maxZ(F)(x)[i] − Z(F)(x)[t],−κ i(cid:54)=t In the equation given above Z(F) is the input of the DNN to the softmax layer (i.e s(F)(x) = softmax(Z(F)(x)))andκisaconfidenceparameter(higherκencourages thesolvertofindadversarialexampleswithhigherconfidence).Thenewoptimization formulationisasfollows: min µ(1(tanh(w)+1)−x) w∈(cid:60)n 2 suchthat g(1(tanh(w)+1)) ≤ 0 2 Nextweincorporatetheconstraintintotheobjectivefunctionasfollows: min µ(1(tanh(w)+1)−x) + cg(1(tanh(w)+1)) w∈(cid:60)n 2 2 Intheobjectivegivenabove,the“lagrangianvariable”c > 0isasuitablychosencon- stant (from the optimization literature we know that there exists c > 0 such that the optimalsolutionsofthelasttwoformulationsarethesame). 3.2 AdversarialTraining Onceanattackerfindsanadversarialexample,thenthealgorithmcanberetrainedus- ing this example. Researchers have found that retraining the model with adversarial examples, produces a more robust model. For this section, we will work with attack algorithmsthathaveatargetlabelt(i.e.weareinthetargetedmis-classificationcase, suchasJSMAorCW).LetA(w,x,t)betheattackalgorithm,whereitsinputsareas follows: w ∈ H is the current hypothesis, x is the data point, and t ∈ C is the target label.TheoutputofA(w,x,t)isaperturbationδsuchthatF(x+δ)=t.Iftheattack algorithm is simply a mis-classification algorithm (e.g. FGSM or Deepfool) we will dropthelastparametert. AnadversarialtrainingalgorithmR (w,x,t)isparameterizedbyanattackalgo- A rithm A and outputs a new hypothesis w(cid:48) ∈ H. Adversarial training works by taking a datapoint x and an attack algorithm A(w,x,t) as its input and then retraiining the modelusingaspeciallydesignedlossfunction(essentiallyoneperformsasinglestep oftheSGDusingthenewlossfunction).Thequestionarises:whatlossfunctiontouse duringthetraining?Differentmethodsusedifferentlossfunctions. Next,wediscusssomeadversarialtrainingalgorithmsproposedintheliterature.At ahighlevel,animportantpointisthatthemoresophisticatedanadversarialperturbation algorithmis,harderitistoturnitintoadversarialtraining.Thereasonisthatitishardto “encode”theadversarialperturbationalgorithmasanobjectivefunctionandoptimize it.Wewillseethisbelow,especiallyforthevirtualadversarialtraining(VAT)proposed byMiyatoetal.[34]. Retraining for FGSM. We discussed the FGSM attack method earlier. In this case A = FGSM.ThelossfunctionusedbytheretrainingalgorithmR (w,x,t)as FGSM follows: (cid:96) (w,x ,y )=(cid:96)(w,x ,y )+λ(cid:96)(w,x +FGSM(w,x ),y ) FGSM i i i i i i i RecallthatFGSM(w,x)wasdefinedearlier,andλisaregularizationparameter.The simplicity of FGSM(w,x ) allows taking its gradient, but this objective function re- i quireslabely becausewearereusingthesamelossfunction(cid:96)usedtotraintheoriginal i model.Further,FGSM(w,x )maynotbeverygoodbecauseitmaynotproducegood i adversarial perturbation direction. The retraining algorithm is simply as follows: take onestepintheSGDusingthelossfunction(cid:96) atthedatapointx . FGSM i AcaveatisneededfortakinggradientduringtheSGDstep.Atiterationtsuppose wehavemodelparametersw ,andweneedtocomputegradientoftheobjective.Note t thatFGSM(w,x)dependsonwsobychainruleweneedtocompute∂FGSM(w,x)/∂w| . w=wt However,thisgradientisvolatile2,andsoinsteadGoodfellowetal.onlycompute: (cid:12) ∂(cid:96)(w,xi+FGSM(wt,xi),yi)(cid:12) (cid:12) ∂w (cid:12) w=wt EssentiallytheytreatFGSM(w ,x )asaconstantwhiletakingthederivative. t i Virtual Adversarial Training (VAT). Miyato et al. [34] observed the drawback of requiring label y above. They propose that, instead of reusing (cid:96), to use the following i fortheregularizer, ∆(r,x,w)=KL(s(F )(x)[y],s(F )(x+r)[y]) w w for some r such that (cid:107)r(cid:107) ≤ δ. As a result, the label y is no longer required. The i questionis:whatr touse?Miyatoetal.[34]proposethatintheoryweshouldusethe “best”oneas max KL(s(F )(x)[y],s(F )(x+r)[y]) w w r:(cid:107)r(cid:107)≤δ IntheequationgivenaboveKListheKullbackLeiblerdivergence.RecallthatKLdi- vergenceoftwodistributionsP andQoverthesamefinitedomainD isgivenbythe followingequation: (cid:18) (cid:19) (cid:88) P(i) KL(P,Q)= P(i)log Q(i) i∈D Thisthusgivesrisetothefollowinglossfunctiontouseduringretraining: (cid:96) (w,x ,y )=(cid:96)(w,,x ,y )+λ max ∆(r,x ,w) VAT i i i i i r:(cid:107)r(cid:107)≤δ However,onecannoteasilytakegradientfortheregularizer.Hencetheauthorsperform anapproximationasfollows: 1. Take Taylor expansion of ∆(r,x ,w) at r = 0, so ∆(r,x ,w) = rTH(x ,w) r i i i whereH(x ,w)istheHessianmatrixof∆(r,x ,w)withrespecttoratr =0. i i 2. Thus max ∆(r,x ,w) = max (cid:0)rTH(x ,w)r(cid:1). By variational char- (cid:107)r(cid:107)≤δ i (cid:107)r(cid:107)≤δ i acterization of the symmetric matrix (H(x ,w) is symmetric), r∗ = δv¯ where i v¯=v(x ,w)istheuniteigenvectorofH(x ,w)correspondingtoitslargesteigen- i i value.Notethatr∗dependsonx andw.Thereforethelossfunctionbecomes: i (cid:96) (θ,x ,y )=(cid:96)(θ,x ,y )+λ∆(r∗,x ,w) VAT i i i i i 2Ingeneral,second-orderderivativesofaclassifiercorrespondingtoaDNNvanishatseveral pointsbecauseseverallayersarepiece-wiselinear. 3. Now suppose in the process of SGD we are at iteration t with model parameters w ,andweneedtocompute∂(cid:96) /∂w| .Bychainruleweneedtocompute t VAT w=wt ∂r∗/∂w| . However the authors find that such gradients are volatile, so they w=wt insteadfixr∗asaconstantatthepointθ ,andcompute t (cid:12) ∂KL(s(Fw)(x)[y],s(Fw)(x+r)[y])(cid:12) (cid:12) ∂w (cid:12) w=wt 3.3 BlackBoxAttacks Recallthatearlierattacks(e.g.FGSMandJSMA)neededwhite-boxaccesstotheclas- sifierF (essentiallybecausetheseattacksrequirefirstorderinformationabouttheclas- sifier). In this section, we present black-box attacks. In this case, an attacker can only askforthelabelsF(x)forcertaindatapoints.Ourpresentationisbasedon[37],butis moregeneral. LetA(w,x,t)betheattackalgorithm,whereitsinputsare:w ∈ H isthecurrent hypothesis,xisthedatapoint,andt ∈ C isthetargetlabel.TheoutputofA(w,x,t) is a perturbation δ such that F(x+δ) = t. If the attack algorithm is simply a mis- classification algorithm (e.g. FGSM or Deepfool) we will drop the last parameter t (recallthatinthiscasetheattackalgorithmreturnsaδsuchthatF(x+δ)(cid:54)=F(x)).An adversarialtrainingalgorithmR (w,x,t)isparameterizedbyanattackalgorithmA A andoutputsanewhypothesisw(cid:48) ∈H (thiswasdiscussedintheprevioussubsection). Initialization:WepickasubstituteclassifierGandaninitialseeddatasetS andtrain 0 G.Forsimplicity,wewillassumethatthesamplespaceZ = X ×Y andthehypoth- esis space H for G is same as that of F (the classifier under attack). However, this is not crucial to the algorithm. We will call G the substitute classifier and F the target classifier.LetS =S betheinitialdataset,whichwillbeupdatedasweiterate. 0 Iteration: Run the attack algorithm A(w,x,t) on G and obtain a δ. If F(x+δ) = t, thenstopwearedone.IfF(x+δ) = t(cid:48) butnotequaltot,weaugmentthedatasetS asfollows: S =S∪(x+δ,t(cid:48)) WenowretrainGonthisnewdataset,whichessentiallymeansrunningtheSGDonthe newdatapoint(x+δ,t(cid:48)).NoticethatwecanalsouseadversarialtrainingR (w,x,t) A toupdateG(toourknowledgethishasbeennottriedoutintheliterature). 3.4 Defenses Defenseswithformalguaranteesagainsttest-timeattackshaveprovenelusive.Forex- ample,CarliniandWagner[7]havearecentpaperthatbreakstenrecentdefensepropos- als. However, defenses that are based on robust-optimization objectives have demon- strated promise [33,27,44]. Several techniques for verifying properties of a DNN (in isolation)haveappearedrecently(e.g.,[24,20,13,14]).Duetospacelimitationswewill notgiveadetailedaccountofallthesedefenses.
Description: