Workshoptrack-ICLR2017 A THEORETICAL FRAMEWORK FOR ROBUSTNESS OF (DEEP) CLASSIFIERS AGAINST ADVERSARIAL EXAMPLES BeilunWang,JiGao,YanjunQi DepartmentofComputerScience UniversityofVirginia Charlottesville,VA22901,USA {bw4mw,jg6yd,yanjun}@virginia.edu ABSTRACT Mostmachinelearningclassifiers,includingdeepneuralnetworks,arevulnerable to adversarial examples. Such inputs are typically generated by adding small butpurposefulmodificationsthatleadtoincorrectoutputswhileimperceptibleto human eyes. The goal of this paper is not to introduce a single method, but to maketheoreticalstepstowardsfullyunderstandingadversarialexamples. Byusing conceptsfromtopology,ourtheoreticalanalysisbringsforththekeyreasonswhy anadversarialexamplecanfoolaclassifier(f )andaddsitsoracle(f ,likehuman 1 2 eyes)insuchanalysis. Byinvestigatingthetopologicalrelationshipbetweentwo (pseudo)metricspacescorrespondingtopredictorf andoraclef , wedevelop 1 2 necessaryandsufficientconditionsthatcandetermineiff isalwaysrobust(strong- 1 robust)againstadversarialexamplesaccordingtof . Interestinglyourtheorems 2 indicatethatjustoneunnecessaryfeaturecanmakef notstrong-robust,andthe 1 rightfeaturerepresentationlearningisthekeytogettingaclassifierthatisboth accurateandstrongrobust. 1 INTRODUCTION DeepNeuralNetworks(DNNs)canefficientlylearnhighlyaccuratemodelsandhavebeendemon- stratedtoperformexceptionallywell(Krizhevskyetal.,2012;Hannunetal.,2014). However,recent studiesshowthatintelligentattackerscanforcemanymachinelearningmodels,includingDNNs,to misclassifyexamplesbyaddingsmallandhardlyvisiblemodificationsonaregulartestsample. Themaliciouslygeneratedinputsarecalled“adversarialexamples”(Goodfellowetal.,2014;Szegedy et al., 2013) and are commonly crafted by carefully searching small perturbations through an optimizationprocedure. Severalrecentstudiesproposedalgorithmsforsolvingsuchoptimizationto foolDNNclassifiers. (Szegedyetal.,2013)firstlyobservethatconvolutionDNNsarevulnerable to small artificial perturbations. They use box-constrained Limited-memory BFGS (L-BFGS) to createadversarialexamplesandfindthatadversarialperturbationsgeneratedfromoneDNNnetwork can also force other networks to produce wrong outputs. Then, (Goodfellow et al., 2014) try to clarifythattheprimarycauseofsuchvulnerabilitiesmaybethelinearnatureofDNNs. Theythen proposethefastgradientsignmethodforgeneratingadversarialexamplesquickly. Subsequentpapers (Fawzietal.,2015;Papernotetal.,2015a;Nguyenetal.,2015)haveexploredotherwaystoexplore adversarial examples for DNN (details in Section 2.1). The goal of this paper is to analyze the robustnessofmachinelearningmodelsinthefaceofadversarialexamples. Inresponsetoprogressingeneratingadversarialexamples,researchersattempttodesignstrategiesfor makingmachine-learningsystemsrobusttovariousnoise,intheworstcaseasadversarialexamples. Forinstance,denoisingNNarchitectures(Vincentetal.,2008;Gu&Rigazio,2014;Jinetal.,2015) candiscovermorerobustfeaturesbyusinganoise-corruptedversionofinputsastrainingsamples. Amodifieddistillationstrategy(Papernotetal.,2015b)isproposedtoimprovetherobustnessof DNNsagainstadversarialexamples,thoughithasbeenshowntobeunsuccessfulrecently(Carlini& Wagner,2016a). Themostgenerallysuccessfulstrategytodateisadversarialtraining(Goodfellow etal.,2014;Szegedyetal.,2013)whichinjectsadversarialexamplesintotrainingtoimprovethe generalizationofDNNmodels. Morerecenttechniquesincorporateasmoothnesspenalty(Miyato 1 Workshoptrack-ICLR2017 Table1: Alistofimportantnotationsusedinthepaper. f Alearnedmachinelearningclassifierf =c ◦g . 1 1 1 1 f Theoracleforthesametask(seeDefinition(2.1))f =c ◦g . 2 2 2 2 g Partoff includingoperationsthatprogressivelytransforminputintoanew i i formoflearnedrepresentationsinX . i c Partoff includingsimpledecisionfunctions(likelinear)forclassifying. i i X Inputspace(e.g.,{0,1,2,...,255}32×32×3forCIFAR-10data(Krizhevsky &Hinton,2009)). Y Outputspace(e.g.,{1,2,3,...,10}forCIFAR-10data(Krizhevsky&Hinton, 2009)). X Featurespacedefinedbythefeatureextractionmoduleg ofpredictorf . 1 1 1 X Featurespacedefinedbythefeatureextractionmoduleg oforaclef . 2 2 2 d (·,·) ThemetricfunctionformeasuringsampledistancesinfeaturespaceX with 1 1 respecttopredictorf . 1 d (·,·) ThemetricfunctionformeasuringsampledistanceinfeaturespaceX with 2 2 respecttooraclef . 2 d(cid:48)(·,·) The Pseudometric function with respect to predictor f , d(cid:48)(x,x(cid:48)) = 1 1 1 d (g (x),g (x(cid:48))). 1 1 1 d(cid:48)(·,·) The Pseudometric function with respect to oracle f , d(cid:48)(x,x(cid:48)) = 2 2 2 d (g (x),g (x(cid:48))). 2 2 2 a.e. almosteverywhere(Folland,2013);(definedbyDefinition(9.2)in Section9.1) (cid:15),δ ,δ ,δ,η smallpositiveconstants 1 2 etal.,2016;Zhengetal.,2016)oralayer-wisepenalty(Carlini&Wagner,2016b)asaregularization terminthelossfunctiontopromotethesmoothnessoftheDNNmodeldistributions. Recentstudies(reviewedby(Papernotetal.,2016b))aremostlyempiricalandprovidelittleunder- standingofwhyanadversarycanfoolmachinelearningmodelswithadversarialexamples. Several importantquestionshavenotbeenansweredyet: • Whatmakesaclassifieralwaysrobusttoadversarialexamples? • Whichpartsofaclassifierinfluenceitsrobustnessagainstadversarialexamplesmore,compared withtherest? • Whatistherelationshipbetweenaclassifier’sgeneralizationaccuracyanditsrobustnessagainst adversarialexamples? • Why(many)DNNclassifiersarenotrobustagainstadversarialexamples? Howtoimprove? Thispapertriestoanswerabovequestionsandmakesthefollowingcontributions: • Section2pointsoutthatpreviousdefinitionsofadversarialexamplesforaclassifier(f )have 1 overlookedtheimportanceofanoraclefunction(f )ofthesametask. 2 • Section3formallydefineswhenaclassifierf isalwaysrobust("strong-robust")againstadversarial 1 examples. Itprovesfourtheoremsaboutsufficientandnecessaryconditionsthatmakef always 1 robustagainstadversarialexamplesaccordingtof . Ourtheoremsleadtoanumberofinteresting 2 insights,likethatthefeaturerepresentationlearningcontrolsifaDNNisstrong-robustornot. • Section12isdedicatedtoprovidepracticalandtheoreticallygroundeddirectionsforunderstanding andhardeningDNNmodelsagainstadversarialexamples. Table1providesalistofimportantnotationsweuseinthepaper. 2 DEFINE ADVERSARIAL EXAMPLES Thissectionprovidesageneraldefinitionofadversarialexamples,byincludingthenotionofan oracle. Foraparticularclassificationtask,alearnedclassifierisrepresentedasf :X →Y,where 1 X representstheinputsamplespaceandY istheoutputspacerepresentingacategoricalset. 2.1 PREVIOUSFORMULATIONS Various definitions of “adversarial examples” exist in the recent literature, with most following Eq.(2.1). SeemoredetailedreviewsinSection8. Thebasicideaistogenerateamisclassifiedsample 2 Workshoptrack-ICLR2017 Figure1: Exampleofamachine-learningclassifier(predictor)andahumanannotator(oracle)for classifyingimagesofhand-written“0”. Bothincludetwosteps: featureextractionandclassification. Theupperhalfisaboutthelearnedmachineclassifierf andthelowerhalfisabouttheoraclef . f 1 2 1 transformssamplesfromtheoriginalspaceX toanembeddedmetricspace(X ,d )usingitsfeature 1 1 extractionstep. Here,d isthesimilaritymeasureonthefeaturespaceX . Classificationmodels 1 1 likeDNNcoverthefeatureextractionstepinitsmodel, thoughmanyothermodelslikedecision treeneedhard-craftedordomain-specificfeatureextraction. Thenf canusealinearfunctionto 1 decidetheclassificationpredictiony ∈Y. Similarly,humanoraclef transformsdatasamplesfrom (cid:98) 2 theoriginalspaceX intoanembeddedmetricspace(X ,d )byitsfeatureextraction. Here,d is 2 2 2 thecorrespondingsimilaritymeasure. Thentheoraclegettheclassificationresulty ∈Y usingthe featurerepresentationofsamples(X ,d ). 2 2 x(cid:48)by“slightly”perturbingacorrectlyclassifiedsamplex,withanadversarialperturbation∆(x,x(cid:48)). Formally,whengivenx∈X Findx(cid:48) s.t. f (x)(cid:54)=f (x(cid:48)) (2.1) 1 1 ∆(x,x(cid:48))<(cid:15) Herex,x(cid:48) ∈X. ∆(x,x(cid:48))representsthedifferencebetweenxandx(cid:48),whichdependsonthespecific datatypethatxandx(cid:48)belongto1. Table2summarizesdifferentchoicesoff and∆(x,x(cid:48))usedin 1 therecentliterature,inwhichnormfunctionsontheoriginalspaceX aremostlyusedtocalculate ∆(x,x(cid:48)). MultiplealgorithmshavebeenimplementedtosolveEq.(2.1)asaconstrainedoptimization (summarized by the last column of Table 2). More details are included for three such studies in Section8.2. Whensearchingforadversarialexamples, oneimportantpropertyhasnotbeenfullycapturedby Eq.(2.1). Thatis,anadversarialexamplehasbeenmodifiedveryslightlyfromitsseedandthese modificationscanbesosubtlethat,forexampleinimageclassification,ahumanobserverdoesnot evennoticethemodificationatall. Wedefinetheroleof“humanobserver”moreformallyasfollows: Definition2.1. An“Oracle”representsadecisionprocessgeneratinggroundtruthlabelsforatask ofinterest. Eachoracleistask-specific,withfiniteknowledgeandnoise-free2. 1Forexample,inthecaseofstrings,∆(x,x(cid:48))representsthedifferencebetweentwostrings. 2Weleavealldetailedanalysisofwhenanoraclecontainsnoiseasfuturework. 3 Workshoptrack-ICLR2017 Figure2: Anexampleshowingthatf withoneunnecessaryfeature(accordingtof )isproneto 1 2 adversarialexamples. Theredcircledenotesanadversarialexample(e.g. generatedbysomeattack similarasJSMA(Papernotetal.,2015a)(detailsinSection8.2)). Eachadversarialexampleisvery closetoitsseedsampleintheoraclefeaturespace(accordingtod ),butitiscomparativelyfarfrom 2 itsseedsampleinthefeaturespace(accordingtod )ofthetrainedclassifierandisatthedifferent 1 sideofthedecisionboundaryoff . Essentially“adversarialexamples”canbeeasilyfoundforall 1 seedsamplesinthisFigure. Weonlydrawcasesfortwoseeds. Besides,foreachseedsample,we cangenerateaseriesof“adversarialexamples”(byvaryingattackingpower)aftertheattackingline crossesthedecisionboundaryoff . Weonlyshowonecaseofsuchanadversarialexampleforeach 1 seedsample. Table2: Summaryofthepreviousstudiesdefiningadversarialexamples. Previousstudies f ∆(x,x(cid:48)) Formulationoff (x)(cid:54)=f (x(cid:48)) 1 1 1 (Goodfellowetal.,2014) Convolutionalneuralnetworks (cid:96) argmaxLoss(f (x(cid:48)),f (x)) ∞ 1 1 x(cid:48) (Szegedyetal.,2013) Convolutionalneuralnetworks (cid:96) argminLoss(f (x(cid:48)),l),subjectto:l(cid:54)=f (x(cid:48)) 2 1 1 x(cid:48) (Biggioetal.,2013) Supportvectormachine(SVM) (cid:96) argminLoss(f (x(cid:48)),−1),subjectto:f (x)=1 2 1 1 x(cid:48) (Kantchelianetal.,2015) DecisiontreeandRandomforest (cid:96) , (cid:96) , argminLoss(f (x(cid:48)),−1),subjectto:f (x)=1 2 1 1 1 (cid:96) x(cid:48) ∞ (Papernotetal.,2016a) Convolutionalneuralnetworks (cid:96) argmaxLoss(f (x(cid:48)),f (x)) 0 1 1 x(cid:48) (Grosseetal.,2016) Convolutionalneuralnetworks (cid:96) argmaxLoss(f (x(cid:48)),f (x)) 0 1 1 x(cid:48) (Xuetal.,2016) RandomforestandSVM (cid:96) ,(cid:96) argminLoss(f (x(cid:48)),−1),subjectto:f (x)=1 1 ∞ 1 1 x(cid:48) The goal of machine learning is to train a learning-based predictor function f : X → Y to 1 approximateanoracleclassifierf :X →Y forthesameclassificationtask. Forexample,inimage 2 classificationtasks,theoraclef isoftenagroupofhumanannotators. Addingthenotationoforacle, 2 wereviseEq.(2.1)into: Findx(cid:48) s.t. f (x)(cid:54)=f (x(cid:48)) 1 1 (2.2) ∆ (x,x(cid:48))<(cid:15) 2 f (x)=f (x(cid:48)) 2 2 4 Workshoptrack-ICLR2017 2.2 MEASURINGSAMPLEDIFFERENCEINWHICHSPACE? MODELING&DECOMPOSINGf2 ∆ (x,x(cid:48)) < (cid:15)reflectsthatadversarialexamplesadd“smallmodifications”thatarealmostimper- 2 ceptibletooracleofthetask. Clearlycalculating∆ (x,x(cid:48))needstoaccordtooraclef . Formost 2 2 classification tasks, an oracle does not measure the sample difference in the original input space X. Wewanttoemphasizethatsampledifferenceiswithregardstoitsclassificationpurpose. For instance,whenlabelingimagesforthehand-writtendigitalrecognition,humanannotatorsdonot needtoconsiderthosebackgroundpixelstodecideifanimageis“0”ornot. Illustrated in Figure 1, we denote the feature space an oracle uses to consider difference among samples for the purpose of classification decision as X . The sample difference uses a distance 2 functiond inthisspace. Anoraclefunctionf :X →Y canbedecomposedasf =c ◦g where 2 2 2 2 2 g : X → X represents the operations for feature extraction from X to X and c : X → Y 2 2 2 2 2 denotes the simple operation of classification in X . Essentially g includes the operations that 2 2 (progressively)transforminputrepresentationsintoaninformativeformofrepresentationsX . c 2 2 appliesrelativelysimplefunctions(likelinear)inX forthepurposeofclassification. d isthemetric 2 2 function(detailsinSection3)anoracleusestomeasurethesimilarityamongsamples(byrelyingon representationslearnedinthespaceX ). WeillustratethemodelinganddecompositioninFigure1. 2 In Section3ourtheoreticalanalysisuses(X ,d )tobringforththefundamentalcausesofadversarial 2 2 examplesandleadstoasetofnovelinsightstounderstandsuchexamples. Tothebestoftheauthors’ knowledge,thetheoreticalanalysismadebythispaperhasnotbeenuncoveredbytheliterature. ModelingOraclef : Onemayarguethatitishardtomodelf and(X ,d )forrealapplications, 2 2 2 2 since if such oracles can be easily modeled machine-learning based f seems not necessary. In 1 Section 8.3, we provide examples of modeling oracles for real applications. For many security- sensitiveapplicationsaboutmachines,oraclesf doexist3. Forartificialintelligencetaskslikeimage 2 classification,humansaref . Asillustratedbycognitiveneurosciencepapers(DiCarlo&Cox,2007; 2 DiCarloetal.,2012),humanbrainsperformvisualobjectrecognitionusingtheventralvisualstream, andthisstreamisconsideredtobeaprogressiveseriesofvisualre-representations,fromV1toV2 toV4toITcortex (DiCarlo&Cox,2007). Experimentalresultssupportthathumanvisualsystem makesclassificationdecisionatthefinalITcortexlayer. Thisprocessiscapturedexactlybyour decompositionf =c ◦g . 2 2 2 2.3 REVISEDFORMULATION Now we use the decomposition of f to rewrite ∆ (x,x(cid:48)) as d (g (x),g (x(cid:48))) in Eq. (2.2) and 2 2 2 2 2 obtainourproposedgeneraldefinitionofadversarialexamples: Definition2.2. adversarialexample: Supposewehavetwofunctionsf andf . f :X →Y isthe 1 2 1 classificationfunctionlearnedfromatrainingsetandf :X →Y istheclassificationfunctionofthe 2 oraclethatgeneratesground-truthlabelsforthesametask. Givenasamplex∈X,anadversarial examplex(cid:48) ∈X. (x,x(cid:48))satisfiesEq.(2.3). Findx(cid:48) s.t. f (x)(cid:54)=f (x(cid:48)) 1 1 (2.3) d (g (x),g (x(cid:48)))<δ 2 2 2 2 f (x)=f (x(cid:48)) 2 2 Mostpreviousstudies(Table2)havemadeanimportantandimplicitassumptionaboutf (through 2 using∆(x,x(cid:48)) < (cid:15)): f isalmosteverywhere(a.e.) continuous. Weexplainsthea.e. continuity 2 assumptionanditsindicationinSection9. Basically,whenf isassumedcontinuousa.e., 2 P(f (x)=f (x(cid:48))|d (g (x),g (x(cid:48)))<δ )=1 2 2 2 2 2 2 Therefore,whenf iscontinuousa.e. Eq.(2.3)canbesimplifiedintothefollowingEq.(2.4). 2 Findx(cid:48) s.t. f (x)(cid:54)=f (x(cid:48)) (2.4) 1 1 d (g (x),g (x(cid:48)))<δ 2 2 2 2 3Oraclesf doexistinmanysecurity-sensitiveapplicationsaboutmachines.Butmachine-learningclassifiers 2 f areusedpopularlyduetospeedorefficiency 1 5 Workshoptrack-ICLR2017 3 DEFINE STRONG-ROBUSTNESS Withamoreaccuratedefinitionof“adversarialexamples”,nowweaimtoanswerthefirstcentral question: “What makes a classifier always robust against adversarial examples?”. Section 3.2 definestheconcept“strong-robust”describingaclassifieralwaysrobustagainstadversarialexamples. Section 3.3 and Section 3.4 present sufficient and necessary conditions for “strong-robustness”. Section4thenprovidesasetoftheoreticalinsightstounderstand“strong-robustness”. 3.1 MODELINGANDDECOMPOSINGf1 AsshowninFigure1, wedecomposef inasimilarwayasthedecompositionoff . Thisisto 1 2 answeranotherkeyquestion: “whichpartsofalearnedclassifierinfluenceitsrobustnessagainst adversarialexamplesmore,comparedwiththerest?”. Amachine-learningclassifierf =c ◦g , 1 1 1 whereg :X →X representsthefeatureextractionoperationsandc :X →Y performsasimple 1 1 1 1 operation(e.g., linear)ofclassification. Section8.4providesmultipleexamplesofdecomposing state-of-the-artf 4. d denotesthedistancefunctionf usestomeasuredifferenceamongsamples 1 1 1 inX . 1 Almostallpopularmachinelearningclassifierssatisfythea.e. continuityassumption. Itmeans: P(f (x)=f (x(cid:48))|d (g (x),g (x(cid:48)))<δ )=1 1 1 1 1 1 1 When f is not continuous a.e., it is not robust to any types of noise. See Section 9 for detailed 1 discussions. For the rarecases that f is notcontinuous a.e., Section11 discusses "boundarypoints" of f 5. 1 1 Roughlyspeaking,whenf isnotcontinuousa.e.,6 1 P(f (x)(cid:54)=f (x(cid:48))|d (g (x),g (x(cid:48)))<δ )>0 1 1 1 1 1 1 Thereforethefollowingprobabilityof“boundarypointsbasedadversarialexamples”mightnotbe0 forsuchcases7: P(f (x)(cid:54)=f (x(cid:48))|f (x)=f (x(cid:48)), 1 1 2 2 (3.1) d (g (x),g (x(cid:48)))<δ ,d (g (x),g (x(cid:48)))<δ ) 1 1 1 1 2 2 2 2 ThevalueofthisprobabilityiscriticalforouranalysisinTheorem(3.3)andinTheorem(3.5). 3.2 {δ2,η}-STRONG-ROBUSTAGAINSTADVERSARIALEXAMPLES Wethenapplyreverse-thinkingonDefinition(2.2)andderivethefollowingdefinitionof strong- robustnessforamachinelearningclassifieragainstadversarialexamples: Definition 3.1. {δ ,η}-Strong-robustness of a machine-learning classifier: A machine-learning 2 classifier f (·) is {δ ,η}-strong-robust against adversarial examples if: ∀x,x(cid:48) ∈ X a.e., (x,x(cid:48)) 1 2 satisfiesEq.(3.2). ∀x,x(cid:48) ∈X P(f (x)=f (x(cid:48))|f (x)=f (x(cid:48)), (3.2) 1 1 2 2 d (g (x),g (x(cid:48)))<δ )>1−η 2 2 2 2 Whenf iscontinuousa.e.,Eq.(3.2)simplifiesintoEq.(3.3): 2 ∀x,x(cid:48) ∈X, P(f (x)=f (x(cid:48))| 1 1 (3.3) d (g (x),g (x(cid:48)))<δ )>1−η 2 2 2 2 Eq.(3.2)definesthe“{δ ,η}-strong-robustness”asaclaimwiththehighprobability. Tosimplify 2 notations,intherestofthispaper,weuse“strong-robust”representing“{δ ,η}-strong-robust”. Also 2 intherestofthispaperweproposeandprovetheoremsandcorollariesbyusingitsmoregeneral formbyEq.(3.2). Forallcases,iff iscontinuousa.e.,allproofsandequationscanbesimplified 2 byusingonlythetermd (g (x),g (x(cid:48)))<δ (i.e. removingthetermf (x)=f (x(cid:48)))accordingto 2 2 2 2 2 2 Eq.(3.3)). The“strong-robustness”definitionleadstofourimportanttheoremsinnexttwosubsections. 4Noticethatg mayalsoincludeimplicitfeatureselectionstepslike(cid:96) regularization. 1 1 5Boundarypointsarethosepointssatisfyingf (x)(cid:54)=f (x(cid:48))andd (g (x),g (x(cid:48)))<δ ) 1 1 1 1 1 1 6Whenf iscontinuousa.e.,P(f (x)(cid:54)=f (x(cid:48))|d (g (x),g (x(cid:48)))<δ )=0. 1 1 1 1 1 1 1 7“Boundarypointsbasedadversarialexamples”onlyattackseedsampleswhoareboundarypointsoff . 1 6 Workshoptrack-ICLR2017 3.3 TOPOLOGICALEQUIVALENCEOFTWOMETRICSPACES(X1,d1)AND(X2,d2)IS SUFFICIENTINDETERMININGSTRONG-ROBUSTNESS Intheappendix,Section10.1brieflyintroducestheconceptofmetricspaceandthedefinitionof topologicalequivalenceamongtwometricspaces. AsshowninFigure1,heref definesametric 1 space(X ,d )onX withthemetricfunctiond . Similarlyf definesametricspace(X ,d )on 1 1 1 1 2 2 2 X withthemetricfunctiond . 2 2 Ifthetopologicalequivalence(Eq.(10.1))existsbetween(X ,d )and(X ,d ),itmeansthatfor 1 1 2 2 allpairofsamplesfromX,wehavethefollowingrelationship: ∀x,x(cid:48) ∈X, (3.4) d (g (x),g (x(cid:48)))<δ ⇔d (g (x),g (x(cid:48)))<δ 1 1 1 1 2 2 2 2 When f is continuous a.e., this can get us the following important theorem, indicating that the 1 topological equivalence between (X ,d ) and (X ,d ) is a sufficient condition in determining 1 1 2 2 whetherornotf isstrong-robustagainstadversarialexamples: 1 Theorem 3.2. When f is continuous a.e., if (X ,d ) and (X ,d ) are topologically equivalent, 1 1 1 2 2 thenthelearnedclassifierf (·)isstrong-robusttoadversarialexamples. 1 Proof. SeeitsproofsinSection10.3.4 Thistheoremcanactuallyguaranteethat: ∀x,x(cid:48) ∈X, P(f (x)=f (x(cid:48))|f (x)=f (x(cid:48)), (3.5) 1 1 2 2 d (g (x),g (x(cid:48)))<δ )=1 2 2 2 2 ClearlyEq.(3.5)isaspecial(stronger)caseofthe“strong-robustness”definedbyEq.(3.2). Formoregeneralcasesincludingf mightnotbecontinuousa.e.,weneedtoconsidertheprobability 1 oftheboundarypointattacks(Eq.(3.1)). Therefore,wegetamoregeneraltheoremasfollows: Theorem3.3. If(X ,d )and(X ,d )aretopologicallyequivalentandP(f (x)(cid:54)=f (x(cid:48))|f (x)= 1 1 2 2 1 1 2 f (x(cid:48)),d (g (x),g (x(cid:48))) < δ ,d (g (x),g (x(cid:48))) < δ ) < η, then the learned classifier f (·) is 2 1 1 1 1 2 2 2 2 1 strong-robusttoadversarialexamples. Proof. SeeitsproofsinSection10.3.3. 3.4 FINERTOPOLOGYOF(X,d(cid:48))THAN(X,d(cid:48))ISSUFFICIENTANDNECESSARYIN 1 2 DETERMININGSTRONG-ROBUSTNESS Nowweextendthediscussionfromtwometricspacesintotwopseudometricspaces. Thisextension findsthesufficientandnecessaryconditionthatdeterminesthestrong-robustnessoff . Therelated 1 twopseudometricsared(cid:48) (forf )andd(cid:48) (forf ),bothdirectlybeingdefinedonX. AppendixSec- 1 1 2 2 tion10.2includesdetaileddescriptionsofpseudometric,pseudometricspaces,topologyandafiner topologyrelationshipbetweentwopseudometricspaces. Essentially, thetopologyinpseudometricspace(X,d(cid:48)))isafinertopologythanthetopologyin 1 pseudometricspace(X,d(cid:48))means: 2 ∀x,x(cid:48) ∈X,d(cid:48)(x,x(cid:48))<δ ⇒d(cid:48)(x,x(cid:48))<δ (3.6) 2 2 1 1 Becaused(cid:48)(x,x(cid:48))=d (g (x),g (x(cid:48)))andd(cid:48)(x,x(cid:48))=d (g (x),g (x(cid:48))),theaboveequationequals 1 1 1 1 2 2 2 2 to: ∀x,x(cid:48) ∈X, (3.7) d (g (x),g (x(cid:48)))<δ ⇒d (g (x),g (x(cid:48)))<δ 2 2 2 2 1 1 1 1 UsingEq.(3.7)andthecontinuitya.e. assumption,wecanderivethefollowingTheoremaboutthe sufficientandnecessaryconditionforf beingstrong-robust: 1 Theorem3.4. Whenf iscontinuousa.e.,f isstrong-robustagainstadversarialexamplesifand 1 1 onlyifthetopologyin(X,d(cid:48))isafinertopologythanthetopologyin(X,d(cid:48)). 1 2 Proof. SeeitsproofinappendixSection10.3.1. 7 Workshoptrack-ICLR2017 Table3: Summaryoftheoreticalconclusionsthatwecanderive. HereX1 = Rn1 andX2 = Rn2. Thestrong-robustnessisdeterminedbyfeatureextractionfunctiong . Theaccuracyisdeterminedby 1 boththeclassificationfunctionc andthefeatureextractionfunctiong . 1 1 Cases: d &d arenorms Canbeaccurate? Basedon Illustration 1 2 (cid:84) (I) X \(X X )(cid:54)=∅, NotStrong-robust maynotbeaccurate Theorem(3.4) Figure2 1 1 2 X (cid:54)⊂X 2 1 (II) n >n ,X (cid:40)X Notstrong-robust maybeaccurate Corollary(4.1) Figure2 1 2 2 1 (III) n =n ,X =X Strong-robust maybeaccurate Corollary(4.2) Figure4 1 2 1 2 (IV) n <n ,X ⊂X Strong-robust maynotbeaccurate Theorem(3.4) Figure5 1 2 1 2 Actuallytheabovetheoremcanguaranteethatwhenf iscontinuousa.e.: 1 ∀x,x(cid:48) ∈X,P(f (x)=f (x(cid:48))|d (g (x),g (x(cid:48)))<δ )=1 (3.8) 1 1 2 2 2 2 Eq.(3.8)clearlyisaspecial(stronger)caseofstrong-robustnessdefinedbyEq.(3.2). Whenf isnotcontinuousa.e.,weneedtoconsidertheprobabilityoftheboundarypointsbased 1 adversarial examples (Eq. (3.1)). For such a case, we get a sufficient condition 8 for the strong- robustness: Theorem 3.5. When f is not continuous a.e., if the topology in (X,d(cid:48))) is a finer topol- 1 1 ogy than the topology in (X,d(cid:48)) and P(f (x) (cid:54)= f (x(cid:48))|f (x) = f (x(cid:48)),d (g (x),g (x(cid:48))) < 2 1 1 2 2 1 1 1 δ ,d (g (x),g (x(cid:48)))<δ )<η,thenf isstrong-robustagainstadversarialexamples. 1 2 2 2 2 1 Whenf isnotcontinuousa.e.,itsstrong-robustnessissignificantlyinfluencedbyitsboundarypoints 1 andthereforerelatestothec function. Section11.2providessomediscussionandweomitcovering 1 suchcasesintherestofthispaper. 4 TOWARDS PRINCIPLED UNDERSTANDING Thefourtheoremsproposedaboveleadtoasetofkeyinsightsaboutwhyandhowanadversarialcan foolamachine-learningclassifierusingadversarialexamples. Oneofthemostvaluableinsightsis: featurelearningstepdecideswhetherapredictorisstrong-robustornotinanadversarialtestsetting. Allthediscussionsinthesubsectionassumef iscontinuousa.e.. 1 4.1 UNNECESSARYFEATURESRUINSTRONG-ROBUSTNESS Theorem(3.2)and Theorem(3.4)indicatethatwhenf iscontinuousa.e.,thetwofeaturespaces 1 (X ,d )and(X ,d )orthefunctionsg andg determinethestrong-robustnessoff . Basedon 1 1 2 2 1 2 1 Theorem(3.4),wecanderiveacorollaryasfollows(proofinSection10.3.1): Corollary4.1. Whenf1 iscontinuousa.e.,ifX1 = Rn1,X2 = Rn2,n1 > n2,X2 (cid:40) X1,d1,d2 arenormfunctions,thenf (·)isnotstrong-robustagainstadversarialexamples. 1 Thiscorollaryshowsifunnecessaryfeatures(withregardstoX )areselectedinthefeatureselection 2 step,thennomatterhowaccuratethemodelistrained,itisnotstrong-robusttoadversarialexamples. Figure2showsasituationthattheoracleforthecurrenttaskonlyneedstouseonefeaturetoclassify samplescorrectly. Amachinelearningclassifierextractstwofeatureswithoneusedbytheoracle andtheotherisanextraunnecessaryfeature9. InX ,f (actuallyc )successfullyclassifiesallthe 1 1 1 testinputs. However,it’sveryeasytofindadversaryexamplessatisfyingEq.(2.4)byonlyaddinga smallperturbationalongtheunnecessaryfeaturedimension. InFigure2,redcirclesshowafewsuch adversarialexamples. Theadversarialexamplesareveryclosetoseedsamplesintheoraclespace. Buttheyarepredictedintoadifferentclassbyf . 1 Formanysecuritysensitiveapplications,previousstudiesusingstate-of-artlearning-basedclassifiers normallybelievethataddingmorefeaturesisalwayshelpful. Apparently,ourcorollaryindicatesthat 8When f is not continuous a.e., it is difficult to find the necessary and sufficient condition for strong- 1 robustnessoff .Weleavethistofutureresearch. 1 9TwofeaturesofX actuallypositivelycorrelateinFigure2.However,theoracledoesnotneedtousethe 1 secondfeatureformakingclassificationdecision 8 Workshoptrack-ICLR2017 thisthinkingiswrongandcanleadtotheirclassifiersvulnerabletoadversarialexamples(Xuetal., 2016). Asanotherexample,multipleDNNstudiesaboutadversarialexamplesclaimthatadversarialexamples aretransferableamongdifferentDNNmodels. ThiscanbeexplainedbyFigure2(whenX isa 1 muchhigher-dimensionalspace). SincedifferentDNNmodelslearnover-completefeaturespaces {X },thereisahighchancethatthesedifferentX involveasimilarsetofunnecessaryfeatures 1 1 (e.g.,thedifferentlearnedfeaturesarecorrelatedwithothers). Thereforetheadversarialexamplesare generatedalongsimilargradientdirections. ThatiswhymanysuchsamplescanevademultipleDNN models. 4.2 FEATURESPACEMOREIMPORTANTTHANNORM Using Theorem(3.3),weobtainanothercorollaryasfollows(proofinSection10.3.1): Corollary4.2. Whenf iscontinuousa.e.,ifd andd arenormsandX =X =Rn,thenf (·) 1 1 2 1 2 1 isstrong-robusttoadversarialexamples. Thiscorollaryshowsthatifalearnedclassifieranditsoraclesharethesamederivedfeaturespace (X =X ),thelearnedclassifierisstrong-robustwhentwometricsarebothnormfunctions(evenif 1 2 notthesamenorm). Wecancallthiscorollaryas“normdoesn’tmatter”. ManyinterestingphenomenacanbeansweredbyCorollary(4.2).Forinstance,foranormregularized classifier,thiscorollaryanswersanimportantquestionthatwhetheradifferentnormfunctionwill influence its robustness against adversarial examples. The corollary indicates that changing to a differentnormfunctionmaynotimprovetherobustnessofthemodelunderadversarialperturbation. SummarizingTheorem(3.2),Theorem(3.4),Corollary(4.2)andCorollary(4.1),therobustnessofa learnedclassifierisdecidedbytwofactors: (1)thedifferencebetweentwoderivedfeaturespaces; and(2)thedifferencebetweenthemetricfunctions. Twocorollariesshowthatthedifferencebetween thefeaturespacesismoreimportantthanthedifferencebetweenthetwometricfunctions. 4.3 ROBUSTNESSANDGENERALIZATION In Table 3, we provide four situations in which the proposed theorems can be used to determine whetheraclassifierf isstrong-robustagainstadversarialexamplesornot. 1 • Case(I):Iff usessomeunnecessaryfeatures,itwillnotbestrong-robusttoadversarialexamples. 1 Itmaynotbeanaccuratepredictoriff missessomenecessaryfeaturesusedbyf . 1 2 • Case(II):Iff usessomeunnecessaryfeatures,itwillnotbestrong-robusttoadversarialexamples. 1 Itmaybeanaccuratepredictoriff usesallthefeaturesusedbyf . 1 2 • Case(III):Iff andf usethesamesetoffeaturesandnothingelse,f isstrong-robustandmay 1 2 1 beaccurate. • Case(IV):Iff missessomenecessaryfeaturesanddoesnotextractunnecessaryfeatures,f is 1 1 strong-robust(eventoughitsaccuracymaynotbegood). Table3providesamuchbetterunderstandingoftherelationshipbetweenrobustnessandaccuracy. TwointerestingcasesfromTable3areworthtoemphasizeagain: (1)Iff missesfeaturesusedby 1 f anddoesnotincludeunnecessaryfeatures(accordingtoX ),f isstrong-robust(eventhoughit 2 2 1 maynotbeaccurate). (2)Iff extractssomeextraunnecessaryfeatures,itwillnotbestrong-robust 1 (thoughitmaybeaveryaccuratepredictor). Wewanttoemphasizethat“f isstrong-robust”doesnotmeanitisagoodclassifier. Forexample,a 1 trivialexampleforstrong-robustmodelsisf (x)≡1,∀x∈X. However,itisauselessmodelsince 1 itdoesn’thaveanypredictionpower. Inanadversarialsetting,weshouldaimtogetaclassifierthatis bothstrong-robustandprecise. Abetterfeaturelearningfunctiong isexactlythesolutionthatmay 1 achievebothgoals. Table3indicatesthatc andc donotinfluencethestrong-robustnessoff whenf iscontinuous 1 2 1 1 a.e. 10. Figure4andFigure5furthershowtwoconcreteexamplecasesinwhichf isstrong-robust 1 accordingtof . However,inbothfigures,f isnotaccurateaccordingtof . 2 1 2 10Whenf isnotcontinuousa.e.,c mattersforthestrong-robustness.SeeSection11fordetails. 1 1 9 Workshoptrack-ICLR2017 Table4: ConnectingtorelevantDNNhardeningsolutions. Theexperimentalresultsofcomparing differenthardeningsolutionsareshowninFigure9,Figure10,Table10andTable11. x(cid:48) LossL (x,x(cid:48)) OnLayer f1 Stabilitytraining(Zheng randomperturbation KL(f (x),f (x(cid:48))) Classificationlayer 1 1 etal.,2016) (Miyatoetal.,2016) adversarialperturbation KL(f (x),f (x(cid:48))) Classificationlayer 1 1 Adversarial train- adversarialperturbation L(f (x(cid:48)),f (x)) Lossfunction 1 2 ing(Goodfellow et al., 2014) Large Adversarial train- adversarialperturbation L(f (x(cid:48)),f (x)) Lossfunction 1 2 ing(Kurakinetal.,2016) (Leeetal.,2015) adversarialperturbation (cid:107)g (x)−g (x(cid:48))(cid:107) Layer before classification 1 1 2 layer SiameseTraining randomperturbation (cid:107)g (x)−g (x(cid:48))(cid:107) Layer before classification 1 1 2 layer 5 TOWARDS PRINCIPLED SOLUTIONS FOR DNNS Ourtheoreticalanalysisuncoversfundamentalpropertiestoexplaintheadversarialexamples. Inthis section,weapplythemtoanalyzeDNNclassifiers. Morespecifically,(1)wefindthatDNNsarenot strong-robustagainstadversarialexamples;and(ii)weconnecttopossiblehardeningsolutionsand introduceprincipledunderstandingofthesesolutions. 5.1 ARESTATE-OF-THE-ARTDNNSSTRONG-ROBUST? ForDNN,itisdifficulttoderiveapreciseanalyticformofd (ord(cid:48)). Butwecanobservesome 1 1 propertiesofd throughexperimentalresults. Table5,Table6,Table7andTable8showpropertiesof 1 d (andd(cid:48))resultingfromperformingtestingexperimentsonfourstate-of-artDNNnetworks(details 1 1 in Section12.1). AllfourtablesindicatethattheaccuracyofDNNmodelsintheadversarialsetting arequitebad. Theperformanceonrandomlyperturbedinputsismuchbetterthanperformanceon maliciouslyperturbedadversarialexamples. ThephenomenonweobservedcanbeexplainedbyFigure3. Comparingthesecondcolumnand the third column in four tables we can conclude that d (and d(cid:48)) in a random direction is larger 1 1 than d (and d(cid:48))in theadversarial direction. Thisindicates thataround spherein (X ,d ) (and 1 1 1 1 (X,d(cid:48)))correspondstoaverythinhigh-dimensionalellipsoidin(X,||·||)(illustratedbythelefthalf 1 ofFigure3). Figure3(I)showsaspherein(X,d(cid:48))andFigure3(III)showsaspherein(X ,d ). 1 1 1 Theycorrespondtotheverythinhigh-dimensionalellipsoidin(X,||·||)inFigure3(V).Thenorm function||·||isdefinedinspaceX andisapplication-dependent. Allfourtablesuses||·||=||·|| . ∞ Differently,forhumanoracles,aspherein(X,d(cid:48))(showninFigure3(II))orin(X ,d )(shown 2 2 2 inFigure3(IV))correspondstoanellipsoidin(X,||·||)notincludingvery-thindirections(shown inFigure3(VI)).Whentheattackerstrytominimizetheperturbationsizeusingtheapproximated distancefunctiond =||·||,thethindirectionofellipsoidinFigure3(V)isexactlytheadversarial 2 direction. 5.2 TOWARDSPRINCIPLEDSOLUTIONS OurtheoremssuggestalistofpossiblesolutionsthatmayimprovetherobustnessofDNNclassifiers againstadversarialsamples. Optionsincludesuchas: Bylearningabetterg :MethodslikeDNNsdirectlylearnthefeatureextractionfunctiong .Table4 1 1 summarizesmultiplehardeningsolutions(Zhengetal.,2016;Miyatoetal.,2016;Leeetal.,2015) intheDNNliterature. Theymostlyaimtolearnabetterg byminimizingdifferentlossfunctions 1 L (x,x(cid:48))sothatwhend (g (x),g (x(cid:48)))<(cid:15)(approximatedby(X,||·||)),thislossL (x,x(cid:48))is f1 2 2 2 f1 small. Two major variations exist among related methods: the choice of L (x,x(cid:48)) and the way f1 togeneratepairsof(x,x(cid:48)). Forinstance,toreachthestrong-robustnesswecanforcetolearnag 1 thathelps(X,d(cid:48))tobeafinertopologythan(X ,d(cid:48)). Section12.4exploresthisoption(“Siamese 1 2 2 training”inTable4)throughSiamesearchitecture. ExperimentallySection12.5comparesadversarial training,stabilitytrainingandSiamesetrainingontwostate-of-the-artDNNimage-classification 10
Description: