Table Of Content

Workshoptrack-ICLR2017 A THEORETICAL FRAMEWORK FOR ROBUSTNESS OF (DEEP) CLASSIFIERS AGAINST ADVERSARIAL EXAMPLES BeilunWang,JiGao,YanjunQi DepartmentofComputerScience UniversityofVirginia Charlottesville,VA22901,USA {bw4mw,jg6yd,yanjun}@virginia.edu ABSTRACT Mostmachinelearningclassifiers,includingdeepneuralnetworks,arevulnerable to adversarial examples. Such inputs are typically generated by adding small butpurposefulmodificationsthatleadtoincorrectoutputswhileimperceptibleto human eyes. The goal of this paper is not to introduce a single method, but to maketheoreticalstepstowardsfullyunderstandingadversarialexamples. Byusing conceptsfromtopology,ourtheoreticalanalysisbringsforththekeyreasonswhy anadversarialexamplecanfoolaclassifier(f )andaddsitsoracle(f ,likehuman 1 2 eyes)insuchanalysis. Byinvestigatingthetopologicalrelationshipbetweentwo (pseudo)metricspacescorrespondingtopredictorf andoraclef , wedevelop 1 2 necessaryandsufficientconditionsthatcandetermineiff isalwaysrobust(strong- 1 robust)againstadversarialexamplesaccordingtof . Interestinglyourtheorems 2 indicatethatjustoneunnecessaryfeaturecanmakef notstrong-robust,andthe 1 rightfeaturerepresentationlearningisthekeytogettingaclassifierthatisboth accurateandstrongrobust. 1 INTRODUCTION DeepNeuralNetworks(DNNs)canefficientlylearnhighlyaccuratemodelsandhavebeendemon- stratedtoperformexceptionallywell(Krizhevskyetal.,2012;Hannunetal.,2014). However,recent studiesshowthatintelligentattackerscanforcemanymachinelearningmodels,includingDNNs,to misclassifyexamplesbyaddingsmallandhardlyvisiblemodificationsonaregulartestsample. Themaliciouslygeneratedinputsarecalled“adversarialexamples”(Goodfellowetal.,2014;Szegedy et al., 2013) and are commonly crafted by carefully searching small perturbations through an optimizationprocedure. Severalrecentstudiesproposedalgorithmsforsolvingsuchoptimizationto foolDNNclassifiers. (Szegedyetal.,2013)firstlyobservethatconvolutionDNNsarevulnerable to small artificial perturbations. They use box-constrained Limited-memory BFGS (L-BFGS) to createadversarialexamplesandfindthatadversarialperturbationsgeneratedfromoneDNNnetwork can also force other networks to produce wrong outputs. Then, (Goodfellow et al., 2014) try to clarifythattheprimarycauseofsuchvulnerabilitiesmaybethelinearnatureofDNNs. Theythen proposethefastgradientsignmethodforgeneratingadversarialexamplesquickly. Subsequentpapers (Fawzietal.,2015;Papernotetal.,2015a;Nguyenetal.,2015)haveexploredotherwaystoexplore adversarial examples for DNN (details in Section 2.1). The goal of this paper is to analyze the robustnessofmachinelearningmodelsinthefaceofadversarialexamples. Inresponsetoprogressingeneratingadversarialexamples,researchersattempttodesignstrategiesfor makingmachine-learningsystemsrobusttovariousnoise,intheworstcaseasadversarialexamples. Forinstance,denoisingNNarchitectures(Vincentetal.,2008;Gu&Rigazio,2014;Jinetal.,2015) candiscovermorerobustfeaturesbyusinganoise-corruptedversionofinputsastrainingsamples. Amodifieddistillationstrategy(Papernotetal.,2015b)isproposedtoimprovetherobustnessof DNNsagainstadversarialexamples,thoughithasbeenshowntobeunsuccessfulrecently(Carlini& Wagner,2016a). Themostgenerallysuccessfulstrategytodateisadversarialtraining(Goodfellow etal.,2014;Szegedyetal.,2013)whichinjectsadversarialexamplesintotrainingtoimprovethe generalizationofDNNmodels. Morerecenttechniquesincorporateasmoothnesspenalty(Miyato 1 Workshoptrack-ICLR2017 Table1: Alistofimportantnotationsusedinthepaper. f Alearnedmachinelearningclassifierf =c ◦g . 1 1 1 1 f Theoracleforthesametask(seeDefinition(2.1))f =c ◦g . 2 2 2 2 g Partoff includingoperationsthatprogressivelytransforminputintoanew i i formoflearnedrepresentationsinX . i c Partoff includingsimpledecisionfunctions(likelinear)forclassifying. i i X Inputspace(e.g.,{0,1,2,...,255}32×32×3forCIFAR-10data(Krizhevsky &Hinton,2009)). Y Outputspace(e.g.,{1,2,3,...,10}forCIFAR-10data(Krizhevsky&Hinton, 2009)). X Featurespacedefinedbythefeatureextractionmoduleg ofpredictorf . 1 1 1 X Featurespacedefinedbythefeatureextractionmoduleg oforaclef . 2 2 2 d (·,·) ThemetricfunctionformeasuringsampledistancesinfeaturespaceX with 1 1 respecttopredictorf . 1 d (·,·) ThemetricfunctionformeasuringsampledistanceinfeaturespaceX with 2 2 respecttooraclef . 2 d(cid:48)(·,·) The Pseudometric function with respect to predictor f , d(cid:48)(x,x(cid:48)) = 1 1 1 d (g (x),g (x(cid:48))). 1 1 1 d(cid:48)(·,·) The Pseudometric function with respect to oracle f , d(cid:48)(x,x(cid:48)) = 2 2 2 d (g (x),g (x(cid:48))). 2 2 2 a.e. almosteverywhere(Folland,2013);(definedbyDefinition(9.2)in Section9.1) (cid:15),δ ,δ ,δ,η smallpositiveconstants 1 2 etal.,2016;Zhengetal.,2016)oralayer-wisepenalty(Carlini&Wagner,2016b)asaregularization terminthelossfunctiontopromotethesmoothnessoftheDNNmodeldistributions. Recentstudies(reviewedby(Papernotetal.,2016b))aremostlyempiricalandprovidelittleunder- standingofwhyanadversarycanfoolmachinelearningmodelswithadversarialexamples. Several importantquestionshavenotbeenansweredyet: • Whatmakesaclassifieralwaysrobusttoadversarialexamples? • Whichpartsofaclassifierinfluenceitsrobustnessagainstadversarialexamplesmore,compared withtherest? • Whatistherelationshipbetweenaclassifier’sgeneralizationaccuracyanditsrobustnessagainst adversarialexamples? • Why(many)DNNclassifiersarenotrobustagainstadversarialexamples? Howtoimprove? Thispapertriestoanswerabovequestionsandmakesthefollowingcontributions: • Section2pointsoutthatpreviousdefinitionsofadversarialexamplesforaclassifier(f )have 1 overlookedtheimportanceofanoraclefunction(f )ofthesametask. 2 • Section3formallydefineswhenaclassifierf isalwaysrobust("strong-robust")againstadversarial 1 examples. Itprovesfourtheoremsaboutsufficientandnecessaryconditionsthatmakef always 1 robustagainstadversarialexamplesaccordingtof . Ourtheoremsleadtoanumberofinteresting 2 insights,likethatthefeaturerepresentationlearningcontrolsifaDNNisstrong-robustornot. • Section12isdedicatedtoprovidepracticalandtheoreticallygroundeddirectionsforunderstanding andhardeningDNNmodelsagainstadversarialexamples. Table1providesalistofimportantnotationsweuseinthepaper. 2 DEFINE ADVERSARIAL EXAMPLES Thissectionprovidesageneraldefinitionofadversarialexamples,byincludingthenotionofan oracle. Foraparticularclassificationtask,alearnedclassifierisrepresentedasf :X →Y,where 1 X representstheinputsamplespaceandY istheoutputspacerepresentingacategoricalset. 2.1 PREVIOUSFORMULATIONS Various definitions of “adversarial examples” exist in the recent literature, with most following Eq.(2.1). SeemoredetailedreviewsinSection8. Thebasicideaistogenerateamisclassifiedsample 2 Workshoptrack-ICLR2017 Figure1: Exampleofamachine-learningclassifier(predictor)andahumanannotator(oracle)for classifyingimagesofhand-written“0”. Bothincludetwosteps: featureextractionandclassification. Theupperhalfisaboutthelearnedmachineclassifierf andthelowerhalfisabouttheoraclef . f 1 2 1 transformssamplesfromtheoriginalspaceX toanembeddedmetricspace(X ,d )usingitsfeature 1 1 extractionstep. Here,d isthesimilaritymeasureonthefeaturespaceX . Classificationmodels 1 1 likeDNNcoverthefeatureextractionstepinitsmodel, thoughmanyothermodelslikedecision treeneedhard-craftedordomain-specificfeatureextraction. Thenf canusealinearfunctionto 1 decidetheclassificationpredictiony ∈Y. Similarly,humanoraclef transformsdatasamplesfrom (cid:98) 2 theoriginalspaceX intoanembeddedmetricspace(X ,d )byitsfeatureextraction. Here,d is 2 2 2 thecorrespondingsimilaritymeasure. Thentheoraclegettheclassificationresulty ∈Y usingthe featurerepresentationofsamples(X ,d ). 2 2 x(cid:48)by“slightly”perturbingacorrectlyclassifiedsamplex,withanadversarialperturbation∆(x,x(cid:48)). Formally,whengivenx∈X Findx(cid:48) s.t. f (x)(cid:54)=f (x(cid:48)) (2.1) 1 1 ∆(x,x(cid:48))<(cid:15) Herex,x(cid:48) ∈X. ∆(x,x(cid:48))representsthedifferencebetweenxandx(cid:48),whichdependsonthespecific datatypethatxandx(cid:48)belongto1. Table2summarizesdifferentchoicesoff and∆(x,x(cid:48))usedin 1 therecentliterature,inwhichnormfunctionsontheoriginalspaceX aremostlyusedtocalculate ∆(x,x(cid:48)). MultiplealgorithmshavebeenimplementedtosolveEq.(2.1)asaconstrainedoptimization (summarized by the last column of Table 2). More details are included for three such studies in Section8.2. Whensearchingforadversarialexamples, oneimportantpropertyhasnotbeenfullycapturedby Eq.(2.1). Thatis,anadversarialexamplehasbeenmodifiedveryslightlyfromitsseedandthese modificationscanbesosubtlethat,forexampleinimageclassification,ahumanobserverdoesnot evennoticethemodificationatall. Wedefinetheroleof“humanobserver”moreformallyasfollows: Definition2.1. An“Oracle”representsadecisionprocessgeneratinggroundtruthlabelsforatask ofinterest. Eachoracleistask-specific,withfiniteknowledgeandnoise-free2. 1Forexample,inthecaseofstrings,∆(x,x(cid:48))representsthedifferencebetweentwostrings. 2Weleavealldetailedanalysisofwhenanoraclecontainsnoiseasfuturework. 3 Workshoptrack-ICLR2017 Figure2: Anexampleshowingthatf withoneunnecessaryfeature(accordingtof )isproneto 1 2 adversarialexamples. Theredcircledenotesanadversarialexample(e.g. generatedbysomeattack similarasJSMA(Papernotetal.,2015a)(detailsinSection8.2)). Eachadversarialexampleisvery closetoitsseedsampleintheoraclefeaturespace(accordingtod ),butitiscomparativelyfarfrom 2 itsseedsampleinthefeaturespace(accordingtod )ofthetrainedclassifierandisatthedifferent 1 sideofthedecisionboundaryoff . Essentially“adversarialexamples”canbeeasilyfoundforall 1 seedsamplesinthisFigure. Weonlydrawcasesfortwoseeds. Besides,foreachseedsample,we cangenerateaseriesof“adversarialexamples”(byvaryingattackingpower)aftertheattackingline crossesthedecisionboundaryoff . Weonlyshowonecaseofsuchanadversarialexampleforeach 1 seedsample. Table2: Summaryofthepreviousstudiesdefiningadversarialexamples. Previousstudies f ∆(x,x(cid:48)) Formulationoff (x)(cid:54)=f (x(cid:48)) 1 1 1 (Goodfellowetal.,2014) Convolutionalneuralnetworks (cid:96) argmaxLoss(f (x(cid:48)),f (x)) ∞ 1 1 x(cid:48) (Szegedyetal.,2013) Convolutionalneuralnetworks (cid:96) argminLoss(f (x(cid:48)),l),subjectto:l(cid:54)=f (x(cid:48)) 2 1 1 x(cid:48) (Biggioetal.,2013) Supportvectormachine(SVM) (cid:96) argminLoss(f (x(cid:48)),−1),subjectto:f (x)=1 2 1 1 x(cid:48) (Kantchelianetal.,2015) DecisiontreeandRandomforest (cid:96) , (cid:96) , argminLoss(f (x(cid:48)),−1),subjectto:f (x)=1 2 1 1 1 (cid:96) x(cid:48) ∞ (Papernotetal.,2016a) Convolutionalneuralnetworks (cid:96) argmaxLoss(f (x(cid:48)),f (x)) 0 1 1 x(cid:48) (Grosseetal.,2016) Convolutionalneuralnetworks (cid:96) argmaxLoss(f (x(cid:48)),f (x)) 0 1 1 x(cid:48) (Xuetal.,2016) RandomforestandSVM (cid:96) ,(cid:96) argminLoss(f (x(cid:48)),−1),subjectto:f (x)=1 1 ∞ 1 1 x(cid:48) The goal of machine learning is to train a learning-based predictor function f : X → Y to 1 approximateanoracleclassifierf :X →Y forthesameclassificationtask. Forexample,inimage 2 classificationtasks,theoraclef isoftenagroupofhumanannotators. Addingthenotationoforacle, 2 wereviseEq.(2.1)into: Findx(cid:48) s.t. f (x)(cid:54)=f (x(cid:48)) 1 1 (2.2) ∆ (x,x(cid:48))<(cid:15) 2 f (x)=f (x(cid:48)) 2 2 4 Workshoptrack-ICLR2017 2.2 MEASURINGSAMPLEDIFFERENCEINWHICHSPACE? MODELING&DECOMPOSINGf2 ∆ (x,x(cid:48)) < (cid:15)reflectsthatadversarialexamplesadd“smallmodifications”thatarealmostimper- 2 ceptibletooracleofthetask. Clearlycalculating∆ (x,x(cid:48))needstoaccordtooraclef . Formost 2 2 classification tasks, an oracle does not measure the sample difference in the original input space X. Wewanttoemphasizethatsampledifferenceiswithregardstoitsclassificationpurpose. For instance,whenlabelingimagesforthehand-writtendigitalrecognition,humanannotatorsdonot needtoconsiderthosebackgroundpixelstodecideifanimageis“0”ornot. Illustrated in Figure 1, we denote the feature space an oracle uses to consider difference among samples for the purpose of classification decision as X . The sample difference uses a distance 2 functiond inthisspace. Anoraclefunctionf :X →Y canbedecomposedasf =c ◦g where 2 2 2 2 2 g : X → X represents the operations for feature extraction from X to X and c : X → Y 2 2 2 2 2 denotes the simple operation of classification in X . Essentially g includes the operations that 2 2 (progressively)transforminputrepresentationsintoaninformativeformofrepresentationsX . c 2 2 appliesrelativelysimplefunctions(likelinear)inX forthepurposeofclassification. d isthemetric 2 2 function(detailsinSection3)anoracleusestomeasurethesimilarityamongsamples(byrelyingon representationslearnedinthespaceX ). WeillustratethemodelinganddecompositioninFigure1. 2 In Section3ourtheoreticalanalysisuses(X ,d )tobringforththefundamentalcausesofadversarial 2 2 examplesandleadstoasetofnovelinsightstounderstandsuchexamples. Tothebestoftheauthors’ knowledge,thetheoreticalanalysismadebythispaperhasnotbeenuncoveredbytheliterature. ModelingOraclef : Onemayarguethatitishardtomodelf and(X ,d )forrealapplications, 2 2 2 2 since if such oracles can be easily modeled machine-learning based f seems not necessary. In 1 Section 8.3, we provide examples of modeling oracles for real applications. For many security- sensitiveapplicationsaboutmachines,oraclesf doexist3. Forartificialintelligencetaskslikeimage 2 classification,humansaref . Asillustratedbycognitiveneurosciencepapers(DiCarlo&Cox,2007; 2 DiCarloetal.,2012),humanbrainsperformvisualobjectrecognitionusingtheventralvisualstream, andthisstreamisconsideredtobeaprogressiveseriesofvisualre-representations,fromV1toV2 toV4toITcortex (DiCarlo&Cox,2007). Experimentalresultssupportthathumanvisualsystem makesclassificationdecisionatthefinalITcortexlayer. Thisprocessiscapturedexactlybyour decompositionf =c ◦g . 2 2 2 2.3 REVISEDFORMULATION Now we use the decomposition of f to rewrite ∆ (x,x(cid:48)) as d (g (x),g (x(cid:48))) in Eq. (2.2) and 2 2 2 2 2 obtainourproposedgeneraldefinitionofadversarialexamples: Definition2.2. adversarialexample: Supposewehavetwofunctionsf andf . f :X →Y isthe 1 2 1 classificationfunctionlearnedfromatrainingsetandf :X →Y istheclassificationfunctionofthe 2 oraclethatgeneratesground-truthlabelsforthesametask. Givenasamplex∈X,anadversarial examplex(cid:48) ∈X. (x,x(cid:48))satisfiesEq.(2.3). Findx(cid:48) s.t. f (x)(cid:54)=f (x(cid:48)) 1 1 (2.3) d (g (x),g (x(cid:48)))<δ 2 2 2 2 f (x)=f (x(cid:48)) 2 2 Mostpreviousstudies(Table2)havemadeanimportantandimplicitassumptionaboutf (through 2 using∆(x,x(cid:48)) < (cid:15)): f isalmosteverywhere(a.e.) continuous. Weexplainsthea.e. continuity 2 assumptionanditsindicationinSection9. Basically,whenf isassumedcontinuousa.e., 2 P(f (x)=f (x(cid:48))|d (g (x),g (x(cid:48)))<δ )=1 2 2 2 2 2 2 Therefore,whenf iscontinuousa.e. Eq.(2.3)canbesimplifiedintothefollowingEq.(2.4). 2 Findx(cid:48) s.t. f (x)(cid:54)=f (x(cid:48)) (2.4) 1 1 d (g (x),g (x(cid:48)))<δ 2 2 2 2 3Oraclesf doexistinmanysecurity-sensitiveapplicationsaboutmachines.Butmachine-learningclassifiers 2 f areusedpopularlyduetospeedorefficiency 1 5 Workshoptrack-ICLR2017 3 DEFINE STRONG-ROBUSTNESS Withamoreaccuratedefinitionof“adversarialexamples”,nowweaimtoanswerthefirstcentral question: “What makes a classifier always robust against adversarial examples?”. Section 3.2 definestheconcept“strong-robust”describingaclassifieralwaysrobustagainstadversarialexamples. Section 3.3 and Section 3.4 present sufficient and necessary conditions for “strong-robustness”. Section4thenprovidesasetoftheoreticalinsightstounderstand“strong-robustness”. 3.1 MODELINGANDDECOMPOSINGf1 AsshowninFigure1, wedecomposef inasimilarwayasthedecompositionoff . Thisisto 1 2 answeranotherkeyquestion: “whichpartsofalearnedclassifierinfluenceitsrobustnessagainst adversarialexamplesmore,comparedwiththerest?”. Amachine-learningclassifierf =c ◦g , 1 1 1 whereg :X →X representsthefeatureextractionoperationsandc :X →Y performsasimple 1 1 1 1 operation(e.g., linear)ofclassification. Section8.4providesmultipleexamplesofdecomposing state-of-the-artf 4. d denotesthedistancefunctionf usestomeasuredifferenceamongsamples 1 1 1 inX . 1 Almostallpopularmachinelearningclassifierssatisfythea.e. continuityassumption. Itmeans: P(f (x)=f (x(cid:48))|d (g (x),g (x(cid:48)))<δ )=1 1 1 1 1 1 1 When f is not continuous a.e., it is not robust to any types of noise. See Section 9 for detailed 1 discussions. For the rarecases that f is notcontinuous a.e., Section11 discusses "boundarypoints" of f 5. 1 1 Roughlyspeaking,whenf isnotcontinuousa.e.,6 1 P(f (x)(cid:54)=f (x(cid:48))|d (g (x),g (x(cid:48)))<δ )>0 1 1 1 1 1 1 Thereforethefollowingprobabilityof“boundarypointsbasedadversarialexamples”mightnotbe0 forsuchcases7: P(f (x)(cid:54)=f (x(cid:48))|f (x)=f (x(cid:48)), 1 1 2 2 (3.1) d (g (x),g (x(cid:48)))<δ ,d (g (x),g (x(cid:48)))<δ ) 1 1 1 1 2 2 2 2 ThevalueofthisprobabilityiscriticalforouranalysisinTheorem(3.3)andinTheorem(3.5). 3.2 {δ2,η}-STRONG-ROBUSTAGAINSTADVERSARIALEXAMPLES Wethenapplyreverse-thinkingonDefinition(2.2)andderivethefollowingdefinitionof strong- robustnessforamachinelearningclassifieragainstadversarialexamples: Definition 3.1. {δ ,η}-Strong-robustness of a machine-learning classifier: A machine-learning 2 classifier f (·) is {δ ,η}-strong-robust against adversarial examples if: ∀x,x(cid:48) ∈ X a.e., (x,x(cid:48)) 1 2 satisfiesEq.(3.2). ∀x,x(cid:48) ∈X P(f (x)=f (x(cid:48))|f (x)=f (x(cid:48)), (3.2) 1 1 2 2 d (g (x),g (x(cid:48)))<δ )>1−η 2 2 2 2 Whenf iscontinuousa.e.,Eq.(3.2)simplifiesintoEq.(3.3): 2 ∀x,x(cid:48) ∈X, P(f (x)=f (x(cid:48))| 1 1 (3.3) d (g (x),g (x(cid:48)))<δ )>1−η 2 2 2 2 Eq.(3.2)definesthe“{δ ,η}-strong-robustness”asaclaimwiththehighprobability. Tosimplify 2 notations,intherestofthispaper,weuse“strong-robust”representing“{δ ,η}-strong-robust”. Also 2 intherestofthispaperweproposeandprovetheoremsandcorollariesbyusingitsmoregeneral formbyEq.(3.2). Forallcases,iff iscontinuousa.e.,allproofsandequationscanbesimplified 2 byusingonlythetermd (g (x),g (x(cid:48)))<δ (i.e. removingthetermf (x)=f (x(cid:48)))accordingto 2 2 2 2 2 2 Eq.(3.3)). The“strong-robustness”definitionleadstofourimportanttheoremsinnexttwosubsections. 4Noticethatg mayalsoincludeimplicitfeatureselectionstepslike(cid:96) regularization. 1 1 5Boundarypointsarethosepointssatisfyingf (x)(cid:54)=f (x(cid:48))andd (g (x),g (x(cid:48)))<δ ) 1 1 1 1 1 1 6Whenf iscontinuousa.e.,P(f (x)(cid:54)=f (x(cid:48))|d (g (x),g (x(cid:48)))<δ )=0. 1 1 1 1 1 1 1 7“Boundarypointsbasedadversarialexamples”onlyattackseedsampleswhoareboundarypointsoff . 1 6 Workshoptrack-ICLR2017 3.3 TOPOLOGICALEQUIVALENCEOFTWOMETRICSPACES(X1,d1)AND(X2,d2)IS SUFFICIENTINDETERMININGSTRONG-ROBUSTNESS Intheappendix,Section10.1brieflyintroducestheconceptofmetricspaceandthedefinitionof topologicalequivalenceamongtwometricspaces. AsshowninFigure1,heref definesametric 1 space(X ,d )onX withthemetricfunctiond . Similarlyf definesametricspace(X ,d )on 1 1 1 1 2 2 2 X withthemetricfunctiond . 2 2 Ifthetopologicalequivalence(Eq.(10.1))existsbetween(X ,d )and(X ,d ),itmeansthatfor 1 1 2 2 allpairofsamplesfromX,wehavethefollowingrelationship: ∀x,x(cid:48) ∈X, (3.4) d (g (x),g (x(cid:48)))<δ ⇔d (g (x),g (x(cid:48)))<δ 1 1 1 1 2 2 2 2 When f is continuous a.e., this can get us the following important theorem, indicating that the 1 topological equivalence between (X ,d ) and (X ,d ) is a sufficient condition in determining 1 1 2 2 whetherornotf isstrong-robustagainstadversarialexamples: 1 Theorem 3.2. When f is continuous a.e., if (X ,d ) and (X ,d ) are topologically equivalent, 1 1 1 2 2 thenthelearnedclassifierf (·)isstrong-robusttoadversarialexamples. 1 Proof. SeeitsproofsinSection10.3.4 Thistheoremcanactuallyguaranteethat: ∀x,x(cid:48) ∈X, P(f (x)=f (x(cid:48))|f (x)=f (x(cid:48)), (3.5) 1 1 2 2 d (g (x),g (x(cid:48)))<δ )=1 2 2 2 2 ClearlyEq.(3.5)isaspecial(stronger)caseofthe“strong-robustness”definedbyEq.(3.2). Formoregeneralcasesincludingf mightnotbecontinuousa.e.,weneedtoconsidertheprobability 1 oftheboundarypointattacks(Eq.(3.1)). Therefore,wegetamoregeneraltheoremasfollows: Theorem3.3. If(X ,d )and(X ,d )aretopologicallyequivalentandP(f (x)(cid:54)=f (x(cid:48))|f (x)= 1 1 2 2 1 1 2 f (x(cid:48)),d (g (x),g (x(cid:48))) < δ ,d (g (x),g (x(cid:48))) < δ ) < η, then the learned classifier f (·) is 2 1 1 1 1 2 2 2 2 1 strong-robusttoadversarialexamples. Proof. SeeitsproofsinSection10.3.3. 3.4 FINERTOPOLOGYOF(X,d(cid:48))THAN(X,d(cid:48))ISSUFFICIENTANDNECESSARYIN 1 2 DETERMININGSTRONG-ROBUSTNESS Nowweextendthediscussionfromtwometricspacesintotwopseudometricspaces. Thisextension findsthesufficientandnecessaryconditionthatdeterminesthestrong-robustnessoff . Therelated 1 twopseudometricsared(cid:48) (forf )andd(cid:48) (forf ),bothdirectlybeingdefinedonX. AppendixSec- 1 1 2 2 tion10.2includesdetaileddescriptionsofpseudometric,pseudometricspaces,topologyandafiner topologyrelationshipbetweentwopseudometricspaces. Essentially, thetopologyinpseudometricspace(X,d(cid:48)))isafinertopologythanthetopologyin 1 pseudometricspace(X,d(cid:48))means: 2 ∀x,x(cid:48) ∈X,d(cid:48)(x,x(cid:48))<δ ⇒d(cid:48)(x,x(cid:48))<δ (3.6) 2 2 1 1 Becaused(cid:48)(x,x(cid:48))=d (g (x),g (x(cid:48)))andd(cid:48)(x,x(cid:48))=d (g (x),g (x(cid:48))),theaboveequationequals 1 1 1 1 2 2 2 2 to: ∀x,x(cid:48) ∈X, (3.7) d (g (x),g (x(cid:48)))<δ ⇒d (g (x),g (x(cid:48)))<δ 2 2 2 2 1 1 1 1 UsingEq.(3.7)andthecontinuitya.e. assumption,wecanderivethefollowingTheoremaboutthe sufficientandnecessaryconditionforf beingstrong-robust: 1 Theorem3.4. Whenf iscontinuousa.e.,f isstrong-robustagainstadversarialexamplesifand 1 1 onlyifthetopologyin(X,d(cid:48))isafinertopologythanthetopologyin(X,d(cid:48)). 1 2 Proof. SeeitsproofinappendixSection10.3.1. 7 Workshoptrack-ICLR2017 Table3: Summaryoftheoreticalconclusionsthatwecanderive. HereX1 = Rn1 andX2 = Rn2. Thestrong-robustnessisdeterminedbyfeatureextractionfunctiong . Theaccuracyisdeterminedby 1 boththeclassificationfunctionc andthefeatureextractionfunctiong . 1 1 Cases: d &d arenorms Canbeaccurate? Basedon Illustration 1 2 (cid:84) (I) X \(X X )(cid:54)=∅, NotStrong-robust maynotbeaccurate Theorem(3.4) Figure2 1 1 2 X (cid:54)⊂X 2 1 (II) n >n ,X (cid:40)X Notstrong-robust maybeaccurate Corollary(4.1) Figure2 1 2 2 1 (III) n =n ,X =X Strong-robust maybeaccurate Corollary(4.2) Figure4 1 2 1 2 (IV) n <n ,X ⊂X Strong-robust maynotbeaccurate Theorem(3.4) Figure5 1 2 1 2 Actuallytheabovetheoremcanguaranteethatwhenf iscontinuousa.e.: 1 ∀x,x(cid:48) ∈X,P(f (x)=f (x(cid:48))|d (g (x),g (x(cid:48)))<δ )=1 (3.8) 1 1 2 2 2 2 Eq.(3.8)clearlyisaspecial(stronger)caseofstrong-robustnessdefinedbyEq.(3.2). Whenf isnotcontinuousa.e.,weneedtoconsidertheprobabilityoftheboundarypointsbased 1 adversarial examples (Eq. (3.1)). For such a case, we get a sufficient condition 8 for the strong- robustness: Theorem 3.5. When f is not continuous a.e., if the topology in (X,d(cid:48))) is a finer topol- 1 1 ogy than the topology in (X,d(cid:48)) and P(f (x) (cid:54)= f (x(cid:48))|f (x) = f (x(cid:48)),d (g (x),g (x(cid:48))) < 2 1 1 2 2 1 1 1 δ ,d (g (x),g (x(cid:48)))<δ )<η,thenf isstrong-robustagainstadversarialexamples. 1 2 2 2 2 1 Whenf isnotcontinuousa.e.,itsstrong-robustnessissignificantlyinfluencedbyitsboundarypoints 1 andthereforerelatestothec function. Section11.2providessomediscussionandweomitcovering 1 suchcasesintherestofthispaper. 4 TOWARDS PRINCIPLED UNDERSTANDING Thefourtheoremsproposedaboveleadtoasetofkeyinsightsaboutwhyandhowanadversarialcan foolamachine-learningclassifierusingadversarialexamples. Oneofthemostvaluableinsightsis: featurelearningstepdecideswhetherapredictorisstrong-robustornotinanadversarialtestsetting. Allthediscussionsinthesubsectionassumef iscontinuousa.e.. 1 4.1 UNNECESSARYFEATURESRUINSTRONG-ROBUSTNESS Theorem(3.2)and Theorem(3.4)indicatethatwhenf iscontinuousa.e.,thetwofeaturespaces 1 (X ,d )and(X ,d )orthefunctionsg andg determinethestrong-robustnessoff . Basedon 1 1 2 2 1 2 1 Theorem(3.4),wecanderiveacorollaryasfollows(proofinSection10.3.1): Corollary4.1. Whenf1 iscontinuousa.e.,ifX1 = Rn1,X2 = Rn2,n1 > n2,X2 (cid:40) X1,d1,d2 arenormfunctions,thenf (·)isnotstrong-robustagainstadversarialexamples. 1 Thiscorollaryshowsifunnecessaryfeatures(withregardstoX )areselectedinthefeatureselection 2 step,thennomatterhowaccuratethemodelistrained,itisnotstrong-robusttoadversarialexamples. Figure2showsasituationthattheoracleforthecurrenttaskonlyneedstouseonefeaturetoclassify samplescorrectly. Amachinelearningclassifierextractstwofeatureswithoneusedbytheoracle andtheotherisanextraunnecessaryfeature9. InX ,f (actuallyc )successfullyclassifiesallthe 1 1 1 testinputs. However,it’sveryeasytofindadversaryexamplessatisfyingEq.(2.4)byonlyaddinga smallperturbationalongtheunnecessaryfeaturedimension. InFigure2,redcirclesshowafewsuch adversarialexamples. Theadversarialexamplesareveryclosetoseedsamplesintheoraclespace. Buttheyarepredictedintoadifferentclassbyf . 1 Formanysecuritysensitiveapplications,previousstudiesusingstate-of-artlearning-basedclassifiers normallybelievethataddingmorefeaturesisalwayshelpful. Apparently,ourcorollaryindicatesthat 8When f is not continuous a.e., it is difficult to find the necessary and sufficient condition for strong- 1 robustnessoff .Weleavethistofutureresearch. 1 9TwofeaturesofX actuallypositivelycorrelateinFigure2.However,theoracledoesnotneedtousethe 1 secondfeatureformakingclassificationdecision 8 Workshoptrack-ICLR2017 thisthinkingiswrongandcanleadtotheirclassifiersvulnerabletoadversarialexamples(Xuetal., 2016). Asanotherexample,multipleDNNstudiesaboutadversarialexamplesclaimthatadversarialexamples aretransferableamongdifferentDNNmodels. ThiscanbeexplainedbyFigure2(whenX isa 1 muchhigher-dimensionalspace). SincedifferentDNNmodelslearnover-completefeaturespaces {X },thereisahighchancethatthesedifferentX involveasimilarsetofunnecessaryfeatures 1 1 (e.g.,thedifferentlearnedfeaturesarecorrelatedwithothers). Thereforetheadversarialexamplesare generatedalongsimilargradientdirections. ThatiswhymanysuchsamplescanevademultipleDNN models. 4.2 FEATURESPACEMOREIMPORTANTTHANNORM Using Theorem(3.3),weobtainanothercorollaryasfollows(proofinSection10.3.1): Corollary4.2. Whenf iscontinuousa.e.,ifd andd arenormsandX =X =Rn,thenf (·) 1 1 2 1 2 1 isstrong-robusttoadversarialexamples. Thiscorollaryshowsthatifalearnedclassifieranditsoraclesharethesamederivedfeaturespace (X =X ),thelearnedclassifierisstrong-robustwhentwometricsarebothnormfunctions(evenif 1 2 notthesamenorm). Wecancallthiscorollaryas“normdoesn’tmatter”. ManyinterestingphenomenacanbeansweredbyCorollary(4.2).Forinstance,foranormregularized classifier,thiscorollaryanswersanimportantquestionthatwhetheradifferentnormfunctionwill influence its robustness against adversarial examples. The corollary indicates that changing to a differentnormfunctionmaynotimprovetherobustnessofthemodelunderadversarialperturbation. SummarizingTheorem(3.2),Theorem(3.4),Corollary(4.2)andCorollary(4.1),therobustnessofa learnedclassifierisdecidedbytwofactors: (1)thedifferencebetweentwoderivedfeaturespaces; and(2)thedifferencebetweenthemetricfunctions. Twocorollariesshowthatthedifferencebetween thefeaturespacesismoreimportantthanthedifferencebetweenthetwometricfunctions. 4.3 ROBUSTNESSANDGENERALIZATION In Table 3, we provide four situations in which the proposed theorems can be used to determine whetheraclassifierf isstrong-robustagainstadversarialexamplesornot. 1 • Case(I):Iff usessomeunnecessaryfeatures,itwillnotbestrong-robusttoadversarialexamples. 1 Itmaynotbeanaccuratepredictoriff missessomenecessaryfeaturesusedbyf . 1 2 • Case(II):Iff usessomeunnecessaryfeatures,itwillnotbestrong-robusttoadversarialexamples. 1 Itmaybeanaccuratepredictoriff usesallthefeaturesusedbyf . 1 2 • Case(III):Iff andf usethesamesetoffeaturesandnothingelse,f isstrong-robustandmay 1 2 1 beaccurate. • Case(IV):Iff missessomenecessaryfeaturesanddoesnotextractunnecessaryfeatures,f is 1 1 strong-robust(eventoughitsaccuracymaynotbegood). Table3providesamuchbetterunderstandingoftherelationshipbetweenrobustnessandaccuracy. TwointerestingcasesfromTable3areworthtoemphasizeagain: (1)Iff missesfeaturesusedby 1 f anddoesnotincludeunnecessaryfeatures(accordingtoX ),f isstrong-robust(eventhoughit 2 2 1 maynotbeaccurate). (2)Iff extractssomeextraunnecessaryfeatures,itwillnotbestrong-robust 1 (thoughitmaybeaveryaccuratepredictor). Wewanttoemphasizethat“f isstrong-robust”doesnotmeanitisagoodclassifier. Forexample,a 1 trivialexampleforstrong-robustmodelsisf (x)≡1,∀x∈X. However,itisauselessmodelsince 1 itdoesn’thaveanypredictionpower. Inanadversarialsetting,weshouldaimtogetaclassifierthatis bothstrong-robustandprecise. Abetterfeaturelearningfunctiong isexactlythesolutionthatmay 1 achievebothgoals. Table3indicatesthatc andc donotinfluencethestrong-robustnessoff whenf iscontinuous 1 2 1 1 a.e. 10. Figure4andFigure5furthershowtwoconcreteexamplecasesinwhichf isstrong-robust 1 accordingtof . However,inbothfigures,f isnotaccurateaccordingtof . 2 1 2 10Whenf isnotcontinuousa.e.,c mattersforthestrong-robustness.SeeSection11fordetails. 1 1 9 Workshoptrack-ICLR2017 Table4: ConnectingtorelevantDNNhardeningsolutions. Theexperimentalresultsofcomparing differenthardeningsolutionsareshowninFigure9,Figure10,Table10andTable11. x(cid:48) LossL (x,x(cid:48)) OnLayer f1 Stabilitytraining(Zheng randomperturbation KL(f (x),f (x(cid:48))) Classificationlayer 1 1 etal.,2016) (Miyatoetal.,2016) adversarialperturbation KL(f (x),f (x(cid:48))) Classificationlayer 1 1 Adversarial train- adversarialperturbation L(f (x(cid:48)),f (x)) Lossfunction 1 2 ing(Goodfellow et al., 2014) Large Adversarial train- adversarialperturbation L(f (x(cid:48)),f (x)) Lossfunction 1 2 ing(Kurakinetal.,2016) (Leeetal.,2015) adversarialperturbation (cid:107)g (x)−g (x(cid:48))(cid:107) Layer before classification 1 1 2 layer SiameseTraining randomperturbation (cid:107)g (x)−g (x(cid:48))(cid:107) Layer before classification 1 1 2 layer 5 TOWARDS PRINCIPLED SOLUTIONS FOR DNNS Ourtheoreticalanalysisuncoversfundamentalpropertiestoexplaintheadversarialexamples. Inthis section,weapplythemtoanalyzeDNNclassifiers. Morespecifically,(1)wefindthatDNNsarenot strong-robustagainstadversarialexamples;and(ii)weconnecttopossiblehardeningsolutionsand introduceprincipledunderstandingofthesesolutions. 5.1 ARESTATE-OF-THE-ARTDNNSSTRONG-ROBUST? ForDNN,itisdifficulttoderiveapreciseanalyticformofd (ord(cid:48)). Butwecanobservesome 1 1 propertiesofd throughexperimentalresults. Table5,Table6,Table7andTable8showpropertiesof 1 d (andd(cid:48))resultingfromperformingtestingexperimentsonfourstate-of-artDNNnetworks(details 1 1 in Section12.1). AllfourtablesindicatethattheaccuracyofDNNmodelsintheadversarialsetting arequitebad. Theperformanceonrandomlyperturbedinputsismuchbetterthanperformanceon maliciouslyperturbedadversarialexamples. ThephenomenonweobservedcanbeexplainedbyFigure3. Comparingthesecondcolumnand the third column in four tables we can conclude that d (and d(cid:48)) in a random direction is larger 1 1 than d (and d(cid:48))in theadversarial direction. Thisindicates thataround spherein (X ,d ) (and 1 1 1 1 (X,d(cid:48)))correspondstoaverythinhigh-dimensionalellipsoidin(X,||·||)(illustratedbythelefthalf 1 ofFigure3). Figure3(I)showsaspherein(X,d(cid:48))andFigure3(III)showsaspherein(X ,d ). 1 1 1 Theycorrespondtotheverythinhigh-dimensionalellipsoidin(X,||·||)inFigure3(V).Thenorm function||·||isdefinedinspaceX andisapplication-dependent. Allfourtablesuses||·||=||·|| . ∞ Differently,forhumanoracles,aspherein(X,d(cid:48))(showninFigure3(II))orin(X ,d )(shown 2 2 2 inFigure3(IV))correspondstoanellipsoidin(X,||·||)notincludingvery-thindirections(shown inFigure3(VI)).Whentheattackerstrytominimizetheperturbationsizeusingtheapproximated distancefunctiond =||·||,thethindirectionofellipsoidinFigure3(V)isexactlytheadversarial 2 direction. 5.2 TOWARDSPRINCIPLEDSOLUTIONS OurtheoremssuggestalistofpossiblesolutionsthatmayimprovetherobustnessofDNNclassifiers againstadversarialsamples. Optionsincludesuchas: Bylearningabetterg :MethodslikeDNNsdirectlylearnthefeatureextractionfunctiong .Table4 1 1 summarizesmultiplehardeningsolutions(Zhengetal.,2016;Miyatoetal.,2016;Leeetal.,2015) intheDNNliterature. Theymostlyaimtolearnabetterg byminimizingdifferentlossfunctions 1 L (x,x(cid:48))sothatwhend (g (x),g (x(cid:48)))<(cid:15)(approximatedby(X,||·||)),thislossL (x,x(cid:48))is f1 2 2 2 f1 small. Two major variations exist among related methods: the choice of L (x,x(cid:48)) and the way f1 togeneratepairsof(x,x(cid:48)). Forinstance,toreachthestrong-robustnesswecanforcetolearnag 1 thathelps(X,d(cid:48))tobeafinertopologythan(X ,d(cid:48)). Section12.4exploresthisoption(“Siamese 1 2 2 training”inTable4)throughSiamesearchitecture. ExperimentallySection12.5comparesadversarial training,stabilitytrainingandSiamesetrainingontwostate-of-the-artDNNimage-classification 10

Description:

for analyzing learning-based classifiers, especially deep neural networks (DNN) in . interaction between the classifier and the adversary with more obvious security implications about PDF malware (Xu et al., 2016), x in Eq. (2.1) is In Structural, Syntactic, and Statistical Pattern Recognition, pp

(deep) classifiers against adversarial sam PDF

38 Pages·2017·3.57 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview (deep) classifiers against adversarial sam

Description:

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.