ANTIDOTE: Understanding and Defending against Poisoning of Anomaly Detectors Benjamin I. P. Rubinstein1 Blaine Nelson1 Ling Huang2 Anthony D. Joseph1,2 Shing-hon Lau1 Satish Rao1 Nina Taft2 J. D. Tygar1 1ComputerScienceDivision,UniversityofCalifornia,Berkeley 2IntelLabsBerkeley ABSTRACT Keywords Statistical machine learning techniques have recently gar- Network Traffic Analysis, Principal Components Analysis, nered increased popularity as a means to improve network Adversarial Learning, Robust Statistics design and security. For intrusion detection, such methods build a model for normal behavior from training data and 1. INTRODUCTION detect attacks as deviations from that model. This process Statisticalmachinelearning(SML)techniquesareincreas- invites adversaries to manipulate the training data so that ingly being used as tools for analyzing and improving net- thelearned model fails to detect subsequentattacks. work design and performance. They have been applied to We evaluate poisoning techniques and develop a defense, a variety of problems such as enterprise network fault di- inthecontextofaparticularanomalydetector—namelythe agnosis [1, 5, 14], email spam filtering [24, 27], worm de- PCA-subspacemethodfor detectinganomalies in backbone tection [25], and intrusion detection [16, 30, 33], as well as networks. For three poisoning schemes, we show how at- many others. These solutions draw upon a variety of tech- tackers can substantially increase their chance of success- niques from the SML domain including Singular Value De- fully evading detection by only adding moderate amounts composition, clustering, Bayesian inference, spectral anal- of poisoned data. Moreover such poisoning throws off the ysis, maximum-margin classification, etc. In many scenar- balance between false positives and false negatives thereby ios, these approaches have been demonstrated to perform dramatically reducingtheefficacy of thedetector. well. Many of these SML techniques include a learning Tocombatthesepoisoningactivities, weproposeananti- phaseduringwhich a modelis trained usingcollected data. dotebasedontechniquesfromrobuststatisticsandpresenta Such techniques have a serious vulnerability, namely they new robust PCA-based detector. Poisoning has little effect are susceptible to adversaries who purposefully inject mali- on the robust model, whereas it significantly distorts the cious data during the phases of data-collection and model- model produced by the original PCA method. Our tech- building. The intent of such poisoning is to direct an SML niquesubstantiallyreducestheeffectivenessofpoisoningfor algorithm tolearn thewrongmodel;ifadversariesinfluence a variety of scenarios and indeed maintains a significantly detectorstolearnthewrongunderlyingmodelornormality, better balance between false positives and false negatives thensuchdetectorsareunabletoproperlyidentifyabnormal than theoriginal method when underattack. activities. Poisoning is particularly incentivized when SML techniquesare used as defenses against cybercrimethreats. Categories andSubject Descriptors Otherthanafewefforts [10,31,32],thistypeofvulnera- bilityhasnotbeenextensivelyexploredbythosewhoapply C.2.0[Computer-CommunicationNetworks]: General— SML techniques to networking and systems problems. Ap- Security and Protection; C.4 [Performance of Systems]: plied machine learning researchers have started to address ModelingTechniques;I.2.6[ArtificialIntelligence]: Learn- theseproblemsbyfocusingonadversarialtrainingofspecific ing; K.6.5 [Management of Computing and Informa- algorithms [2, 8, 22]. The learning theory community has tion Systems]: Security and Protection focused on online learning [4], where data is selected by an adversary with complete knowledge of the learner, and has developedefficientalgorithmswithstrongguarantees. How- General Terms ever,thesimplifyingassumption of all databeingproduced Measurement, Performance, Security by an omniscient adversary does not hold for many practi- cal threat models. Given the increasing popularity of ap- plying SML techniques to networking problems, we believe exploringadversariallearningwithrealisticthreatmodelsis Permission tomake digital orhardcopies ofall orpartofthis workfor important and timely. personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare In this paper we study both poisoning strategies and de- notmadeordistributedforprofitorcommercialadvantageandthatcopies fensesinthecontextofaparticularanomalydetector,namely bearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,to thePCA-subspacemethod[16], basedonPrincipalCompo- republish,topostonserversortoredistributetolists,requirespriorspecific nent Analysis (PCA). This technique has received a large permissionand/orafee. amount of attention, leading to extensions [15, 17, 18], and IMC’09,November4–6,2009,Chicago,Illinois,USA. Copyright2009ACM978-1-60558-770-7/09/11...$10.00. inspiring related research [3, 12, 20, 28, 33]. We consider an adversary who knows that an ISPis using a PCA-based a variety of poisoning situations, and to assess theirperfor- anomaly detector. The adversary’s aim is to evade future mance via multiplemetrics. To do this, we used traffic ma- detection by poisoning the training data so that the detec- trixdatafromtheAbilenenetworksincemanyotherstudies tor learns a distorted set of principal components. Because oftrafficmatrixestimationandanomalydetectionhaveused PCAsolelyfocusesonlinktrafficcovariance,weexplorepoi- this data. We show that the original PCA method can be soning schemes that add chaff (additional traffic) into the easily compromised by any of our poisoning schemes, with networktoincreasethevarianceofnetworktraffic. Theend onlysmallamountsofchaff. Formoderateamountsofchaff, goaloftheattackeristoincreasethefalsenegativerateofa the PCA detector starts to approach the performance of a detector, which corresponds to his evasion success rate. In random detector. However,antidote is dramatically more ourabstract[29],weillustratedthatsimplepoisoningstrate- robust. It outperforms PCA in that i) it more effectively gies can improve an adversary’s ability to evade detection. limitstheadversary’sabilitytoincreasehisevasionsuccess; Our first contribution in this paper is a detailed analysis of ii) it can reject a larger portion of contaminated training how adversaries subvert the learning process. We explore a data; and iii) it provides robust protection across nearly all range of poisoning strategies in which theattacker’s knowl- origin-destinationflowsthroughanetwork. Thegainsofan- edge about the network traffic state is varied, and in which tidotefortheseperformancemeasuresarelarge,especially the attacker’s time horizon (length of poisoning episode) is as the amount of poisoning increases. Most importantly, varied. (Weusethewords‘attackers’and‘adversaries’inter- wedemonstratethatantidoteincursinsignificantshiftsin changeably.) Through theoretical analysis of global poison- its false negative and false positive performance, compared ing tactics, we uncover some simple and effective poisoning to PCA, when no poisoning events happen; however when strategies for the adversary. In order to gain insights as to poisoning does occur, the gains of antidote over PCA are why these attacks work, we illustrate their impact on the enormous with respect to both of these traditional perfor- normal model built bythePCA detector. mance measures. The PCA method was not designed to BecausethenetworksthatSMLtechniquesareusedinare be robust. Our results indicate that it is possible to take non-stationary,thebaselinemodelsmustbeperiodicallyre- such useful techniquesand bolster their performance under trained to capture evolving trends in the underlying data. difficult circumstances. In previous usage scenarios [16, 30], the PCA detector is Ourstudyshedslightonthegeneralproblemofpoisoning retrained regularly (e.g., weekly), meaning that attackers SMLtechniques,intermsofthetypesofpoisoningschemes couldpoisonPCAslowlyoverlongperiodsoftime;thuspoi- thatcanbeconstrued,theirimpactondetection,andstrate- soning PCA in a more stealthy fashion. By perturbing the gies for defense. principal components gradually, the attacker decreases the RelatedWork. Severalearlierstudiesexaminedattacks chancethatthepoisoningactivityitself isdetected. Wede- onspecificlearningsystemsforrelatedapplications. In[26], sign suchapoisoningscheme,called aBoilingFrog scheme, the authors describe red herring attacks that weaken poly- anddemonstratethatitcanboostthefalsenegativerateas morphic worm detection systems by poisoning the training highasthenon-stealthystrategies,withfarlesschaff,albeit datausedtobuildsignature-basedclassifiers. Inredherring over a longer period of time. attacks, the adversary forces the learner to make false neg- Our second contribution is to design a robust defense ative predictions by including spurious features in positive against this type of poisoning. It is known that PCA can training examples. Subsequent malicious instances evade be strongly affected by outliers [28]. However, instead of detection by excludingthese features, now included as con- findingtheprincipalcomponentsalongdirectionsthatmax- junctsintheconjunctionlearnedbyPolygraph. Venkatara- imize variance, robust statistics suggests components that man et al. [31] present lower bounds for learning worm sig- maximize more robust measures of dispersion. It is well naturesbasedonredherringattacksandreductionstoclas- known that the median is a more robust measure of loca- sic results from Query Learning. While the red herring at- tion than the mean, in that it is far less sensitive to the tacks exploit the Polygraph conjunction learner’s tendency influence of outliers. This concept can be extended to ro- tooverfit,ourpoisoningattacksexploitPCA’ssingularfocus bust alternatives to variance such as the Median Absolute on link traffic covariance. Deviation (MAD). Over the past two decades a number of Attacks that increase false negative rates by manipulat- robustPCAalgorithmshavebeendevelopedthatmaximize ingthetestdatahavealsobeenexplored. Thepolymorphic MAD instead of variance. Recently the PCA-GRID algo- blending attacks of Fogla and Lee [10] encrypt malicious rithm was proposed as an efficient method for maximizing traffic so that the traffic is indistinguishable from innocu- MADwithoutunder-estimatingvariance(aflawidentifiedin ous traffic to an intrusion detection system. By contrast previous solutions) [6]. We adapt PCA-GRID for anomaly ourvarianceinjectionattacksaddsmallamountsofchaffto detectionbycombiningthemethodwithanewrobustcutoff largely innocuoustrainingtraffictomakethetraffic appear threshold. Insteadof modeling thesquared predictionerror morelikefutureDoSattackstobelaunchedpost-poisoning. asGaussian(asintheoriginalPCAmethod),wemodelthe Intheemailspamfilteringdomain,WittelandWu[32]and error using a Laplace distribution. The new threshold was Lowd and Meek [22] add good words—tokens the filter as- motivatedfromobservationsoftheresidualthatshowlonger sociates with non-spam messages—so spam messages can tails than exhibited by Gaussian distributions. We call our evadedetection. method that combines PCA-GRID with a Laplace cutoff Ringberget al. [28] performed a studyof thesensitivities threshold,antidote. Thekeyintuitionbehindthismethod of the PCA method that illustrates how the PCA method is to reduce the effect of outliers and help reject poisonous canbesensitivetothenumberofprincipalcomponentsused training data. to describe the normal subspace. This parameter can limit Our third contribution is to carry out extensive evalua- PCA’s effectiveness if not properly configured. They also tions of both antidote and the original PCA method, in showthatroutingoutagescanpollutethenormalsubspace; akindofperturbationtothesubspacethatisnotadversar- most variance can be captured by the first K =4 principal ial. Ourworkdiffersintwokeyways. Firstwedemonstrate components. adifferenttypeofsensitivity,namelythatofdatapoisoning. PCA is a dimensionality reduction method that chooses This adversarial perturbation can be stealthy and subtle, Korthogonalprincipalcomponents toformaK-dimensional andismorechallengingtocircumventthanobservablerout- subspacecapturingmaximalvarianceinthedata. LetY¯ be ing outages. Second,[28] focuses on showing thevariability thecenteredlinktrafficmatrix,i.e.,witheachcolumnofYis in PCA’s performance to certain sensitivities, and not on translatedtohavezeromean. Thekth principalcomponent defenses. In our work, we propose a robust defense against is computed as a malicious adversary and demonstrate its effectiveness. It k−1 isconceivablethatthetechniqueweproposecouldhelplimit v =argmax Y¯ I v v⊤ w . (1) PisCbAey’sonsedntshiteivsictoypteoorfotuhtiisnpgaopuetra.gAes,reacltehnotusgtuhdsyuc[3h]ashsotwudedy k w:kwk=1‚‚ −Xi=1 i i ! ‚‚ ‚ ‚ that the sensitivities observed in [28] come from PCA’s in- The resulting K-dimens‚ional subspace spanned‚ by the first ‚ ‚ ability to capture temporal correlations. They propose to K principalcomponentsV1:K =[v1,v2,...,vK]isthenor- replace PCA by a Karhunen-Loeve expansion. Our study mal traffic subspace n and has a projection matrix Pn = indicates that it would be important to examine, in future V1:KV1⊤:K. The residSual (N −K)-dimensional subspace is work, thedata poisoning robustness of this proposal. spanned bythe remaining principal componentsVK+1:N = [v ,v ,...,v ]. This space is the abnormal traffic K+1 K+2 N subspace with a corresponding projection matrix P = 2. BACKGROUND V VS⊤a =I P . a K+1:N K+1:N − n Touncoveranomalies,manynetworkanomographydetec- Volume anomalies can be detected by decomposing the tiontechniquesminethenetwork-widetrafficmatrix,which linktrafficintoy(t)=y (t)+y (t)wherey (t)isthemod- n a n describes the traffic volume between all pairs of Points-of- eled normal traffic and y (t) is the residual traffic, corre- a Presence(PoP)inabackbonenetworkandcontainsthecol- spondingtoprojectingy(t)onto and ,respectively. A n a S S lected traffic volume time series for each origin-destination volumeanomalyattimettypicallyresultsinalargechange (OD)flow. Inthissection,wedefinetrafficmatrices,present toy (t),whichcanbedetectedbythresholdingthesquared a our notation, and summarize the PCA anomaly detection prediction error y (t) 2 against Q , the Q-statistic at the a β k k method of Lakhinaet al. [16]. 1 β confidencelevel[13]. Thatis,thePCA-baseddetector − classifies a link measurement vectoras 2.1 Traffic Matrices and VolumeAnomalies anomalous, y (t) 2>Q Network link traffic represents the superposition of OD c(y(t)) = k a k β . (2) flows. Weconsideranetwork with N linksandF ODflows (innocuous, kya(t)k2≤Qβ and measure traffic on this network over T time intervals. Whileothershaveexploredmoreefficientdistributedvari- The relationship between link traffic and OD flow traffic is ations of this approach [12, 20, 21], we focus on the basic concisely captured in the routing matrix A. This matrix is method introduced byLakhinaet al. [16]. anN F matrixsuchthatA =1ifODflowj passesover ij × linki,andiszerootherwise. IfXistheT F trafficmatrix × 3. POISONINGSTRATEGIES (TM) containingthetime-seriesofallODflows,andifY is the T N link TM containing the time-series of all links, thenY×=XA⊤. Wedenotethetth rowofYasy(t)=Y 3.1 The Threat Model t,• (thevectorofN linktrafficmeasurementsattimet),andthe Theadversary’sgoalistolaunchaDenialofService(DoS) original traffic along a source link, S by yS(t). We denote attackonsomevictimandtohavetheattacktrafficsuccess- column f of routing matrix A by Af. fully cross an ISP’s network without being detected. The We consider the problem of detecting OD flow volume DoS traffic thus needs to traverse from an ingress point-of- anomalies across a top-tier network by observing link traf- presence (PoP) node to an egress PoP of the ISP. Before fic volumes. Anomalous flow volumes are unusual traffic launching a DoS attack, the attacker poisons the detector load levels in a network caused by anomalies such as De- for a period of time, by injecting additional traffic, chaff, nialofService(DoS)attacks,DistributedDoSattacks,flash along the OD flow (i.e., from an ingress PoP to an egress crowds, device failures, misconfigurations, and so on. DoS PoP)thatheeventuallyintendstoattack. Thiskindofpoi- attacksserveasthecanonicalexampleattackinthispaper. soningactivityispossibleiftheadversarygainscontrolover clients of an ingress PoP or if the adversary compromises 2.2 Subspace Method forAnomalyDetection a router (or set of routers) within the ingress PoP. For a We briefly summarize the PCA-based anomaly detector poisoning strategy, the attacker needs to decide how much introducedbyLakhinaetal.[16]. Theauthorsobservedhigh chaff to add, and when to do so. These choices are guided levelsoftrafficaggregationonISPbackbonelinkscauseOD by theamount of information available to theattacker. flow volume anomalies to often go unnoticed because they We consider poisoning strategies in which the attacker areburiedwithinnormaltrafficpatterns. Theyalsoobserve has increasing amounts of information at his disposal. The that although the measured data has high dimensionality, weakestattackerisonethatknowsnothingaboutthetraffic N,normaltrafficpatternslieinasubspaceoflowdimension attheingressPoP,andaddschaffrandomly(calledanunin- K N. Inferring this normal traffic subspace using PCA formedattack). Anintermediatecaseiswhentheattackeris ≪ (which findstheprincipaltraffic components) makesit eas- partiallyinformed. Heretheattackerknowsthecurrentvol- ier toidentifyvolumeanomalies intheremainingabnormal umeoftrafficontheingresslink(s)thatheintendstoinject subspace. For the Abilene (Internet2 backbone) network, chaffon. BecausemanynetworksexportSNMPrecords,an adversarymightinterceptthisinformation,orpossiblymon- 3.2 Uninformed ChaffSelection itor it himself (i.e., in the case of a compromised router). At each time t, the adversary decides whether or not to Wecallthistypeofpoisoningalocally-informed attack. Al- inject chaff according to a Bernoulli random variable. If he thoughexporteddatafromroutersmaybedelayedinreach- decides to inject chaff, the amount of chaff added is of size ing the adversary, we consider the case of minimal delay in θ, i.e., c = θ. This method is independent of the network t our first studyof thistopic. traffic since our attacker is uninformed. We call this the In a third scenario, the attacker is globally-informed be- Random scheme. causehisglobalviewoverthenetworkenableshimtoknow the traffic levels on all network links. Moreover, we assume 3.3 Locally-InformedChaff Selection thisattackerhasknowledgeoffuturetrafficlinklevels. (Re- Theattacker’sgoalistoincreasetrafficvariance,onwhich call that in the locally-informed scheme, the attacker only the PCA detector’s model is based. In thelocally-informed knows the current traffic volume of a link.) Although these scenario, the attacker knows the volume of traffic in the attackercapabilitiesareveryunlikely,weincludethisinour ingress link he controls, y (t). Hence this scheme elects to S studyinordertounderstandthelimitsofvarianceinjection onlyaddchaffwhentheexistingtrafficisalreadyreasonably poisoning schemes. Also this scenario serves as a difficult large. Inparticular,weaddchaffwhenthetrafficvolumeon test for our antidote technique. thelinkexceedsaparameterα(wetypicallyusethemean). Poisoning strategies can also vary according to the time The amount of chaff added is c = (max 0,y (t) α )θ. t S horizon over which they are carried out. Most studies on { − }} Inotherwords,wetakethedifferencebetweenthelinktraf- the PCA-subspace method use a one week training period, fic and a parameter α and raise it to θ. In this scheme so we assume that PCA is retrained each week. Thus the (called Add-More-If-Bigger), the further the traffic is from PCsusedinanyweekmarethoselearnedduringweekm 1 − theaverage load, thelarger thedeviation of chaff inserted. withanydetectedanomaliesremoved. Thusforourpoison- ingattacks,theadversaryinsertschaffalong thetargetOD 3.4 Globally-InformedChaff Selection flowthroughouttheoneweektrainingperiod. Wealsocon- Theglobally-informedschemecapturesanomnipotentad- sider a long-term attack in which the adversary slowly, but versary with full knowledge of Y, A, and the future mea- increasingly, poisons the principal components over several surementsy˜,and whois capableof injectingchaff intoany t weeks, by adding small amounts of chaff, in gradually in- networkflowduringtraining. Thislatterpointisimportant. creasing quantities. We call this the Boiling Frog poisoning Inpreviouspoisoningschemestheadversarycanonlyinject methodafterthefolktalethatonecanboilafrogbyslowly chaffalongtheircompromisedlink,whereasinthisscenario, increasing thewater temperatureover time1. the adversary can inject chaff on any link. We formalize Weassumetheadversarydoesnothavecontroloverexist- the problem of selecting a link n to poison, and selecting ingtraffic(i.e.,hecannotdelayordiscardtraffic). Similarly, an amount of chaff C as an optimization problem that tn the adversary cannot submit false SNMP reports to PCA. the adversary solves to maximally increase his chances of Such approaches are more conspicuous because the incon- evasion. Although these globally-informed capabilities are sistencies in SNMP reporting from neighboring PoPs could unrealistic, we include a globally-informed poisoning strat- expose thecompromised router. egy in order to understand the limits of variance injection This paper focuses on non-distributed poisoning of DoS methods. detectors. Distributed poisoning that aims to evade a DoS The PCA Evasion Problem considers an adversary wish- detector is also possible; our globally-informed poisoning ing to launch an undetected DoS attack of volume δ along strategy is an example, as the adversary has control over flowf attimet. Ifthevectoroflinkvolumesatfuturetime all network links. We focus on DoS for a two reasons. In tisy˜,wherethetildedistinguishesthisfuturemeasurement t ourfirststudyonthistopic,weaimtosolvethebasicprob- from past training data Y¯, then the vectors of anomalous lem first before tackling a distributed version. Second, we DoS volumes are given by y˜′ = y˜ +δ A . Denote by C point out that results on evasion via non-distributed poi- t t ∗ f the matrix of link traffic injected into the network by the soning are stronger than distributed poisoning results: the adversary during training. Then the PCA-based anomaly DDoS attacker can monitor and influence many more links detector is trained on altered link traffic matrix Y¯ +C to than the DoS attacker. Hence a DoS poisoning scenario is produce the mean traffic vector µ, the top K eigenvectors usually stealthier than a DDoS one. V , and the squared prediction error threshold Q . The 1:K β For each of these scenarios of different information avail- adversary’s objective is to enable as large a DoS attack as able to the adversary, we now outline specific poisoning possible (maximizing δ) bydesigning C. The PCA Evasion schemes. Ineachscheme,theadversarydecidesonthequan- Problem corresponds to solving thefollowing: tityofc chafftoaddtothetargetflowtimeseriesatatime t t. Each strategy has an attack parameter θ, which controls max δ δ∈R, C∈RT×F the intensity of the attack. For each scenario, we present onlyonespecificpoisoningscheme. Wehavestudiedothers, s.t. (µ,V,Qβ)=PCA(Y+C) but those includedhere are representative. V⊤ (y˜′ µ) Q K+1:N t− 2 ≤ β ‚‚C 1 θ ‚‚t,n Ctn 0 , k‚ k ≤ ∀‚ ≥ where θ is a constant constraining total chaff. The second 1Note that there is nothinginherent in thechoice of a one- constraintguaranteesevasionbyrequiringthatthecontam- inated link volumes at time t are classified innocuous (cf. week poisoning period. For a general SML algorithm, our strategies would correspond to poisoning over one training Eq. 2). The remaining constraints upper-bound the total period (whateverits length) or multiple training periods. chaffvolumebyθandconstrainthechafftobenon-negative. Unfortunately, this optimization is difficult to solve an- 4. ANTIDOTE: A ROBUSTDEFENSE alytically. Thus we construct a relaxed approximation to For defenses against our attacks on PCA-based anom- obtain a tractable analytic solution. We make a few as- aly detection we explore techniquesfrom Robust Statistics. sumptionsandderivations2,andshowthattheaboveobjec- Such methods are less sensitive to outliers, and as such are tive seeks to maximize the attack direction Af’s projected ideal defenses against variance injection schemes that per- length in the normal subspace maxC∈RT×F V1⊤:KAf 2. turb data to increase variance along the target flow. There Next, we restrict our focus to traffic processes that gener- have been two approaches to make PCA robust: the first ate spherical k-rank link traffic covariance mat‚‚rices3. T‚‚his computestheprincipalcomponentsastheeigenspectrumof propertyimpliesthattheeigen-spectrumconsistsofK ones a robust estimate of the covariance matrix [9], while the followed by all zeroes. Such an eigen-spectrum allows us secondapproachsearchesfordirectionsthatmaximizearo- to approximate the top eigenvectors V1:K in the objective, bust scale estimate of the data projection. We propose one with the matrix of all eigenvectors weighted by their corre- of the latter methods as a defense against our poisoning. sponding eigenvalues ΣV. We can thus convert the PCA After describing the method, we propose a new threshold evasion problem intothefollowing optimization: statisticthatcanbeusedforanyPCA-basedmethodinclud- ing robust PCA. Robust PCA and the new robust Laplace max (Y¯ +C)A (3) C∈RT×F f 2 thresholdtogetherformanewnetwork-widetrafficanomaly s.t. ‚C θ ‚ detection method, antidote, that is less sensitive to our ‚ 1 ‚ k k ≤ poisoning attacks. t,n C 0 . tn ∀ ≥ 4.1 Intuition Solutions to this optimization are obtained by a standard Projection Pursuit method from optimization: iteratively Fundamentally,tomitigatetheeffectofpoisoningattacks, take a step in the direction of the objective’s gradient and we need a learning algorithm that is stable in spite of data then project onto thefeasible set. contamination; i.e., a small amount of data contamination These solutions yield an interesting insight. Recall that should not dramatically changethe model produced by our our adversary is capable of injecting chaff along any flow. algorithm. Thisconceptofstability hasbeenstudiedinthe One could imagine that it might be useful to inject chaff fieldof RobustStatisticsin whichrobust is theformal term alonganODflowwhosetrafficdominatesthechoiceofprin- used to qualify this notion of stability. In particular, there cipalcomponents(i.e.,anelephantflow),andthensendthe havebeen several approaches todeveloping robust PCA al- DoS traffic along a different flow (that possibly shares a gorithms that construct a low dimensional subspace that subset of links with the poisoned OD flow). However the captures most of the data’s dispersion4 and are stable un- solutionsofEq.(3)indicatesthatthebeststrategytoevade derdata contamination [6, 7, 9, 19, 23]. detectionistoinjectchaffonlyalongthelinksA associated The robust PCA algorithms we considered search for a f withthetargetflowf. Thisfollowsfromtheformoftheini- unitdirectionvwhoseprojectionsmaximizesomeunivariate tializer C(0) Y¯A A⊤ (obtained from an L relaxation) dispersion measure S(); that is, ∝ f f 2 · as well as the form of the projection and gradient steps. In v argmax S(Ya) . (4) particular, all these objects preserve the property that the ∈ kak2=1 solutiononlyinjectschaffalongthetargetflow. Infact,the only difference between this globally-informed solution and The standard deviation is the dispersion measure used by 1/2 the locally-informed scheme is that the former uses infor- PCA;i.e.,SSD(r ,r ,...,r )= 1 n r r¯ . How- mation abouttheentiretrafficmatrixY todeterminechaff 1 2 n n−1 i=1 i− ever, the standard deviation is “sensitive to outlie”rs making allocation along the flow whereas the latter use only local P PCAnon-robusttocontamination. RobustPCAalgorithms information. instead use measures of dispersion based on the concept of 3.5 Boiling Frog Attacks robustprojectionpursuit(RPP)estimators[19]. Asisshown byLi&Chen,RPPestimatorsachievethesamebreakdown Boiling Frog poisoning can useany oftheprecedingchaff points as their dispersion measure (the breakdown point is schemes toselect ct. Theduration of poisoning isincreased the(asymptotic)fractionofthedataanadversarymustcon- asfollows. Weinitiallysettheattackparameterθtoasmall trolinordertoarbitrarilychangeanestimator,andassuch valueandthenincreaseitslowlyovertime. Inthefirstweek is a common measure of statistical robustness) as well as oftheattack,thetargetflowisinjectedwithchaffgenerated being qualitatively robust;i.e., theestimators are stable. usingparameterθ . Attheweek’send,PCAisretrainedon 1 However, unlike the eigenvector solutions that arise in that week’s data. Any anomalies detected by PCA during PCA, there is generally no efficiently computable solution thatweekareexcludedfromfuturetrainingdata. Thispro- forrobustdispersionmeasuresandsothesemustbeapprox- cess continueswith θt >θt−1 used for week t. Even though imated. Below, we describe the PCA-GRID algorithm, a PCA is retrained from scratch each week, thetraining data successfulmethodforapproximatingrobustPCAsubspaces includes events not caught by the previous detector. Thus, developed by Croux et al. [6]. Among the projection pur- eachsuccessiveweekwillcontainadditionalmalicioustrain- suit techniques we tried [7, 23], PCA-GRID proved to be ing data, with the process continuing until the week of the most resilient to our poisoning attacks. It is worth empha- DoS attack, when theadversary stopsinjecting chaff. sizing that the procedure described in the next section is 2Thefull proof is ommitted dueto space constraints. 4Dispersion is an alternative term for variation since the 3While the spherical assumption does not hold in practice, later is often associated with statistical variation. By a dis- the assumption of low-rank traffic matrices is met by pub- persionmeasurewemeanastatisticthatmeasuresthevari- lished datasets [16]. ability or spread of a variable. rithms. Initially,thedatawasclusteredinanellipse. Inthe Subspaces with no Poisoning top plot, we see that both algorithms construct reasonable estimates for the center and first principal component for 8 e+0 Initial PCA this data. However, in thebottom plot, we see that a large 1 Initial ANTIDOTE amountofpoisoningdramaticallyperturbssomeofthedata w o 7 andasaresultthePCAsubspaceisdramaticallyshiftedto- Fl +0 ward the target flow’s direction (y-axis). Due to this shift, et 8e DoS attacks along the target flow will be less detectable. g Meanwhile, the subspace of PCA-GRID is noticeably less Tar e+07 affected. o 6 nt 4.2 PCA-GRID n o e+07 ThePCA-GRIDalgorithmintroducedbyCrouxetal.[6] o 4 is a projection pursuit technique as described above. It ecit 07 finds a K-dimensional subspace that approximately maxi- oj e+ mizesS(),arobust measureofdispersion,forthedataYas Pr 2 in Eq.(4·). Thefirst step isto specify ourrobust dispersion 0 measure. WeusetheMedianAbsoluteDeviation(MAD)ro- 0 e+ bust measure of dispersion, over other possible choices for 0 S(). For scalars r ,...,r theMAD is definedas 5e+08 6e+08 7e+08 8e+08 9e+08 1e+09 · 1 n Projection on 1st PC SMAD(r1,...,rn) = ω median ri median rj , · {| − { }|} where the coefficient ω = 1.486 ensures asymptotic consis- tency on normal distributions. Subspaces with 35 % Poisoning Thenextstep requireschoosing anestimateof thedata’s 8 central location. In PCA, this estimate is simply the mean 0 e+ Initial PCA of the data. However, the mean is not robust, so we center w 1 Initial ANTIDOTE thedata usingthe spatial median instead: Poisoned PCA Flo +07 Poisoned ANTIDOTE n get 8e cˆ(Y) ∈ arµg∈RmNini=1kyi−µk2 , Tar e+07 whichinvolvesaconvexoptimizaXtionthatisefficientlysolved o 6 (see e.g., [11]). nt Given a dispersion measure andlocation estimate, PCA- n o e+07 GRID findsa (unit) direction v that is an approximateso- o 4 lution to Eq. (4). The PCA-GRID algorithm uses a grid- ecit 07 search for this task. Namely, suppose we want to find the oj e+ best candidate between some pair of unit vectors a1 and Pr 2 a2 (a 2D search space). The search space is the unit cir- 00 cle parameterized by φ as aφ = cos(φ)a1 +sin(φ)a2 with + φ [ π/2,π/2]. Thegridsearchsplitsthedomainofφinto e 0 ∈ − 5e+08 6e+08 7e+08 8e+08 9e+08 1e+09 a mesh of Q+1 candidatesφk = π2 2Qk −1 , k=0,...,Q. Projection on 1st PC Eachcandidatevectoraφk isassessed“andth”eonethatmax- imizes S(Ya ) is theapproximate maximizer aˆ. φk TosearchamoregeneralN-dimensionalspace,thesearch iterativelyrefinesitscurrentbestcandidateaˆbyperforming Figure 1: Herethedatahasbeenprojectedintothe2D a grid search between aˆ and each of the unit directions e . space spanned by the 1st principal component and the i Witheachiteration,therangeofanglesconsideredprogres- direction of the attack flow #118. The effect on the 1st sivelynarrows aroundaˆ tobetterexploreitsneighborhood. principal components of PCA and PCA-GRID is shown under a globally informed attack (represented by ◦’s). This procedure (outlined in Algorithm 1) approximates the directionofmaximaldispersionanalogoustoaneigenvector in PCA. To find the K-dimensional subspace v v⊤v = δ simply a technique for approximating a projection pursuit thatmaximizesthedispersionmeasure,t{heiG|ridi-Sjearchi,ji}s estimator and does not itself contribute to the algorithm’s repeatedK-times. Aftereachrepetition,thedataisdeflated robustness—thatrobustnesscomesfromthedefinitionofthe toremovethedispersioncapturedbythelastdirectionfrom projection pursuit estimator in Eq. (4). thedata. This process is detailed in Algorithm 2. First,tobetterunderstandtheefficacyofarobustPCAal- 4.3 Robust LaplaceThreshold gorithm,wedemonstratetheeffectourpoisoningtechniques have on thePCA algorithm and contrast them with the ef- Inaddition totherobust PCA-GRID algorithm, wealso fect on the PCA-GRID algorithm. In Figure 1, we see the usearobustestimateforitsresidualthresholdinplaceofthe impactofagloballyinformedpoisoningattackonbothalgo- Q-statisticdescribedinSection2.2. UsingtheQ-statisticas Algorithm 1 Grid-Search(Y) Histogram of PCA Residuals Require: Y is a T N matrix × Qstat 1: Let: vˆ =e1; 00 2: for i=1 toC do 2 Laplace 3: for j =1 toN do 0 4: for k=0 to Q do cy 15 n e 5: Let: φk = 2πi 2Qk −1 ; Frequ 100 6: Let: aφk =co“s(φk)aˆ+”sin(φk)ej; 7: if S(Yaφk) > S(Yvˆ) then 50 8: Assign: vˆ a ; 9: Return: vˆ; ← φk 0 0e+00 2e+08 4e+08 6e+08 8e+08 Residual Size Algorithm 2 PCA-GRID(Y,K) Histogram of PCA−GRID Residuals 1: Center Y: Y Y cˆ(Y); 2: for i=1 toK←do − Qstat 0 3: vi Grid-Search(Y); 20 4: Y ← projection of Y ontothecomplement of v ; Laplace 5: endf←or i cy 150 n 6: Returnsubspacecentered at cˆ(Y) with principaldirec- e tions {vi}Ki=1; Frequ 100 0 5 athresholdwasmotivatedbyanassumptionofnormallydis- 0 tributedresiduals[13]. However,wefoundthattheresiduals 0e+00 2e+08 4e+08 6e+08 8e+08 for both the PCA and PCA-GRID subspaces were empiri- Residual Size callynon-normalleadingustoconcludethattheQ-statistic is a poor choice for our detection threshold. Instead, to ac- Figure 2: Histograms of the residuals for the original countfortheoutliersandheavy-tailedbehaviorweobserved PCAalgorithm(left)andthePCA-GRIDalgorithm(the fromourmethod’sresiduals,wechooseourthresholdasthe largestresidual is excluded asan outlier). Red andblue 1 β quantile of a Laplace distribution fit with robust lo- ca−tionandscaleparameters. Oursolution,antidoteisthe verticallinesdemarcatethethresholdselectedusing the Q-statisticand the Laplace threshold, respectively. combination of the PCA-GRID algorithm and the Laplace threshold. Thenon-normalityoftheresidualshasalsobeen recently pointed out in [3]. Empirically,theLaplacethresholdalsoprovedtobebetter AswiththepreviousmethoddescribedinSection2.2,we suited forthresholdingtheresiduals of ourmodels thanthe select our threshold QL,β as the 1 β quantile of a para- Q-statistic. AscanbeseeninFigure2,boththeQ-statistic − metric distribution fit to the residuals in the training data. and the Laplace threshold produce a reasonable threshold However,insteadofthenormaldistributionassumedbythe ontheresidualsofthePCAalgorithm butonlytheLaplace Q-statistic, we use the quantiles of a Laplace distribution threshold produces a reasonable threshold for the residuals specifiedbyalocation parametercandascaleparameterb. ofthePCA-GRIDalgorithm;theQ-statisticvastlyunderes- Critically, though, instead of using the mean and standard timatesthespreadoftheresiduals. Aswasconsistentlyseen deviation,werobustlyfitthedistribution’sparameters. We throughout our experiments, the Laplace threshold proved estimate c and b from the residuals ya(t) 2 using robust to bea more reliable threshold than theQ-statistic. k k consistent estimates of location (median) and scale (MAD) cˆ=median y (t) 2 k a k 5. METHODOLOGY 1 ˆb= ` med´ian y (t) 2 cˆ √2P−1(0.75) k a k − 5.1 Traffic Data ˘˛ ˛¯ where P−1(q) is the qth quantile o˛f the standar˛d Laplace We use OD flow data collected from the Abilene (Inter- net2 backbone) network to simulate attacks on PCA-based distribution. The Laplace quantile function has the form anomaly detection. Data was collected over an almost con- P−1(q)=c+b k(q)forsomek(q). Thus,ourthresholdonly c,b · tinuous6monthperiodfromMarch1,2004throughSeptem- depends linearly on the (robust) estimates cˆand ˆb making ber 10, 2004 [33]. Each week of data consists of 2016 mea- the threshold itself robust. This form is also shared by the surements across all 144 network OD flows binned into 5 normal quantiles (differing only in the function k(q)), but minuteintervals. Atthetimeofcollection thenetworkcon- becausenon-robustestimatesforcandbareimplicitlyused sisted of 12 PoPs and 15 inter-PoP links. 54 virtual links by the Q-statistic, it is not robust. Further, by choosing are present in the data corresponding to two directions for a heavy-tailed distribution like the Laplace, the quantiles each inter-PoP link and an ingress and egress link for each are more appropriate for the heavy-tails we observed, but PoP. the robustness of our threshold comes from our parameter estimation. 5.2 Validation weekofpoisoningwithFNRscomputedduringthetestweek To evaluate the subspace method and antidote in the that includes144 2016 samples coming from thedifferent × face of poisoning and DoS attacks, we use two consecutive flows and time slots. Because the poisoning is determinis- weeksofdata—thefirstfortrainingandthesecondfortest- ticin Add-More-If-Bigger thisexperimentwasrunoncefor ing. The poisoning occurs throughout the training phase, thatscheme. Incontrast,fortheRandom poisoningscheme, while the attack occurs during the test week. An alter- weran20independentrepetitionsofpoisoningexperiments natemethod(describedlater)isneededfortheBoilingFrog data becausethe poisoning is random. scheme where training and poisoning occur over multiple ToproducetheROCcurves,weusethesquaredprediction weeks. Ourperformancemetricformeasuringthesuccessof errors produced by the detection methods, that consist of the poisoning strategies is through their impact on a PCA- anomalousandnormalexamplesfromthetestset. Byvary- based detector’s false negative rate (FNR).TheFNR isthe ingthemethod’sthresholdfrom to acurveofpossible −∞ ∞ ratio of thenumberof successful evasions tothetotalnum- (FPR,TPR) pairs is produced from the set of SPE’s; the berofattacks(i.e.,theattacker’ssuccessrateisPCA’sFNR Q-statistic and Laplace threshold, each correspond to one rate). WealsouseReceiverOperatingCharacteristic(ROC) such point in ROCspace. We adopt theArea UnderCurve curves to visualize a detection method’s trade-off between (AUC)statisticfromInformationRetrievaltodirectlycom- detection rate (TPR) and false positive rate (FPR). pareROCcurves. TheareaunderanROCcurveofdetector In order to compute the FNRs and FPRs, we generate estimates theconditional probability A synthetic anomalies according to the method of Lakhina et AUC( ) Pr(SPE (y )>SPE (y )) , al. [16] and inject them into the Abilene data. While there A ≈ A 1 A 2 are disadvantages to this method, such as the conservative given anomalous and normal random link volume vectors assumption that a single volume size is anomalous for all y and y . The ideal detector has an AUC of 1, while the 1 2 flows, we adopt it for the purposes of relative comparison random predictor achieves an AUCof 0.5. between PCA and Robust PCA, to measure relative effects ofpoisoning,andforconsistencywithpriorstudies. Weuse 5.3 SinglePeriod &Boiling Frog Poisoning week-long training sets, as such a time scale is sufficiently Weevaluatetheeffectivenessofourattackerstrategiesus- largetocaptureweekdayandweekendcyclictrends[28],and ing weeks 20 and 21 from the Abilene dataset to simulate previousstudiesoperatedonthissametimescale[16]. There the Single-Training Period attacks. The PCA algorithm is isnothinginherenttoourmethodthatlimitsitsusetothis trained on the week 20 traffic matrix poisoned by the at- time scale; our methods will work as long as the training tacker; we then inject attacks during week 21 to see how data is poisoned throughout. Because thedatais binnedin oftentheattackercanevadedetection. Weselectthesepar- 5 minute windows (corresponds to the reporting interval of ticular weeks because PCA achieved the lowest FNRs on SNMP),adecisionaboutwhetherornotanattackispresent these duringtesting. can be made at the end of each 5 minute window; thus To test the Boiling Frog attack we simulate traffic ma- attackscanbedetectedwithin5minutesoftheiroccurrence. trix data, inspired by methods used in [16]. Our simula- Starting with the flow traffic matrix X for the test week, tions present multiple weeks of stationary data to the ad- we generateapositiveexample(an anomalous ODflow) by versary. While such data is unrealistic in practice, it is an setting flow f’s volume at time t, Xt,f, to be a large value easy case on which PCA should succeed. Anomaly detec- known to correspond to an anomalous flow (replacing the tion under non-stationary conditions is difficult due to the original traffic volume in this time slot). This value is de- learner’s inability to distinguish between benign data drift, fined[16]tobe1.5timesacutoffof8 107. Aftermultiplying and adversarial poisoning. Demonstrated flaws of PCA in × by the routing matrix A, the link volume measurement at thestationarycaseconstitutestrongresults. Wedecidedto time t is anomalous. Werepeat thisprocess for each time t validatetheBoilingFrog attackonasynthesizedmulti-week (each 5 minute window) in the test week to generate a set dataset,becausethe6monthAbilenedatasetof[33]proved of 2016 anomaly samples for thesingle target flow f. to be too non-stationary for PCA to consistently operate In order to obtain FPRs, we generate negative examples well from one week to the next. It is unclear whether the (benignODflows)asfollows. WefitthedatatoanEWMA non-stationarityobservedinthisdataisprevalentingeneral model that is intended to capture the main trends of the or whetherit is an artifact of thedataset. datawithoutmuchnoise. Weusethismodeltoselectwhich Wesynthesizeamulti-weeksetofODflowtrafficmatrices, points in time, in an Abilene flow’s time series, to use as with stationarity on the inter-week level. We use a three negative examples. We compare the actual data and the stepgenerativeproceduretomodeleachODflowseparately. EWMAmodel,andifthedifferenceissmall(notintheflow’s FirsttheunderlyingdailycycleoftheODflowf timeseries toponepercentile)foraparticularflowataparticulartime, ismodeledbyasinusoidalapproximation. Thenthetimesat Xt,f, then we label the element Xt,f as“benign.” We do whichtheflowisexperiencingananomalyaremodeledbya thisacross allflows; whenwefindtimeslotswhereallflows Binomialarrivalprocesswithinter-arrivaltimesdistributed arelabeledasbenign,werunourdetectorsandseewhether according to the geometric distribution. Finally Gaussian or not theyraise an alarm for those time slots. white noise is added to the base sinusoidal model during WesimulateaDoSattackalongeveryflowateverytime. times of benign OD flow traffic; and exponential traffic is WeaverageFNRsoverall144possibleanomalousflowsand added to the base model during times of anomalous traffic. all 2016 anomaly times. When reporting the effect of an Wenextdescribetheprocessoffittingthisgenerativemodel attack on traffic volumes, we first average over links within to theweek 20 Abilenedata. each flow then over flows. Furthermore we generally re- In step 1, we capture the underlying cyclic trends via port averagevolumesrelativetothepre-attackaveragevol- Fourier basis functions. We use sinusoids of periods of 7, umes. Thusasinglepoisoningexperimentwasbasedonone 5 and 3 days, and 24, 12, 6, 3 and 1.5 hours, as well as a constant function [16]. For each OD flow, we find the Informed witha10%averageincreaseinthemeanlinkrates, Fouriercoefficientsfromtheflow’sprojectionontothisbasis. the unpoisoned FNR is raised by a factor of 10 to 38% and We next remove the portion of the traffic modeled by this eventuallytoover90%. Thebigdifferencebetweentheper- Fourier forecaster and model the remaining residual traf- formance of the locally-informed and globally-informed at- fic via two processes. One is a noise process modeled by tacker is intuitive to understand. Recall that the globally- a zero-mean Gaussian to capture short-term benign traffic informed attacker knows a great deal more (traffic on all variance. The second process models volume anomalies as links, and future traffic levels) than the locally-informed being exponentially distributed. one (who only knows the traffic status of a single ingress Instep2weselectwhichofthetwonoiseprocessesisused link). We consider the locally-informed adversary to have ateachtimeinterval. Aftercomputingourmodel’sresiduals succeeded quitewell with only a small view of thenetwork. (the difference between the observed and predicted traffic) An adversary is unlikely to be able to acquire, in practice, we note the smallest negative residual value m. We as- the capabilities used in the globally-informed poisoning at- − sume that residuals in the interval [ m,m] correspond to tack. Moreover,adding30% chaff,inordertoobtaina90% − benigntrafficandthatresidualsexceedingmcorrespond to evasion success is dangerous in that the poisoning activity trafficanomalies. Weseparatebenignvariationandanoma- itselfislikelytobedetected. Therefore Add-More-If-Bigger lies in this way since these effects behave quite differently. presentsanicetrade-off,fromtheadversary’spointofview, (This is an approximation but it works reasonably well for intermsofpoisoningeffectiveness,andattackercapabilities most OD flows.) Negative residual traffic reflects benign andrisks. WethereforeuseAdd-More-If-Bigger,thelocally- variance, and since we assume that benign residuals have a informed strategy, for many of theremaining experiments. zero-meandistribution,itfollowsthatsuchresidualsshould WeevaluatethePCAdetectionalgorithmonbothanoma- lie within the interval [ m,m]. Upon classifying residual lous and normal data, as described in Section 5.2, produc- − traffic as benign or anomalous we then model anomaly ar- ingtheReceiverOperatingCharacteristic(ROC)curvesdis- rival times as a Bernoulli arrival process. Underthismodel played in Fig. 4. We produce a ROC curve (as shown) by the inter-anomaly arrival times become geometrically dis- first training a PCA model on the unpoisoned data from tributed. Since we consider only spatial PCA methods, the week 20. We next evaluate the algorithm when trained on placement of anomalies is of secondary importance. data poisoned by Add-More-If-Bigger. For the final step, the parameters for the two residual To validate PCA-based detection on poisoned training traffic volume and the inter-anomaly arrival processes are data, we poison exactly one flow at a time as dictated by inferred from the residual traffic using the Maximum Like- the threat model. Thus, for relative chaff volumes ranging lihoodestimatesoftheGaussian’svarianceandexponential from 5% to50%,Add-More-If-Bigger chaffisaddedtoeach andgeometricratesrespectively. Positivegoodness-of-fitre- flow separately to construct 144 separate training sets and sults (Q-Qplots not shown) havebeen obtained for mouse, 144correspondingROCcurvesforthegivenlevelofpoison- medium and elephant flows. ing. The poisoned curves in Fig. 4 display the averages of Inoursimulations,weconstrainalllinkvolumestorespect theseROCcurves(i.e., theaverageTPR overthe144 flows thelinkcapacitiesintheAbilenenetwork: 10gbpsforallbut for each FPR). onelinkthatoperatesatonefourthofthisrate. Wecapchaff We see that the poisoning scheme can throw off the bal- that would cause traffic to exceed thelink capacities. ancebetween false positives andfalse negativesof thePCA detector: Thedetectionandfalse alarm ratesdroptogether 6. POISONINGEFFECTIVENESS rapidly as the level of chaff is increased. At 10% relative chaff volume performance degrades significantly from the 6.1 Single Training PeriodPoisoning ideal ROCcurve(lines from (0,0) to(0,1) to(1,1)) and at 20% thePCA’s mean ROCcurveis already close tothat of We evaluate the effectiveness of our three data poison- blindrandomizedprediction(they=xlinewith0.5AUC). ing schemes in Single-Training Period attacks. During the testing week, the attacker launches a DoS attack in each 5 6.2 Multi-Training Period Poisoning minute time window. The results of these attacks are dis- played in Fig. 3. Although our poisoning schemes focus on WenowevaluatetheeffectivenessoftheBoilingFrogstrat- addingvariance,themeanoftheODflowbeingpoisonedin- egy,thatcontaminatesthetrainingdataovermultipletrain- creases as well, increasing the means of all links over which ingperiods. InFig.5weplottheFNRsagainstthepoisoning the OD flow traverses. The x-axis in Fig. 3 indicates the duration for the PCA detector. We examine four different relative increase in the mean rate. We average over all ex- poisoning schedules with growth rates g as 1.01, 1.02, 1.05 periments (i.e., over all OD flows). and1.15respectively. Thegoalofthescheduleistoincrease As expected the increase in evasion success is smallest theattackedlinks’averagetrafficbyafactorofg fromweek for the uninformed strategy, intermediate for the locally- to week. The attack strength parameter θ (see Sec. 3) is informed scheme, and largest for theglobally-informed poi- chosen to achieve this goal. We see that the FNR dramati- soning scheme. A locally-informed attacker can use the cally increases for all four schedules as the poison duration Add-More-If-Bigger scheme to raise his evasion success to increases. Witha 15% growth rate theFNR is increased to 28% from thebaseline FNR of 3.67% viaa 10% average in- more than 70% from 3.67% over3 weeks of poisoning; even crease in the mean link rates due to chaff. Although 28% with a 5% growth rate the FNR is increased to 50% over 3 may not be viewed as a high likelihood of evasion, the at- weeks. Thus Boiling Frog attacks are effective even when tacker success rate is nearly 8 times larger than the unpoi- theamount of poisoned data increases ratherslowly. sonedPCAmodel’srate. Thisnumberrepresentsanaverage Recallthatthetwomethodsareretrainedeveryweekus- over attacks launched in each 5 minute window, so the at- ingthedatacollectedfromthepreviousweek. However,the tackercouldsimplyretrymultipletimes. WithourGlobally- data from the previous week has been filtered by the de- Single Poisoning Period: Evading PCA Single Poisoning Period: ROC Curves 0 0 1. 1. Uninformed Locally−informed Globally−informed ) 0.8 R) 0.8 R P N T (F e ( ess 0.6 Rat 0.6 c n on suc 0.4 etectio 0.4 PPCCAA −− 5u%np ochisaofnfed vasi S D PPCCAA −− 1200%% cchhaaffff E 0.2 Do 0.2 PCA − 50% chaff Random detector Q−statistic Laplace threshold 0 0 0. 0. 0% 10% 20% 30% 40% 50% 0.0 0.2 0.4 0.6 0.8 1.0 Mean chaff volume False Alarm Rate (FPR) Figure 3: Evasion success of PCA under Single-Training Figure 4: ROCcurvesofPCAunder Single-TrainingPe- Period poisoning attacks using 3 chaff methods. riod poisoning attacks. Boiling Frog Poisoning: Evading PCA Boiling Frog Poisoning: PCA Rejections 0 0 1. 1. Growth rates ) R N 1.01 est F 0.8 cted 0.8 11..0025 erage t 0.6 aff reje 0.6 1.15 v h a c s ( of ces 0.4 on 0.4 uc Growth rates orti sion s 0.2 11..0012 Prop 0.2 a 1.05 v E 1.15 0 0 0. 0. 0 5 10 15 20 5 10 15 20 Attack duration (weeks) Week Figure 5: Evasion success of PCA under Boiling Frog Figure 6: Chaff rejection rates of PCA under poisoning poisoning attacks. attacksshown in Fig. 5. tector itself. At any time point flagged as anomalous, the the slower schedules. We conclude that the Boiling Frog training data is thrown out. Fig. 6 shows the proportion of strategy with a moderate growth rate of 2–5% can signifi- chaff rejected each week by PCA (chaff rejection rate) for cantly poison PCA, dramatically increasing its FNR while the Boiling Frog strategy. The threeslower schedules enjoy still going unnoticed bythedetector. a relatively small constant rejection rate close to 5%. The By comparing Figs. 3 and 5, we observe that in order 15% schedule begins with a relatively high rejection rate, to raise the FNR to 50%, an increase in mean traffic of butafteramonthsufficientamountsofpoisonedtrafficmis- roughly18%fortheSingle-TrainingPeriod attackisneeded, train PCA after which point the rates drop to the level of whereas in the Boiling Frog attack the same thing can be
Description: