Table Of ContentDataMinKnowlDisc(2011)22:291–335
DOI10.1007/s10618-010-0197-3
Classifier evaluation and attribute selection against
active adversaries
MuratKantarcıog˘lu · BoweiXi · ChrisClifton
Received:16June2008/Accepted:17July2010/Publishedonline:12August2010
©TheAuthor(s)2010.ThisarticleispublishedwithopenaccessatSpringerlink.com
Abstract Manydataminingapplications,suchasspamfilteringandintrusiondetec-
tion, are faced with active adversaries. In all these applications, the future data sets
and the training data set are no longer from the same population, due to the trans-
formations employed by the adversaries. Hence a main assumption for the existing
classification techniques no longer holds and initially successful classifiers degrade
easily.Thisbecomesagamebetweentheadversaryandthedataminer:Theadversary
modifiesitsstrategytoavoidbeingdetectedbythecurrentclassifier;thedataminer
thenupdatesitsclassifierbasedonthenewthreats.Inthispaper,weinvestigatethe
possibilityofanequilibriuminthisseeminglyneverendinggame,whereneitherparty
hasanincentivetochange.Modifyingtheclassifiercausestoomanyfalsepositives
withtoolittleincreaseintruepositives;changesbytheadversarydecreasetheutilityof
thefalsenegativeitemsthatarenotdetected.Wedevelopagametheoreticframework
whereequilibriumbehaviorofadversarialclassificationapplicationscanbeanalyzed,
andprovidesolutionsforfindinganequilibriumpoint.Aclassifier’sequilibriumper-
formance indicates its eventual success or failure. The data miner could then select
attributesbasedontheirequilibriumperformance,andconstructaneffectiveclassifier.
Responsibleeditor:JohannesFürnkranz.
B
M.Kantarcıog˘lu( )
ComputerScienceDepartment,UniversityofTexasatDallas,Richardson,TX,USA
e-mail:muratk@utdallas.edu
B.Xi
DepartmentofStatistics,PurdueUniversity,WestLafayette,IN,USA
e-mail:xbw@stat.purdue.edu
C.Clifton
DepartmentofComputerScience,PurdueUniversity,WestLafayette,IN,USA
e-mail:clifton@cs.purdue.edu
123
292 M.Kantarcıog˘luetal.
A case study on online lending data demonstrates how to apply the proposed game
theoreticframeworktoarealapplication.
Keywords Adversarialclassification·Gametheory·Attributeselection·
Simulatedannealing
1 Introduction
Many data mining applications, both current and proposed, are faced with active
adversaries.Problemsrangefromtheannoyanceofspamtothedamageofcomputer
hackers to the destruction of terrorists.In all of these cases, statistical classification
techniquesplayanimportantroleindistinguishingthelegitimatefromthedestructive.
Therehasbeensignificantinvestmentintheuseoflearnedclassifierstoaddressthese
issues,fromcommercialspamfilterstoresearchprogramssuchasthoseonintrusion
detection (Lippmann et al. 2000). These problems pose a significant new challenge
notaddressedinpreviousresearch:Thebehaviorofaclasscontrolledbytheadversary
mayadapttoavoiddetection.Traditionallyaclassifierisconstructedfromatraining
dataset,andfuturedatasetscomefromthesamepopulationasthetrainingdataset.
Aclassifierconstructedbythedataminerinsuchastaticenvironmentcannotmaintain
itsoptimalperformanceforlong,whenfacedwithanactiveadversary.
One intuitive approach to fight the adversary is to let the classifier adapt to the
adversary’sactions,eithermanuallyorautomatically.Suchaclassifierwasproposed
inDalvietal.(2004).Theproblemisthatthisbecomesanever-endinggamebetween
theclassifierandtheadversary.Anotherapproachistominimizetheworstcaseerror
throughazero-sumgame(Lanckrietetal.2003;ElGhaouietal.2003).
Ourapproachisnottodevelopalearningstrategyfortheclassifiertostayaheadof
theadversary.Instead,weproposeanAdversarialClassificationStackelbergGame,a
two-playergame,tomodelthesequentialmovesoftheadversaryandthedataminer.
Each player follows their own interest in the proposed game theoretic framework:
Theadversarytriestomaximizeitsreturnfromthefalsenegativeitems(thosethatget
throughtheclassifier),andthedataminertriestominimizethemisclassificationcost.
Wethenpredicttheendstateofthegame—anequilibriumstate.Whenconsider-
ingthewholestrategyspaceofallthepossibletransformationsandthepenaltiesfor
transformation, an equilibrium state offers insight into the error rate to be expected
fromaclassifierinthelongrun.Equilibriuminformationalsooffersanalternativeto
theminimaxapproachwhichcouldbetoopessimisticinsomecases.
Weexamineunderwhichconditionsanequilibriumwouldexist,andprovideasto-
chasticsearchmethodandaheuristicmethodtoestimatetheclassifierperformance
andtheadversary’sbehavioratsuchanequilibriumpoint(e.g.,theplayers’equilib-
riumstrategies).Furthermore,foranygivensetofattributes,wecanobtainequilibrium
strategiesonthesubsetsofattributes.Suchinformationisusedtoselectthemosteffec-
tiveattributestobuildaclassifier.Whennoneofthesubset’sequilibriumperformance
issatisfactory,thedataminerwillhavetochangetherulesofthegame.Forexample,
considernewinputattributesorincreasethepenaltiesforexistingones.
123
Classifierevaluationandattributeselection 293
Predictingtheeventualequilibriumstatehastwoprimaryuses.First,thedataminer
candetermineiftheapproachusedwillhavelongtermvalue,beforemakingalarge
investmentintodeployingthesystem.Second,thiscanaidinattributeselection.While
itmayseemthatthebestsolutionissimplytouseallavailableattributes,thereisoften
acostassociatedwithobtainingtheattributes.Forexample,inspamfiltering“white
listing”goodaddressesand“blacklisting”badIPaddressesareeffective.Butbesides
blockingsomegoodtraffic,atrade-offintheclassifierlearningprocess,creatingand
maintainingsuchlistsdemandseffort.Ourapproachenablesthedataminertopredict
ifsuchexpensiveattributeswillbeeffectiveinthelongrun,orifthelong-termbenefit
does not justify the cost. In Sect. 7 we show experimentally that an attribute that is
effectiveatthebeginningmaynotbeeffectiveinthelong-term.
Ourframeworkisgeneralenoughtobeapplicableinavarietyofadversarialclas-
sification scenarios and can accommodate different classification techniques. Spam
filtering is one such application where classifier degradation is clearly visible and
where we can easily observe the actions taken by the adversary (spammer) and the
classifier(spamfilter).1
Anotherexampleisbotnetdetectionwhereseveraldetectionalgorithmshavebeen
proposed in the literature, each monitoring different sets of attributes (Stinson and
Mitchell 2008). The equilibrium performance of different defensive algorithms can
helpthedataminertodeterminetheireffectiveness.Furthermore,thedataminercan
applytheproposedframeworktoselectthemosteffectiveattributesfromeachdefen-
sive approach. In return, these attributes can be combined to build a more robust
defensivealgorithm.
Finally,weapplytheproposedframeworktomodelonlinelendingdatainSect.9.
Inthiscase,theadversary’sequilibriumtransformationcanhelpthelendertoidentify
high risk applications and determine how often additional information needs to be
verifiedtoincreasethepenaltyfortheadversary.
Thepaperisorganizedasfollows:InSect.2wepresentagametheoreticmodel.
In Sect. 3 we propose a stochastic search method to solve for an equilibrium. We
demonstrate that penalty costs can affect the equilibrium Bayes error in interesting
waysforGaussiandistributioninSect.4.Weexaminetheimpactofextremeclassifi-
cationrules(topassallortoblockallobjects)onanequilibrium,usingGaussianand
Bernoullirandomvariablesrespectively,inSect.5.InSect.6weprovideacomputa-
tionallyefficientheuristicsolutionforBayesianclassifier,whichallowsustohandle
highdimensionaldata.Section7presentsasimulationstudy,whereweevaluatethe
equilibrium performance of multiple combinations of Gaussian attributes, demon-
strating the effect of different combinations of distributions and penalties without
transformationandinequilibrium.Section8presentsanothersimulationstudywith
Bernoullirandomvariables.Section9presentsacasestudyofmodelingonlinelend-
ingdata.Weconcludewithadiscussionoffuturework.First,wediscussrelatedwork
below.
1 PleaserefertoPuandWebb(2006)foranextensivestudyofadversarialbehaviorevolutioninspam
filtering.
123
294 M.Kantarcıog˘luetal.
1.1 Relatedwork
Learninginthepresenceofanadaptiveadversaryisanissueinmanydifferentapplica-
tions.Problemsrangingfromintrusiondetection(MahoneyandChan2002)tofraud
detection(FawcettandProvost1997)needtobeabletocopewithadaptivemalicious
adversaries.AsdiscussedinDalvietal.(2004),thechallengescreatedbythemalicious
adversariesarequitedifferentfromthoseinconceptdrift(Hultenetal.2001),because
theconceptismaliciouslychangedbasedontheactionsoftheclassifier.Therehave
beenapplicationsofgametheorytospamfiltering.InAndroutsopoulosetal.(2005),
the spam filter and spam emails are considered fixed; the game is if the spammer
shouldsendlegitimateorspamemails,andtheuserdecidesifthespamfiltershould
betrustedornot.InLowdandMeek(2005a),theadversarytriestoreverseengineer
theclassifiertolearntheparameters.InDalvietal.(2004),theauthorsappliedgame
theorytoproduceaNaïveBayesclassifierthatcouldautomaticallyadapttotheadver-
sary’s expected actions. While recognizing the importance of an equilibrium state,
theysimplifiedthesituationbyassumingtheadversarybasesitsstrategyontheinitial
NaïveBayesclassifierratherthantheirproposedadaptivestrategy.
In Lowd and Meek (2005b), the authors studied the impact of attaching words
thatserveasstrongindicatorofregularuseremails.Thestudyshowsthatbyadding
30-150goodwords,spammercansignificantlyincreaseitssuccessrate.Inthispaper,
weproposeagame theoreticmodelwheresuchconclusions could bereached auto-
matically.Inaddition,robustclassificationtechniquesthatuseminimaxcriterionhave
been proposed in El Ghaoui et al. (2003) and Lanckriet et al. (2003). Compared to
thoseworks,weassumethattheadversarycanmodifytheentirebadclassdistribu-
tiontoavoidbeingdetected.Alsoourmodelallowsthedataminerandtheadversary
to have different utility functions. There has been other work on how to construct
robust classifiers for various tasks. For example, in Globerson and Roweis (2006),
theauthorsconstructedrobustclassifiersindomainssuchasdocumentclassification
bynotover-weightinganysingleattribute.Similarly,inTeoetal.(2008),theauthors
applieddomainspecificknowledgeofinvariancetransformationstoconstructarobust
supervisedlearningalgorithm.Comparedtothatwork,weincorporatethecostforthe
adversariesintoourmodelaswellasthecostfordataminer.
InonlinelearningsuchasCesa-Bianchi andLugosi(2006),astrategicgame has
beenusedtolearnaconceptinrealtimeormakeapredictionforthenearfutureby
seeinginstancesoneatatime.Tothebestofourknowledge,thoseworksdonotdeal
withsituationswhereanadversarychangesthedistributionoftheunderlyingconcept.
Overall, we take a very different approach from the existing work. By directly
investigatinganequilibriumstateofthegame,atwhichpointallpartiessticktotheir
currentstrategies,weaimatprovidingaguidelineforbuildingclassifiersthatcould
leadtothedataminer’seventualsuccessinthegame.
2 Agametheoreticmodel
Inthissectionwepresentagametheoreticmodelforadversarialclassificationappli-
cations.Beforewediscussthemodelindetails,weprovideamotivatingexamplethat
explainsthebasicaspectsofourmodel.
123
Classifierevaluationandattributeselection 295
2.1 Motivatingexample
Considerthecasewhereaclassifierisbuilttodetectwhetheracomputeriscompro-
mised or not by malware that sends spam e-mails. This malware detection system
couldbebuiltbasedonvarioussystemwideparameters.Forexample,lookingatthe
numberofe-mailssentbyacompromisedcomputercouldbeoneusefulmeasureto
detectsuchmalware.
Insuchascenario,asimpleclassifierthatraisesanalarmifthenumberofe-mails
sentbyacomputerexceedsathresholdvaluecouldbeinitiallyverysuccessful.Now
anattackerthatdesignsamalwarecanreducethenumberofspame-mailssenttoavoid
detection.Theattackercanpossiblyreducethespame-mailssentperdaytonearzero
and avoid being detected, but this will not be profitable for the attacker. Afterward,
dependingonthenumberofspame-mailssentperday,thedataminersetathreshold
value.Tokeepthenumberoffalsepositivesatareasonablelevel,thethresholdvalue
chosentobetheclassificationrulecannotbetoosmall.Insuchagametheattacker
andthedataminermayreachanequilibriumpointwherebothpartieshavenoincen-
tivetochangetheirstrategies.Theattackerwouldsetthespame-mailssentperday
to a number that is most profitable: Increasing the number causes the computer to
bedetected,andreducingthenumberislessprofitableandnotnecessary.Whenthe
attacker receives maximum payoff and does not change its strategy, the data miner
doesnotneedtore-setthethresholdvalue:Ifthethresholdusedbytheclassifierwere
furtherloweredtodetectthespammingmachine,toomanylegitimatemachineswould
bemisidentifiedassourcesofspam.Theimportantquestioniswhethertheclassifier
equilibriumperformanceissatisfactoryforthedataminer.Ifnot,thedataminerwould
needanewclassificationapproach,suchasincludingadditionalattributes(e.g.sys-
tem call sequences executed by the programs) to build a classifier and improve its
equilibriumperformance.
Below, we discuss how an example given above could be modeled in our frame-
worktounderstandtheequilibriumperformanceforagivenclassifierandthesetof
attributesusedforbuildingsuchaclassifier.
2.2 Adversarialclassificationstackelberggame
The adversarial classification scenario is formulated as a two class problem, where
class one (π ) is the “good” class and class two (π ) is the “bad” class. Assume q
g b
attributesaremeasuredfromanobjectcomingfromeitherclass.Wedenotethevector
ofattributesbyx=(x ,x ,...,x )(cid:2).Furthermore,weassumethattheattributesofan
1 2 q
objectxfollowdifferentdistributionsfordifferentclasses.Let f (x)betheprobability
i
densityfunctionofclassπ ,i =gorb.Theoverallpopulationisformedbycombin-
i
ingthetwoclasses.Let p denotetheproportionofclassπ intheoverallpopulation;
i i
p + p = 1. The distribution of the attributes x for the overall population can be
g b
consideredasamixtureofthetwodistributions,withthedensityfunctionwrittenas
f(x)= p f (x)+ p f (x).
g g b b
We assume that the adversary can control the distribution of the “bad” class π .
b
Inotherwords,theadversarycanmodifythedistributionbyapplyingatransformation
123
296 M.Kantarcıog˘luetal.
Ttotheattributesofanobjectxthatbelongstoπ .Hence f (x)istransformedinto
b b
fT(x). Each such transformation comes with a cost; the transformed object is less
b
likelytobenefittheadversary,althoughmorelikelytopasstheclassifier.Forexam-
ple, a spammer could send a legitimate journal call for papers; while this would be
hardtodetectasspam,itwouldnotresultinsalesofthespammer’sproduct.Whena
“bad”objectfromπ ismisclassifiedasa“good”objectintoπ ,itgeneratesprofitfor
b g
theadversary.Atransformedobjectfrom fT(x)generateslessprofitthantheoriginal
b
one.Inallofthesimulationstudies,weassumethatthevaluesof p and p arenot
g b
affectedbytransformation,meaningthatadversarytransformsthedistributionofπ ,
b
but in a short time period does not significantly increase or decrease the number of
“bad”objects.However,foraBayesianclassifier p and p arejustparametersthat
b g
define the classification regions. They can be transformed by the adversary and be
adjustedinBayesianclassifiertooptimizetheclassificationrulebydataminer.Here
we examine the case where a rational adversary and a rational data miner play the
followinggame:
1. Giventheinitialdistributionanddensity f(x),theadversarychoosesatransfor-
mationTfromthesetofallfeasibletransformationsS,thestrategyspace.
2. AfterobservingthetransformationT,dataminercreatesaclassificationruleh.
Consider the case where data miner wants to minimize its misclassification cost.
Given transformation T and the associated fT(x), the data miner responds with a
b
classificationruleh(x).Let L(h,i)betheregionwheretheobjectsareclassifiedas
π basedonh(x)fori =gorb.LettheexpectedcostofmisclassificationbeC(T,h),
i
whichisalwayspositive.Definethepayofffunctionofdatamineras
u (T,h)=−C(T,h).
g
Inordertomaximizeitspayoffu ,thedataminerneedstominimizethemisclassifi-
g
cationcostC(T,h).
Notethatadversaryonlyprofitsfromthe“bad”objectsthatareclassifiedas“good”.
Alsonotethattransformationmaychangetheadversary’sprofitofanobjectthatsuc-
cessfully passes detection. Define g(T,x) as the profit function for a “bad” object
x beingclassifiedasa“good”one,aftertransformationTbeingapplied.Definethe
adversary’spayofffunctionofatransformationTgivenh asthefollowing:
(cid:2)
u (T,h)= g(T,x)fT(x)dx.
b b
L(h,g)
Within the vast literature of game theory, the extensive game provides a suitable
framework for us to model the sequential structure of adversary and data miner’s
actions. Specifically, the Stackelberg game with two players suits our need. In a
Stackelberggame,oneofthetwoplayers(Leader)choosesanactiona firstandthe
b
secondplayer(Follower),afterobservingtheactionoftheleader,choosesanactiona .
g
Thegameendswithpayoffstoeachplayerbasedontheirutilityfunctionsandactions.
Inourmodel,weassumeallplayersactrationallythroughoutthegame.FortheStac-
kelberggame,thisimpliesthatthefollowerrespondswiththeactiona thatmaximizes
g
123
Classifierevaluationandattributeselection 297
u giventheactiona ofthefirstplayer.Theassumptionofactingrationallyatevery
g b
stageofthegameeliminatestheNashequilibriawithnon-crediblethreatsandcreates
anequilibriumcalledthesubgameperfectequilibrium.
We assume that each player has perfect information about the other. Here in this
context,“perfectinformation”meansthateachplayerknowstheotherplayer’sutility
function.Furthermore,followerobservesa beforechoosinganaction.Thisassump-
b
tionisnotunreasonablesincedataandotherinformationarepubliclyavailableinmany
applications,suchasspamfiltering.Theutilitiescanbeestimatedinsomeapplication
areas.ForthecasestudyinSect.9,thepenaltiesfortransformationcanbemeasuredby
theamountofmoneytheadversaryspendstoimproveitsprofile.Forbotnetdetection
thepenaltiesfortransformationcanbethereductionintheamountoftrafficgenerated
bytheadversary.Inaddition,differentpenaltiescanbeusedtorunwhat-ifanalysis.
For example, in loan applications, to prevent an adversary transforming the home
ownershipattribute(falselyclaimingtoownahome),wecanputverificationrequire-
mentsinplace.Suchverificationwillmakeitcostlytotransformthehomeownership
attribute in a loan application. Our model can be re-run with different penalties to
predicttheeffectofinstitutingsuchaverificationprocess.
AninterestingspecialcaseofStackelberggameisthezero-sumgame,whenthetwo
utility functions have the following relationship: u (T,h) = −u (T,h) (Basar and
b g
Olsder1999).Inthatcase,theStackelbergsolutionconceptforadversarialclassifica-
tioncorrespondstotheminimaxsolutionstudiedindepthbymanyauthors(Lanckriet
etal.2003;ElGhaouietal.2003).AgeneralStackelbergsolutionfortheadversarial
classificationgameautomaticallyhandlestheminimaxsolutionconcept.2
Thegametheoreticframeworkweproposeisdifferentthanthewellknownstrategic
games,suchasnon-cooperativestrategicgames.Inastrategicgame,eachplayerisnot
informedabouttheotherplayer’splanofaction.Playerstake“simultaneous”actions.
The famous Nash equilibrium concept (Osborne and Rubinstein 1999) captures the
steady state of such a game. In strategic games, a player cannot change its chosen
actionafteritlearnstheotherplayer’saction.Consequentlyifoneplayerchoosesthe
equilibriumstrategywhiletheotherdoesnot,theresultcanbebadforbothofthem.
Compared with strategic games with “simultaneous” actions, we choose the
Stackelberggametoemphasizethesequentialactionsofthetwoplayers.Weassume
that the data miner monitors a certain set of attributes through a classifier and the
adversaryisawareoftheexistenceoftheclassifierbeforethegamestarts.Thedata
miner empirically sets the parameters of the classifier in its initial state. This ini-
tial action does not need to be directly modeled by the proposed Stackelberg Game
framework.Whenthegamestarts,firsttheadversarytransformstheattributesbeing
monitoredbytheclassifiertoincreaseitspayoff.Inthesecondstep,afterobserving
thetransformationemployedbytheadversary,thedatamineroptimizestheparameter
valuesoftheclassifier.TheproposedStackelberggamemimicstheparametertuning
action,becausethedataminerenjoysthefreedomtoadjusttheclassifierparameters
afteritobservesasignificantchangeinthedata.Althoughtheadversaryistheleader
inthegame,thedataminerchoosesacertainsetofattributesandbuildsaclassifier
2 Ofcourse,moreefficientmethodsforminimaxsolutionconceptcouldbefoundusingsomeoftheexisting
techniquessuchastheonesgiveninLanckrietetal.(2003).Herewefocusonthegeneralframework.
123
298 M.Kantarcıog˘luetal.
beforethegamestarts.Thedataminersetsthetoneforthegameandthereforehasthe
advantageovertheadversary.
WedefinetheAdversarialClassificationStackelbergGameG =(N,H,P,u ,u ):
b g
N = {adversary,dataminer}.Setofsequences H = {∅,(T),(T,h)}s.t.T ∈ S and
h ∈ C, where S is the set of all admissible transformations for adversary, and C is
thesetofallpossibleclassificationrulesgivenacertaintypeofclassifier.Function P
assignsaplayertoeachsequencein H: P(∅) =adversaryand P((T)) =dataminer.
Equivalentlythereexistsacorrespondingfunction A thatassignsanactionspaceto
each sequence in H: A(∅) = S,A((T)) = C,and A((T,h)) = ∅.Payoff functions
u andu aredefinedasabove.
b g
In our framework we need the distribution of the “bad” class to understand the
transformationbeingemployedandtoassesspenaltyfortheadversary.Dependingon
thetypeoftheclassifierbeingused,theknowledgeofthedistributionsmayormay
notbenecessarytoobtaintheoptimalclassificationruleemployedbydataminer.The
classifiercanbere-trainedusingadatasetcontainingthetransformedbadinstances
thatarecollectedorsimulated.
However, we assume that data miner sticks to one type of classifier while the
adversarycanchoosefromallthetransformationsinthestrategyspace.Forexample,
ifdataminerchoosesaBayesianclassifier,itadjuststheBayesianclassifierwithnew
weights in order to defeat the adversary’s transformations. The data miner will not
use a Bayesian classifier facing certain transformations and a decision tree against
other transformations. This is a realistic assumption because of development costs:
adjustingparametersinamodel(orevenretrainingthemodel)ismuchlessexpensive
thanswitchingtoanewmodel.
WenoticethatthestrategyspaceofadversarytransformationsScanbequitecom-
plex.However,inrealitypeoplehavetodealwitheverytransformationemployedby
adversary.Thisiswelldocumentedandcanbeobservedfromdata.Insuchcases,a
formalmodelprovidesasystematicapproachtodealwithvarioustransformations.
Examiningtheclassifier’sperformanceatequilibrium,wheretheadversarymax-
imizes itsgain given the classifier being optimized todefeat itsaction, isasensible
choiceinaStackelberggame.Forafixedtransformation,itistruethattheproblem
reducestoaregularlearningproblem.Ontheotherhand,givenacertainsetofattri-
butesmonitoredthroughaclassifier,everypossibletransformationandtheoptimized
classificationruletodefeatthistransformationgeneratesacorrespondingerrorrate.
Themainissueistoknowwhichoneofthepotentialtransformationsaremorelikely
tobeadoptedinpracticeandwhattodoaboutitifsuchtransformationoccurs.One
approachistoselectasetofattributesthatminimizestheworstcaseerrorrateunderall
possibletransformations.ForsomeapplicationareassuchasonlinelendinginSect.9,
theworstcasescenarioisunlikelytohappenbecauseoftheheavypenaltiesfortrans-
formation. Equilibrium information offers an alternative to the minimax approach.
Whenconsideringthewholestrategyspaceofallthepossibletransformationsandthe
penaltiesfortransformation,anequilibriumstateoffersinsightintotheerrorrateto
beexpectedfromaclassifierinthelongrun.Assuggestedbyourpaper,suchinsights
areusedtoguideattributeselection.
Inthisgame,weassumethattheadversaryactsb(cid:3)yfirstap(cid:4)plyingatransformationT.
AfterobservingTbeingappliedtothe“bad”class fT(x) ,theoptimalclassification
b
123
Classifierevaluationandattributeselection 299
rulebecomesh (x).h (x)isthebestresponseofdataminerfacingatransformation
T T
T.LetL(h ,g)betheregionwheretheobjectsareclassifiedasπ givenh .Define
T g T
theadversarygainofapplyingtransformationTas:
(cid:2)
(cid:3) (cid:4)
W(T)=ub(T,hT)= g(T,x)fbT(x)dx= EfT I{L(hT,g)}(x)g(T,x) .
b
L(hT,g)
(2.1)
W(T)istheexpectedvalueoftheprofitgeneratedbythe“bad”objectsthatpassdetec-
tion under transformation T and the data miner’s optimal classification rule against
T. When both parties are rational players, both attempt to maximize their payoff.
Thereforewecanwriteasubgameperfectequilibriumas(Te,hTe),where
Te =argmaxT∈S (W(T)). (2.2)
Gametheory(OsborneandRubinstein1999)establishesthatthesolutionoftheabove
maximizationproblemisasubgameperfectequilibrium.Furthermoreifthestrategy
spaceSiscompactandW(T)iscontinuous,themaximizationproblemhasasolution.
Theaboveformulationcanaccommodateanywell-definedsetoftransformations
S,anyappropriatedistributionswithdensities f (x)and f (x),andanymeaningful
g b
profitfunctiong(T,x).
We solve the above equations by exploiting the structure of the game: To search
for an equilibrium in a Stackelberg game is equivalent to solving an optimization
problem. We present a general solution based on stochastic search in Sect. 3, and a
heuristicsolutionbasedonanapproximationoftheclassificationregionforminimal
costBayesianclassifierinSect.6,forhighdimensionaltasks.
3 Solvingfortheequilibrium
Tosearchforasubgameperfectequilibrium,theunderlyingproblemisconvertedto
anoptimizationproblemsimilartotheonedefinedbyEq.2.2(BasarandOlsder1999).
Althoughtherearecomputationalgametheorytoolstofindsubgameperfectequilibria
forfinitegames(McKelveyetal.2007),searchingforsubgameperfectequilibriaisa
hardproblemingeneral(BasarandOlsder1999).Therefore,optimizationtechniques
suchasgeneticalgorithms,havebeenappliedtosearchforsubgameperfectequilibria
(ValleeandBasar1999).Tothebestofourknowledge,noneoftheexistingcompu-
tationalgamealgorithmscanbeappliedtoourcaseduetothespecialstructureofthe
adversary gain W(T). Since the integration region L(h ,g) for the adversary gain
T
W(T)isafunctionoftransformationT,findingananalyticalsolutiontothemaximiza-
tionproblemischallenging.Inaddition,evencalculatingtheintegrationanalytically
foraspecifictransformationisnotpossibleforhighdimensionaldata.Wehavetoeval-
uate W(T) numerically. Because of such difficulties, we consider stochastic search
algorithmsforfindinganapproximatesolution.Atypicalstochasticsearchalgorithm
foroptimizationproblemsworksasfollows:Thealgorithmstartswitharandominitial
123
300 M.Kantarcıog˘luetal.
pointandthensearchesthesolutionspacebymovingtodifferentpointsbasedonsome
selectioncriterion.Thisprocessinvolvesevaluatingthetargetfunctionattheselected
pointsinthesolutionspace.Clearly,thisimpliesacomputationallyefficientmethod
for calculating W(T) is required, since the function will be evaluated at thousands
of transformations in S. Furthermore, a stochastic search algorithm with the ability
to converge to a global optimal solution is highly desirable. In the rest of this sec-
tion,MonteCarlointegrationmethodisintroducedtocomputeW(T)andsimulated
annealingalgorithmisimplementedtosolveforasubgameperfectequilibrium.
3.1 Montecarlointegration
MonteCarlointegrationtechniqueconvertsanintegratio(cid:5)nproblemtocomputingan
expected value.Assumethatwew(cid:5)ouldliketocalculate g(x)dx.Ifwecanfinda
probabilitydensityfunction f(x)( f(x)dx=1)thatiseasytosamplefrom,then
(cid:2) (cid:2) (cid:6) (cid:7)
g(x) g(x)
g(x)dx= × f(x)dx= E .
f(x) f f(x)
(cid:5)
g(x)dxisequaltotheexpectedvalueofg(x)/f(x)withrespecttothedensity f(x).
Theexpe(cid:8)cta(cid:9)tionofg(x)/f(x)isestimatedbycomputin(cid:10)gas(cid:3)am(cid:3)ple(cid:4)mea(cid:3)n.(cid:4)G(cid:4)enerate
m samples xi from f(x) and calculate μm = 1/m × m1 g xi /f(cid:5)xi . When
thesamplesizem islargeenough,μ providesanaccurateestimateof g(x)dx.
m
TheadversarygainW(T)canbewrittenas:
(cid:2)
(cid:3) (cid:4)
W(T)= IL(hT,g)(x)×g(T,x) fbT(x)dx.
Intheaboveformula,IL(hT,g)(x)isanindicatorfunction.Itreturns1ifatransformed
“bad”objectxisclassifiedintoπ ,elseitreturns0. fT(x)isnaturallyaprobability
g b
density function. Therefore W(T) could be calculated by sampling m points from
fT(x),andtakingtheaverageof g(T,x)ofthesamplepointsthatfallin L(h ,g).
b T
Thepseudo-codeforMonteCarlointegrationisgiveninAlgorithm3.1.
Algorithm3.1MonteCarloIntegration
{EvaluatingW(T)fo(cid:11)rag(cid:12)iventransformationT}
Generatemsamples xi from fT(x)
b
sum=0
fori=1tomdo
ifxi ∈L(hT,g)th(cid:13)en (cid:14)
sum=sum+g T,xi
endif
endfor
returnsum/m
123
Description:Problems ranging from intrusion detection (Mahoney and Chan 2002) to fraud detection (Fawcett could be built based on various system wide parameters. step enables the algorithm to escape local maxima (Duda et al. 2001).