7. Generative Adversarial Networks generativeadversarialnetworks 7-1 Adversarial examples Consider a case where an adversary knows some combination of (cid:73) thetrainingdata (cid:73) thetrainedmodeweights (cid:73) thetrainedmodelasablackbox the goal of an adversary is to make the classifier fail (sometimes with emphasis on particular classes or examples) Timeline: (cid:73) "AdversarialClassification"Dalvietal2004: foolspamfilter (cid:73) "EvasionAttacksAgainstMachineLearningatTestTime"Biggio 2013: foolneuralnets (cid:73) Szegedyetal2013: foolImageNetclassifiersimperceptibly (cid:73) Goodfellowetal2014: cheap,closedformattack generativeadversarialnetworks 7-2 Adversarial testing examples Recall the back-propagation algorithm, which can be used to perform gradient descent over the input example, which itself is hard to interpret generativeadversarialnetworks 7-3 Adversarial testing examples consider an experiment where we do gradient ascent on the cross-entropy loss to minimize the probability that it is correctly classified alternatively, we could also to gradient descent on a particular target class concretely, perturb the image slightly by taking the sign of the gradient with a small scaling constant generativeadversarialnetworks 7-4 Adversarial testing examples Inanotherexperiment,youcanstartwitharandomnoiseandtake one gradient step this often produces a confident classification the images outlined by yellow are classified as "airplane" with >50% confidence generativeadversarialnetworks 7-5 Adversarial testing examples In another experiment, you can have targeted adversarial examples, to misclassify examples to a specific target class theadversarial examplesare misclassifiedas ostriches, andin the middle we show the perturbation times ten. generativeadversarialnetworks 7-6 Adversarial testing examples consider a variational autoencoder for images one can create adversarial images that is reconstructed (after compression) as an entirely different image generativeadversarialnetworks 7-7 Adversarial testing examples First reported in ["Intriguing properties of neural networks", 2013, by Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus] Led to serious concerns for security as, for example, (cid:73) onecancreateroadsignsthatfoolsaself-drivingcartoactina certainway this is serious as (cid:73) thereisnoreliabledefenseagainstadversarialexamples (cid:73) adversarialexamplestransfertodifferentnetworks,trainedon disjointsubsetoftrainingdata (cid:73) youdonotneedtheaccesstothemodelparameters;youcantrain yourownmodelandcreateadversarialexamples (cid:73) youonlyneedablack-boxaccessviaAPIs(MetaMind,Amazon, Google) generativeadversarialnetworks 7-8 Adversarial testing examples ["Practical Black-Box Attacks against Machine Learning", 2016, Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, Ananthram Swami] no access to the actual classifier, only treat as a black-box generativeadversarialnetworks 7-9 Adversarial testing examples You can fool a classifier by taking picture of a print-out. ["Adversarial examples in the physical world", 2016, Alexey Kurakin, Ian Goodfellow, Samy Bengio] one can potentially print over a stop sign to fool a self-driving car generativeadversarialnetworks 7-10
Description: