ebook img

De-Anonymization, Classification and Analysis PDF

129 Pages·2017·1.59 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview De-Anonymization, Classification and Analysis

The Dark Net: De-Anonymization, Classification and Analysis Rebecca Portnoff Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2018-5 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-5.html March 7, 2018 Copyright © 2018, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. The Dark Net: De-Anonymization, Classification and Analysis by Rebecca Sorla Portnoff B.S. Princeton University A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Wagner, Chair Professor Vern Paxson Professor David Bamman Spring 2018 The dissertation of Rebecca Sorla Portnoff, titled The Dark Net: De-Anonymization, Classification and Analysis, is approved: Chair Date Date Date University of California, Berkeley The Dark Net: De-Anonymization, Classification and Analysis Copyright (cid:13)c 2018 by Rebecca Sorla Portnoff 1 Abstract The Dark Net: De-Anonymization, Classification and Analysis by Rebecca Sorla Portnoff Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Wagner, Chair The Internet facilitates interactions among human beings all over the world, with greater scope and ease than we could have ever imagined. However, it does this for both well-intentioned and malicious actors alike. This dissertation focuses on these malicious persons and the spaces online that they inhabit and use for profit and plea- sure. Specifically, we focus on three main domains of criminal activity on the clear web and the Dark Net: classified ads advertising trafficked humans for sexual services, cy- ber black-market forums, and Tor onion sites hosting forums dedicated to child sexual abuse material (CSAM). In the first domain, we develop tools and techniques that can be used separately and in conjunction to group Backpage sex ads by their true author (and not the claimed author in the ad). Sites for online classified ads selling sex are widely used by human traffickerstosupporttheirperniciousbusiness. Thesheerquantityofadsmakesmanual explorationandanalysisunscalable. Inaddition,discerningwhetheranadisadvertising a trafficked victim or an independent sex worker is a very difficult task. Very little concrete ground truth (i.e., ads definitively known to be posted by a trafficker) exists in this space. In the first chapter of this dissertation, we develop a machine learning classifierthatusesstylometrytodistinguishbetweenadspostedbythesamevs.different authors with 90% TPR and 1% FPR. We also design a linking technique that takes advantage of leakages from the Bitcoin mempool, blockchain and sex ad site, to link a subset of sex ads to Bitcoin public wallets and transactions. Finally, we demonstrate via a 4-week proof of concept using Backpage as the sex ad site, how an analyst can use these automated approaches to potentially find human traffickers. In the second domain, we develop machine learning tools to classify and extract information from cyber black-market forums. Underground forums are widely used by criminals to buy and sell a host of stolen items, datasets, resources, and crimi- nal services. These forums contain important resources for understanding cybercrime. However, the number of forums, their size, and the domain expertise required to under- stand the markets makes manual exploration of these forums unscalable. In the second 2 chapter of this dissertation, we propose an automated, top-down approach for analyz- ing underground forums. Our approach uses natural language processing and machine learning to automatically generate high-level information about underground forums, first identifying posts related to transactions, and then extracting products and prices. We also demonstrate, via a pairof case studies, how an analyst can use these automated approaches to investigate other categories of products and transactions. We use eight distinct forums to assess our tools: Antichat, Blackhat World, Carders, Darkode, Hack Forums, Hell, L33tCrew and Nulled. Our automated approach is fast and accurate, achieving over 80% accuracy in detecting post category, product, and prices. Inthethirddomain, wedevelopasetoffeaturesforaprincipalcomponentanalysis (PCA) based anomaly detection system to extract producers (those actively abusing children) from the full set of users on Tor CSAM forums. These forums are visited by tens of thousands of pedophiles daily. The sheer quantity of users and posts make manual exploration and analysis unscalable. In the final chapter of this dissertation, we demonstrate how to extract producers from unlabeled, public forum data. We use four distinct forums to assess our tools; these forums remain unnamed to protect law enforcement investigative efforts. We have released our code written for the first two domains, as well as the proof of conceptdatafromthefirstdomain,andasub-setofthelabeleddatafromtheseconddo- main,allowingreplicationofourresults.1 Professor David Wagner Dissertation Committee Chair 1As of the filing of this dissertation, our code and data are available online at https://github.com/rsportnoff/anti-trafficking-tools and https://evidencebasedsecurity.org/forums/ i To my daughter, Esther Eunmee, and all the other kids. Contents Contents ii List of Figures iv List of Tables v 1 Introduction 1 1.1 Classified Ads Selling Sex . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Cyber Black-Market Forums . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Child Sexual Abuse Material Forums . . . . . . . . . . . . . . . . . . . 4 2 Classified Ads Selling Sex 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Sex Trafficking Online . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Bitcoin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Backpage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Bitcoin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Author Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.1 Labeling Ground Truth . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.2 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.4.3 Validation Results . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Linking Ads to Bitcoin Transactions . . . . . . . . . . . . . . . . . . . 18 2.6 Grouping Ads by Owner . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.6.1 Grouping by Shared Author . . . . . . . . . . . . . . . . . . . . 21 2.6.2 Grouping by Persistent Bitcoin Identities . . . . . . . . . . . . . 22 2.7 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.7.1 Price Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . 23 2.7.2 Linking Backpage Ads to Bitcoin Transactions . . . . . . . . . . 24 iii 2.7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3 Cyber Black-Market Forums 36 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2.1 Ecosystem Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2.2 Classification for Black-Markets . . . . . . . . . . . . . . . . . . 38 3.2.3 NLP Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 Forum Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4 Automated Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.1 Type-of-Post Classification . . . . . . . . . . . . . . . . . . . . . 42 3.4.2 Product Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4.3 Price Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4.4 Currency Exchange Extraction . . . . . . . . . . . . . . . . . . 52 3.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.5.1 End-to-end error analysis . . . . . . . . . . . . . . . . . . . . . . 54 3.5.2 Broadly Characterizing a Forum . . . . . . . . . . . . . . . . . . 54 3.5.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.6.1 Identifying Account Activity . . . . . . . . . . . . . . . . . . . . 56 3.6.2 Currency Exchange Patterns . . . . . . . . . . . . . . . . . . . . 57 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4 CSAM Forums 60 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.2.1 Anomaly Detection in Social Network Users . . . . . . . . . . . 61 4.2.2 CSAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3 Forum Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4 Extracting Producers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.4.1 Labeling Ground Truth . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.3 Public Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.4 Validation Results . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5 Conclusion 76 A 87

Description:
the authors conducted a small empirical evaluation with a balanced test set of 500 pairs of ads, achieving a 79% ECDSA signature scheme, where the owner of a bitcoin signs a statement agreeing to either written as if from the perspective of the victim herself, or describing the victim being sold
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.