Table Of ContentALGORITHMS AND TOOLS FOR ANONYMIZATION OF
THE INTERNET TRAFFIC
by
Tanjila Farah
B.Sc., BRAC University, 2010
a Thesis submitted in partial fulfillment
of the requirements for the degree of
Master of Applied Science
in the
School of Engineering Science
Faculty of Applied Sciences
(cid:13)c Tanjila Farah 2013
SIMON FRASER UNIVERSITY
Spring 2013
All rights reserved.
However, in accordance with the Copyright Act of Canada, this work may be
reproduced without authorization under the conditions for “Fair Dealing.”
Therefore, limited reproduction of this work for the purposes of private study,
research, criticism, review and news reporting is likely to be in accordance
with the law, particularly if cited appropriately.
APPROVAL
Name: Tanjila Farah
Degree: Master of Applied Science
Title of Thesis: AlgorithmsandToolsforAnonymizationoftheInternetTraf-
fic
Examining Committee: Veselin Jungic,
Adjunct Professor
Department of Mathematics
Chair
Ljiljana Trajkovi´c, Senior Supervisor
Professor
School of Engineering Science
Parvaneh Saeedi, Supervisor
Assistant Professor
School of Engineering Science
Steve Hardy, Internal Examiner
Professor Emeritus
School of Engineering Science
Date Approved:
ii
Partial Copyright Licence
iii
Abstract
Collecting network traffic traces from deployed networks is one of the basic steps in un-
derstanding communication networks and ensuring adequate quality of service. Traffic
traces may be used for in network management, traffic engineering, packet classification,
and for analyzing user behavior. They may also be used for identifying and tracking net-
work anomalies and for maintaining network security. For privacy and security reasons,
monitored traffic traces should be anonymization before they may be shared. The goal of
anonymization is to preserve traffic properties while enforcing the privacy policies. Various
tools and techniques have been implemented for trace anonymization such as: Crypto-
PAn, Anontool, FLAIM, ip2anonip, IP-Anonymous, and TCPdprive. These tools use a
number of anonymization algorithms such as black-marker, random permutations, trunca-
tion, pseudo-anonymization, and prefix-preserving pseudo-anonymization. In this thesis, we
present a survey of anonymization tools and algorithms. We propose and implement a new
anonymization tool called Anonym, which allows the user to execute multi-level anonymiza-
tion and to display analysis results. Finally, we evaluate the impact of the anonymization
on the anonymized traffic data.
iv
Acknowledgments
This thesis would not have been possible without the support of several thoughtful and gen-
erous individuals. My advisor, Professor Ljiljana Trajkovi´c, who has provided tremendous
insight and guidance both within and outside of the realm of communication networks. My
sincere thanks to Prof. Emeritus Steve Hardy, Prof. Parvaneh Saeedi, and Prof. Veselin
Jungic for being my committee members and for providing valuable comments and sugges-
tions. Many thanks to Prof. Emeritus Kohshi Okumura for his valuable feedback.
My sincere thanks to Eva Mar´ıa Cavero for her wonderful friendship and support. I
would like to extend my appreciation to my colleagues Rajvir Gll, Soroush Haeri, Ravinder
Paul, and Nabil M. Al-Rousan for reviewing my thesis.
Mydeepestgratitudegoestomyfamilyfortheirunconditionalloveandsupportthrough-
out my life: my parents for their love, encouragement, advice, and support and my sisters
Trisha and Aurpa for being there of me.
v
Contents
Approval ii
Partial Copyright License iii
Abstract iv
Acknowledgments v
Contents vi
List of Tables ix
List of Figures xi
List of Programs xiii
1 Introduction 1
1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Implementation of the Anonym Tool . . . . . . . . . . . . . . . . . . . 3
1.2.2 Validation of the Anonym Tool . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Analysis of Un-anomymized and Anomymized Datasets . . . . . . . . 4
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Collection of Network Traffic 5
2.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Traffic Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
vi
2.1.2 Discovering the Internet Network Topology . . . . . . . . . . . . . . . 7
2.1.3 Network Security Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Network Traffic Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 BCNET Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 CAIDA Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.3 Route Views Data Collection . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.4 RIPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Anonymization 18
3.1 Anonymization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Fields and Relative Anonymization Process . . . . . . . . . . . . . . . . . . . 22
3.2.1 IP Addresses Anonymization . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 MAC Addresses Anonymization. . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Time-stamps Anonymization . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.4 Counter Anonymization . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.5 Port Numbers and Protocol Numbers Anonymization . . . . . . . . . 27
3.3 Anonymization Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Crypto-PAn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.2 Anontool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.3 FLAIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 The Anonym Tool 36
4.1 Validation Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 The Functionalities of Anonym . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.1 Converting pcap and mrt Files . . . . . . . . . . . . . . . . . . . . . . 37
4.2.2 Anonymization Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.3 Anonymization Options . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2.4 Option for the Kolmogorov-Smirnov (K-S) Test . . . . . . . . . . . . . 42
4.2.5 Additional Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.6 Data Analysis Options . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Graphical User Interface of the Anonym tool . . . . . . . . . . . . . . . . . . 60
4.3.1 Operational Diagram of Prefix-preserving option . . . . . . . . . . . . 61
vii
5 Conclusions 63
5.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Appendix A List of Programs 65
Appendix B Abbreviations 73
Bibliography 75
viii
List of Tables
2.1 Network attacks and layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Summary of connection types and the number of computers affected by the
Witty worm(March 2004) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Summary of the Witty worm collected in March 2004 . . . . . . . . . . . . . 12
2.5 A sample of the collected Route Views dataset (September 2012) . . . . . . . 14
2.6 A sample of BGP dataset collected from RIPE (July 2003) . . . . . . . . . . 16
2.4 Security dataset on the Code Red worms (August 2001) . . . . . . . . . . . . 17
3.1 Enumeration algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 AnexampleoftheinputandoutputofCrypto-PAnthatusesprefix-preserving
pseudonymization algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 IP address anonymization algorithms. . . . . . . . . . . . . . . . . . . . . . . 22
3.4 MAC address anonymization algorithms. . . . . . . . . . . . . . . . . . . . . . 23
3.5 Time-stapms anonymization algorithms. . . . . . . . . . . . . . . . . . . . . . 25
3.6 Counter anonymization algorithms. . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Port numbers and protocol numbers anonymization algorithms. . . . . . . . . 27
3.8 The input and output of Crypto-PAn. . . . . . . . . . . . . . . . . . . . . . . 29
3.9 Anonymization with Lucent’s Crypto-PAn. . . . . . . . . . . . . . . . . . . . 30
3.10 Algorithms supported by Anontool. . . . . . . . . . . . . . . . . . . . . . . . . 31
3.11 Anontool anonymization results. . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.12 Algorithms supported by FLAIM. . . . . . . . . . . . . . . . . . . . . . . . . 34
3.13 FLAIM anonymization results. . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 The results of Anontool anonymization are shown in blue color. . . . . . . . . 38
4.2 The results of FLAIM anonymization are shown in blue color. . . . . . . . . . 39
ix
4.3 The results of Anonym anonymization are shown in blue color. . . . . . . . . 40
4.4 Results of the Anonym algorithms. . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 K-S test results for CDF of packet length data. . . . . . . . . . . . . . . . . . 44
4.6 Statistics of datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.7 FiveCDFdistributionsfittedtotheCDFoftheun-anonymizedandanonymized
datasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
x
Description:Types of available log files are: pacp, bgpdump, nfdump, iptable, pccat, and netflow. The log files differ in their structure Anonym is a MATLAB-based tool that supports pcap and mrt format input files and provides options to convert