DATA ANALYSIS NETWORK FOR CYBER-SECURITY p919hc_9781783263745_tp.indd 1 17/1/14 10:26 am May2,2013 14:6 BC:8831-ProbabilityandStatisticalTheory PST˙ws TThhiiss ppaaggee iinntteennttiioonnaallllyy lleefftt bbllaannkk DATA ANALYSIS NETWORK FOR CYBER-SECURITY editors Niall Adams • Nicholas Heard Imperial College London Heilbronn Institute for Mathematical Research, University of Bristol Imperial College Press ICP p919hc_9781783263745_tp.indd 2 17/1/14 10:26 am Published by Imperial College Press 57 Shelton Street Covent Garden London WC2H 9HE Distributed by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. DATA ANALYSIS FOR NETWORK CYBER-SECURITY Copyright © 2014 by Imperial College Press All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher. For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher. ISBN 978-1-78326-374-5 Typeset by Stallion Press Email: [email protected] Printed in Singapore January 17, 2014 17:33 9in x 6in Data Analysis for Network CyberSecurity b1711-fm Preface The contents of this volume are contributions from invited speakers at a workshop entitled “Data Analysis for Cyber-Security”, hosted by the Uni- versityofBristolinMarch2013.Wearegratefulforthegeneroussupportof the Heilbronn Institute for Mathematical Research, an academic research unit of the Universisty of Bristol with interests related to cyber-security. Cyber-security–thetaskofdefendingcomputersandpeoplefromelec- tronic attack – is a pressing concern. For example, a report sponsored by the UK government in 2012 estimated the cost of cyber-attack to the UK economy as £29 billion. This cost is attributed to various types of attack, includingextortion,fiscalfraudandidentitytheft.Notably,thelargestcat- egory,intellectualpropertytheft,accountedforaround£9billion.Thescale ofcyber-attackprovokedtheUKgovernmenttohighlightcyber-securityas atoppriorityfornationalsecurityin2013.FromaUKpointofview,based onrecentfigures,theproblemisincreasing:78%oflargeorganisationswere subjectto externalattackin2012,upfrom73%inthe previousyear,while 63%ofsmallbusiness weresubjectto suchattackoverthe same period,an increase of 41% on the previous year. Cyber-security is a broad discipline, covering a range of academic dis- ciplines including computer science, computer and network architecture, and statistics. This volume is concerned with network cyber-security, and particularly, analysis of data that are observed in relation to a network of either computersorpeople.As anexemplar,consideraninstitutional com- puter network,in which communicating devices (computers, printers, etc.) are nodes and communications between devices are events that occur on edges between nodes. Numerous types of cyber-attack have been observed inthiscontext.AvarietyofsuchattacksarewelldescribedinDavidoffand Ham (2012) from the point of view of network forensics. There are a number of problems that can be addressed by analysing networkdata.Aprimaryexample isconstructinganomalydetectionmeth- ods to identify when unusual traffic occurs. These complement signature- based methods, such as those embodied for example in the Snort intrusion v January 17, 2014 17:33 9in x 6in Data Analysis for Network CyberSecurity b1711-fm vi Preface detection system (Caswell et al., 2007). We believe the next generation of detectiontoolswillhavetoutilisemoreinformationaboutpastandpresent traffic behaviour,particularly with respect to temporalaspects. A particu- lar advantage of anomaly detection methods over rule-based approaches is their potential to detect new, so-called zero-day, attacks. Therearesignificantchallengeswhenaddressingnetworkdataanalysis problems. In the context of cyber-security,data sets are typically big, con- sisting of a large number of nodes and a great volume of traffic on existing edges. This alone raises significant computational challenges. If anomaly detection is the objective, then timeliness becomes an issue, and the veloc- ity of the traffic on edges becomes an important factor. The combination of volume and velocity makes much network cyber-security a “big-data” problem.Fromastatisticalormachinelearningpointofview,manycyber- securityproblemsareunsupervised,whichraisesgenericproblemsofmodel selection and control of hyperparameters and decision boundaries. These data analysis problems are generally exacerbated by increases in the vol- ume, velocity and heterogeneity of network data. The precise timing of eventsonedgesisamoresubtleaspectthatisonlyrecentlybeing seriously explored. This latter aspect is extensively addressed in this volume. Theabovediscussionisintendedtomakethecasethatcyber-securityis both an important and interesting research problem. Some of the research opportunities arediscussed in more detailin Meza et al. (2009).The chap- tersinthisvolumeprovideaviewofthisexcitinganddiversearea.Tobegin, PatrickWolfe andBenjaminOlding giveanintroductionto the problemof statistical inference on graphs,and make first steps toward formal inferen- tial procedures. Alex Tartakovsky considers the problem of quickest change detection in the context of statistical anomaly detection. A key concern is to min- imise the detection delay, an aspect of great importance in many practical network cyber problems. This chapter features a hybrid anomaly-spectral- signature-based system useful for efficient traffic filtering. Joshua Neil and co-authors are concerned with aspects of network traffic that are localised in both time and graph space. In particular, they developa scanstatistic-basedmethodologyforfinding connectedsub- graphs that are locally connected in time and which have deviated from historic behaviour. The methodology is illustrated on large-scale network traffic data. Summet Dua and Pradeep Chowriappa address situational awareness byconsideringusersentimentinsocialmedia.Suchsentimentscanprovide January 17, 2014 17:33 9in x 6in Data Analysis for Network CyberSecurity b1711-fm Preface vii indicators and precursors for cyber-threat. A novel data-mining approach is proposed to identify aspects that influence the dynamics of the network over time. C´eline L´evy-Leduc considers both centralised and decentralised net- work versions of change detection for network traffic data. This includes both dimension reduction to simplify network traffic data to a manageable representation and nonparametric change detection adapted for censoring. Finally, Nick Heard and Melissa Turcotte develop Bayesian anomaly detection methodology suited to the computational demands of large net- works. A screening methodology, based on simple node- and edge-based statistical models is developed. While these models are relatively simple, there are numerous technical details addressed to enable their successful deployment. Niall Adams, Nick Heard References Caswell, B., Beale, J. and Baker, A. (2007). Snort Intrusion Detection and Prevention Toolkit (SyngressMedia). Davidoff, S. and Ham, J. (2012). Network Forensics: Tracking Hackers Through Cyberspace (Prentice Hall, UpperSaddle River,NJ). Meza,J.,Campbell,S.andBailey,D.(2009).Mathematicalandstatisticaloppor- tunities in cyber security, CoRR. Available at: http://arxiv.org/abs/0904. 1616. May2,2013 14:6 BC:8831-ProbabilityandStatisticalTheory PST˙ws TThhiiss ppaaggee iinntteennttiioonnaallllyy lleefftt bbllaannkk January 17, 2014 17:33 9in x 6in Data Analysis for Network CyberSecurity b1711-fm Contents Preface v Chapter 1. Inference for Graphs and Networks: Adapting Classical Tools to Modern Data 1 Benjamin P. Olding and Patrick J. Wolfe Chapter 2. Rapid Detection of Attacks in Computer Networks by Quickest Changepoint Detection Methods 33 Alexander G. Tartakovsky Chapter 3. Statistical Detection of Intruders Within Computer Networks Using Scan Statistics 71 Joshua Neil, Curtis Storlie, Curtis Hash and Alex Brugh Chapter 4. Characterizing Dynamic Group Behavior in Social Networks for Cybernetics 105 Sumeet Dua and Pradeep Chowriappa Chapter 5. Several Approaches for Detecting Anomalies in Network Traffic Data 129 C´eline L´evy-Leduc Chapter 6. Monitoring a Device in a Communication Network 151 Nick A. Heard and Melissa J. Turcotte Index 189 ix