Data Analytics Iván Palomares Carrascosa Harsha Kumara Kalutarage Yan Huang Editors Data Analytics and Decision Support for Cybersecurity Data Analytics Serieseditors LongbingCao,AdvancedAnalyticsInstitute,UniversityofTechnology,Sydney, Broadway,NSW,Australia PhilipS.Yu,UniversityofIllinoisatChicago,Chicago,IL,USA AimsandGoals: Building and promoting the field of data science and analytics in terms of publishing work on theoretical foundations, algorithms and models, evaluation and experiments, applications and systems, case studies, and applied analytics in specificdomainsoronspecificissues. SpecificTopics: This series encourages proposals on cutting-edge science, technology and best practicesinthefollowingtopics(butnotlimitedto): Data analytics, data science, knowledge discovery, machine learning, big data, statisticalandmathematicalmethodsfordataandappliedanalytics, New scientific findings and progress ranging from data capture, creation, storage, search,sharing,analysis,andvisualization, Integration methods, best practices and typical examples across heterogeneous, interdependent complex resources and modals for real-time decision-making, collaboration,andvaluecreation. Moreinformationaboutthisseriesathttp://www.springer.com/series/15063 Iván Palomares Carrascosa Harsha Kumara Kalutarage • Yan Huang Editors Data Analytics and Decision Support for Cybersecurity Trends, Methodologies and Applications 123 Editors IvánPalomaresCarrascosa HarshaKumaraKalutarage UniversityofBristol CentreforSecureInformationTechnologies Bristol,UK Queen’sUniversityofBelfast Belfast,UK YanHuang Queen’sUniversityBelfast Belfast,UK ISSN2520-1859 ISSN2520-1867 (electronic) DataAnalytics ISBN978-3-319-59438-5 ISBN978-3-319-59439-2 (eBook) DOI10.1007/978-3-319-59439-2 LibraryofCongressControlNumber:2017946318 ©SpringerInternationalPublishingAG2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Cybersecurity has been classically understood as the protection of any form of information potentially exposed in the Internet. This notion of cybersecurity has progressively evolved towards an umbrella term, covering a broad range of areas concerning the protection of hardware, software, networks and their underlying information and considering the socio-economic, legal and ethical impact that failingtoguaranteesuchprotectionmaycauseonsystems,dataandusers.Avital aspect of cybersecurity is to ensure that confidentiality, integrity and availability requirementsaremetforassetsandtheirrelatedusers. Bothunderadevelopmentalandresearchperspective,cybersecurityhasundoubt- edly attained an enormous importance within the last few years. This is largely a consequence of the massive growth experienced by available data in cyber space, in their volume, value, heterogeneity and, importantly, degree of exposure. This explosion of accessible data with diverse nature accentuates the potential—and sometimes critical—vulnerability of the information harboured by them. Specifi- cally,suchvulnerabilitiesbecomecriticalwhenvaluableinformationorknowledge might be illegitimately accessed by the wrong person/s (e.g. attackers) with malicious purposes. Unsurprisingly, both researchers and practitioners presently show an increasing interest in defining, developing and deploying computational andartificialintelligence(AI)-basedapproachesasthedrivingforcetowardsamore cyber-resilientsociety. Withtheadventofbigdataparadigmsinthelastfewyears,therehasbeenarise in data science approaches, raising in parallel a higher demand for effective data- drivenmodelsthatsupportdecision-makingatastrategiclevel.Thismotivatesthe needfordefiningnoveldataanalyticsanddecisionsupportapproachesinamyriad of real-life scenarios and problems, with cybersecurity-related domains being no exception.Forinstance,Fig.1illustratestheinterrelationshipbetweenseveraldata management, analytics and decision support techniques and methods commonly adoptedincybersecurity-orientedframeworksinthelastyears. This edited volume comprises nine chapters, covering a compilation of recent advancesincybersecurity-relatedapplicationsofdataanalyticsanddecisionsupport approaches.Besidestheoreticalstudiesandoverviewsofexistingrelevantliterature, v vi Preface Human MULTI-CRITERIA VISUALISATION review DECISION MAKING MONITORING 3. Decision RISK ANALYSIS Support Domain-specific extracted knowledge INCOMPLETE INFORMATION MANAGEMENT UNCERTAINTY 2. Analytics HANDLING Aggregated/pre-processed information FEATURE DATA SELECTION FUSION 1. Data Management Social Network Network Traffic Data User Applications Data S E C R U O (...) S A T A D S R O ECT GOVERNMENT FINANCE ENFOLRACWEMENT INSURANCE S Fig.1 Overviewofdataanalyticsanddecisionsupportprocessesandtechniquesincybersecurity scenarios this book comprises a number of highly application-oriented research chapters. Theinvestigationsundertakenacrossthesechaptersfocusondiversecybersecurity problemsandscenariossituatedatforefrontdirectionsofinterestbyacademicsand professionals.Theultimatepurposeofthisvolumeistwofold: 1. To bring together and disseminate some important “lessons learnt” within a youngbutsurprisinglyrapidfield 2. To emphasize the increasing importance that data science and AI techniques (particularly those related to machine learning, data visualization and decision support)areattainingtoovercomethemajorlimitationsandchallengescurrently arisingintheITsecuritylandscape Therefore, this book joins ideas and discussions from leading experts in these rapidlygrowingfields,inordertoalignkeydirectionsofworkthatresearchersand professionalsalikewouldbeencouragedtofollowtofurtherconsolidatesuchfields. Preface vii Part I—Regular Chapters The first seven chapters present both theoretical and practical-industrial contributions related to emergent cybersecurity research. Particular emphasis is put on data analysis approaches and their relationship with decision-making and visualization techniques to provide reliable decision support tools. In Chap.1 [1], Markus Ring et al. present a novel toolset for anomaly-based network intrusion detection. Motivated by challenges that frequently hinder the applicability of anomaly-based intrusion detection systems in real-world settings, the authors propose a flexible framework comprising of diverse data mining algorithms.Theirapproachappliesonlineanalysisuponflow-baseddatadescribing meta-information about network communications, along with domain knowledge extraction, to augment the value of network information before analysing it under multipleperspectives.Toovercometheproblemofdataavailability,theframework is also conceived to emulate realistic user activity—with a particular focus on the insiderthreatproblem—andgeneratereadilyavailableflow-baseddata. Legg reflects in Chap.2 [2] on the problem of detecting insider threats in organizationsandmajorchallengesfacedbyinsiderthreatdetectionsystems,such as the difficulty to reduce false alarms. The chapter investigates the importance of combiningvisualanalyticsapproaches withmachinelearningmethodstoenhance an iterative process of mutual feedback between detection system and human analyst, so as to rationally capture the dynamic—and continuously evolving— boundaries between normal and insider behaviours and make informed decisions. In this work, the author demonstrates how generated visual knowledge can signif- icantlyhelpanalyststoreasonandmakeoptimaldecisions.Thechapterconcludes withadiscussionthatalignscurrentchallengesandfuturedirectionsofresearchon theinsiderthreatproblem. Malware detection is no longer a problem pertaining to desktop computer systemssolely.Withtheenormousriseofmobiledevicetechnologies,thepresence andimpactofmalwarehaverapidlyexpandedtothemobilecomputingpanorama. Ma˘riucaetal.inChap.3[3]reportonAndroidmobilesystemsasapotentiallymajor targetforcollusionattacks,i.e.attacksthatresultby“combining”permissionsfrom multipleappstopavethewayforattackerstoundertakeseriousthreats.Theauthors presenttwoanalysismethodstoassessthepotentialdangerofappsthatmaybecome partofacollusionattack.Appsassessedassuspiciousaresubsequentlyanalysedin further detail to confirm whether an actual collusion exists. Previous work by the authorsisadoptedasaguidelinetoprovideageneraloverviewofthestate-of-the- artresearchinappcollusionanalysis. In Chap.4 [4], Carlin et al. focus on a recent strategy to fight against the devastating effects of malware: the dynamic analysis of run-time opcodes. An opcode is a low-level and human-readable machine language instruction, and it can be obtained by disassembling the software program being analyzed. Carlin et al. demonstrate in their work the benefits of dynamic opcode analysis to detectmalicioussoftwarewithasignificantaccuracyinrealpracticalapplications. One of such notable advantages is the ability of dynamic analysis techniques to observe malware behaviour at runtime. The model presented by the authors, whichusesn-gramanalysisonextractedopcodes,isvalidatedthroughalargedata viii Preface set containing highly representative malware instances. Results show a superior malwareclassificationaccuracy,whencomparedwithprevioussimilarresearch. Moustafaetal.inChap.5[5]presentascalableandlightweightintrusiondetec- tionsystemframework,characterizedbyusingastatisticaldecision-makingengine to identify suspicious patterns of activity in network systems. This framework recognizes abnormal network behaviours predicated on the density characteristics of generated statistical distributions, namely, Dirichlet distributions. The authors illustratehowstatisticalanalysisbasedonlower-upperinterquartilerangeshelpsin unveilingthenormalitypatternsinthenetworkdatabeinganalysed.Consequently, normalitypatternsallowtoflexiblydefineasuitablestatisticalmodeltoclassifythe dataathand,thusmakingclassificationdecisionsmoreintelligently.Aperformance evaluationandacomparativestudyareconductedbyusingtwodatasets,showing thattheproposedframeworkoutperformsothersimilartechniques. Chapter 6 [6] shows the vital role of cybersecurity in e-learning systems, particularly when it comes to ensuring robust and reliable student assessment processes online. Sabbah presents in this chapter two frameworks for overcoming cheating in e-examination processes. The reliability of both systems is tested and validated, showing how both of them outperform existing approaches in terms of riskanalysis.Avarietyofcheating actionsandviolations arecarefullyconsidered in the multi-stage design of the cheating detection system architectures such as impersonation during an e-Examination, messaging patterns and webcam actions. Biometricauthentication(e.g.viafingerprints)isadoptedasareliableauthentication methodtoensurecheating-freeexaminations. Classification techniques undeniably play a major role in many cybersecurity- orienteddataanalyticsapproaches.Noisydatahavelongbeendeemedaninfluential factorintheaccuracyanderrorratesofclassificationmodels.InChap.7[7],Indika revisits a number of popular classification approaches—support vector machines, principalcomponentanalyticsandrandomforestensembles—andpresentsacom- parative study of them, under the perspective of noisy data and its impact on classification accuracy. The author introduces in this chapter a noise removal algorithmandalsoanalyseshowskewness,appropriatesampleratiosandthechosen classificationtechniquejointlyinfluencetheoverallclassificationperformance. Part II—Invited Chapters Besides the regular book chapters outlined above, this book also provides two invited chapters authored by scientists endowed with promising research trajectories in IT security, data analytics and decision support approaches, as well as engagement experience with industry and public sectors to joinforcesagainstcurrentcybersecuritychallenges. InChap.8[8],AlamaniotisandTsoukalashighlightthecentralrolethatcyberse- curityapproachesshouldplayinlarge-scalesmartpowersystemssuchasSCADA systems. Power consumption and forecasting information have been identified as key indicators for detecting cyber-attacks in these contexts. Based on this, AlamaniotisandTsoukalaspresentanintelligentsystemmethodbasedonGaussian process regression and fuzzy logic inference, aimed at analysing, reasoning and learning from load demand information in smart grids. The underlying statistical Preface ix andfuzzyreasoningprocessallowsthesystemtomakehighlyconfidentdecisions autonomously as to whether current levels of power load demands are legitimated ormanipulationbyanattackeristakingplace. Garae and Ko provide in Chap.9 [9] an insightful closure to this book, with a comprehensive overview of analytics approaches based on data provenance, effective security visualization techniques, cybersecurity standards and decision support applications. In their study, Garae and Ko investigate the notion of data provenance as the ability of tracking data from its conception to its deletion and reconstructing its provenance as a means to explore cyber-attack patterns. The authors argue on the potential benefits of integrating data provenance and visualizationtechniquestoanalysedataandsupportdecision-makinginITsecurity scenarios. They also present a novel security visualization standard describing its majorguidelinesandlawenforcementimplicationsindetail. For ease of reference, the table below summarizes the techniques covered and cybersecuritydomainstargetedbyeachoneofthechapterscomprisingthebook. Technique(s)covered Cybersecuritydomain Chapter1:MarkusRingetal. Streamdataanalytics Intrusiondetection Classification,clustering Insiderthreats Datavisualization Chapter2:PhilipA.Legg Human-machinesystems Insiderthreats Decisionsupport Datavisualization Chapter3:IrinaMa˘riuca Mobilecomputing Androidappcollusiondetection Asa˘voaeetal. First-orderlogic Probabilisticmodels Softwaremodelchecking Chapter4:DomhnallCarlin Dynamicopcodeanalysis Runtimemalwaredetection etal. n-Gramanalysis Chapter5:NourMoustafaetal. Statisticaldecisionmodels Intrusiondetection Normalityanalysis Chapter6:YousefW.Sabbah Multi-modaldataanalysis e-Learning Authenticationmethods Onlineexaminations Chapter7:R.Indika Classification(SVM,PCA) Cybersecuritynoisydataremoval P.Wickramasinghe Ensembleclassifiers Noisydatamanagement Chapter8:MiltiadisAlamaniotis Gaussianprocessregression Securesmartpowersystems andLefteriH.Tsoukalas Fuzzylogicinference Chapter9:JefferyGaraeand Dataprovenance Securityvisualizationand RyanK.L.Ko Datavisualizationstandardsmonitoring x Preface We would like to thank Springer editorial assistants for the confidence put in this book and their continuous support in materializing it before, during and after its elaboration. We are also very grateful to the authors of selected chapters for their efforts during the preparation of the volume. Without their valuable ideas and contributions, finishing this project would not have been possible. Likewise, we acknowledge all the scientists and cybersecurity experts who generously vol- unteered in reviewing the chapters included in the book. Finally, we would like to express our special thanks to Dr. Robert McCausland, principal engineer and R&DmanagerintheCentreforSecureInformationTechnologies(CSIT),(Queen’s UniversityBelfast),forfirmlybelievinginourinitiativeandstronglysupportingit sinceitsinception. Bristol,UK IvánPalomaresCarrascosa Belfast,UK HarshaKumaraKalutarage Belfast,UK YanHuang April2017 References 1.MarkusRing,SarahWunderlich,DominikGrüdl,DieterLandes,AndreasHotho.AToolsetfor IntrusionandInsiderThreatDetection. 2.P.A.Legg.Human-MachineDecisionSupportSystemsforInsiderThreatDetection. 3.I.Ma˘riuca.J.Blasco,T.M.Chen,H.K.Kalutarage,I.Muttik,H.N.Nguyen,M.Roggenbach, S.A.Shaikh.Detectingmaliciouscollusionbetweenmobilesoftwareapplications-theAndroid case. 4.D.Carlin,P.O’Kane,S.Sezer.DynamicAnalysisofMalwareusingRun-TimeOpcodes. 5.N.Moustafa,G.Creech,J.Slay.BigDataAnalyticsforIntrusionDetectionSystems:Statistical Decision-MakingusingFiniteDirichletMixtureModels. 6.Y.Sabbah.SecurityofOnlineExaminations. 7.R. Indika P. Wickramasinghe. Attribute Noise, Classification Technique and Classification Accuracy:AComparativeStudy. 8.M. Alamaniotis, L.H. Tsoukalas. Learning from Loads: An Intelligent System for Decision SupportinIdentifyingNodalLoadDisturbancesofCyber-AttacksinSmartPowerSystemsusing GaussianProcessesandFuzzyInference. 9.J.Garae,R.Ko.VisualizationandDataProvenanceTrendsinDecisionSupportforCybersecu- rity.
Description: