Annalisa Appice · Michelangelo Ceci Corrado Loglisci · Elio Masciari Zbigniew W. Ras (Eds.) New Frontiers 2 in Mining 1 3 0 1 I Complex Patterns A N L 5th International Workshop, NFMCP 2016 Held in Conjunction with ECML-PKDD 2016 Riva del Garda, Italy, September 19, 2016 Revised Selected Papers 123 fi Lecture Notes in Arti cial Intelligence 10312 Subseries of Lecture Notes in Computer Science LNAI Series Editors Randy Goebel University of Alberta, Edmonton, Canada Yuzuru Tanaka Hokkaido University, Sapporo, Japan Wolfgang Wahlster DFKI and Saarland University, Saarbrücken, Germany LNAI Founding Series Editor Joerg Siekmann DFKI and Saarland University, Saarbrücken, Germany More information about this series at http://www.springer.com/series/1244 Annalisa Appice Michelangelo Ceci (cid:129) Corrado Loglisci Elio Masciari (cid:129) ś Zbigniew W. Ra (Eds.) New Frontiers in Mining Complex Patterns 5th International Workshop, NFMCP 2016 Held in Conjunction with ECML-PKDD 2016 Riva del Garda, Italy, September 19, 2016 Revised Selected Papers 123 Editors AnnalisaAppice Elio Masciari Universitàdegli Studi diBari AldoMoro ICAR-CNR Bari Rende Italy Italy Michelangelo Ceci Zbigniew W.Raś Universitàdegli Studi diBari AldoMoro University of NorthCarolina Bari Charlotte, NC Italy USA Corrado Loglisci Universitàdegli Studi diBari AldoMoro Bari Italy ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notesin Artificial Intelligence ISBN 978-3-319-61460-1 ISBN978-3-319-61461-8 (eBook) DOI 10.1007/978-3-319-61461-8 LibraryofCongressControlNumber:2017944352 LNCSSublibrary:SL7–ArtificialIntelligence ©SpringerInternationalPublishingAG2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsin publishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Modern automatic systems are able to collect huge volumes of data, often with a complex structure (e.g., multi-table data, XML data, Web data, time series and sequences, graphs, and trees). This fact poses new challenges for current information systems with respecttostoring, managing,and mining these bigsetsofcomplexdata. The 5th International Workshop on New Frontiers in Mining Complex Patterns (NFMCP 2016) was held in Riva del Garda in conjunction with the European Con- ference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases(ECML-PKDD2016)onSeptember19,2016.Thepurposeofthisworkshop wastobringtogetherresearchersandpractitionersindataminingwhoareinterestedin the advances and latest developments in the area of extracting patterns from big and complex data sources. The workshop was aimed at integrating recent results from existingfieldssuchasdatamining,statistics,machinelearning,andrelationaldatabases todiscussandintroducenewalgorithmicfoundationsandrepresentationformalismsin complex pattern discovery. This book features a collection of revised and significantly extended versions of papersacceptedforpresentationattheworkshop.Thesepaperswentthrougharigorous review process to ensure compliance with Springer’s high-quality publication stan- dards. The individual contributions of this book illustrate advanced data-mining techniques that preserve the informative richness of complex data and allow for effi- cient and effective identification of complex information units present in such data. The book is composed offive parts and a total of 16 chapters. Part I analyzes Feature Selection and Induction in the presence of complex data. It consists of two chapters. Chapter 1 introduces an unsupervised algorithm for feature constructionbasedontreeensembles.Itdefinesaninformativedatarepresentationthat is able to handle complex data structures, combining information from multiple sources. Chapter 2 presents a graph-based algorithm for feature selection. It ranks features by identifying the most important ones into an arbitrary set of cues. Part II focuses on Classification and Prediction by illustrating some complex pre- dictiveproblems.Itconsistsoffivechapters.Chapter3tacklestheproblemofpruning rule classifiers, while retaining their descriptive properties. It uses confirmation mea- sures as representatives of interestingness measures designed to select rules with desirable descriptive properties. Chapter 4 studies the problem of automatically rec- ognizing speed changes from audio data recorded in controlled conditions. The clas- sification of the audio data is performed using random forests, deep learning architecturesandsupportvectormachines.Chapter5describesaclassificationtaskthat aims at determining whether two voices are spoken by the same person or not. It illustrates an algorithm that performs the classification by evaluating the dissimilarity between a speech sample and a set of known models. Chapter 6 investigates the problemofinterpretingrulesinducedfromimbalanceddata.Itproposesthreedifferent strategiesthatcombineBayesianconfirmationmeasures,inordertoselectruleshaving VI Preface good descriptive characteristics. Chapter 7 addresses the problem of modeling trust networkevolutionthroughsocialcommunicationsamongusersinasocialmediasite.It introduces a link prediction algorithm based on mediating-objects and analyzes the effect of time-decay in creating trust-links. Part III analyzes issues posed by Clustering in the presence of complex data. It consists of four chapters. Chapter 8 investigates the adoption of cluster analysis to predict the primary medical procedure for a patient. The processed patients are clus- teredaccordingtotheirsetofdiagnoses.Thisclusterknowledgeisthenusedtoidentify other existing patients that are considered similar to the new patient. Chapter 9 describes a clustering algorithm allowing us to group features that are likely to take extreme values simultaneously. It exploits the graphical structure stemming from the definition of the clusters. Chapter 10 presents a latent-factor-based approach whose goal is to profile users according to their behavior. It considers the actions as set of featuresinsteadofsingleatomicelements.Chapter11proposesamultiviewclustering methodology that determines clusters of patients with similar symptoms and detects patterns of medication changes that lead to the improvement or decline of patients’ quality of life. PartIVpresentsalgorithmsPatternDiscovery.Itconsistsofthreechapters.Chapter 12 introduces an approach to extract recurrent deviations from historical logging data and generate anomalous patterns representing high-level deviations. It applies a fre- quent subgraph mining technique together with an ad hoc conformance-checking technique. Chapter 13 investigates the task of detecting weather changes, which are periodically repeated over time and space. It introduces a spatiotemporal pattern to representaperiodicchangeanddescribesacomputationalsolutiontodiscoverthiskind of pattern. Chapter 14 investigates the problem of user authentication based on key- stroke timing pattern. It proposes a simple, robust, and nonparameterized nearest- neighbor regression-based feature-ranking algorithm for anomaly detection. Finally, Part V gives a general overview of Applications in sensor network and game scenarios. It contains two chapters. Chapter 15 provides a formalization of a graph-based approach that extends a directed weighted graph using a sequential state transformation function. It interprets the graph to model state transition matrices and describesanalgorithmforderivingtheseinterpretationsinlarge-scalereal-worldsensor networks. Chapter 16 checks whether, and to what extent, advanced process mining techniquescansupportefficientandeffectiveknowledgediscoveryinchessplaying.It alsoprovidesinterestinginsightintothegamerulesandstrategies,and/ormaysupport effective game playing in future matches. Wewouldliketothankalltheauthorswhosubmittedpapersforpublicationinthis book and all the workshop participants and speakers. We are also grateful to the members of the Program Committee and additional reviewers for their excellent work in reviewing submitted and revised contributions with expertise and patience. We would like to thank Jaakko Hollmen for his invited talk on “On Model, Patterns, and Prediction.”AspecialthanksisduetoboththeECMLPKDDWorkshopChairsandto the ECML PKDD organizers who made the event possible. We would like to acknowledge the support of the European Commission through the projects Preface VII MAESTRA–Learning from Massive, Incompletely Annotated, and Structured Data (Grant number ICT-2013-612944) and TOREADOR–Trustworthy Model-Aware Analytics Data Platform (Grant number H2020-688797). Last but not the least, we thank Alfred Hofmann of Springer for his continuous support. March 2017 Annalisa Appice Michelangelo Ceci Corrado Loglisci Elio Masciari Zbigniew Ras Organization Program Chairs Annalisa Appice University of Bari Aldo Moro, Bari, Italy Michelangelo Ceci University of Bari Aldo Moro, Bari, Italy Corrado Loglisci University of Bari Aldo Moro, Bari, Italy Elio Masciari ICAR-CNR, Rende, Italy Zbigniew Ras UniversityofNorthCarolina,Charlotte,USAandWarsaw University of Technology, Poland Program Committee Nicola Barbieri Yahoo Research Barcelona, Spain Elena Bellodi University of Ferrara, Italy Petr Berka University of Economics, Prague, Czech Republic Jorge Bernardino Polytechnic Institute of Coimbra, Portugal Claudia Diamantini Polytechnic University of Marche, Italy Bettina Fazzinga CNR-ICAR, Italy Stefano Ferilli University of Bari, Italy Hongyu Guo National Research Council Canada, Canada Dino Ienco IRSTEA, France Dragi Kocev Jozef Stefan Institute, Slovenia Ruggero G. Pensa University of Turin, Italy Nhathai Phan University of Oregon, USA Nico Piatkowski Technische Universität Dortmund, Germany Gianvito Pio University of Bari, Italy Domenico Redavid Artificial Brain, Italy Rita P. Ribeiro University of Porto, Portugal Jerzy Stefanowski Poznan University of Technology, Poland Herna Viktor University of Ottawa, Canada Alicja Wieczorkowska Polish-Japanese Institute of Information Technology, Poland Wlodek Zadrozny University of North Carolina, Charlotte USA Additional Reviewers Angelastro, Sergio Mileski, Vanja Branco, Paula Mircoli, Alex Ferilli, Stefano Pazienza, Andrea Genga, Laura On Models, Patterns, and Prediction (Invited Talk) Jaakko Hollmen Department of Computer Science, Aalto University inEspoo,Espoo,Finland Abstract. Pattern discovery has been the center of attention of data mining research for along time, with patterns languages varying from simple to com- plex, according to the needs of the applications and the format of data. In this talk, I will take a view on pattern mining that combines elements from neigh- boringareas.Morespecifically,Iwilldescribeourpreviousresearchworkinthe intersection of the three areas: probabilistic modeling, pattern mining and pre- dictivemodeling.Clusteringinthecontextofpatternminingwillbeexplored,as well as linguistic summarization of patterns. Also, multiresolution pattern mining as well as semantic pattern discovery and pattern visualization will be visited. Time allowing, I will speak about patterns of missing data and it sim- plications onpredictive modeling.