Andrea Calì Peter Wood Nigel Martin Alexandra Poulovassilis (Eds.) 5 6 3 0 1 S Data Analytics C N L 31st British International Conference on Databases, BICOD 2017 London, UK, July 10–12, 2017, Proceedings 123 Lecture Notes in Computer Science 10365 Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen Editorial Board David Hutchison Lancaster University, Lancaster, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Zurich, Switzerland John C. Mitchell Stanford University, Stanford, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Dortmund, Germany Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbrücken, Germany More information about this series at http://www.springer.com/series/7409 ì Andrea Cal Peter Wood (cid:129) Nigel Martin Alexandra Poulovassilis (Eds.) (cid:129) Data Analytics 31st British International Conference on Databases, BICOD 2017 – London, UK, July 10 12, 2017 Proceedings 123 Editors Andrea Calì NigelMartin Birkbeck, University of London Birkbeck, University of London London London UK UK PeterWood Alexandra Poulovassilis Birkbeck, University of London Birkbeck, University of London London London UK UK ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notesin Computer Science ISBN 978-3-319-60794-8 ISBN978-3-319-60795-5 (eBook) DOI 10.1007/978-3-319-60795-5 LibraryofCongressControlNumber:2017943849 LNCSSublibrary:SL3–InformationSystemsandApplications,incl.Internet/Web,andHCI ©SpringerInternationalPublishingAG2017 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictionalclaimsin publishedmapsandinstitutionalaffiliations. Printedonacid-freepaper ThisSpringerimprintispublishedbySpringerNature TheregisteredcompanyisSpringerInternationalPublishingAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface This volume contains research papers presented at BICOD 2017: the 31st British InternationalConferenceonDatabases heldJuly10–12,2017,atBirkbeck,University of London. The BICOD Conference is an international venue with a long tradition of presen- tation and discussion of research in the broad area of data management. The theme of BICOD 2017 was “Data Analytics,” that is, the process of deriving higher-level information from large sets of raw data. The conference featured keynotes from three distinguished speakers: Sihem Amer-Yahia, Laboratoire d’Informatique de Grenoble; Tim Furche, University of Oxford and Wrapidity; and Elena Baralis, Politecnico di Torino. There were also two invitedtutorials:byVasilikiKalavri,ETHZürich,ondistributedgraphprocessing;and by Leopoldo Bertossi, Carleton University, on declarative approaches to data quality assessment and cleaning. BICOD 2017 attracted paper submissions from Australia, Austria, Canada, Colombia, Germany, Ireland, Italy, Russia, Tunisia, and the UK. The Program Com- mittee accepted 16 papers to appear in the proceedings covering topics such as data cleansing, data integration, data wrangling, data mining and knowledge discovery, graph data and knowledge graphs, intelligent data analysis, approximate and flexible querying, data provenance, and ontology-based data access. A prize was awarded for the best paper presented by a PhD student, which was generously sponsored by Neo Technology. The authors of a small number of selected bestpaperswereinvitedtosubmitextendedversionsoftheirworktoaspecialissueof The Computer Journal. We would like to thank all the authors for submitting their work to BICOD, the Program Committee members for their contribution to selecting an inspiring set of research papers, and the distinguished invited speakers for accepting our invitation to presenttheirworkinLondon.Wealsogratefullythankeveryoneinvolvedinthelocal organization, including Tara Orlanes-Angelopoulou, Phil Gregg, Lucy Tallentire, Catherine Griffiths, and Matthew Jayes, for all their hard work and dedication in making this conference possible. May 2017 Andrea Calì Peter Wood Nigel Martin Alexandra Poulovassilis Organization Program Committee Andrea Calì Birkbeck, University of London, UK (PC Co-chair) Alvaro Fernandes University of Manchester, UK George Fletcher Eindhoven University of Technology, The Netherlands Floris Geerts University of Antwerp, Belgium Sven Helmer Free University of Bozen-Bolzano, Italy Anne James Coventry University, UK Mike Jackson Birmingham City University, UK Sebastian Maneth University of Edinburgh, UK Nigel Martin Birkbeck, University of London, UK Werner Nutt Free University of Bozen-Bolzano, Italy Dan Olteanu University of of Oxford, UK Andreas Pieris University of Edinburgh, UK Alessandro Provetti Birkbeck, University of London, UK Alexandra Poulovassilis Birkbeck, University of London, UK Mark Roantree Dublin City University, Ireland Maria-Esther Vidal Universidad Simón Bolívar, Venezuela John Wilson University of Strathclyde, UK Peter Wood Birkbeck, University of London, UK (PC Co-chair) Keynotes Wrapping Millions of Documents Per Day - and How That’s Just the Beginning Tim Furche Department of Computer Science, University of Oxford,Oxford, UK CTOWrapidity Ltd.,Oxford, UK [email protected] Abstract.Companiesandresearchershavebeenpainfullymaintainingmanually createdprogramsforwrapping(akascraping)websitesfordecades-employing hundredsofengineerstowrapthousandsoreventensofthousandsofsources. Wherewrappersbreak,engineersmustscrambletofixthewrappersmanuallyor face the ire of their users. In DIADEM we demonstrated the effectiveness of hybrid AIs for automatically generating wrapper programs in anacademic set- tings. Our hybrid approach combined knowledge-based rule systems with large-scaleanalyticsfoundedintechniquesfromtheNLPandMLcommunity. Recently, we commercialized that technology into Wrapidity Ltd. and quickly joined up with Meltwater, the leading global media-intelligence com- pany.AtMeltwater,weareapplyingWrapidity’stechniquesandinsighttothe wrapping of tens of thousands of news sources, processing over 10M unique documents per day. But that’s just the beginning …. Together with Meltwater engineers and researchers from San Francisco, Stockholm, Bangalore, and Budapest, we are on the way to make the collection and analysis of “outside” dataabreezethroughfairhair.ai,aplatformforaccessingandenrichingthevast content stored andcollected byMeltwater every day. Toward Interactive User Data Analytics Sihem Amer-Yahia CNRS, University Grenoble-Alps, Grenoble,France Abstract. User data can be acquired from various domains. This data is char- acterized by a combination of demographics such as age and occupation and useractionssuchasratingamovie,reviewingarestaurantorbuyinggroceries. User data is appealing to analysts in their role as data scientists who seek to conductlarge-scalepopulationstudies,andgaininsightsonvariouspopulation segments. It is also appealing to novice users in their role as information con- sumerswhousethesocialWebforroutinetaskssuchasfindingabookclubor choosingarestaurant. User data exploration has been formulated as identifying group-level behaviorsuchasAsianwomenwhopublishregularlyindatabases.Group-level explorationenablesnewfindingsandaddressesissuesraisedbythepeculiarities ofuserdatasuchasnoiseandsparsity.Iwillreviewourworkonone-shotand interactive user data exploration. I will then describe the challenges of devel- oping avisualanalytics tool for finding andconnectingusers andgroups.