Nathalie Japkowicz Stan Matwin (Eds.) 6 5 3 9 I Discovery Science A N L 18th International Conference, DS 2015 Banff, AB, Canada, October 4–6, 2015 Proceedings 123 fi Lecture Notes in Arti cial Intelligence 9356 Subseries of Lecture Notes in Computer Science LNAI Series Editors Randy Goebel University of Alberta, Edmonton, Canada Yuzuru Tanaka Hokkaido University, Sapporo, Japan Wolfgang Wahlster DFKI and Saarland University, Saarbrücken, Germany LNAI Founding Series Editor Joerg Siekmann DFKI and Saarland University, Saarbrücken, Germany More information about this series at http://www.springer.com/series/1244 Nathalie Japkowicz Stan Matwin (Eds.) (cid:129) Discovery Science 18th International Conference, DS 2015 – Banff, AB, Canada, October 4 6, 2015 Proceedings 123 Editors Nathalie Japkowicz StanMatwin University of Ottawa Faculty of Computer Science Ottawa, ON Dalhousie University Canada Halifax, NS Canada ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notesin Artificial Intelligence ISBN 978-3-319-24281-1 ISBN978-3-319-24282-8 (eBook) DOI 10.1007/978-3-319-24282-8 LibraryofCongressControlNumber:2015948779 LNCSSublibrary:SL7–ArtificialIntelligence SpringerChamHeidelbergNewYorkDordrechtLondon ©SpringerInternationalPublishingSwitzerland2015 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodologynow knownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbookare believedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsortheeditors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissionsthatmayhavebeenmade. Printedonacid-freepaper SpringerInternationalPublishingAGSwitzerlandispartofSpringerScience+BusinessMedia (www.springer.com) Preface This year’s International Conference on Discovery Science, DS 2015, was the 18th event in this series. Like in previous years, the conference was co-located with the InternationalConferenceonAlgorithmicLearningTheory,ALT2015,whichisalready in its 26th year. Started in 2001, ALT/DS is one of the longest-running series of co-located events in computer science. The unique combination of recent advances in thedevelopmentandanalysisofmethodsfordiscoveringscientificknowledge,coming from machine learning, data mining, and intelligent data analysis, as well as their application in various scientific domains, on the one hand, with the algorithmic advances in machine learning theory, on the other hand, makes every instance of this jointeventuniqueandattractive.Thisvolumecontainsthepaperspresentedatthe18th International Conference on Discovery Science, while the papers of the 26th Interna- tional Conference on Algorithmic Learning Theory are published by Springer in a companion volume (LNCS Vol. 9355). The18thDiscoveryScienceconferencereceived44internationalsubmissions.Each submissionwasreviewedbyatleasttwocommitteemembers.Thecommitteedecided to accept 28 papers, of which 16 are long and 12 are short papers. This results in the 36% acceptance rate for long papers. As is the tradition of the Discovery Science and the Algorithmic Learning Theory conferences, invited talks were shared between the two meetings. This year’s DS invited talks were “Turning Prediction Tools into Decision Tools” by Cynthia Rudin from MIT, and “Overcoming Obstacles to the Adoption of Machine Learning by Domain Experts” by Kiri Wagstaff from Jet Propulsion Laboratories, while the ALT invited talks were “Finding Hidden Structure in Data with Tensor Decompositions” by Sham Kakade from Microsoft and the Uni- versity of Washington, and “Bilinear Prediction Using Low Rank Models” by Inderjit Dhillon form the University of Texas at Austin. Abstracts of all four invited talks are included in these proceedings. We would like to thank all authors of submitted papers, the Program Committee members, and the additional reviewers for their efforts in evaluating the submitted papers,aswellastheinvitedspeakersandtutorialpresenters.Supportandadvicefrom RandyGoebel,theGeneralChairofbothconferences,wereessentialeverystepofthe way. We are grateful to Kamalika Chaudhuri, Claudio Gentile, Sandra Zilles, and Csaba Szepesvari for ensuring a smooth coordination with ALT. We are indebted to Jonathan Amyot from the Faculty of Computer Science, Dalhousie University, for putting up and maintaining our website with great competence and efficiency. WearegratefultothepeoplebehindEasychairformakingthesystemavailablefree of charge. It was an essential tool in the paper submission and evaluation process, as wellasinthepreparationoftheSpringerproceedings.WearealsogratefultoSpringer for their continuing support of Discovery Science and for publishing the conference proceedings since its inception. VI Preface This year, both conferences were held on October 4-6 in the picturesque setting of Banff, Alberta, and were organized by Sandra Zilles and Csaba Szepesvari. We are verygratefultoISMCanada,anIBMcompany,totheAlbertaInnovates-Technology Futures(AITF),totheCanadianArtificialIntelligenceAssociation(CAIAC),andtothe FacultyofComputerScienceatDalhousieUniversityfortheirsponsorshipofboththe conferences. October 2015 Nathalie Japkowicz Stan Matwin Organization Program Committee Aijun An York University, Canada Vincent Barnabe-Lortie University of Ottawa, Canada Colin Bellinger University of Ottawa, Canada Sabine Bergler Concordia University, Canada Albert Bifet University of Waikato, New Zealand Hendrik Blockeel K.U. Leuven, Belgium Ivan Bratko University of Ljubljana, Slovenia Michelangelo Ceci Università degli Studi di Bari, Italy Tapio Elomaa Tampere University of Technology, Finland Johannes Fürnkranz TU Darmstadt, Germany Dragan Gamberger Rudjer Boskovic Institute, Croatia Howard Hamilton University of Regina, Canada Geoffrey Holmes University of Waikato, New Zealand Diana Inkpen University of Ottawa, Canada Aminul Islam Dalhousie University, Canada Nathalie Japkowicz SITE, University of Ottawa, Canada Ross King University of Manchester, UK Svetlana Kiritchenko NRC, Canada William Klement UHN, University of Toronto, Canada Philippe Langlais Université de Montréal, Canada Guy Lapalme RALI-DIRO, Université de Montréal, Canada Donato Malerba Università degli Studi di Bari “Aldo Moro”, Italy Stan Matwin University of Ottawa, Canada Robert Mercer The University of Western Ontario, Canada Evangelos Milios Dalhousie University, Canada Zoran Obradovic Temple University, USA Bernhard Pfahringer University of Waikato, New Zealand Fred Popowich Simon Fraser University, Canada Doina Precup McGill University, Canada Marko Robnik-Sikonja University of Ljubljana, FRI, Slovenia Mohak Shah GE Global Research, USA Marina Sokolova University of Ottawa and Institute for Big Data Analytics, Canada Jerzy Stefanowski Poznań Univeristy of Technology, Poland Einoshin Suzuki Kyushu University, Japan Maguelonne Teisseire Cemagref - UMR Tetis, France Herna Viktor University of Ottawa, Canada VIII Organization Harry Zhang University of New Brunswick, Canada Min-Ling Zhang Southeast University, China Nur Zincir-Heywood Dalhousie University, Canada Blaz Zupan University of Ljubljana, Slovenia Additional Reviewers Alharthi, Haifa Fass, Dan Kurup, Unmesh Cao, Xi Hang Fok, Ricky Lanotte, Pasqua Fabiana Christoff, Zoe Grinberg, Nastasiya Odilinye, Lydia Corizzo, Roberto Han, Chao Olier, Ivan Davoudi, Heidar Kirinde Gamaarachchige, Tofiloski, Milan Ellert, Bradley Prasadith Bilinear Prediction Using Low Rank Models Inderjit S. Dhillon Department of Computer Science University ofTexas at Austin,Austin, USA [email protected] Linear prediction methods, such as linear regression and classification, form the bread-and-butterofmodernmachinelearning.Theclassicalscenarioisthepresenceof datawithmultiplefeaturesandasingletargetvariable.However,therearemanyrecent scenarios,wheretherearemultipletargetvariables.Forexample,predictingbidwords for a web page (where each bid word acts as a target variable), or predicting diseases linked to a gene. In many of these scenarios, the target variables might themselves be associated with features. In these scenarios, we propose the use of bilinear prediction with low-rank models. The low-rank models serve a dual purpose: (i) they enable tractable computation even in the face of millions of data points as well as target variables,and(ii)theyexploitcorrelationsamongthetargetvariables,evenwhenthere aremanymissingobservations.Weillustrateourmethodologyontwomodernmachine learning problems: multi-label learning and inductive matrix completion, and show results on two applications: predicting Wikipedia labels, and predicting gene-disease relationships. This is joint work with Prateek Jain, Nagarajan Natarajan, Hsiang-Fu Yu and Kai Zhong.
Description: