Lecture Notes in Computer Science 5831 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA AlfredKobsa UniversityofCalifornia,Irvine,CA,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen UniversityofDortmund,Germany MadhuSudan MicrosoftResearch,Cambridge,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA GerhardWeikum Max-PlanckInstituteofComputerScience,Saarbruecken,Germany Yishai A. Feldman Donald Kraft Tsvi Kuflik (Eds.) Next Generation Information Technologies and Systems 7th International Conference, NGITS 2009 Haifa, Israel, June 16-18, 2009 Revised Selected Papers 1 3 VolumeEditors YishaiA.Feldman IBMHaifaResearchLab HaifaUniversityCampus,MountCarmel,Haifa31905,Israel E-mail:[email protected] DonaldKraft U.S.AirForceAcademy DepartmentofComputerScience 2354FairchildDrive,Suite6G-101,ColoradoSprings,CO80840,USA E-mail:[email protected] TsviKuflik TheUniversityofHaifa ManagementInformationSystemsDepartment MountCarmel,Haifa31905,Israel E-mail:[email protected] LibraryofCongressControlNumber:2009935905 CRSubjectClassification(1998):H.4,H.3,H.5,H.2,D.2.12,C.2.4 LNCSSublibrary:SL3–InformationSystemsandApplication,incl.Internet/Web andHCI ISSN 0302-9743 ISBN-10 3-642-04940-0SpringerBerlinHeidelbergNewYork ISBN-13 978-3-642-04940-8SpringerBerlinHeidelbergNewYork Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. springer.com ©Springer-VerlagBerlinHeidelberg2009 PrintedinGermany Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SPIN:12772971 06/3180 543210 Foreword Information technology is a rapidly changing field in which researchers and develop- ers must continuously set their vision on the next generation of technologies and the systems that they enable. The Next Generation Information Technologies and Systems (NGITS) series of conferences provides a forum for presenting and discussing the latest advances in information technology. NGITS conferences are international events held in Israel; previous conferences have taken place in 1993, 1995, 1997, 1999, 2002, and 2006. In addition to 14 reviewed papers, the conference featured two keynote lectures and an invited talk by notable experts. The selected papers may be classified roughly in five broad areas: • Middleware and Integration • Modeling • Healthcare/Biomedical • Service and Information Management • Applications NGITS 2009 also included a demonstration session and an industrial track focusing on how to make software development more efficient by cutting expenses with technol- ogy and infrastructures. This event is the culmination of efforts by many talented and dedicated individuals. We are pleased to extend our thanks to the authors of all submitted papers, the mem- bers of the program committee, and the external reviewers. Many thanks are also due to Nilly Schnapp for local organization and logistics, and to Eugeny Myunster for managing the web site and all other technical things. Finally, we are pleased to ac- knowledge the support of our institutional sponsors: The University of Haifa, the Faculty of Social Sciences and the MIS Department at the University of Haifa, the IBM Haifa Research Lab, and the Technion. June 2009 Yishai A. Feldman Donald Kraft Tsvi Kuflik Organization General Chair Tsvi Kuflik Steering Committee Opher Etzion Avigdor Gal Amihai Motro Program Committee Chairs Yishai A. Feldman Donald Kraft Program Committee Nabil Adam Manolis Koubarakis Hamideh Afsarmanesh Maria Jose Martin-Bautista Mathias Bauer Amnon Meisels Iris Berger Naftaly Minsky Dan Berry George Papadopoulos Elisa Bertino Gabriella Pasi Gloria Bordogna Mor Peleg Patrick Bosc Haggai Roitman Rebecca Cathey Doron Rotem Jen-Yao Chung Steve Schach Alessandro D'Atri Pnina Soffer Asuman Dogac Bracha Shapira Ophir Frieder Bernhard Thalheim Mati Golani Eran Toch Paolo Giorgini Yair Wand Enrique Herrera-Viedma Ouri Wolfson David Konopnicki Amiram Yehudai Additional Reviewers Gunes Aluc Gokce Banu Laleci Erturkmen Joel Booth Stefania Marrara Alessio Maria Braccini Simon Samwel Msanjila Mariangela Contenti Cagdas Ocalan Ekatarina Ermilova Aabhas Paliwal VIII Organization Andrea Resca Venkatakumar Srinivasan Basit Shafiq Fulya Tuncer Michal Shmueli-Scheuer Stefano Za Local Arrangements Nilly Schnapp Website Manager Eugeny Myunster Table of Contents Keynote Lectures Searching in the “Real World”..................................... 1 Ophir Frieder Structured Data on the Web ...................................... 2 Alon Y. Halevy 1 Middleware and Integration Worldwide Accessibility to Yizkor Books............................ 3 Rebecca Cathey, Jason Soo, Ophir Frieder, Michlean Amir, and Gideon Frieder Biomedical Information Integration Middleware for Clinical Genomics ....................................................... 13 Simona Rabinovici-Cohen 2 Modeling Interpretation of History Pseudostates in Orthogonal States of UML State Machines .................................................. 26 Anna Derezin´ska and Romuald Pilitowski System Grokking – A Novel Approach for Software Understanding, Validation, and Evolution......................................... 38 Maayan Goldstein and Dany Moshkovich Refactoring of Statecharts......................................... 50 Moria Abadi and Yishai A. Feldman 3 Healthcare/Biomedical Towards Health 2.0: Mashups to the Rescue ......................... 63 Ohad Greenshpan, Ksenya Kveler, Boaz Carmeli, Haim Nelken, and Pnina Vortman Semantic Warehousing of Diverse Biomedical Information ............. 73 Stefano Bianchi, Anna Burla, Costanza Conti, Ariel Farkash, Carmel Kent, Yonatan Maman, and Amnon Shabo X Table of Contents InEDvance: Advanced IT in Support of Emergency Department Management .................................................... 86 Segev Wasserkrug, Ohad Greenshpan, Yariv N. Marmor, Boaz Carmeli, Pnina Vortman, Fuad Basis, Dagan Schwartz, and Avishai Mandelbaum 4 Service and Information Management Enhancing Text Readability in Damaged Documents ................. 96 Gideon Frieder ITRA under Partitions ........................................... 97 Aviv Dagan and Eliezer Dekel Short and Informal Documents: A Probabilistic Model for Description Enrichment ..................................................... 109 Yuval Merhav and Ophir Frieder 5 Applications Towards a Pan-EuropeanLearning Resource Exchange Infrastructure... 121 David Massart Performance Improvement of Fault Tolerant CORBA Based Intelligent Transportation Systems (ITS) with an Autonomous Agent ............ 133 Woonsuk Suh, Soo Young Lee, and Eunseok Lee A Platformfor LifeEventDevelopmentin a eGovernmentEnvironment: The PLEDGE Project............................................ 146 Luis A´lvarez Sabucedo, Luis Anido Rifo´n, and Ruben M´ıguez P´erez Online Group Deliberation for the Elicitation of Shared Values to Underpin Decision Making ........................................ 158 Faezeh Afshar, Andrew Stranieri, and John Yearwood Author Index.................................................. 169 Searching in the “Real World” (Abstract of Invited Plenary Talk) Ophir Frieder Information Retrieval Laboratory Department of Computer Science Illinois Institute of Technology [email protected] For many, "searching" is considered a mostly solved problem. In fact, for text process- ing, this belief is factually based. The problem is that most "real world" search appli- cations involve "complex documents", and such applications are far from solved. Complex documents, or less formally, "real world documents", comprise of a mixture of images, text, signatures, tables, logos, water-marks, stamps, etc, and are often avail- able only in scanned hardcopy formats. Search systems for such document collections are currently unavailable. We describe our efforts at building a complex document information processing (CDIP) prototype. This prototype integrates "point solution" (mature) technologies, such as OCR capability, signature matching and handwritten word spotting tech- niques, search and mining approaches, among others, to yield a system capable of searching "real world documents". The described prototype demonstrates the adage that "the whole is greater than the sum of its parts". To evaluate our CDIP prototype as well as to provide an evaluation platform for future CDIP systems, we also introduced a complex document benchmark. This benchmark is currently in use by the National Institute of Standards and Technology (NIST) Text REtrieval Conference (TREC) Legal Track. The details of our complex document benchmark are similarly presented. Having described the global approach, we describe some additional point solutions developed in the IIT Information Retrieval Laboratory. These include an Arabic stem- mer and a natural language source integration fabric called the Intranet Mediator. In terms of stemming, we developed and commercially licensed an Arabic stemmer and search system. Our approach was evaluated using benchmark Arabic collections and favorably compared against the state of the art. We also focused on source integration and ease of user interaction. By integrating structured and unstructured sources, we de- signed, implemented, and commercially licensed our mediator technology that provides a single, natural language interface to querying distributed sources. Rather than providing a set of links as possible answers, the described approach actually answers the posed question. Both the Arabic stemmer and the mediator efforts are likewise discussed. A summary of the efforts discussed is found in [1]. Reference 1. Frieder, O.: On Searching in the ‘Real World’. In: Argamon, S., Howard, N. (eds.) Com- putational Methods for Counterterrorism, ch. 1. Springer, Heidelberg (2009) ISBN: 978-3-642-01140-5 Y.A. Feldman, D. Kraft, and T. Kuflik (Eds.): NGITS 2009, LNCS 5831, p. 1, 2009. © Springer-Verlag Berlin Heidelberg 2009 Structured Data on the Web Alon Y. Halevy Google Inc., 1600 AmphitheatreParkway, Mountain View, California, 94043, USA [email protected] Abstract of Plenary Talk Though search on the World-Wide Web has focused mostly on unstructured text, there is an increasing amount of structured data on the Web and growing interest in harnessing such data. I will describe several current projects at Google whose overall goal is to leverage structured data and betterexpose it to our users. The first project is on crawling the deep web. The deep web refers to content that resides in databases behindforms, but is unreachable bysearch engines because there are nolinks tothese pages. I will describe asystem that surfaces pages from thedeep web by guessing queries to submit to these forms, and entering the results into the Google index [1]. The pages that we generated using this system come from millions of forms, hundreds of domains and over 40 languages. Pages from the deep web are served in thetop-10 results on google.com for over1000 queriesper second. ThesecondprojectconsidersthecollectionofHTMLtablesontheweb.TheWebTa- bles Project [2] built a corpus of over 150 million tables from HTML tables on the Web. The WebTables System addresses the challenges of extracting these tables from theWeb,andofferssearchoverthiscollectionoftables.Theprojectalsoillustratesthe potential of leveraging the collection of schemas of these tables. Finally, I’ll discuss currentwork on computingaspects of queriesin orderto better organize search results for exploratory queries. Keywords: Deep web, structured data, heterogeneous databases, data integration. References 1. Madhavan,J.,Ko,D.,Kot,L.,Ganapathy,V.,Rasmussen,A.,Halevy,A.:Google’s deep-web crawl. In:Proc. of VLDB,pp. 1241–1252 (2008) 2. Cafarella, M.J., Halevy,A., Zhang, Y., Wang,D.Z., Wu,E.: WebTables: Exploring thePower of Tables on the Web.In:VLDB (2008) Y.A.Feldman,D.Kraft,andT.Kuflik(Eds.):NGITS2009,LNCS5831,p.2,2009. (cid:2)c Springer-VerlagBerlinHeidelberg2009