ebook img

Web Dynamics: Adapting to Change in Content, Size, Topology and Use PDF

456 Pages·2004·14.273 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Web Dynamics: Adapting to Change in Content, Size, Topology and Use

Web Dynamics Springer-Verlag Berlin Heidelberg GmbH Mark Levene Alexandra Poulovassilis Web Dynamics Adapting to Change in Content, Size, Topology and Use With 76 Figures and 29 Tables Springer Mark Levene Alexandra Poulovassilis School of Computer Science and Information Systems Birkbeck University of London; Malet Street London WClE 7HX United Kingdom Library of Congress Cataloging-in-Publication Data applied for Die Deutsche Bibliothek -CIP-Einheitsaufnahme Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at <http://dnb.ddb.de>. ACM Subject Classification (1998): H.3.3 H.3.5 H.S.4 ISBN 978-3-642-07377-9 ISBN 978-3-662-10874-1 (eBook) DOI 10.1007/978-3-662-10874-1 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag Berlin Heidelberg GmbH. Violations are liable for prosecution under the German Copyright Law. sprmgeronline.com © Springer-Verlag Berlin Heidelberg 2004 Originally published by Springer-Verlag Berlin Heidelberg New York in 2004 Softcover reprint of the hardcover 1s t edition 2004 The use of designations, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover Design: KiinkelLopka, Heidelberg Typesetting: Computer to film by author's data Printed on acid-free paper 45/3142PS 543210 Preface The World Wide Web has become a ubiquitous global tool, used for finding infor mation, communicating ideas, carrying out distributed computation and conducting business, learning and science. The Web is highly dynamic in both the content and quantity of the information that it encompasses. In order to fully exploit its enormous potential as a global repository of information, we need to understand how its size, topology and content are evolv ing. This then allows the development of new techniques for locating and retrieving information that are better able to adapt and scale to its change and growth. The Web's users are highly diverse and can access the Web from a variety of devices and interfaces, at different places and times, and for varying purposes. We thus also need techniques for personalising the presentation and content of Web based information depending on how it is being accessed and on the specific user's requirements. As well as being accessed by human users, the Web is also accessed by appli cations. New applications in areas such as e-business, sensor networks, and mobile and ubiquitous computing need to be able to detect and react quickly to events and changes in Web-based information. Traditional approaches using query-based 'pull' of information to find out if events or changes of interest have occurred may not be able to scale to the quantity and frequency of events and changes being generated, and new 'push' -based techniques are needed. In January 2001, we organised a workshop on 'Web Dynamics' in London to explore some of these issues (see www.des.bbk.ae . uk/WebDyn). Following on from that workshop, we co-edited a special issue of the Computer Networks journal on Web Dynamics (vol. 39, no. 3). In May 2002 we organised a second Web dynamics workshop, this time co-located with WWW'2002 in Hawaii (see www.des.bbk.ae . uk/WebDyn2). Our aim with the second workshop was to continue the momentum built up from the first workshop and the special issue, and to identify major new research advances and challenges. The topics discussed fell into four main areas: models of evolution of the Web's structure, locating and retrieving Web information, Web applications, and adaptive hypermedia. Several of the contributors to this workshop have written VI Preface chapters for this book, and we gratefully acknowledge the contributions of all the chapters' authors. To our knowledge, this is the first book that has an explicit focus on the dynamics of the Web, and our intended audience is all those who have a professional or personal interest in this. For practitioners, the book gives a critical discussion of the current state of the art as well as the most recent research advances, and our aim is that it will be a valuable reference in evaluating current and emerging technologies. For researchers and developers, we hope the book will motivate new research, prototypes and products. For teachers and students, our aim is that the book will be used to support teaching of graduate-level courses on Web technologies, and also by PhD students as a comprehensive reference resource. Each chapter covers one particular aspect of Web dynamics, first giving a critical review of the state of the art in that area, then taking a more detailed look at one or two particular solutions and concluding with a discussion of some of the major open problems that still need to be addressed. Chapter 1 gives an introduction and overview that sets the scene for the rest of the book. The chapters that follow it are divided into four parts, essentially covering the same four topics as emerged from the second Web Dynamics workshop: • evolution of the Web's structure and content (Chaps. 2-5), • searching and navigating the Web (Chaps. 6-9), • handling events and change on the Web (Chaps. 10-14), and • personalised access to the Web (Chaps. 15-18). Each part is preceded by a short review written by us, summarising that part and pointing to links, similarities and differences between the techniques discussed in the chapters it contains. Part I focusses on the structural and statistical properties of the Web and covers techniques for measuring the size of the Web (Chap. 2), identifying 'communities' of related Web pages (Chap. 3), models of evolution of the Web viewed as a graph (Chap. 4), and techniques for measuring the rate of change of Web pages (Chap. 5). Part II focusses on locating and retrieving Web information and covers Web nav igation tools (Chap. 6), Web crawlers (Chap. 7), link and content analysis for ranking Web pages (Chap. 8) and techniques for evaluating Web search engines based on the freshness of their indexes (Chap. 9). Part III discusses techniques for handling events and change in Web applications, and covers languages for defining event -condition-action rules over XML repositories (Chaps. 10 and 11), embedding calls to Web services within XML documents in order to support peer-to-peer data integration (Chap. 12), detection and notification of changes to Web pages (Chap. 13), and reliable middleware for detecting and reacting to events in heterogeneous distributed environments (Chap. 14). Part IV discusses personalisation of Web information to how it is being accessed and to the user's specific requirements and preferences. It covers architectures for adaptive hypermdia systems (Chap. 15), adaptive educational hypermedia (Chap. 16), personalisation techniques for the mobile internet (Chap. 17), and techniques for learning and predicting users' browsing patterns (Chap. 18). Preface VII Many open research issues remain in these areas, and new challenges are contin uously arising. We hope that this book will serve to introduce readers to this exciting field and to the possibilities and potential of the dynamic Web. Mark Levene Alex Poulovassilis October 2003 Contents Web Dynamics - Setting the Scene Mark Levene, Alexandra Poulovassilis ................................ . Part I Evolution of Web Structure and Content Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19 How Large Is the World Wide Web? Adrian Dobra, Stephen E. Fienberg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 23 Methods for Mining Web Communities: Bibliometric, Spectral, and Flow Gary William Flake, Kostas Tsioutsiouliklis, Leonid Zhukov . . . . . . . . . . . . . . .. 45 Theory of Random Networks and Their Role in Communications Networks Jose Fernando Mendes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 69 Web Dynamics, Structure, and Page Quality Ricardo Baeza-Yates, Carlos Castillo, Felipe Saint-Jean. . . . . . . . . . . . . . . . . .. 93 Part II Searching and Navigating the Web Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 13 Navigating the World Wide Web Mark Levene, Richard Wheeldon ..................................... 117 Crawling the Web Gautam Pant, Padmini Srinivasan, Filippo Menczer ...................... 153 X Contents Combining Link and Content Information in Web Search Matthew Richardson, Pedro Domingos ................................. 179 Search Engine Ability to Cope With the Changing Web Judit Bar-Ilan .................................................... 195 Part III Events and Change on the Web Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..... 219 An Event-Condition-Action Language for XML James Bailey, George Papamarkos, Alexandra Poulovassilis, Peter T. Wood ... 223 Active XQuery Angela Bonifati, Stefano Paraboschi .................................. 249 Active XML: A Data-Centric Perspective on Web Services Serge Abiteboul, Omar Benjelloun, Joana Manolescu, Tova Milo, Roger Weber ............................................ 275 WebVigiL: An Approach to Just-In-Time Information Propagation in Large Network-Centric Environments Jyoti Jacob, Anoop Sanka, Naveen Pandrangi, Sharma Chakravarthy ........ 301 DREAM: Distributed Reliable Event-Based Application Management Alejandro Buchmann, Christof BornhOvd, Mariano Cilia, Ludger Fiege. Felix Gartner, Christoph Liebig, Matthias Meixner, Gero Milhl ............ 319 Part IV Personalized Access to the Web Introduction ..................................................... 353 A Survey of Architectures for Adaptive Hypermedia Mario Cannataro, Andrea Pugliese ................................... 357 Adaptive Web-Based Educational Hypermedia Paul De Bra, Lora Aroyo, Alexandra Cristea ............................ 387 Mp3 - Mobile Portals, Profiles and Personalization Barry Smyth, Paul Cotter ........................................... 411 Learning Web Request Patterns Brian D. Davison ................................................. 435 Index ........................................................... 461 Web Dynamics - Setting the Scene Mark Levene and Alexandra Poulovassilis School of Computer Science and Information Systems Birkbeck University of London Malet Street, London, WCIE 7HX, U.K., {mark,ap}@dcs.bbk.ac.uk 1 Introduction The World Wide Web is the largest hypertext in existence, with a hyperlinked collec tion of over four billion accessible Web pages, as of late 2003. It is also probably the fastest growing network of this scale the world has ever known. If we include 'deep' Web information, i.e. information inaccessible to search engines, such as information stored in databases that can only be accessed through a specialised interface, then the Web is between 400 to 550 times larger than this estimate for the 'shallow' Web (see [12]). The Web is becoming all-pervasive in our lines, and we use it for finding information, communication, computation, leisure, business, learning and science. About 10% of the world's population now has access to the Web, and its number of users is continuously increasing (see www.glreach.com/globs tats). The Web is thus dynamic in several ways. First, it is a growing network. However. pages and links are not only being added but are also being removed. The contents of Web pages are also being modified. Service providers such as search engines and portals are fighting a continuous battle to deliver high-quality service in the face of this change and growth. Building mathematical models that can predict accurately how the Web is evolving is important, as it then allows us to develop techniques for locating and retrieving Web information that can adapt and scale up to its change and growth. Second, Web-based applications in areas such as information monitoring and subscription, order processing, business process integration, auctions and negotiation need to be able to detect and automatically react to events or changes in information content by carrying out some further processing. The traditional approach where in formation consumers periodically submit retrieval requests to information producers in order to find out if events or changes of interest have occurred may not scale up to the volumes of events being generated in such applications. Instead, new techniques are being employed in which information producers automatically notify consumers when events or changes of interest to them occur. Third, the Web's users are highly diverse. They have a wide variety of information needs that may change over time, and they can access the Web from a variety of devices

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.