Table Of ContentComputer Communications and Networks
For other titles published in this series, go to
http://www.springer.com/
The Computer Communications and Networks series is a range of textbooks,
monographs and handbooks. It sets out to provide students, researchers and non-specialists
alike with a sure grounding in current knowledge, together with comprehensible access to
the latest developments in computer communications and networking.
Emphasis is placed on clear and explanatory styles that support a tutorial approach, so that
even the most complex of topics is presented in a lucid and intelligible manner.
Ian J. Taylor Andrew B. Harrison
From P2P and Grids
to Services on the W eb
Evolving Distributed Communities
Second Edition
1 3
Ian Taylor, BSc, PhD
Andrew Harrison,BA ,MSc , PhD
School of Computer Science, Cardiff University, UK
Series Editor
Professor A.J. Sammes, BSc, MPhil, PhD, FBCS, CEng
CISM Group, Cranfield University,
RMCS, Shrivenham, Swindon SN6 8LA, UK
CCN Series ISSN 1617-7975
ISBN 978-1-84800-122-0 2nd edition e-ISBN 978-1-84800-123-7 2nd edition
ISBN 978-1-85233-869-5 1st edition
DOI 10.1007/978-1-00084800-123-7
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Control Number: 2008939016
© Springer-Verlag London Limited 2009
First published 2004
Second edition 2009
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted
under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or
transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case
of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing
Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers.
The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a
specific statement, that such names are exempt from the relevant laws and regulations and therefore free for
general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the information
contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that
may be made.
Printed on acid-free paper
9 8 7 6 5 4 3 2 1
springer.com
Adina, I thank you for your support and encouragement during
the writing of this book, amidst a somewhat laborious time as
my words fell ever more quickly off the end of these chapters.
And as one delivery ended, another began – to George, our son.
Ian
To P & U again (sigh). I will now keep my promise to break my
worst habit.
Andrew
Preface
Over the past several years, Internet users have changed in their usage pat-
terns from predominately client/server-based Web server interactions to also
involving the use of more decentralized applications, where they contribute
more equally in the role of the application as a whole, and further to dis-
tributed communities based around the Web. Distributed systems take many
forms, appear in many areas and range from truly decentralized systems, like
Gnutella, Skype and Jxta, centrally indexed brokered systems like Web ser-
vices and Jini and centrally coordinated systems like SETI@home.
From P2P and Grids and Services on the Web Evolving Distributed
Communities provides a comprehensive overview of the emerging trends in
peer-to-peer (P2P), distributed objects, Web Services, the Web, and Grid
computing technologies, which have redefined the way we think about dis-
tributed computing and the Internet. The book has four main themes: dis-
tributedenvironments,protocolsandarchitecturesforapplications,protocols
and architectures focusing on middleware and finally deployment of these
middleware systems, providing real-world examples of their usage.
Within the context of applications, examples of the many diverse archi-
tectures are provided including: decentralized systems like Gnutella, Freenet
and in terms of data distribution, BitTorrent; brokered ones like Napster;
andcentralizedapplicationslikeSETI@home,aswellasthecoretechnologies
used by the Web. For middleware, the book covers Jxta, as a programming
infrastructure for P2P computing, core Web Services protocols, Grid com-
puting paradigms, e.g., Globus and OGSA, distributed-object architectures,
e.g.,Jini,andrecentdevelopmentsontheWebincludingMicroformats,Ajax,
the Atom protocols and the Web Application Description Language. Each
technology is described in detail, including source code where appropriate.
Tomaintaincoherency,eachsystemisdiscussedintermsofthegeneralized
taxonomy and transport and data format protocols, which are outlined in
the first part of the book. This taxonomy serves as a placeholder for the
systems presented in the book and gives an overview of the organizational
differencesbetweenthevariousapproaches.Mostofthesystemsarediscussed
VIII Preface
at a high level, particularly addressing the organization and topologies of
the distributed resources. However, some (e.g., Jxta, Jini, Web Services, the
Atom syndication and publishing protocols, and to some extent Gnutella)
are discussed in much more detail, giving practical programming tutorials for
their use. Security is paramount throughout and introduced with a dedicated
chapteroutliningthemanyapproachestosecuritywithindistributedsystems.
Why did we decide to write this book?
ThefirsteditionofthisbookwaswrittenforIan’slecturecourseintheSchool
of Computer Science at Cardiff University on Distributed Systems. However,
for this second edition, we wanted to make the scope more broad to not
only include more recent advances in distributed systems technologies, such
as BitTorrent and Web 2.0, but also to focus on the underlying protocols for
the Web, discovery and data transport. So:
Who should read this book?
This book, we believe, has a wide-ranging scope. It was initially written for
BSc students, with an extensive computing background, and MSc students,
who have little or no prior computing experience, i.e., some students had
never written a line of code in their lives! Therefore, this book should appeal
to people with various computer programming abilities, to the casual reader
who is simply interested in the recent advances in the distributed systems
world, and even those who resent the invasion of distributed computing into
their lives. This book provides an easy-to-follow path through the maze of
often conflicting architectures and paradigms.
Readerswilllearnaboutthevariousdistributedsystemsthatareavailable
today. For a designer of new applications, this will provide a good reference.
Forstudents,thistextwouldaccompanyanycourseondistributedcomputing
to give a broader context of the subject area. For a casual reader, interested
inP2P,WebServices,theWeborGridcomputing,thebookwillgiveabroad
overviewofthefieldandspecificsabouthowsuchsystemsoperateinpractice
without delving into the low-level details. For example, to both casual and
programming-level readers, all chapters will be of interest, except some parts
of the Gnutella chapter and some sections of the deployment chapters, which
are more tuned to the lower-level mechanisms and therefore targeted more to
programmers.
Organization
Chapter 1: Introduction: In this chapter, an introduction is given into
distributedsystems,payingparticularattentiontotheroleofmiddleware.
Ataxonomyisconstructedfordistributedsystemsrangingonascalefrom
centralized to decentralized depending on how resources or services are
organized, discovered and how they communicate with each other. This
Preface IX
will serve as an underlying theme for the understanding of the various
applications and middleware discussed in this book.
Chapter 2: Discovery Protocols: This chapter discusses the local and
global mechanisms for the discovery of services across the Internet. It
provides a foundation for the middleware and applications because each
employs some aspect of discovery in their implementation. We describe
both the Unicast protocols (UDP and TCP) and the Multicast protocol
by providing some detail about how each operates. We then conclude
with a brief description of SLP, which is a protocol that makes use of
these underlying protocols and directory services for discovering services
on a network.
Chapter 3: Structured Document Types: This chapter provides an
overview of the related technologies of XML, HTML and XHTML. These
technologies are widely used across many distributed systems and hence
are referred to frequently throughout the book. A description of the pri-
mary technologies used to model and validate these documents is also
provided.
Chapter 4: Distributed Security Techniques: This chapter covers the
basic elements of security in a distributed system. It covers the various
ways that a third party can gain access to data and the design issues
involved in building a distributed security system. It then gives a basic
overview of cryptography and describes the various ways in which secure
channels can be set up, using public-key pairs or by using symmetric
keys, e.g., shared secret keys or session keys. Finally, secure mobile code
is discussed within the concept of sandboxing.
Chapter5:TheWeb:ThischapterdescribeshowtheWebcameintobeing
and what its underlying principles and technologies are including the use
of Uniform Resource Identifiers, Hypermedia and the Hypertext Trans-
fer Protocol. The architectural style of Representational State Transfer
(REST) which is derived from observations of the Web is also described.
The Web is the most ubiquitous distributed system of all and therefore
has influenced many of the systems described in this book, both architec-
turally and technologically. The advent of Web 2.0 has re-ignited interest
in the Web as a distributed platform from areas that did not previously
consider it feasible.
Chapter 6: Peer-2-Peer Environments: This chapter gives a brief his-
tory of client/server and peer-to-peer computing. The current P2P def-
inition is stated and specifics of the P2P environment that distinguish
it from client/server are provided: e.g., transient nodes, multi-hop, NAT,
firewalls, etc. Several examples of P2P technologies are given, along with
application scenarios for their use and categorizations of their behaviour
within the taxonomy described in the first chapter.
Chapter7:WebServices:Thischapterintroducestheconceptofmachine-
to-machine communication and how this fits in with the existing Web
technologies and future scopes. This leads onto a high-level overview of
X Preface
Web Services, which illustrates the core concepts without getting bogged
down with the deployment details.
Chapter 8: Distributed Objects and Agent Technologies:Thischap-
ter provides a brief introduction to distributed objects, using CORBA as
an example, as well as the related field of Mobile Agents. The relation-
ship between three important abstractions used in distributed systems,
namely, Objects, Services and Resources, is also discussed, providing the
reader with an understanding of the underlying differences between dif-
ferent architectural styles.
Chapter 9: Grid Computing: This chapter introduces the idea of a com-
putational Grid environment, which is typically composed of a number
of heterogeneous resources that may be owned and managed by different
administrators.Theconceptofa“virtualorganization”isdiscussedalong
with its security model, which employs a single sign-on mechanism. The
Globustoolkit,thereferenceimplementationthatcanbeusedtoprogram
computational Grids, is then outlined giving some typical scenarios.
Chapter 19: On the Horizon: This chapter provides a glimpse into pos-
sible future environments based on technologies that are already being
deployed;specificallycloudcomputinginfrastructuresandubiquitoussys-
tems,aswellasconjecturesbyparticularcurrentcommentatorsonwhere
distributed systems are likely to lead in the coming decade.
Chapter 10: Gnutella: This chapter combines a conceptual overview of
Gnutella and the details of the actual Gnutella protocol specification.
Many empirical studies are then outlined that illustrate the behaviour of
theGnutellanetworkinpracticeandshowthemanyissueswhichneedto
be overcome in order for this decentralized structure to succeed. Finally,
the advantages and disadvantages of this approach are discussed.
Chapter 11: Scalability: In this chapter, we look at scalability issues by
analysing the manner in which peers are organized within popular P2P
networks, using both structured and unstructured approaches. First, we
lookatsocialnetworksandcomparetheseagainsttheirP2Pcounterparts.
WethenexploretheuseofdecentralizedP2Pnetworkswithinthecontext
of file sharing and how and why hybrid (centralized/decentralized) net-
works are used. We then discuss distributed hash table technology, which
offers a more structured approach to this problem.
Chapter 12: Freenet: This chapter gives a concise description of the
Freenet distributed information storage system, which is a real-world ex-
ample of how the various technologies, so far discussed, can be integrated
andusedwithinasinglesystem.Forexample:Freenetisdesignedtowork
within a P2P environment; it addresses scalability through the use of an
adaptive routing algorithm that creates a centralized/decentralized net-
worktopologydynamically;anditaddressesanumberofprivacyissuesby
using a combination of hash functions and public/private key encryption.
Chapter 13: BitTorrent: This chapter describes the BitTorrent protocol.
It discusses how BitTorrent uses a tracker to create a group or swarm