SPRINGER BRIEFS IN COMPUTER SCIENCE Reaz Ahmed Raouf Boutaba Collaborative Web Hosting Challenges and Research Directions 123 SpringerBriefs in Computer Science SeriesEditors StanZdonik PengNing ShashiShekhar JonathanKatz XindongWu LakhmiC.Jain DavidPadua XueminShen BorkoFurht V.S.Subrahmanian MartialHebert KatsushiIkeuchi BrunoSiciliano Forfurthervolumes: http://www.springer.com/series/10028 Reaz Ahmed • Raouf Boutaba Collaborative Web Hosting Challenges and Research Directions 123 ReazAhmed RaoufBoutaba DavidR.CheritonSchool DavidR.CheritonSchool ofComputerScience ofComputerScience UniversityofWaterloo UniversityofWaterloo Waterloo,ON,Canada Waterloo,ON,Canada ISSN2191-5768 ISSN2191-5776(electronic) ISBN978-3-319-03806-3 ISBN978-3-319-03807-0(eBook) DOI10.1007/978-3-319-03807-0 SpringerChamHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2013955570 ©TheAuthor(s)2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface TheWebhastremendousimportanceworldwide.Ithasarguablybecometheworld’s greatest resource for information, and its success has fostered a variety of new ways for people to share information, communicate, and interact. Over the past decade, a wave of cultural phenomena – including Facebook, Google+, Flicker, YouTube, and MySpace – have all utilized the Web as their interface. However, cloud-basedsolutionsforonlinestorage,backup,andsharingofmultimediacontent over the Web have inherent privacy perils. Users have to put their trust on the cloud-serviceproviders.Serviceprovidersdictatethetermsofusage,andpotentially gain control over users’ contents. Beside the privacy concern, transporting huge volumes of user-generated, multimedia content to distant data centers may not be bandwidth friendly for unpopular contents. A peer-to-peer (P2P) Web-based content sharing architecture can subside these problems. This book investigates the challenges in P2P web hosting and presents a potential solution named pWeb. ThreemajorchallengeshavebeenaddressedinpWeb:(a)persistentnamingofWeb contentsovernon-persistentP2Pnetworks,(b)decentralizedWebcontentsearching and distributed ranking of search results, and (c) ensuring content availability with minimal replication overhead. pWeb will allow free hosting of websites and multimedia Web contents, without limitation on content type or size. This will provide anybody the opportunity to publish to the masses, rather than restricting thembyeconomics.Inaddition,freedomofspeechisavaluedprinciple;however worldwide there are many who strive to block access to certain information. The distributedapproachofpWebisinherentlyresistanttocensorship,andwillhelpto spreadthisfreedomworldwide. Waterloo,ON,Canada ReazAhmed RaoufBoutaba v Contents 1 Introduction ................................................................... 1 1.1 ImportanceofP2PWebHosting ........................................ 2 1.2 Challenges................................................................ 3 1.2.1 Naming........................................................... 3 1.2.2 P2PWebSearch ................................................. 4 1.2.3 EnsuringContentAvailability................................... 4 1.3 Organization.............................................................. 5 References...................................................................... 5 2 Plexus:RoutingandIndexing ............................................... 7 2.1 CoreConceptsinPlexus................................................. 8 2.2 PlexusRouting........................................................... 9 2.3 UsingPlexusinpWeb.................................................... 10 References...................................................................... 11 3 Naming......................................................................... 13 3.1 Requirements............................................................. 13 3.2 WhoNeedsaName...................................................... 14 3.3 NaminginPeer-to-PeerSystems........................................ 15 3.3.1 FileSharingSystems ............................................ 15 3.3.2 BitTorrent ........................................................ 15 3.3.3 P2PDNS......................................................... 16 3.4 ACollaborativeNamingScheme ....................................... 17 3.4.1 EntitiesandRequirements....................................... 18 3.4.2 pWebNamingSystem........................................... 18 3.4.3 NamingScheme ................................................. 21 3.4.4 NamingAuthority ............................................... 22 3.4.5 NameResolution................................................. 24 3.4.6 MethodsforSelectingWebID.................................. 24 3.5 Summary ................................................................. 25 References...................................................................... 25 vii viii Contents 4 CollaborativeWebSearch ................................................... 27 4.1 Requirements............................................................. 27 4.2 WebSearchinP2PNetworks ........................................... 28 4.2.1 SimilarKeywordSearch......................................... 28 4.2.2 DistributedRelevanceRanking ................................. 31 4.3 ACollaborativeApproach............................................... 32 4.3.1 NetworkArchitecture............................................ 32 4.3.2 IndexingArchitecture ........................................... 35 4.3.3 ResolvingWebQuery ........................................... 39 4.4 Summary ................................................................. 40 References...................................................................... 41 5 Availability..................................................................... 45 5.1 Requirements............................................................. 46 5.2 AvailabilityinP2PSystems ............................................. 47 5.3 ConceptualOverview.................................................... 48 5.3.1 Architecture...................................................... 48 5.3.2 AvailabilityVector............................................... 49 5.4 S-DATAProtocolDetails................................................ 50 5.4.1 Terminology...................................................... 50 5.4.2 IndexingAvailabilityInformation .............................. 51 5.4.3 GroupFormation................................................. 52 5.4.4 GroupMaintenance.............................................. 53 5.4.5 ContentIndexingandLookup................................... 54 5.5 Summary ................................................................. 54 References...................................................................... 55 6 Conclusion..................................................................... 57 Chapter 1 Introduction Modeofinformationproduction,disseminationandconsumptionisgaininganew momentum with the advent of cheaper and more powerful home entertainment devices (like set-top-boxes, home-gateways, network-attached storages, gaming consoles etc.) and hand-held devices (like smart-phones, tablets, portable gaming devices etc.). Combining the powerful multimedia capabilities of the hand-held deviceswiththepersistentuptimebehaviorofthehomeentertainmentdevices,users canhaveanelevatedInternetexperienceinacost-effectivemanner. Hand-held devices will increase dramatically in the upcoming years, which is predictable from the prominent shift of the tech industry towards hand-held devicemarket,speciallythesmartphonesandtabletPCs.Equippedwithpowerful multimedia(e.g.,HDvideocamera,audio,GPSetc.)andnetworking(e.g.,Wi-fi,4G LTE,Bluetoothetc.)capabilities,thesedevicesaregeneratingvoluminouscontent. These devices are contributing significantly to the popular social networking sites (e.g., Facebook) and online multimedia streaming portals (e.g., YouTube). As of February2010,YouTubeservedonebillionvideosperday,andmoreinterestingly, it would take 35h to watch the videos uploaded to YouTube per minute. Online storageandbackupsolutionsareyetanotherclassofInternetapplicationsthatare consistently gaining popularity. These solutions offer reliable online storage and easeofaccessovertheInternet. Cloud-based solutions for online storage, backup and sharing of multimedia content over the Web have a few inherent drawbacks as pointed out in [3]. First, voluminous multimedia content has to be uploaded to the cloud-stores, which generates significant amount of Internet traffic. Second, building new data-centers willgeneratemorepressureontheenergysector;asofFebruary,2009,Microsoft’s datacenterinQuincy,Washingtonwasconsuming48MWofelectricity–sufficient topoweraround40,000homes[1]. Transportinghugevolumesofusergenerated,multimediacontenttodistantdata- centersisnotascalablesolution.Rather,semi-persistentdeviceslike,set-topboxes, home-gateways, network-attached storages (NAS) etc., with network and storage capabilities and residing near multimedia content production and consumption R.AhmedandR.Boutaba,CollaborativeWebHosting,SpringerBriefsinComputer 1 Science,DOI10.1007/978-3-319-03807-0__1,©TheAuthor(s)2014