ebook img

Advanced Techniques in Web Intelligence - I PDF

277 Pages·2010·5.152 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Advanced Techniques in Web Intelligence - I

JuanD.Vela´squezandLakhmiC.Jain(Eds.) AdvancedTechniquesinWebIntelligence–1 StudiesinComputationalIntelligence,Volume311 Editor-in-Chief Prof.JanuszKacprzyk SystemsResearchInstitute PolishAcademyofSciences ul.Newelska6 01-447Warsaw Poland E-mail:[email protected] Furthervolumesofthisseriescanbefoundonour homepage:springer.com Vol.299.VassilSgurev,MinchoHadjiski,and JanuszKacprzyk(Eds.) Vol.288.I-HsienTing,Hui-JuWu,Tien-HwaHo(Eds.) IntelligentSystems:FromTheorytoPractice,2010 MiningandAnalyzingSocialNetworks,2010 ISBN978-3-642-13427-2 ISBN978-3-642-13421-0 Vol.300.BaodingLiu(Ed.) UncertaintyTheory,2010 Vol.289.AnneHa˚kansson,RonaldHartung,and ISBN978-3-642-13958-1 NgocThanhNguyen(Eds.) AgentandMulti-agentTechnologyforInternetand Vol.301.GiulianoArmano,MarcodeGemmis, EnterpriseSystems,2010 GiovanniSemeraro,andEloisaVargiu(Eds.) ISBN978-3-642-13525-5 IntelligentInformationAccess,2010 ISBN978-3-642-13999-4 Vol.290.WeiliangXuandJohnBronlund Vol.302.BijayaKetanPanigrahi,AjithAbraham, MasticationRobots,2010 andSwagatamDas(Eds.) ISBN978-3-540-93902-3 ComputationalIntelligenceinPowerEngineering,2010 ISBN978-3-642-14012-9 Vol.291.ShimonWhiteson AdaptiveRepresentationsforReinforcementLearning,2010 Vol.303.JoachimDiederich,CengizGunay,and ISBN978-3-642-13931-4 JamesM.Hogan RecruitmentLearning,2010 Vol.292.FabriceGuillet,GilbertRitschard, ISBN978-3-642-14027-3 HenriBriand,DjamelA.Zighed(Eds.) Vol.304.AnthonyFinnandLakhmiC.Jain(Eds.) AdvancesinKnowledgeDiscoveryandManagement,2010 InnovationsinDefenceSupportSystems–1,2010 ISBN978-3-642-00579-4 ISBN978-3-642-14083-9 Vol.293.AnthonyBrabazon,MichaelO’Neill,and Vol.305.StefaniaMontaniandLakhmiC.Jain(Eds.) DietmarMaringer(Eds.) SuccessfulCase-BasedReasoningApplications–1,2010 NaturalComputinginComputationalFinance,2010 ISBN978-3-642-14077-8 ISBN978-3-642-13949-9 Vol.306.TruHoangCao Vol.294.ManuelF.M.Barros,JorgeM.C.Guilherme,and ConceptualGraphsandFuzzyLogic,2010 NunoC.G.Horta ISBN978-3-642-14086-0 AnalogCircuitsandSystemsOptimizationbasedon Vol.307.AnupamShukla,RituTiwari,andRahulKala EvolutionaryComputationTechniques,2010 TowardsHybridandAdaptiveComputing,2010 ISBN978-3-642-12345-0 ISBN978-3-642-14343-4 Vol.295.RogerLee(Ed.) Vol.308.RogerNkambou,JacquelineBourdeau,and SoftwareEngineering,ArtificialIntelligence,Networkingand RiichiroMizoguchi(Eds.) Parallel/DistributedComputing,2010 AdvancesinIntelligentTutoringSystems,2010 ISBN978-3-642-13264-3 ISBN978-3-642-14362-5 Vol.296.RogerLee(Ed.) Vol.309.IsabelleBichindaritz,LakhmiC.Jain,SachinVaidya, SoftwareEngineeringResearch,Managementand andAshleshaJain(Eds.) Applications,2010 ComputationalIntelligenceinHealthcare4,2010 ISBN978-3-642-13272-8 ISBN978-3-642-14463-9 Vol.310.DiptiSrinivasanandLakhmiC.Jain(Eds.) Vol.297.TaniaTronco(Ed.) InnovationsinMulti-AgentSystemsandApplications–1, NewNetworkArchitectures,2010 2010 ISBN978-3-642-13246-9 ISBN978-3-642-14434-9 Vol.298.AdamWierzbicki Vol.311.JuanD.Vela´squezandLakhmiC.Jain(Eds.) TrustandFairnessinOpen,DistributedSystems,2010 AdvancedTechniquesinWebIntelligence–1,2010 ISBN978-3-642-13450-0 ISBN978-3-642-14460-8 Juan D.Vela´squezand Lakhmi C.Jain (Eds.) Advanced Techniques in Web Intelligence – 1 123 Dr.JuanD.Vela´squez DepartmentofIndustrialEngineering SchoolofEngineeringandScience UniversityofChileRepublica701 Santiago,P.C.837-0720 Chile E-mail:[email protected] Prof.LakhmiC.Jain SchoolofElectricalandInformationEngineering UniversityofSouthAustralia Adelaide MawsonLakesCampus SouthAustralia Australia E-mail:[email protected] ISBN 978-3-642-14460-8 e-ISBN 978-3-642-14461-5 DOI 10.1007/978-3-642-14461-5 Studiesin Computational Intelligence ISSN1860-949X Library of Congress Control Number:2010932016 (cid:2)c 2010 Springer-VerlagBerlin Heidelberg Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpart of the material is concerned, specifically therights of translation, reprinting,reuse ofillustrations, recitation,broadcasting, reproductiononmicrofilm orinanyother way, and storage in data banks. Duplication of this publication or parts thereof is permittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution undertheGerman Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset&CoverDesign:ScientificPublishing ServicesPvt. Ltd., Chennai, India. Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com Preface The term Web Intelligence is defined as a new line of scientific research and development,whichisusedtoexplorethefundamentalrolesandpracticalim- pactofArtificialIntelligencetogetherwithadvancedInformationTechnology and its effect on the future generations of Web-empowered products. These include systems, services, amongst other activities, all of which are carried out by the Web Intelligence Consortium (http://wi-consortium.org/). Web Intelligence wasfirstcoinedinthe late1999’s.Fromthat time,many newalgorithms,methodsandtechniquesweredevelopedandusedextracting both knowledge and wisdom from the data originating from the Web. A number of initiatives have been adopted by the world communities in this areaofstudy.Theseincludebooks,conferenceseries,andjournals.Thislatest book encomposes a variaty of up to date state of the art approaches in Web Intelligence. Furthermore, it hightlights successful applications in this area of researchwithin a practical context. The present book aims to introduce a selection of research applications in the area of Web Intelligence. We have selected a number of researchers around the world, all of which are experts in their respective research areas. Each chapter focuses on a specific topic in the field of Web Intelligence. Furthermorethebookconsistsofanumberofinnovativeproposalswhichwill contribute to the development of web science and technology for the long- term future, rendering this collective work a valuable piece of knowledge. It was a great honour to have collaborated with this team of very talented experts. We also wish to express our grattitude to those who reviewed this book offering their constructive feedbacks. Our thanks are also due to the Springer-Verlag staff for their excellent support given to us during the preparation of this manuscript. VI Preface We are also indebted to the staff of the Millennium Institute on Complex EngineeringSystems(ICM:P-05-004-F,CONICYT:FBO16),whichalsopar- tially funded this work. Santiago, Chile, Juan D. Vela´squez Adelaide, Australia Lakhmi C. Jain June 2010 Contents 1 Innovations in Web Intelligence.......................... 1 Gasto´n L’Huillier, Juan D. Vel´asquez, Lakhmi C. Jain 1.1 Introduction......................................... 1 1.2 An Overview of the Advanced Techniques Used in Web Intelligence .......................................... 2 1.3 Chapters Included in the Book......................... 4 1.4 Summary ........................................... 5 1.5 Resources ........................................... 5 1.5.1 Journals ..................................... 5 1.5.2 Conferences .................................. 7 1.5.3 Conferences Proceedings ....................... 7 1.5.4 Books ....................................... 13 References ................................................ 16 2 Advanced Techniques in Web Data Pre-processing and Cleaning................................................. 19 Pablo E. Rom´an, Robert F. Dell, Juan D. Vela´squez 2.1 Introduction......................................... 19 2.2 The Nature of the Web Data: General Characteristics and Quality Issues.................................... 21 2.2.1 Web Content ................................. 21 2.2.2 Web Site Structure............................ 22 2.2.3 Web User Session ............................. 23 2.2.4 Privacy Issues ................................ 24 2.2.5 Quality Measures ............................. 25 2.3 Transforming Hyperlinks to a Graph Representation...... 27 2.3.1 Hyperlink Retrieval Issues...................... 28 2.3.2 Crawler Processing............................ 28 2.3.3 Large Sparse Distributed Storage ............... 29 2.4 Transforming Web Content into a Feature Vector......... 29 VIII Contents 2.4.1 Cleaning Web Content......................... 30 2.4.2 Vector Representation of Content ............... 30 2.4.3 Web Object Extraction ........................ 31 2.5 Web Session Reconstruction ........................... 33 2.5.1 Representation of the Trails .................... 34 2.5.2 Proactive Sessionization ....................... 34 2.5.3 Reactive Sessionization ........................ 37 2.5.4 Sessions in Dynamic Environments.............. 41 2.5.5 Identifying Session Outliers .................... 41 2.6 Summary ........................................... 41 References ................................................ 42 3 Web Pattern Extraction and Storage .................... 49 V´ıctor L. Rebolledo, Gasto´n L’Huillier, Juan D. Vel´asquez 3.1 Introduction......................................... 50 3.1.1 From Data to Knowledge ...................... 51 3.1.2 About Knowledge Representation ............... 52 3.1.3 General Terms and Definition of Terms .......... 54 3.2 Feature Selection for Web Data ........................ 55 3.2.1 Feature Selection Techniques ................... 56 3.2.2 Feature Extraction Techniques.................. 57 3.3 Pattern Extraction from Web Data ..................... 59 3.3.1 Supervised Learning Techniques ................ 59 3.3.2 Unsupervised Techniques ...................... 62 3.3.3 Ensemble Meta-algorithms ..................... 63 3.4 Web Mining Model Assessment ........................ 64 3.4.1 Evaluation of Classifiers ....................... 64 3.4.2 Evaluation of Regression Models ................ 66 3.4.3 MDL Principle ............................... 67 3.4.4 Evaluation of Clustering Models ................ 67 3.4.5 Evaluating Association Rules ................... 68 3.4.6 Other Evaluation Criteria...................... 68 3.5 A Pattern Webhouse Application....................... 68 3.5.1 Data Webhouse Overview...................... 69 3.5.2 About PMML ................................ 70 3.5.3 Application .................................. 71 3.6 Summary ........................................... 73 References ................................................ 74 4 Web Content Mining Using MicroGenres ................ 79 V´aclav Sn´aˇsel, Miloˇs Kudˇelka, Zdenˇek Hor´ak 4.1 Introduction......................................... 79 4.2 Web Content Mining Summary ........................ 80 4.3 Web Usability Basics ................................. 82 4.3.1 Web Design Pattern Basics..................... 85 4.4 Recent Methods...................................... 87 Contents IX 4.5 MicroGenre ......................................... 96 4.5.1 Pattrio Method............................... 97 4.5.2 Analysis ..................................... 98 4.6 Experiments......................................... 100 4.6.1 Accuracy of Pattrio Method.................... 101 4.6.2 Analysis by Nonnegative Matrix Factorization .... 104 4.7 Summary ........................................... 106 References ................................................ 106 5 Web Structure Mining................................... 113 Ricardo Baeza-Yates, Paolo Boldi 5.1 Introduction......................................... 113 5.2 The Web as a Graph: Facts, Myths, and Traps........... 114 5.2.1 The Web Graph .............................. 115 5.2.2 Some Structural Properties of the Web .......... 117 5.2.3 Web Graph Models ........................... 119 5.3 Link Analysis........................................ 122 5.3.1 PageRank.................................... 123 5.3.2 HITS........................................ 126 5.3.3 Spam-Related LAR Algorithms ................. 127 5.4 Structural Clustering and Communities ................. 128 5.5 Algorithmic Issues.................................... 129 5.5.1 Streaming and Semistreaming Computation Models ...................................... 130 5.5.2 Web Graph Compression....................... 133 5.6 Summary ........................................... 138 References ................................................ 138 6 Web Usage Mining ...................................... 143 Pablo E. Rom´an, Gasto´n L’Huillier, Juan D. Vel´asquez 6.1 Introduction......................................... 143 6.2 Characterizing the Web User Browsing Behaviour ........ 144 6.2.1 Representative Variables ....................... 145 6.2.2 Empirical Statistics Studies about Web Usage .... 148 6.2.3 Amateur and Expert Users..................... 149 6.3 Representing the Web User Browsing Behaviour and Preferences.......................................... 150 6.3.1 Vector Representations ........................ 150 6.3.2 Incorporating Content Valuations ............... 151 6.3.3 Web Object Valuation......................... 151 6.3.4 Graph Representation ......................... 152 6.3.5 The High Dimensionality of Representation ...... 152 6.4 Extracting Patterns from Web User Browsing Behaviour ........................................... 153 6.4.1 Clustering Analysis ........................... 153 6.4.2 Decision Rules................................ 154 X Contents 6.4.3 Integer Programming.......................... 155 6.4.4 Markov Chain Models ......................... 155 6.4.5 Mixture of Markov Models ..................... 156 6.4.6 Hidden Markov Models ........................ 156 6.4.7 Conditional Random Fields .................... 156 6.4.8 Variable Length Markov Chain (VLMC) ......... 156 6.4.9 Biology Inspired Web User Model............... 157 6.4.10 Ant Colony Models ........................... 158 6.4.11 Matrix Factorization Methods .................. 158 6.5 Application of Web Usage Mining ...................... 158 6.5.1 Adaptive Web Sites ........................... 159 6.5.2 Web Personalization........................... 159 6.5.3 Recommendation ............................. 159 6.6 Summary ........................................... 160 References ................................................ 160 7 User-Centric Web Services for Ubiquitous Computing ... 167 In-Young Ko, Hyung-Min Koo, Angel Jimenez-Molina 7.1 Introduction......................................... 168 7.2 Essential Requirements for Providing Web Services in Ubicomp ............................................ 169 7.2.1 User Centricity ............................... 169 7.2.2 Context Awareness............................ 170 7.2.3 Composability and Reusability.................. 172 7.2.4 Dynamicity .................................. 172 7.3 Current Research in Ubicomp Web Services.............. 173 7.3.1 Gaia ........................................ 173 7.3.2 Aura ........................................ 174 7.3.3 ABC Framework.............................. 174 7.3.4 IST Amigo................................... 175 7.4 Task-Oriented Service Framework for Ubiquitous Computing .......................................... 175 7.4.1 Task and Action Definition..................... 175 7.4.2 OverallArchitecture of the Framework .......... 176 7.4.3 Task and Action Semantic Representation Model ....................................... 178 7.4.4 Processes for Web Service Composition and Execution in Ubicomp ......................... 181 7.4.5 Implementation and Evaluation................. 183 7.5 Summary ........................................... 186 References ................................................ 187

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.