Forward, Back and Home Again Analyzing User Behavior on the Web Eelco Herder Graduation Committee Prof.dr.ir. W.H.M. Zijm (University of Twente), chairman and secretary Prof.dr.ir. A. Nijholt (University of Twente), promotor Dr. E.M.A.G. van Dijk, (University of Twente), assistant promotor Prof.dr. P.M.E. De Bra (Eindhoven University of Technology) Prof.dr. N. Henze (University of Hannover) Prof.dr. T.W.C. Huibers (University of Twente) Prof.dr. M.A. Neerincx (Delft University of Technology) Dr. S. Weibelzahl (National College of Ireland) Dr. J. Zwiers (University of Twente) CTIT Ph.D. Thesis Series No. 06-83. Centre for Telematics and Information Technology, University of Twente, P.O. Box 217, 7500 AE Enschede, The Netherlands. SIKS Dissertation Series No. 2006-08. The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems. The research reported in this thesis was carried out in the project Personal Assistance for onLine Services (PALS) Anywhere. This project was supported by grant MMI0122NC from the Dutch Ministry of Eco- nomic Affairs, as part of the Dutch Innovative Re- search Program IOP-MMI. Copyright (cid:13)c 2006 by Eelco Herder Herder, Eelco Forward, Back and Home Again - Analyzing User Behavior on the Web Ph.D. Thesis, University of Twente, 2006 Includes bibliographical references Cover design by Eelco Herder Printed by F&N Boekservice, Amsterdam ISBN-10: 90-73838-73-8 ISBN-13: 978-90-73838-73-4 ISSN: 1381-3617, No. 06-83 (CTIT Ph.D. Thesis Series) FORWARD, BACK AND HOME AGAIN ANALYZING USER BEHAVIOR ON THE WEB Dissertation to obtain the doctor’s degree at the University of Twente, on the authority of the rector magnificus, prof.dr. W.H.M. Zijm, on account of the decision of the graduation committee, to be publicly defended on Thursday April 13, 2006 at 16.45 by Eelco Herder born on December 14, 1977 in Leeuwarden, The Netherlands This dissertation is approved by the promotor, prof.dr.ir. A. Nijholt, and by the assistant promotor, dr. E.M.A.G. van Dijk. Preface Many people from my generation and younger can hardly imagine what life would be like without the World Wide Web. Currently, a large population uses the Web for keeping up-to-date, planning trips, communication, entertainment, shopping, banking, and many other activities on a daily basis. At first sight, the Web interface is extremely simple and intuitive: one types in a Web address or a search query, and there you go. Whereas this might be true for many relatively simple activities, many issues arise when working on more complex, unfamiliar, or interrelated tasks. A large number of users would still feel extremely challenged if they would have to arrange all necessary steps for booking a trip to London - flight, hotel, travel plan, theatre tickets, restaurants, attractions - online. And even though many users might not know it, they often employ coping strategies in their Web interaction. In October 2001, I started working as a PhD student on the PALS Anywhere project, which was aimed at developing concepts for personal assistance for online services. Intheinitialphaseoftheprojectitbecameclearthatexistingknowledge on how users interact with online information and services on the Web, was quite limited. Even though several user studies have been carried out, and even though research in the fields of adaptive hypermedia and user modeling is specifically aimedatunderstandingusersandtheiractions,noanswers-orconflictinganswers - could be found for even very obvious questions. To me, it was intriguing that so little was known on activities that we carry out on a daily basis. This is what sprung my interest in the analysis of user interaction with the Web. Given the infancy of the Web, which exists only since the early 1990s, it is not surprising that many interface concepts are not yet fully developed, even more since the Web is still subject to constant change in terms of technologies, services and usage. However, the differences in how users approach similar tasks, and the differences in performance, were striking. Even more, a closer analysis of interaction patterns revealed aspects that might seem completely obvious in retrospective, but which have apparently been left unnoticed and which were v clearlynotsupportedbyanyinterfaceconcept. Thisthesisreflectsmyexploration of these lesser known yet interesting things in everyday life, an exploration which I definitely could not have carried out on my own. There are many people who I would like to thank for their direct or indirect contributions to this thesis, to the research behind the thesis, and to the life behind the research. Undoubtedly, if I would try and mention all of them, I would forget to include some names that should be included, or I might even include some names that should be forgotten. For this reason, I will refrain from mentioning names, with a few exceptions. First of all I would like to thank my supervisors for their support, the pro- motion committee for their involvement, and my colleagues at the Human Media Interaction group for a large variety of reasons. Further acknowledgments go to my fellow PALS project members from the University of Twente, Utrecht Uni- versity, the TNO research institute in Soesterberg, and the companies involved in the project. Interesting ideas, broader insights, and valuable feedback were given to me by various researchers that I met at conferences and workshops. I have been fortunate to have had a number of close collaborations. In Ion Juvina from Utrecht University I found a partner for the two laboratory studies reported in this thesis. Our substantially differing backgrounds often resulted in good discussions and new insights, and his experience in statistics and experi- ment design proved to be invaluable and instructive. For the long-term study I found partners in Harald Weinreich, Matthias Mayer and Hartmut Obendorf from the University of Hamburg. The collaboration was intensive, at some points accompanied by frustration and arguments. For the most part, however, the co- operation was extremely pleasant and fruitful. I also would like to thank all those who participated in the user studies. Much support and motivation has also come from my family and friends, who enriched my life behind the research. I intensely enjoyed the rich cultural life at the University Campus. I made many friends at the classical student choir DrienerloosVocaalEnsemble, andIwillsorelymisstherehearsals, concerts, trips, social events, and many other activities - some better be left unmentioned. As a cofounder of the piano club Utopiano - one of a kind in the Netherlands - I would like to thank all who organized, participated in, or facilitated its activities; the concert registrations will remain a precious keepsake. I am happy that my two closest friends, Arend de Haan and Maarten Gijselaar, will stand by me as paranymphs during the defense of this thesis. Just not to disappoint anyone, thanks to all friends yet unmentioned. And, last but not least, I would like to express my gratitude to my father, mother and brother - Oebele, Betty and Rudmer; it feels great that ‘Huize Beemdgras’ is still a place that I call home. Hannover Eelco Herder March, 2006 vi Contents Preface v 1 Introduction 1 1.1 A very brief history of the Web . . . . . . . . . . . . . . . . . . . 3 1.1.1 The Internet . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2 Hypermedia . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 The World Wide Web . . . . . . . . . . . . . . . . . . . . 6 1.2 A bird’s eye view on Web technologies . . . . . . . . . . . . . . . 7 1.3 Web usability and personalization . . . . . . . . . . . . . . . . . . 8 1.3.1 Lost in hyperspace . . . . . . . . . . . . . . . . . . . . . . 9 1.3.2 Coherence in the Web . . . . . . . . . . . . . . . . . . . . 11 1.3.3 Missing hypermedia features on the Web . . . . . . . . . . 11 1.4 Research approach and thesis overview . . . . . . . . . . . . . . . 13 2 Adaptive hypermedia and personalization on the Web 17 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Adaptive hypermedia . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2.1 Goals of adaptive hypermedia . . . . . . . . . . . . . . . . 19 2.2.2 Personalization versus overall design improvement . . . . . 20 2.2.3 Where can it be used . . . . . . . . . . . . . . . . . . . . . 21 2.2.4 To what can it be adapted . . . . . . . . . . . . . . . . . . 23 2.2.5 What can be adapted . . . . . . . . . . . . . . . . . . . . . 26 2.3 User modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3.1 Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . 28 2.3.2 Knowledge representation . . . . . . . . . . . . . . . . . . 30 2.3.3 Statistical methods . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Structure and evaluation of adaptive systems . . . . . . . . . . . . 34 2.4.1 Layered Evaluation . . . . . . . . . . . . . . . . . . . . . . 35 2.4.2 Separation and reintegration of concerns . . . . . . . . . . 38 vii 3 Empirical and theoretical models of Web navigation 41 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Laboratory studies . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 Field studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.3.1 Observational studies . . . . . . . . . . . . . . . . . . . . . 45 3.3.2 Long-term studies . . . . . . . . . . . . . . . . . . . . . . . 49 3.4 Theoretical models . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.4.1 Information foraging theory . . . . . . . . . . . . . . . . . 56 3.4.2 The CoLiDeS model . . . . . . . . . . . . . . . . . . . . . 58 3.4.3 Cognitive architectures . . . . . . . . . . . . . . . . . . . . 60 3.5 Putting the parts together . . . . . . . . . . . . . . . . . . . . . . 61 3.5.1 Searching, browsing and backtracking . . . . . . . . . . . . 62 3.5.2 Recurrent behavior and revisits . . . . . . . . . . . . . . . 63 3.5.3 Sniffing around for enhanced information scent . . . . . . . 64 3.5.4 Conclusion and outlook . . . . . . . . . . . . . . . . . . . . 67 4 Web usage mining - finding patterns in Web navigation 69 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.2 Data collection and data preparation . . . . . . . . . . . . . . . . 72 4.2.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . 72 4.2.2 Data cleaning and data enrichment . . . . . . . . . . . . . 78 4.2.3 Data representation and enrichment . . . . . . . . . . . . . 83 4.3 A graph-based framework . . . . . . . . . . . . . . . . . . . . . . 84 4.3.1 Graph models of hypermedia structure and user navigation 85 4.3.2 General Web structures . . . . . . . . . . . . . . . . . . . 89 4.3.3 Page measures . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.4 Global measures . . . . . . . . . . . . . . . . . . . . . . . . 96 4.3.5 Navigation measures . . . . . . . . . . . . . . . . . . . . . 101 4.4 Aggregated measures and predictive modeling . . . . . . . . . . . 102 4.4.1 Aggregate measures . . . . . . . . . . . . . . . . . . . . . . 103 4.4.2 Machine learning . . . . . . . . . . . . . . . . . . . . . . . 105 4.5 The Navigation Visualizer . . . . . . . . . . . . . . . . . . . . . . 108 4.5.1 Background and related work . . . . . . . . . . . . . . . . 109 4.5.2 Design rationale . . . . . . . . . . . . . . . . . . . . . . . . 111 4.5.3 System overview . . . . . . . . . . . . . . . . . . . . . . . 112 4.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5 Laboratory studies 119 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.2 Discovery of individual navigation styles . . . . . . . . . . . . . . 120 5.2.1 Individual differences in Web navigation . . . . . . . . . . 120 5.2.2 User navigation styles . . . . . . . . . . . . . . . . . . . . 121 viii 5.2.3 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 124 5.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3 The impact of link suggestions . . . . . . . . . . . . . . . . . . . . 134 5.3.1 Individual differences, disorientation and navigation styles 135 5.3.2 Experimental setup . . . . . . . . . . . . . . . . . . . . . . 136 5.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6 Long-term study 149 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.2 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.2.1 Data capturing procedure . . . . . . . . . . . . . . . . . . 151 6.3 General statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 6.3.1 Activities carried out on the Web . . . . . . . . . . . . . . 154 6.4 Revisitation on the Web . . . . . . . . . . . . . . . . . . . . . . . 156 6.4.1 How often do people revisit Web pages . . . . . . . . . . . 157 6.4.2 Within-session and cross-session revisits . . . . . . . . . . 160 6.4.3 Characterizing page revisits . . . . . . . . . . . . . . . . . 162 6.4.4 Support for recent and frequent revisits . . . . . . . . . . . 164 6.4.5 Popular sites and pages in popular sites . . . . . . . . . . . 167 6.4.6 Support for less frequent revisits . . . . . . . . . . . . . . . 168 6.5 Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.5.1 General statistics . . . . . . . . . . . . . . . . . . . . . . . 171 6.5.2 Query formulation and modification . . . . . . . . . . . . . 172 6.5.3 Query length . . . . . . . . . . . . . . . . . . . . . . . . . 172 6.5.4 Differences in search strategies . . . . . . . . . . . . . . . . 174 6.5.5 Search and recurrent behavior . . . . . . . . . . . . . . . . 175 6.6 Browser tools used . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.6.1 General statistics . . . . . . . . . . . . . . . . . . . . . . . 178 6.6.2 Form submission and backtracking . . . . . . . . . . . . . 180 6.6.3 Multiple windows and the back button . . . . . . . . . . . 181 6.6.4 Multiple windows . . . . . . . . . . . . . . . . . . . . . . . 183 6.6.5 Windows, tabs and backtracking . . . . . . . . . . . . . . 184 6.7 Categorization of navigation sessions . . . . . . . . . . . . . . . . 185 6.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 6.8.1 Support for backtracking . . . . . . . . . . . . . . . . . . . 188 6.8.2 Support for recurrent behavior . . . . . . . . . . . . . . . . 190 6.8.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . 191 ix 7 Conclusions 193 7.1 Answers to research questions . . . . . . . . . . . . . . . . . . . . 194 7.2 Theoretical insights . . . . . . . . . . . . . . . . . . . . . . . . . . 197 7.2.1 Models of Web navigation . . . . . . . . . . . . . . . . . . 197 7.2.2 Navigation styles as evaluation measures . . . . . . . . . . 199 7.2.3 Client-side Web usage analysis . . . . . . . . . . . . . . . . 200 7.3 Directions for navigation support . . . . . . . . . . . . . . . . . . 201 7.3.1 User support for information gathering . . . . . . . . . . . 201 7.3.2 Support for backtracking . . . . . . . . . . . . . . . . . . . 202 7.3.3 Supporting recurrent behavior . . . . . . . . . . . . . . . . 203 7.3.4 Adapt to the user’s task context . . . . . . . . . . . . . . . 204 7.3.5 Summary - the future browser . . . . . . . . . . . . . . . . 205 7.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Bibliography 208 Abstract 223 Samenvatting 227 x
Description: