Academic Search Engines CHANDOS INFORMATION PROFESSIONAL SERIES Series Editor: Ruth Rikowski (email: [email protected]) Chandos’ new series of books is aimed at the busy information professional. They have been specially commissioned to provide the reader with an authoritative view of current thinking. They are designed to provide easy-to-read and (most importantly) practical coverage of topics that are of interest to librarians and other information professionals. If you would like a full listing of current and forthcoming titles, please visit www.chandospublishing.com. New authors: we are always pleased to receive ideas for new titles; if you would like to write a book for Chandos, please contact Dr Glyn Jones on [email protected] or telephone +44 (0) 1865 843000. Academic Search Engines A quantitative outlook J ´ L O OSE UIS RTEGA AMSTERDAM (cid:129) BOSTON (cid:129) CAMBRIDGE (cid:129) HEIDELBERG (cid:129) LONDON NEW YORK (cid:129) OXFORD (cid:129) PARIS (cid:129) SAN DIEGO SAN FRANCISCO (cid:129) SINGAPORE (cid:129) SYDNEY (cid:129) TOKYO Chandos Publishing is an imprint of Elsevier Chandos Publishing Elsevier Limited The Boulevard Langford Lane Kidlington Oxford OX5 1GB UK store.elsevier.com/Chandos-Publishing-/IMP_207/ Chandos Publishing is an imprint of Elsevier Limited Tel: +44 (0) 1865 843000 Fax: +44 (0) 1865 843010 store.elsevier.com First published in 2014 ISBN 978-1-84334-791-0 (print) ISBN 978-1-78063-472-2 (online) Chandos Information Professional Series ISSN: 2052-210X (print) and ISSN: 2052-2118 (online) Library of Congress Control Number: 2014946174 © J.L. Ortega, 2014 British Library Cataloguing-in-Publication Data. A catalogue record for this book is available from the British Library. All rights reserved. No part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted, in any form, or by any means (electronic, mechanical, photocopying, recording or otherwise) without the prior written permission of the publishers. This publication may not be lent, resold, hired out or otherwise disposed of by way of trade in any form of binding or cover other than that in which it is published without the prior consent of the publishers. Any person who does any unauthorised act in relation to this publication may be liable to criminal prosecution and civil claims for damages. The publishers make no representation, express or implied, with regard to the accuracy of the information contained in this publication and cannot accept any legal responsibility or liability for any errors or omissions. The material contained in this publication constitutes general guidelines only and does not represent to be advice on any particular matter. No reader or purchaser should act on the basis of material contained in this publication without first taking professional advice appropriate to their particular circumstances. All screenshots in the publication are the copyright of the website owner(s), unless indicated otherwise. Typeset by Domex e-Data Pvt. Ltd., India Printed in the UK and USA. To everyone who was interested in this book – family, friends, colleagues – and especially mi madre y mi padre, quienes se empeñaron en que este libro saliera a la luz This page intentionally left blank Contents List of figures and tables xi Preface xvii About the author xxi 1 Introduction 1 What is an academic search engine? 3 Challenges for an academic search engine 3 The evolution of academic search engines 6 Future perspectives 8 2 CiteSeerx: a scientific engine for scientists 11 Autonomous citation indexing 12 A focus on computer science 13 A searchable digital library 15 Searching for authors and citations 17 Parsing mistakes 20 Other ‘Seers’: the CiteSeerx lab 24 A pioneer in citation indexing 27 3 Scirus: a multi-source searcher 29 Web pages and authoritative sources 30 Crawling and data extraction 39 Source filtering 41 Ranking on links 43 A missed opportunity 44 vii Academic Search Engines 4 AMiner: science networking as an information source 47 A networked engine 48 A chaotic design 49 Based on bibliographic databases 57 Searching only in documents 58 Exhaustive author profiles 61 PatentMiner 67 A half-academic search engine 68 5 Microsoft Academic Search: the multi-object engine 71 The object-level vertical search engine 73 Slow content updating speed 74 Filtering across the directory 79 A multidimensional ranking 81 A composition based on profiles 82 Visualization: graphs as research assessment tools 103 More a directory than an engine 106 6 Google Scholar: on the shoulders of a giant 109 A specialization of Google 110 Feeding back its own sources 111 The opacity of results 119 Google Scholar’s additional services 124 The most exhaustive academic search engine 138 7 Other academic search engines 143 BASE 144 Q-Sensei Scholar 148 WorldWideScience 152 viii Contents 8 A comparative analysis 159 Functioning 159 Structure 162 Coverage 163 Searching 175 A heterogeneous sample 178 9 Final remarks 179 References 183 Index 193 ix