Kim Falk M A N N I N G Practical Recommender Systems Practical Recommender Systems KIM FALK MANNING SHELTER ISLAND For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: [email protected] © 2019 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Helen Stergius 20 Baldwin Road Production editor: Janet Vail PO Box 761 Copy editors: Katie Petito and Frances Buran Shelter Island, NY 11964 Proofreader: Elizabeth Martin Technical proofreaders: Valentin Crettaz and Furkan Kamaci Typesetter: Dottie Marsico Cover designer: Marija Tudor ISBN 9781617292705 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – SP – 24 23 22 21 20 19 To the loves of my life: my wife, Sara, and my son, Peter, the small Superhero brief contents PART 1 GETTING READY FOR RECOMMENDER SYSTEMS...................1 1 ■ What is a recommender? 3 2 ■ User behavior and how to collect it 30 3 ■ Monitoring the system 57 4 ■ Ratings and how to calculate them 77 5 ■ Non-personalized recommendations 102 6 ■ The user (and content) who came in from the cold 128 PART 2 RECOMMENDER ALGORITHMS........................................149 7 ■ Finding similarities among users and among content 151 8 ■ Collaborative filtering in the neighborhood 181 9 ■ Evaluating and testing your recommender 211 10 ■ Content-based filtering 248 11 ■ Finding hidden genres with matrix factorization 284 12 ■ Taking the best of all algorithms: implementing hybrid recommenders 329 13 ■ Ranking and learning to rank 357 14 ■ Future of recommender systems 384 vii contents preface xvii acknowledgments xix about this book xx about the author xxiii about the cover illustration xxiv PART 1 GETTING READY FOR RECOMMENDER SYSTEMS....1 1 What is a recommender? 3 1.1 Real-life recommendations 3 Recommender systems are at home on the internet 5 ■ The long tail 5 ■ The Netflix recommender system 6 ■ Recommender system definition 12 1.2 Taxonomy of recommender systems 15 Domain 15 ■ Purpose 16 ■ Context 16 ■ Personalization level 17 ■ Whose opinions 18 ■ Privacy and trustworthiness 18 ■ Interface 19 ■ Algorithms 22 1.3 Machine learning and the Netflix Prize 23 1.4 The MovieGEEKs website 24 Design and specification 26 ■ Architecture 26 1.5 Building a recommender system 28 ix