ebook img

Mahout in Action PDF

415 Pages·2011·8.7 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Mahout in Action

IN ACTION Sean Owen Robin Anil Ted Dunning Ellen Friedman M A N N I N G Mahout in Action Mahout in Action SEAN OWEN ROBIN ANIL TED DUNNING ELLEN FRIEDMAN MANNING SHELTER ISLAND For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Email: [email protected] ©2012 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Katharine Osborne 20 Baldwin Road Copyeditor: Andy Carroll PO Box 261 Proofreader: Melody Dolab Shelter Island, NY 11964 Typesetter: Dottie Marsico Cover designer: Marija Tudor ISBN 9781935182689 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – MAL – 16 15 14 13 12 11 brief contents 1 ■ Meet Apache Mahout 1 PART 1 RECOMMENDATIONS ...................................................11 2 ■ Introducing recommenders 13 3 ■ Representing recommender data 26 4 ■ Making recommendations 41 5 ■ Taking recommenders to production 70 6 ■ Distributing recommendation computations 91 PART 2 CLUSTERING.............................................................115 7 ■ Introduction to clustering 117 8 ■ Representing data 130 9 ■ Clustering algorithms in Mahout 145 10 ■ Evaluating and improving clustering quality 184 11 ■ Taking clustering to production 198 12 ■ Real-world applications of clustering 210 v vi BRIEF CONTENTS PART 3 CLASSIFICATION........................................................225 13 ■ Introduction to classification 227 14 ■ Training a classifier 255 15 ■ Evaluating and tuning a classifier 281 16 ■ Deploying a classifier 307 17 ■ Case study: Shop It To Me 341 contents preface xvii acknowledgments xix about this book xx about multimedia extras xxiii about the cover illustration xxv 1 Meet Apache Mahout 1 1.1 Mahout’s story 2 1.2 Mahout’s machine learning themes 3 Recommender engines 3 ■ Clustering 3 ■ Classification 4 1.3 Tackling large scale with Mahout and Hadoop 5 1.4 Setting up Mahout 6 Java and IDEs 7 ■ Installing Maven 8 ■ Installing Mahout 8 ■ Installing Hadoop 9 1.5 Summary 9 PART 1 RECOMMENDATIONS...........................................11 2 Introducing recommenders 13 2.1 Defining recommendation 14 vii viii CONTENTS 2.2 Running a first recommender engine 15 Creating the input 15 ■ Creating a recommender 16 Analyzing the output 17 2.3 Evaluating a recommender 18 Training data and scoring 18 ■ Running RecommenderEvaluator 19 ■ Assessing the result 20 2.4 Evaluating precision and recall 21 Running RecommenderIRStatsEvaluator 21 ■ Problems with precision and recall 23 2.5 Evaluating the GroupLens data set 23 Extracting the recommender input 23 ■ Experimenting with other recommenders 24 2.6 Summary 25 3 Representing recommender data 26 3.1 Representing preference data 27 The Preference object 27 ■ PreferenceArray and implementations 28 ■ Speeding up collections 28 FastByIDMap and FastIDSet 29 3.2 In-memory DataModels 30 GenericDataModel 30 ■ File-based data 30 ■ Refreshable components 31 ■ Update files 32 ■ Database-based data 32 JDBC and MySQL 32 ■ Configuring via JNDI 33 Configuring programmatically 34 3.3 Coping without preference values 34 When to ignore values 35 ■ In-memory representations without preference values 36 ■ Selecting compatible implementations 37 3.4 Summary 39 4 Making recommendations 41 4.1 Understanding user-based recommendation 42 When recommendation goes wrong 42 ■ When recommendation goes right 42 4.2 Exploring the user-based recommender 43 The algorithm 43 ■ Implementing the algorithm with GenericUserBasedRecommender 44 ■ Exploring with GroupLens 45 ■ Exploring user neighborhoods 46 Fixed-size neighborhoods 46 ■ Threshold-based neighborhood 47 CONTENTS ix 4.3 Exploring similarity metrics 48 Pearson correlation–based similarity 48 ■ Pearson correlation problems 50 ■ Employing weighting 50 ■ Defining similarity by Euclidean distance 51 ■ Adapting the cosine measure similarity 52 ■ Defining similarity by relative rank with the Spearman correlation 52 ■ Ignoring preference values in similarity with the Tanimoto coefficient 54 ■ Computing smarter similarity with a log-likelihood test 55 ■ Inferring preferences 56 4.4 Item-based recommendation 56 The algorithm 57 ■ Exploring the item-based recommender 58 4.5 Slope-one recommender 59 The algorithm 60 ■ Slope-one in practice 61 ■ DiffStorage and memory considerations 62 ■ Distributing the precomputation 62 4.6 New and experimental recommenders 63 Singular value decomposition–based recommenders 63 Linear interpolation item–based recommendation 64 Cluster-based recommendation 65 4.7 Comparison to other recommenders 66 Injecting content-based techniques into Mahout 66 Looking deeper into content-based recommendation 67 Comparison to model-based recommenders 67 4.8 Summary 68 5 Taking recommenders to production 70 5.1 Analyzing example data from a dating site 71 5.2 Finding an effective recommender 72 User-based recommenders 73 ■ Item-based recommenders 74 Slope-one recommender 75 ■ Evaluating precision and recall 75 Evaluating Performance 76 5.3 Injecting domain-specific information 77 Employing a custom item similarity metric 77 ■ Recommending based on content 78 ■ Modifying recommendations with IDRescorer 79 ■ Incorporating gender in an IDRescorer 80 Packaging a custom recommender 82 5.4 Recommending to anonymous users 83 Temporary users with PlusAnonymousUserDataModel 84 Aggregating anonymous users 85 5.5 Creating a web-enabled recommender 86 Packaging a WAR file 86 ■ Testing deployment 87

Description:
SummaryMahout in Action is a hands-on introduction to machine learning with Apache Mahout. Following real-world examples, the book presents practical use cases and then illustrates how Mahout can be applied to solve them. Includes a free audio- and video-enhanced ebook. About the TechnologyA compute
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.