ebook img

Hadoop in Practice PDF

537 Pages·2012·21.07 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Hadoop in Practice

P IN RACTICE Alex Holmes M A N N I N G Hadoop in Practice Download from Wow! eBook <www.wowebook.com> Download from Wow! eBook <www.wowebook.com> Hadoop in Practice ALEX HOLMES MANNING SHELTER ISLAND Download from Wow! eBook <www.wowebook.com> For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Email: [email protected] ©2012 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Cynthia Kane 20 Baldwin Road Copyeditors: Bob Herbtsman, Tara Walsh PO Box 261 Proofreader: Katie Tennant Shelter Island, NY 11964 Typesetter: Gordan Salinovic Illustrator: Martin Murtonen Cover designer: Marija Tudor ISBN 9781617290237 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – MAL – 17 16 15 14 13 12 Download from Wow! eBook <www.wowebook.com> To Michal, Marie, Oliver, Ollie, Mish, and Anch Download from Wow! eBook <www.wowebook.com> Download from Wow! eBook <www.wowebook.com> brief contents PART 1 BACKGROUND AND FUNDAMENTALS . .............................1 1 ■ Hadoop in a heartbeat 3 PART 2 DATA LOGISTICS..........................................................25 2 ■ Moving data in and out of Hadoop 27 3 ■ Data serialization—working with text and beyond 83 PART 3 BIG DATA PATTERNS..................................................137 4 ■ Applying MapReduce patterns to big data 139 5 ■ Streamlining HDFS for big data 169 6 ■ Diagnosing and tuning performance problems 194 PART 4 DATA SCIENCE...........................................................251 7 ■ Utilizing data structures and algorithms 253 8 ■ Integrating R and Hadoop for statistics and more 285 9 ■ Predictive analytics with Mahout 305 vii Download from Wow! eBook <www.wowebook.com> viii BRIEF CONTENTS PART 5 TAMING THE ELEPHANT.............................................333 10 ■ Hacking with Hive 335 11 ■ Programming pipelines with Pig 359 12 ■ Crunch and other technologies 394 13 ■ Testing and debugging 410 Download from Wow! eBook <www.wowebook.com> contents preface xv acknowledgments xvii about this book xviii PART 1 BACKGROUND AND FUNDAMENTALS ......................1 1 Hadoop in a heartbeat 3 1.1 What is Hadoop? 4 1.2 Running Hadoop 14 1.3 Chapter summary 23 PART 2 DATA LOGISTICS.................................................25 2 Moving data in and out of Hadoop 27 2.1 Key elements of ingress and egress 29 2.2 Moving data into Hadoop 30 TECHNIQUE 1 Pushing system log messages into HDFS with Flume 33 TECHNIQUE 2 An automated mechanism to copy files into HDFS 43 TECHNIQUE 3 Scheduling regular ingress activities with Oozie 48 TECHNIQUE 4 Database ingress with MapReduce 53 TECHNIQUE 5 Using Sqoop to import data from MySQL 58 ix Download from Wow! eBook <www.wowebook.com>

Description:
Integrating R and Hadoop for statistics and more 285. 9 □ .. Purchase of Hadoop in Practice includes free access to a private web forum run by Man-.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.