ebook img

Introduction to Apache Hadoop & Pig - Indiana University PDF

149 Pages·2010·6.4 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Introduction to Apache Hadoop & Pig - Indiana University

Introduction to Apache Hadoop & Pig Milind Bhandarkar ([email protected]) Y!IM: gridsolutions Wednesday, 22 September 2010 Agenda • Apache Hadoop • Map-Reduce • Distributed File System • Writing Scalable Applications • Apache Pig • Q & A 2 Wednesday, 22 September 2010 Hadoop: Behind Ever y Click At Yahoo! 3 Wednesday, 22 September 2010 Hadoop At Yahoo! (Some Statistics) • 40,000 + machines in 20+ clusters • Largest cluster is 4,000 machines • 6 Petabytes of data (compressed, unreplicated) • 1000+ users • 200,000+ jobs/day 4 Wednesday, 22 September 2010 !"# *'"# $"# )$1#2345346# +%"#78#29-4/:3# *""# %"# +;<#;-=9>?0#@-A6# &"# ) % B/.--C#2345346# - , +'"# B/.--C#29-4/:3#D78E# . 9&5:2) '"#- ) , % + /-#($4;#') , ) 0 * 2 # 1 ("#) +45,'4,) % & ( 678&40) +""# 0 ' , & / % )"# $ # 3,%,&-4") " ! *"# '"# +"# ,-./0# "# "# *""&# *""%# *""$# *""!# *"+"# Wednesday, 22 September 2010 Sample Applications • Data analysis is the inner loop at Yahoo! • Data ⇒ Information ⇒ Value • Log processing: Analytics, repor ting, buzz • Search index • Content Optimization, Spam filters • Computational Adver tising 6 Wednesday, 22 September 2010 BEHIND EVERY CLICK Wednesday, 22 September 2010 BEHIND EVERY CLICK Wednesday, 22 September 2010 Wednesday, 22 September 2010 Who Uses Hadoop ? Wednesday, 22 September 2010

Description:
•Input: Web pages input.FileInputFormat: Total input paths to process : 4 mapred.JobClient: Running job: job_200904270516_5709 mapred.JobClient: map 0% reduce 0%
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.