Introduction to Apache Hadoop & Pig Milind Bhandarkar ([email protected]) Y!IM: gridsolutions Wednesday, 22 September 2010 Agenda • Apache Hadoop • Map-Reduce • Distributed File System • Writing Scalable Applications • Apache Pig • Q & A 2 Wednesday, 22 September 2010 Hadoop: Behind Ever y Click At Yahoo! 3 Wednesday, 22 September 2010 Hadoop At Yahoo! (Some Statistics) • 40,000 + machines in 20+ clusters • Largest cluster is 4,000 machines • 6 Petabytes of data (compressed, unreplicated) • 1000+ users • 200,000+ jobs/day 4 Wednesday, 22 September 2010 !"# *'"# $"# )$1#2345346# +%"#78#29-4/:3# *""# %"# +;<#;-=9>?0#@-A6# &"# ) % B/.--C#2345346# - , +'"# B/.--C#29-4/:3#D78E# . 9&5:2) '"#- ) , % + /-#($4;#') , ) 0 * 2 # 1 ("#) +45,'4,) % & ( 678&40) +""# 0 ' , & / % )"# $ # 3,%,&-4") " ! *"# '"# +"# ,-./0# "# "# *""&# *""%# *""$# *""!# *"+"# Wednesday, 22 September 2010 Sample Applications • Data analysis is the inner loop at Yahoo! • Data ⇒ Information ⇒ Value • Log processing: Analytics, repor ting, buzz • Search index • Content Optimization, Spam filters • Computational Adver tising 6 Wednesday, 22 September 2010 BEHIND EVERY CLICK Wednesday, 22 September 2010 BEHIND EVERY CLICK Wednesday, 22 September 2010 Wednesday, 22 September 2010 Who Uses Hadoop ? Wednesday, 22 September 2010
Description: