ebook img

Conquering Big Data with Apache Spark PDF

94 Pages·2015·9.01 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Conquering Big Data with Apache Spark

Conquering Big Data with Apache Spark Ion Stoica November 1st, 2015 UC   BERKELEY The Berkeley AMPLab lgorithms January 2011 – 2017 •  8 faculty •  > 50 students •  3 software engineer team achines eople Organized for collaboration AMPCamp (since 2012) 3 day retreats 400+ campers (twice a year) (100s companies) The Berkeley AMPLab Governmental and industrial funding: Goal: Next generation of open source data analytics stack for industry & academia: Berkeley Data Analytics Stack (BDAS) Generic Big Data Stack Processing Layer Resource Management Layer Storage Layer Hadoop Stack g Hive Pig a h n m i l p s a s Processring Layer … a e p o c r t m i o S G r HadoopMR P I t n . sm Resource MaYnaargne ment Layer e Rg M e g StoraHgDeF SL ayer a r o t S BDAS Stack g Sample n BlinkDB R X MLBase g rkmi Clean k h n a r p si pa Processiang Laayer Velox s e p Velox e S r r SparkSQL S G MLlib c t o S r P Spark Core t n . sm e MMeessRooess s ource ManagementH Laadyoeor p Yarn Rg M Succinct e g Storage LayHeDr FS, S3, Ceph, … a Tachyon r o t S BDAS Stack 3rd party Today’s Talk g Sample n BlinkDB R X MLBase g rkmi Clean k h n a r p si pa a a Velox s e p Velox e S r r SparkSQL S G MLlib c t o S r P Spark Core t n . sm e MMeessRooess s ource ManagementH Laadyoeor p Yarn Rg M Succinct e g Storage LayHeDr FS, S3, Ceph, … a Tachyon r o t S BDAS Stack 3rd party Today’s Talk Overview 1.  Introduction 2.  RDDs 3.  Generality of RDDs (e.g. streaming) 4.  DataFrames 5.  Project Tungsten Overview 1.  Introduction 2.  RDDs 3.  Generality of RDDs (e.g. streaming) 4.  DataFrames 5.  Project Tungsten A Short History Started at UC Berkeley in 2009 Open Source: 2010 Apache Project: 2013 Today: most popular big data project

Description:
Spark. Streaming. SparkSQL. GraphX. MLlib. MLBase. BlinkDB. Sample. Clean. SparkR. Velox Powerful APIs in Scala, Python, Java, R. Spark Core. Spark. Streaming. SparkSQL. MLlib .. Thread.sleep(10000); cluster.shutdown();.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.