Table Of Content

Algèbre Spark – SQL en Map Reduce Mohamed-Amine Baazizi – email: pré[email protected] h9p://www-bd.lip6.fr/wiki/site/enseignement/master/ bdle/start Plan •  Quelques compléments Scala pour Spark •  Algèbre Spark : transformations et Actions –  RDD simples –  RDD clé, valeur •  Requêtes SQL en Map Reduce –  Opérateur de jointure en Map Reduce –  Requêtes SQL complexes 2 Sucre syntaxique Scala Fonction binaire passée au Reduce � Reduce (f:(T,T)⇒T) : RDD[T]⇒ T si f est l’addition alors f:(T,T)⇒T devient _+_ scala> val tab = sc.parallelize(Array(1,3,5,7,9)) tab: org.apache.spark.rdd.RDD[Int] = … scala> tab.reduce(_+_) res3: Int = 25 valable aussi pour ReduceByKey 3 Pattern matching dans Scala Condition exprimée dans map� Map (f:T⇒U) : RDD[T]⇒ RDD[U] f peut s’exprimer avec case(exp) => U scala> val zipcode = sc.parallelize(Array(("Paris", 75), ("Lyon", 69))) zipcode: org.apache.spark.rdd.RDD[(String, Int)] = … scala> val city = zipcode.map(x=>x._1) city: org.apache.spark.rdd.RDD[String] = … scala> val city = zipcode.map{case(ville,code)=>ville} city: org.apache.spark.rdd.RDD[String] = … Avantage : lisibilité du code! 4 Pattern matching dans Scala Condition exprimée dans map� Map (f:T⇒U) : RDD[T]⇒ RDD[U] f peut s’exprimer avec case(exp) => U S’applique aussi à d’autres méthodes ex. filter 5 Pattern matching dans Scala Condition exprimée dans filter� Filter (f:T⇒bool) : RDD[T]⇒ RDD[T] f peut s’exprimer avec case(exp) => bool scala> val zipcode = sc.parallelize(Array(("Paris", 75), ("Lyon",69), ("Cayenne", 973))) zipcode: org.apache.spark.rdd.RDD[(String, Int)] = scala> val dom = zipcode.filter{case(ville,code)=>code>100} dom: org.apache.spark.rdd.RDD[(String, Int)] = .. 6 Plan •  Quelques compléments Scala pour Spark •  Algèbre Spark : transformations et Actions –  RDD simples –  RDD clé, valeur •  Requêtes SQL en Map Reduce –  Opérateur de jointure en Map Reduce –  Requêtes SQL complexes 7 API Spark : RDDs simples Figure Urée de [Spark] 8 Map, filter et flatMap Map (f:T⇒U) : RDD[T]⇒ RDD[U] 7,2010,04,27,75 12,2009,01,31,78 Filter (f:T⇒bool) : RDD[T]⇒ RDD[T] 41,2009,03,25,95 flatMap (f:T⇒Seq[U]) : RDD[T]⇒ RDD[U] …… scala> val lines=sc.textFile( "/user/cours/mesures.txt") lines: org.apache.spark.rdd.RDD[String] = … scala> val tab = lines.map(x=>x.split(",")) tab_s: org.apache.spark.rdd.RDD[Array[String]] = ... scala> val tup = tab.map(x=>(x(1).toInt, x(3).toDouble)) tab: org.apache.spark.rdd.RDD[(Int, Double)] = ... scala> val tup_bisext = tup. filter{case(annee, temp) => (annee%4==0&&annee %100!=0) || (annee%400==0)} tab_bisext: org.apache.spark.rdd.RDD[(Int, Double)] = ... scala> val Ull_now = tup.flatMap{case(annee,temp) => annee.to(2015)} Ull_now: org.apache.spark.rdd.RDD[Int] = .. 9 API Spark : RDDs simples Figure Urée de [Spark] 10

Description:

Algèbre Spark : transformations et Actions. – RDD simples .. outer. WHERE outer.l_partkey = inner.l_partkey; AND outer.l_quan ty < inner.t1;. 43.

Algèbre Spark PDF

65 Pages·2016·4.15 MB·French

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Algèbre Spark

Description:

Algèbre Spark : transformations et Actions. – RDD simples .. outer. WHERE outer.l_partkey = inner.l_partkey; AND outer.l_quan ty < inner.t1;. 43.

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.