Tutorial - Big Data Analyses with R O’Reilly Strata Conference London Dr. rer. nat. Markus Schmidberger @cloudHPC [email protected] November 13th, 2013 M.Schmidberger Tutorial-BigDataAnalyseswithR November13th,2013 1/119 Dr. rer. nat. Markus Schmidberger High Performance Computing mongoDB Parallel ComputingGentleman Lab Technomathematics AWS Bioconductor Data Quality ManagemeNntoRSQHLDaBaBitodaalov gSoaiccraioila eDnaptcae MPIBigh uDsbaandtaDeMvOupnsich LMU comSysto GmbH SnowPboharDdinagff2yP kaidras TU Munich Map Reduce Cloud Computing M.Schmidberger Tutorial-BigDataAnalyseswithR November13th,2013 2/119 comSysto GmbH based in Munich, Germany Lean Company. Great Chances! software company specialized in lean business, technology development and Big Data focuses on open source frameworks Meetup organizer for Munich (cid:73) http://www.meetup.com/Hadoop-User-Group-Munich/ (cid:73) http://www.meetup.com/Muenchen-MongoDB-User-Group/ (cid:73) http://www.meetup.com/munich-useR-group/ http://www.comsysto.com M.Schmidberger Tutorial-BigDataAnalyseswithR November13th,2013 3/119 Project: Telefo´nica Dynamic Insights ’SmartSteps’ ”Big decisions made Better” http://dynamicinsights.telefonica.com/488/smart-steps M.Schmidberger Tutorial-BigDataAnalyseswithR November13th,2013 4/119 Motivation and Goals Today, there exists a lot of data and a huge pool of analyses methods R is a great tool for your Big Data analyses Provide an overview of Big Data technologies in R Hands-on code and exercises to get started M.Schmidberger Tutorial-BigDataAnalyseswithR November13th,2013 5/119 Outline 1 Big Data 2 R Intro and R Big Data Packages 3 R and Databases 4 R and Hadoop M.Schmidberger Tutorial-BigDataAnalyseswithR November13th,2013 6/119 Outline 1 Big Data 2 R Intro and R Big Data Packages 3 R and Databases 4 R and Hadoop M.Schmidberger Tutorial-BigDataAnalyseswithR November13th,2013 7/119 Big Data a big hype topic everything is big data everyone wants to work with big data Wikipedia: ”... a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications ...” M.Schmidberger Tutorial-BigDataAnalyseswithR November13th,2013 8/119 Big Data my view on data access to many different data sources (Internet) storage is cheep - store everything today, CIOs get interested in the power of all their data ⇒ a lot of different and complex data have to be stored ⇒ NoSQL M.Schmidberger Tutorial-BigDataAnalyseswithR November13th,2013 9/119 NoSQL - MongoDB NoSQL: databases with less constrained consistency models ⇒ schema-less MongoDB: (cid:73) open source, cross-platform document-oriented database system (cid:73) most popular NoSQL database system (cid:73) supported MongoDB Inc. (cid:73) stores structured data as JSON-like documents with dynamic schemas (cid:73) MongoDB as a German / European Service http://www.mongodb.org http://www.mongosoup.de M.Schmidberger Tutorial-BigDataAnalyseswithR November13th,2013 10/119
Description: