ebook img

Scalable Data-Intensive Processing for Science on Clouds PDF

36 Pages·2015·11.73 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Scalable Data-Intensive Processing for Science on Clouds

Scalable Data-Intensive Processing for Science on Clouds: A-Brain and Z-CloudFlow Lessons Learned and Future Directions Gabriel Ant oniu, Inria Joint work with Radu Tudoran, Benoit Da Mota, Alexandru Costan, Elena Apostol, Bertrand Thirion (co-PI for A-Brain), Ji Liu, Luis Pineda, Esther Pacitti, Patrick Valduriez (co-PI for Z-CloudFlow) and the Microsoft Azure tea m from MSR ATL Europe
 EIT Digital Future Cloud Symposium, Rennes, 19-20 October 2015 Inria Teams Involved in Cloud-Related Projects of the MSR-Inria Joint Centre INRIA Lille Nord Europe KERDATA:  Data  Storage  and  Processing   INRIA Paris Rocquencourt INRIA Nan cy Grand Est INRIA INRIA Rennes Saclay PARIETAL:  Neuroimaging   Bretagne Atlantique Île-de-France INRIA Grenoble Rhône-Alpes INRIA Bordeaux Sud-Ouest INRIA Sophia Antipolis Méditerranée ZENITH:  Scien=fic  Data   Management   - 2 2 KerData’s Focus: How to efficiently store and share data at large scale for next-generation, data-intensive applications? •  Scientific challenges •  Massive data •  Geographically distributed •  Fine-grain access (MB) for reading and writing •  High concurrency •  Without locking •  Major goal: high-throughput under heavy concurrency •  Our contribution –  Design and implementation of distributed algorithms –  Validation with real apps on real platforms with real users 3 Motivating Application: A-Brain Detect risk factors for brain diseases Brain image Genetic data finding associations: p( , ) 106 106 – DNA array (SNP/CNV) – Anatomical MRI – gene expression data – Functional MRI – others... – Diffusion MRI >2000 subjects IEEE Cluster’15, Chicago, USA, 10 September 2015 4 Approach: A-Brain as Map-Reduce Processing 5   5 Challenges: Overview Multi-­‐site   Enabling     Sprcoacliensgs  itnhge       MapReduce   sdcisiecnotviefircy         psrEconaclaeebs  slsicinnieggn  l  tairfigce  -­‐ Data   management   across  sites   High-­‐Performance  Big  Data   Management  Across  Cloud   Data  Centers   High-­‐ Optimize  inter-­‐ performance   site  transfers   streaming     Configurable     Cloud-­‐provided   Streaming   cost-­‐performance   Transfers  Service   across     tradeoffs   6   cloud  sites Challenges: Overview Multi-­‐site   Enabling     Sprcoacliensgs  itnhge       MapReduce   sdcisiecnotviefircy         psrEconaclaeebs  slsicinnieggn  l  tairfigce  -­‐ Data   management   across  sites   High-­‐Performance  Big  Data   Management  Across  Cloud   Data  Centers   Optimize   High-­‐ inter-­‐site   performance   transfers   streaming       Configurable     Cloud-­‐provided   Streaming   cost-­‐performance   Transfers  Service   across     tradeoffs   7   cloud  sites Data Management on Public Clouds Cloud             Compute  Nodes   Cloud-­‐provided   storage  service   Computa.on-­‐to-­‐data  latency  is  high!   8 TomusBlobs: Leverage Virtual Disks •  Colloca.ng  computa.on  and  data  in  PaaS  clouds:     •  Federate  virtual  disk  of  compute  nodes   •  Self-­‐configura.on,  automa.c  deployment  and  scaling  of  the  data   management  system     •  Apply  to  MapReduce  and  Workflow  processing   9 Leveraging TomusBlobs for MapReduce Processing Map   Map   Map   Client   Azure  Queues   Reduce   Reduce   •  New  MapReduce  prototype  (no  Hadoop  at  that  point  on  Azure)   •   Relies  on  versioning  to  support  high  throughput  under  heavy   concurrency,  leveraging  BlobSeer  (KerData,  Inria,  Rennes)   10

Description:
Inria Teams Involved in Cloud-Related Projects discovery. Data management across sites. Configurable cost-‐performance tradeoffs. Streaming across cloud sites. Cloud-‐provided. Transfers Service. High-‐Performance Big Data.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.