ebook img

Scalable Data-Intensive Processing for Science on Clouds PDF

36 Pages·2015·11.73 MB·English
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Scalable Data-Intensive Processing for Science on Clouds

Scalable Data-Intensive Processing for Science on Clouds: A-Brain and Z-CloudFlow Lessons Learned and Future Directions Gabriel Ant oniu, Inria Joint work with Radu Tudoran, Benoit Da Mota, Alexandru Costan, Elena Apostol, Bertrand Thirion (co-PI for A-Brain), Ji Liu, Luis Pineda, Esther Pacitti, Patrick Valduriez (co-PI for Z-CloudFlow) and the Microsoft Azure tea m from MSR ATL Europe
 EIT Digital Future Cloud Symposium, Rennes, 19-20 October 2015 Inria Teams Involved in Cloud-Related Projects of the MSR-Inria Joint Centre INRIA Lille Nord Europe KERDATA:  Data  Storage  and  Processing   INRIA Paris Rocquencourt INRIA Nan cy Grand Est INRIA INRIA Rennes Saclay PARIETAL:  Neuroimaging   Bretagne Atlantique Île-de-France INRIA Grenoble Rhône-Alpes INRIA Bordeaux Sud-Ouest INRIA Sophia Antipolis Méditerranée ZENITH:  Scien=fic  Data   Management   - 2 2 KerData’s Focus: How to efficiently store and share data at large scale for next-generation, data-intensive applications? •  Scientific challenges •  Massive data •  Geographically distributed •  Fine-grain access (MB) for reading and writing •  High concurrency •  Without locking •  Major goal: high-throughput under heavy concurrency •  Our contribution –  Design and implementation of distributed algorithms –  Validation with real apps on real platforms with real users 3 Motivating Application: A-Brain Detect risk factors for brain diseases Brain image Genetic data finding associations: p( , ) 106 106 – DNA array (SNP/CNV) – Anatomical MRI – gene expression data – Functional MRI – others... – Diffusion MRI >2000 subjects IEEE Cluster’15, Chicago, USA, 10 September 2015 4 Approach: A-Brain as Map-Reduce Processing 5   5 Challenges: Overview Multi-­‐site   Enabling     Sprcoacliensgs  itnhge       MapReduce   sdcisiecnotviefircy         psrEconaclaeebs  slsicinnieggn  l  tairfigce  -­‐ Data   management   across  sites   High-­‐Performance  Big  Data   Management  Across  Cloud   Data  Centers   High-­‐ Optimize  inter-­‐ performance   site  transfers   streaming     Configurable     Cloud-­‐provided   Streaming   cost-­‐performance   Transfers  Service   across     tradeoffs   6   cloud  sites Challenges: Overview Multi-­‐site   Enabling     Sprcoacliensgs  itnhge       MapReduce   sdcisiecnotviefircy         psrEconaclaeebs  slsicinnieggn  l  tairfigce  -­‐ Data   management   across  sites   High-­‐Performance  Big  Data   Management  Across  Cloud   Data  Centers   Optimize   High-­‐ inter-­‐site   performance   transfers   streaming       Configurable     Cloud-­‐provided   Streaming   cost-­‐performance   Transfers  Service   across     tradeoffs   7   cloud  sites Data Management on Public Clouds Cloud             Compute  Nodes   Cloud-­‐provided   storage  service   Computa.on-­‐to-­‐data  latency  is  high!   8 TomusBlobs: Leverage Virtual Disks •  Colloca.ng  computa.on  and  data  in  PaaS  clouds:     •  Federate  virtual  disk  of  compute  nodes   •  Self-­‐configura.on,  automa.c  deployment  and  scaling  of  the  data   management  system     •  Apply  to  MapReduce  and  Workflow  processing   9 Leveraging TomusBlobs for MapReduce Processing Map   Map   Map   Client   Azure  Queues   Reduce   Reduce   •  New  MapReduce  prototype  (no  Hadoop  at  that  point  on  Azure)   •   Relies  on  versioning  to  support  high  throughput  under  heavy   concurrency,  leveraging  BlobSeer  (KerData,  Inria,  Rennes)   10

Description:
Inria Teams Involved in Cloud-Related Projects discovery. Data management across sites. Configurable cost-‐performance tradeoffs. Streaming across cloud sites. Cloud-‐provided. Transfers Service. High-‐Performance Big Data.
See more

The list of books you might like