Table Of ContentInformation Technology T
E
R
Z
O
(cid:127)
M
O
S
S
U
C
Cloud Computing with e-Science Applications C
A
The amount of data in everyday life has been exploding. This data increase
has been especially significant in scientific fields, where substantial amounts
of data must be captured, communicated, aggregated, stored, and analyzed.
Cloud Computing with e-Science Applications explains how cloud
computing can improve data management in data-heavy fields such as
bioinformatics, earth science, and computer science. C
l
o
The book begins with an overview of cloud models supplied by the u
d
National Institute of Standards and Technology (NIST), and then:
C
o
(cid:127) Discusses the challenges imposed by big data on scientific data m
infrastructures, including security and trust issues p
u
(cid:127) Covers vulnerabilities such as data theft or loss, privacy concerns, t
i
infected applications, threats in virtualization, and cross-virtual n
machine attack g
(cid:127) Describes the implementation of workflows in clouds, proposing an w
architecture composed of two layers—platform and application i
t
h
(cid:127) Details infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS),
and software-as-a-service (SaaS) solutions based on public, private, e
-
and hybrid cloud computing models S
c Cloud Computing
(cid:127) Demonstrates how cloud computing aids in resource control, vertical
i
e
and horizontal scalability, interoperability, and adaptive scheduling
n
c
e
Featuring significant contributions from research centers, universities,
A
and industries worldwide, Cloud Computing with e-Science Applications w i t h
p
presents innovative cloud migration methodologies applicable to a variety of p
fields where large data sets are produced. The book provides the scientific li e-Science Applications
c
community with an essential reference for moving applications to the cloud.
a
t
i
o
n
s
EDITED BY OLIVIER TERZO (cid:127) LORENZO MOSSUCCA
K20498
Cloud Computing
with
e-Science Applications
Cloud Computing
with
e-Science Applications
EDITED BY
OLIVIER TERZO
ISMB, TURIN, ITALY
LORENZO MOSSUCCA
ISMB, TURIN, ITALY
Boca Raton London New York
CRC Press is an imprint of the
Taylor & Francis Group, an informa business
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2015 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20141212
International Standard Book Number-13: 978-1-4665-9116-5 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface ....................................................................................................................vii
Acknowledgments ..............................................................................................xiii
About the Editors ..................................................................................................xv
List of Contributors ............................................................................................xvii
1 Evaluation Criteria to Run Scientific Applications in the Cloud .........1
Eduardo Roloff, Alexandre da Silva Carissimi,
and Philippe Olivier Alexandre Navaux
2 Cloud-Based Infrastructure for Data-Intensive e-Science
Applications: Requirements and Architecture .......................................17
Yuri Demchenko, Canh Ngo, Paola Grosso, Cees de Laat,
and Peter Membrey
3 Securing Cloud Data ....................................................................................41
Sushmita Ruj and Rajat Saxena
4 Adaptive Execution of Scientific Workflow Applications
on Clouds .......................................................................................................73
Rodrigo N. Calheiros, Henry Kasim, Terence Hung, Xiaorong Li,
Sifei Lu, Long Wang, Henry Palit, Gary Lee, Tuan Ngo,
and Rajkumar Buyya
5 Migrating e-Science Applications to the Cloud:
Methodology and Evaluation .....................................................................89
Steve Strauch, Vasilios Andrikopoulos, Dimka Karastoyanova,
and Karolina Vukojevic-Haupt
6 Closing the Gap between Cloud Providers and Scientific Users .....115
David Susa, Harold Castro, and Mario Villamizar
7 Assembling Cloud-Based Geographic Information Systems:
A Pragmatic Approach Using Off-the-Shelf Components.................141
Muhammad Akmal, Ian Allison, and Horacio González–Vélez
8 HCloud, a Healthcare-Oriented Cloud System
with Improved Efficiency in Biomedical Data Processing ................163
Ye Li, Chenguang He, Xiaomao Fan, Xucan Huang, and Yunpeng Cai
v
vi Contents
9 RPig: Concise Programming Framework by Integrating R
with Pig for Big Data Analytics ...............................................................193
MingXue Wang and Sidath B. Handurukande
10 AutoDock Gateway for Molecular Docking Simulations
in Cloud Systems ........................................................................................217
Zoltán Farkas, Péter Kacsuk, Tamás Kiss, Péter Borsody, Ákos Hajnal,
Ákos Balaskó, and Krisztián Karóczkai
11 SaaS Clouds Supporting Biology and Medicine ..................................237
Philip Church, Andrzej Goscinski, Adam Wong, and Zahir Tari
12 Energy-Aware Policies in Ubiquitous Computing Facilities .............267
Marina Zapater, Patricia Arroba, José Luis Ayala Rodrigo,
Katzalin Olcoz Herrero, and José Manuel Moya Fernandez
Preface
The interest in cloud computing in both industry and research domains is
continuously increasing to address new challenges of data management, com-
putational requirements, and flexibility based on needs of scientific commu-
nities, such as custom software environments and architectures. It provides
cloud platforms in which users interact with applications remotely over the
Internet, bringing several advantages for sharing data, for both applications
and end users. Cloud computing provides everything: computing power,
computing infrastructure, applications, business processes, storage, and
interfaces, and can provide services wherever and whenever needed.
Cloud computing provides four essential characteristics: elasticity; scal-
ability; dynamic provisioning of applications, storage, and resources; and
billing and metering of service usage in a pay-as-you-go model. This flexibil-
ity of management and resource optimization is also what attracts the main
scientific communities to migrate their applications to the cloud.
Scientific applications often are based on access to large legacy data sets and
application software libraries. Usually, these applications run in dedicated
high performance computing (HPC) centers with a low-latency interconnec-
tion. The main cloud features, such as customized environments, flexibility,
and elasticity, could provide significant benefits.
Since every day the amount of data is exploding, this book describes how
cloud computing technology can help such scientific communities as bio-
informatics, earth science, and many others, especially in scientific domains
where large data sets are produced. Data in more scenarios must be captured,
communicated, aggregated, stored, and analyzed, which opens new chal-
lenges in terms of tool development for data and resource management, such
as a federation of cloud infrastructures and automatic discovery of services.
Cloud computing has become a platform for scalable services and deliv-
ery in the field of services computing. Our intention is to put the empha-
sis on scientific applications using solutions based on cloud computing
models—public, private, and hybrid—with innovative methods, including
data capture, storage, sharing, analysis, and visualization for scientific algo-
rithms needed for a variety of fields. The intended audience includes those
who work in industry, students, professors, and researchers from informa-
tion technology, computer science, computer engineering, bioinformatics,
science, and business fields.
Actually, applications migration in the cloud is common, but a deep analy-
sis is important to focus on such main aspects as security, privacy, flexibility,
resource optimization, and energy consumption.
This book has 12 chapters; the first two are on exposing a proposal strategy
to move applications in the cloud. The other chapters are a selection of some
vii
viii Preface
applications used on the cloud, including simulations on public transport,
biological analysis, geographic information system (GIS) applications, and
more. Various chapters come from research centers, universities, and indus-
tries worldwide: Singapore, Australia, China, Hong Kong, India, Brazil,
Colombia, the Netherlands, Germany, the United Kingdom, Hungary, Spain,
and Ireland. All contributions are significant; most of the research leading to
results has received funding from European and regional projects.
After a brief overview of cloud models provided by the National Institute
of Standards and Technology (NIST), Chapter 1 presents several criteria to
meet user requirements in e-science fields. The cloud computing model has
many possible combinations; the public cloud offers an alternative to avoid
the up-front cost of buying dedicated hardware. Preliminary analysis of user
requirements using specific criteria will be a strong help for users for the
development of e-science services in the cloud.
Chapter 2 discusses the challenges that are imposed by big data on sci-
entific data infrastructures. A definition of big data is shown, presenting
the main application fields and its characteristics: volume, velocity, variety,
value, and veracity. After identifying research infrastructure requirements,
an e-science data infrastructure is introduced using cloud technology to
answer future big data requirements. This chapter focuses on security and
trust issues in handling data and summarizes specific requirements to access
data. Requirements are defined by the European Research Area (ERA) for
infrastructure facility, data-processing and management functionalities,
access control, and security.
One of the important aspects in the cloud is certainly security due to the
use of personal and sensitive information, especially derived mainly by
social n etwork and health information. Chapter 3 presents a set of impor-
tant vulnerability issues, such as data theft or loss, privacy issues, infected
applications, threats in virtualization, and cross-virtual machine attack.
Many techniques are used to protect against cloud service providers, such as
homomorphic encryption, access control using attributes based on encryp-
tion, and data auditing through provable data possession and proofs of
irretrievability. The chapter underlines points that are still open, such as
security in the mobile cloud, distributed data auditing for clouds, and secure
multiparty computation on the cloud.
Many e-science applications can be modeled as workflow applications,
defined as a set of tasks dependent on each other. Cloud technology and
platforms are a possible solution for hosting these applications. Chapter 4
discusses implementation aspects for execution of workflows in clouds. The
proposal architecture is composed of two layers: platform and application.
The first one, described as scientific workflow, enables operations such as
dynamic resource provisioning, automatic scheduling of applications, fault
tolerance, security, and privacy in data access. The second one defines data
analytic applications enabling simulation of the public transport system of
Singapore and the effect of unusual events in its network. This application