Table Of Content1
Monitoring in the Clouds: Comparison of
ECO Clouds and EXCESS Monitoring Approaches
2
Pavel Skvortsov, Dennis Hoppe, Axel Tenschert, Michael Gienger
HLRS, Universita¨t Stuttgart, Stuttgart, Germany
skvortsov@hlrs.de, hoppe@hlrs.de, tenschert@hlrs.de, gienger@hlrs.de
Abstract—With the increasing adoption of private cloud in- of Big Data, Fast Data (i.e., Big Data sets, which have to be
6 frastructures by providers and enterprises, the monitoring of processed in real-time), the Internet of Things (IoT), and the
these infrastructures is becoming crucial. The rationale behind
1 recenttrendofemployingGraphicalProcessingUnits(GPUs)
monitoring is manifold: reasons include saving energy, lowering
0 in order to speed up execution of compute-intensive scientific
costs,andbettermaintenance. Inthee-Sciencesector,moreover,
2
the collection of infrastructure and application-specific data at applications. The need for monitoring in high performance
n high resolutions is immanent. In this paper, we present two computing, in particular in cloud computing is driven by
a monitoring approaches implemented throughout two European resource planning and management, data center management,
J projects: ECO Clouds and EXCESS. The ECO Clouds project
2 2 SLAadministration,billing,andperformancemanagement[6].
7 aims to minimize CO emissions caused by the execution of
2 Aceto et al. [6] highlight that energy-efficiency is “a major
2 applications on the cloud infrastructure. In order to allow
for eco-aware deployment and scheduling of applications, the driver of monitoring data analysis for planning, provisioning
] ECO Clouds monitoring framework provides the necessary set and management of resources”. Since increased energy con-
2
C
of metrics on different layers including physical, virtual and sumption leads to extra operational costs, reducing energy is
D application layer. In turn, the EXCESS project introduces new
a driving factor for data centers [2].
energy-aware execution models that improve energy-efficiency
. In this paper, we present two different monitoring ap-
s onasoftwarelevel.Havingin-depthknowledgeabouttheenergy
c consumption and overall behavior of applications on a given proaches focusing on energy-efficiency, which were imple-
[ infrastructure, subsequent executions can be optimized to save mented throughout European projects ECO2Clouds and EX-
1 energy.Toachievethisgoal,theEXCESSmonitoringframework CESS respectively.
provides APIs allowing developers to collect application-specific
v The ECO Clouds project aims to minimize CO emis-
2 2
data in addition to infrastructure data at run-time. We perform
5 sions caused by the execution of applications on the cloud
a comparative analysis of both monitoring approaches, and
5
highlightingusecasesincludingahybridapproachwhichbenefits infrastructure. In order to allow for eco-aware deployment
3
7 from both monitoring solutions. and scheduling of applications, the ECO2Clouds monitoring
0 Keywords-Cloud computing, metrics, performance, eco- framework provides the necessary set of metrics on different
1. awareness, deployment, scheduling layers including physical, virtual and application layers. The
basic feature allowing for minimizing the CO emissions
0 2
6 is the detailed mix of energy sources which cloud provider
1 I. INTRODUCTION sites EPCC (Edinburgh Parallel Computing Centre), INRIA
v: Privatecloudinfrastructuresarebeingincreasinglyadoptedby (French Institute for Research in Computer Science and Au-
i providers and enterprises. They include heterogeneous com- tomation) and HLRS (High Performance Computing Center
X
puting, storage and networking resources, which are utilized Stuttgart)areabletosupport.Inadditiontotheinstalledpower
r bythevarietyofapplicationswithdiverserequirements.Thus, distribution units (PDUs) measuring the power consumption
a
the monitoring of these infrastructures is becoming important. of physical nodes, the emissions size was computed. The
The reasons for sophisticated and timely monitoring include difference between the energy mix of each site, as well as the
notonlymaintenance-relatedtechnicalparameters,butalsothe heterogeneity of the computing resources within sites, allows
analysisofadvancedmetricswhichindicatehowtoutilizethe forsuchaschedulingofapplicationsexecutionthattheoverall
infrastructure in order to save energy, lower costs and reduce carbon footprint (i.e., CO emissions) of an application is
2
emissions. At the same time, cloud infrastructures are often minimized.
used in terms of e-Science research [1], which deals with Anotherimplementationofamonitoringframeworkaiming
sophisticated computer simulations and Big Data use cases. at energy efficiency is proposed in the EU-funded project
In such cases, the collection of infrastructure and application- EXCESS (Execution Models for Energy-Efficient Computing
specific data at high resolution is immanent. Systems) [7]. Optimizing the energy consumption on an ap-
Energy-awareness in cloud computing and HPC and em- plication level requires a deep understanding of the runtime
bedded systems has become a priority in recent years [2]. behavior of applications. The EXCESS project introduces
Although performance keeps to be the topmost objective for new energy-aware execution models that improve energy-
companies,datacentershaveshownahighinterestinreducing efficiency. The validation and verification of the execution
their energy consumption that continues to increase due to modelsisneededbasedonrealmonitoringdata.Therationale
several reasons [3], [4], [5]: reasons include the advancement behind the monitoring solution developed in the EXCESS
2
project is to raise energy-awareness relevant for software vancedmetrics[6],[9],[10].Themeaningsoftheseproperties
developers. Having in-depth knowledge about the energy con- are listed below.
sumptionandoverallbehavioroftheirapplicationsonagiven • Architecture: agent-based (producer-consumer principle),
infrastructure,subsequentexecutionscanbeoptimizedtosave agent-less or hybrid;
energy. To achieve this goal, the EXCESS monitoring frame- • Non-Intrusiveness: low to very low impact on the system
workprovidesAPIsallowingdeveloperstocollectapplication- at run-time;
specific data in addition to infrastructure data at run-time. • Scalability: high update frequencies of metrics while
The EXCESS monitoring framework provides fine-granular being low-intrusive;
monitoring information without introducing extra costs on • Timeliness: near-real time update rates of metrics; allow-
performance that interfere with the application. EXCESS is ing for snapshots at a given point in time;
concerned with a holistic analysis of systems including the • Granularity: profile different levels of granularity (e.g.,
application, system software and hardware stack in order to functions);
detect preventable energy dissipation. The EXCESS monitor- • Extensibility: provide a plug-in system to add extra met-
ingframeworkenablesimprovingtheenergy-efficiencyinhigh rics;enableapplication-specificmetricsupport;language-
performance computing and embedded systems, as well as in independent interface to transfer data;
cloud computing. • Data Storage: low-intrusive, efficient data storage at run-
In this paper, we perform a comparative analysis of the time; data export using common formats (e.g., CSV
ECO2Clouds and EXCESS monitoring approaches We high- or JSON); allow for filtering the data based on time
light the target use cases including a hybrid approach which intervals;
benefits from both monitoring solutions. • Visualization: provide basic visualization functionality;
The rest of this paper is structured as follows: we make an • Adaptability: enable/disable profiling of specific compo-
overview of the related work in Section II. In Sections III and nents configuration of plug-ins at run-time;
IV,wedescribetheECO2CloudsandEXCESSapproaches,re- • Predictability: predictors via extensions (e.g., CO2 pre-
spectively.InSectionV,wecomparethepresentedapproaches diction);
and discuss their suitable use cases. Finally, we conclude the • Metrics: non-standard metrics need to be supported; ex-
paper in Section VI. ternal measurements to increase data accuracy; profiling
of specialized hardware.
II. RELATEDWORK These demands, in particular the latter one, are not easy to
be met by existing monitoring frameworks such as Ganglia,
Next, we provide a brief overview about existing frameworks
Lattice, Nagios, Open-NMS, Zenoss, or Zabbix. For example,
for monitoring in computing systems.
key requirements that need to be satisfied by a monitoring
Existing systems for monitoring can be distinguished along
framework in case of the EXCESS project include supporting
three dimensions: agent-less, agent-based, and hybrid [8].
different granularity levels to be profiled, low-intrusiveness of
Agent-based monitoring systems have a client-server archi-
the monitoring, and the possibility of code instrumentation.
tecture. Prior to monitoring a component, agents, often light-
In case of the ECO Clouds project, it is required to support
2
weight applications, are installed remotely on target comput-
non-standard CO -related metrics, predict the future energy
2
ers. Each agent then collects specific metric data such as the
usage and emissions and to provide a data storage for historic
CPU usage over time. Periodically, collected data is sent to a
monitoring information.
monitoring server for storage and analysis. This approach is
crucialforsupportingcustom,non-standardmetricssuchasthe
III. ECO2CLOUDSAPPROACH
amount of time a server needs to respond to a request. Agent-
The major goal of the ECO Clouds project is to enable such
less monitoring systems, by contrast, do not deploy agents on 2
eco-aware automated deployment and execution of applica-
target computers to collect data; the data is rather gathered
tionsinthecloud,whichoptimizestheCO emissionscaused
through remote calls, e.g., via the Simple Network Manage- 2
by the physical infrastructure. In order to achieve this goal,
ment Protocol (SNMP). As a consequence, this monitoring
the ECO Clouds project introduces an approach to monitor
strategy is limited to basic hardware information available on 2
the eco-metrics of the running applications on the different
allplatformsbydefault.Theadvantageofsuchanapproachis
layers and performs the applications’ scheduling based on the
thatnoinitialsoftwarehastobedeployedontargetcomputers
retrieved monitoring values. Next, we describe the monitoring
and the following low impact on system performance at run-
approach developed in the ECO Clouds project. For more
time. Hybrid solutions, finally, support both agent-based and 2
details about the scheduling used in ECO Clouds, we refer
agent-lessmonitoring.TheATOMframeworkoftheEXCESS 2
the interested reader to the work of Wajid et al. [11].
projectfollowsanagent-basedarchitectureinordertosupport
application-specific metrics. In turn, the ECO Clouds moni-
2
toring framework relies on the Zabbix-based agents. A. Architecture of the ECO2Clouds Monitoring Approach
Aside from the architecture, monitoring frameworks can be TheisECO Cloudsmonitoringsystemisbasedonthemon-
2
classified via the following nine key properties: scalability, itoring framework for federated clouds which was developed
non-intrusiveness, timeliness, granularity, extensibility, data duringthecourseoftheEU-fundedBonFIREproject[12].The
storage, visualization, adaptability, and predictability and ad- general architecture of the ECO Clouds monitoring system
2
3
Fig.2. AggregationandcalculationofmetricsinECO2Clouds[13]
(i.e., metrics aggregator) installed on a dedicated VM (one
per each experiment) gathers all the metrics information from
the Zabbix clients within a given experiment.
Different hardware configurations and different conditions
regardingtheusedenergymixoftheproviders(EPCC,INRIA
Fig.1. ArchitectureoftheECO2Cloudsmonitoringapproach[13] andHLRS)werechallenging.Inordertoadaptthemonitoring
metricsimplementationatallprovidersites,Zabbixtemplates,
bash, Python, and Ruby scripts were implemented.
and its relation to the other project components is depicted in
C. List of ECO Clouds Metrics
Figure1[13].Onthecloudproviderside,aZabbixserver[14] 2
running on a dedicated virtual machine (VM) collects the OneofthemajorcontributionsoftheECO2Cloudsapproach
monitoring information from the Zabbix agents installed on is calculation of a VM’s power consumption. It is determined
physical nodes. The system monitors the metrics from (a) through a formula depending on both physical and VM
the physical infrastructure, (b) VMs, and (c) applications. metrics. The metrics required for this include the size of a
The Zabbix monitoring data is gathered and structured in an VM defined through the used memory, the data I/O (i.e., the
Accounting SQL database by the Accounting Service which sendandreceiveactivities),thediskactivity(i.e.,thereadand
runsonaseparateVM.TheAccountingDBallowsforvarious write operations) and the consumed CPU seconds. The list of
combinations of desired experiment, physical host, provider major ECO2Clouds metrics is presented in Tables I and II.
site, VM, and time frame to be specified in the selection Each metric is marked to indicate whether it is also supported
queries. The monitoring data from the Accounting DB is then by the EXCESS monitoring framework. We can see that the
used by the ECO2Clouds Scheduler and Application Con- VM-related metrics are only supported by the ECO2Clouds
troller to provide eco-aware deployment of applications [11]. project, while EXCESS approach provides energy metrics on
On the highest level, user manages the experiments (i.e., the physical level. For more details about the ECO2Clouds
applications) and the cloud resources by using the RESTful metrics we refer to our previous work [11]
BonFIRE API which is based on HTTP commands [15].
IV. EXCESSAPPROACH
The key idea behind EXCESS is a holistic system anal-
B. ECO Clouds Technology for Metrics Collection
2 ysis incorporating high performance and embedded comput-
The underlying ECO Clouds solution which enables to ing [7]. The iterative hardware/software co-design process
2
consider the eco-aware metrics is equipment of physical implemented in the EXCESS project is illustrated in Fig-
hosts with power distribution units (PDUs). PDUs provide ure 3. This EXCESS system’s major component is ATOM
monitoring systems with exact values of power consumption (neAr-real Time mOnitoring fraMework) [16]. ATOM is a
measurements, which are later transformed into the carbon monitoring framework which is suitable for both HPC and
footprint (i.e., CO emissions). The consequent calculation of cloud computing. ATOM tackles the challenge of analyzing
2
carbon footprint based on power consumption parameters is the system’s run-time context and overcoming the drawbacks
possible, since the distribution of power sources are known of existing solutions. ATOM is designed to be low-intrusive
for each cloud infrastructure provider. and flexible, while allowing to collect, visualize, and analyze
Zabbix clients are installed on each physical node and relevant application and infrastructure data in near-real time.
VMs. Theycollect the pre-definedmetrics including theusual The following items are crucial for optimizing the energy
metrics such as memory utilization, I/O throughput and CPU consumption and performance of applications at run-time, as
utilizationforVMsaswellasPDU-basedmetricsforphysical well as improving energy-awareness in software engineering:
nodes (see Figure 2). The measurement rate differs from 5 a) monitoring applications at run-time, b) collecting large
to 30 seconds for various metrics, and the resulting data set amounts of performance and energy-related data at high fre-
was observed to add up to 100 Mbytes per day, depending quencies, and c) providing a user-friendly interface for data
on the number of running experiments. The Zabbix server analysis. The main contributions of ATOM are:
4
Fig.3. Iterativehardware/softwareco-designprocessimplementedintheEXCESSproject[7]
TABLEI
code instrumentation in order to gather application-
ECO2CLOUDSMETRICS(PART1)
specific data not accessible globally.
• Integration with the PBS resource manager to allow
Support executing and monitoring applications without any prior
Metric and
Definition Unit by MF of knowledge by end users or software developers.
layer
EXCESS • Interactive front-end allowing for near-real time explo-
Task ration of collected data as well as exporting historical
execution Time taken to execute a spe- s Yes data for further analysis.
time (app. cific task
layer)
Application A. Agent-based Architecture
execution Time taken to execute whole
s Yes ATOM’s architecture, as depicted in Figure 4, is agent-
time (app. application
layer) based. ATOM is composed of two main components: a
Power con- server(MONITORingserver—MONITOR)andmultiple,light-
Power currently consumed by
sumption application. W Yes weightagents(AtomCollecTORS—ACTORS).Thepurposeof
(app. layer)
theACTORSistocontinuouslysamplenode-andapplication-
Response
Average time taken from user specific data and send this information to MONITOR using a
time (app. s No
layer) request to service response user-definedupdaterate(e.g.,every100milliseconds).MONI-
Application power usage ef- TORcollectsthedatabeingsentbyeachtargetnodeandoffers
A-PUE fectiveness; ratio between total functionalities to query, visualize, and analyze historical data.
(Appli- amount of power (P) required
Ratio No The cluster consists of several compute nodes including
cation byallVMsofapplicationiand
embedded hardware such as the Movidius Myriad 2 chip.
PUE) the power used to execute i-th
application task Whenusersstartajobonthecluster,anACTORisstartedon
Application Energy Productiv- each node to monitor relevant data while the job is running.
Application ity(A-EP);ratiobetweennum- ACTORS then send sampled data at run-time through a high-
Energy ber of executions of tasks W−1 No bandwidth network (InfiniBand) to the MONITOR server.
Productiv- hosted by all VMs of i-the ap-
MONITOR manages incoming data, stores it in the database,
ity (A-EP) plication and total energy for
and also provides a web interface accessible by users.
execution of the VM
Application
Green Share of green energy used to
W No B. ATOM Collectors (ACTORS)
Efficiency run the i-th application
(A-GE) AnACTORisdeployedoneachtargetnode.ACTORShave
CPU usage Processor utilization (inside a % No a set of plug-ins that can be implemented in any program-
(virt. layer) VM) for running VM
ming language; the only constraint is on the format used for
Storage utilization on corre-
transferringdatatoMONITOR.Weselectedtheprogramming
Storage sponding storage device; the
usage (virt. ratio between the used disk % Supportable language C for ACTORS and plug-ins. High-level languages
layer) space and the allocated disk such as C++ or Java impose a performance overhead at run-
space time, and hence being too intrusive. Since ACTORS are by
default extensible, developers can add plug-ins with ease in
order to sample additional metrics. Plug-ins do not have any
limitations on the type of metric to be collected as long as
• Low-intrusive,scalablearchitecturebasedonopen-source thedataissenttotheserverviaapre-defineddataformat;we
tools and libraries: Node.js, Elasticsearch, D3.js, and selected JSON objects to hold metric data.
Kibana. Figure5illustratesthethreebuildingblocksofanACTOR:
• Flexible, language-independent plug-in system, which a plug-in manager, a thread handler, and a data management
supports collecting specific data including the energy component in terms of a FIFO queue connected to libCURL.
consumption of embedded systems, or connecting to When ACTORS are started, a default or user-defined con-
external power measurement devices. figuration file is parsed in order to activate/deactivate plug-
• Light-weight and easy-to-grasp user library that allows ins, define the relevant metrics to be collected, and to set
5
Fig.4. ArchitectureoftheATOMmonitoringframeworkdeployedontheEXCESScluster
the sample rate for each plug-in. During initialization, an to be used for data-intensive real-time applications that run
executionIDisrequestedfromMONITOR;theexecutionIDis across distributed devices. Basic visualization functionality is
usedtouniquelyassociatemetricdatawithspecificapplication provided via the JavaScript library D3.js. This is a popular
executions.Thethreadhandlingroutinesthentriggertheplug- data visualization tool [19], which is also used by Datameer,
in discovery and start a thread for each of the plug-ins using a company concerned with visualization of big data [20].
the sample rates previously defined in the configuration file. The benefits of using these frameworks are manifold:
InadditiontothethreadsshowninFigure5,twoextrathreads Firstly, the common shared data format JSON eases the
are created: an extra thread aggregates and sends metric data integration of components. Secondly, employing Elasticsearch
to MONITOR, and another thread checks the configuration as (historical) data storage bypasses a drawback that existing
file for changes, which allows sample rates to be changed at monitoring frameworks such as Nagios or Zabbix impede:
run-time. TheyrelyontraditionaldatabasessuchasMySQLthatrestrict
third-party developers in using pre-defined data models, and
C. ATOM Monitoring Server (MONITOR) cannot handle vast amounts of data writes and reads in near-
real time. With ATOM, we offer developers the flexibility of
The MONITOR server is deployed on a separate node in
selecting their custom JSON objects to store metric data as
the EXCESS cluster. As illustrated in Figure 5, MONITOR
longastheyprovideafewmandatoryfields(cf.SectionIV-D).
is composed of three building blocks: a data store, a web
In addition, NoSQL databases such as Elasticsearch offer
service, and a visualization component. As a data storage
a feature called auto-sharding, which distributes data auto-
component, Elasticsearch is used [17]. Elasticsearch is a flex-
matically and on-demand across multiple nodes in order to
ible and powerful open source, real-time search and analytics
balance query and data load. This feature, among others,
engine. Itprovides adistributed, multi-tenant-capablefull-text
makes Elasticsearch an excellent choice to be used in HPC
search engine with a RESTful web interface and schema-free
environments.
JSON documents. As web server, the server-side JavaScript
library Node.js is selected [18]. Since Node.js is based on
D. Communication Layer
Google’s V8 engine written primarily in C, Node.js qualifies
JSON is selected as the primary data format for data
exchange between all ATOM components; the actual com-
munication between the ACTORS and MONITOR is realized
via HTTP. MONITOR exposes a RESTful interface that each
ACTOR can access through simple CURL commands.
E. Monitoring Hierarchy
SinceEXCESSfollowsaholisticapproachtoenergyreduc-
tion,weareinterestedinprofilingdifferenthierarchylevelsof
the hardware and software stack: infrastructure, applications,
and functions. Profiling functions, in particular, require code
instrumentation. Code instrumentation is hardly supported by
existingmonitoringframeworks,becauseitintroducesanaddi-
tional performance overhead. Up to date, ATOM supports the
following metrics by default: PAPI-C [21], RAPL [22], Lik-
Fig. 5. Detailing the interaction between ACTORS (left) and MONITOR wid[23],/proc/meminfo,iostat,andhw_power[24].
(right). Each ACTOR loads at startup a list of plug-ins in order to sample
It should be noted that metric support will be extended in the
relevant metrics. Metric data is buffered in a FIFO queue, before it is sent
vialibCURLtothedatastore.MONITORisdeployedonaseparatenodein future; current plug-ins were selected, because they represent
ordertoreducetheperformanceoverheadatrun-time. a minimal set of basic metrics to be considered for profiling.
6
a) Infrastructure: ATOM currently includes the follow-
ing plug-ins in order to monitor the infrastructure. For perfor-
mance data, we rely on PAPI-C and /proc/meminfo. In
order to monitor the energy consumption at run-time, we im-
TABLEII
plemented plug-ins for RAPL and Likwid. An external power
ECO2CLOUDSMETRICS(PART2)
measurement system is installed in addition on the EXCESS
cluster (cf. Figure 4) to monitor the energy consumption of
specific components such as the CPU, the GPU, or the whole
Support
Metric and
Definition Unit by MF of computational node with high accuracy.
layer
EXCESS b) Applications: In order to profile applications, PAPI-C
Percentage of process execu- and iostat can be used to measure performance. Although
I/O usage
tion time in which the disk is % No PAPI-C is a so-called first-person monitoring tool, i.e. it
(virt. layer)
busy with read/write activity
monitorsaspecificprocessorthread,weimplementedathird-
Memory
Ratio of memory size used by person variant collecting data on a per-core basis. To the best
usage (virt. % No
layer) VM to total memory available ofourknowledge,thereexistscurrentlynoapproachtosample
Power con- the energy consumption of specific applications or smaller
Power currently consumed by
sumption W No building blocks. We tackle this challenge by monitoring the
the given VM
(virt. layer) execution of particular functions by using code instrumenta-
Assessment of I/O operations
Disk IOPS Ops No tion.
of a virtual resource s
c) Functions: Performing a thorough analysis of appli-
Site Current utilization of a single
utilization site; ratio of available cores to % No cationsrequirestomonitortherun-timebehavioroffunctions,
(infr. layer) total cores. and to collect additional application-specific metric data that
Storage Percentage of frontend storage is normally not supported by existing performance or energy
utilization % No
used measurement tools. Such data includes, for example, the
(infr. layer)
number of user requests to a web service. As a consequence,
Availability Shows whether the OpenNeb-
(infr. layer, ula OCCI server answers the Bool No we have implemented a light-weight library for source code
site) requests instrumentation, which acts like an extra plug-in loaded by an
Energy efficiency of a site; ACTOR. The library has the following key features:
PUE (infr. measuredasratiobetweentotal
layer) facility power and power used Ratio No • sending application-specific data,
by computing equipment • profilingspecificcodefragments(e.g.,monitoringexecu-
Power con- tion times of functions), and
Powerconsumedbygivenhost
sumption W Supportable
in given time period • retrieving historic metric data for code optimization at
(infr. layer)
run-time.
Disk IOPS I/O operations of disk within
IO/s No
(infr. layer) host In particular, the last two items are of great interest for
CPU EXCESS, because we are interested in optimizing applica-
AverageutilizationofCPUsin-
utilization side host % No tions in order to yield a better trade-off between energy and
(Infr. layer)
performance at run-time. By profiling critical code fragments,
Availability
andbeingabletodirectlygetfeedbackaboutperformanceand
(infr. layer, Availability of host Bool No
host) energy consumption, we can directly use the data as input for
Green optimization of future prediction models.
Efficiency Share of energy consumed by
Coefficient site that is produced by green % No
(GEC) (ac- energy sources F. Extensibility
counting)
Custom plug-ins are language-independent due to the fol-
Site In-
Share of power consumed by lowingthreereasons:a)theentirecommunicationisbasedon
frastructure
information technology equip-
Efficiency % No HTTP, b) the data exchange format is JSON, and c) MON-
ment power of total facility
(SIE) (ac- ITOR provides a RESTful interface. ACTORS are currently
power
counting) written in C, and plug-ins are loaded via shared objects at
Weighted average of CEFs re-
Carbon run-time. The plug-in for iostat is realized as a Unix shell
lated to energy sources used
Usage Ef- script using the streaming support of awk to parse the output
in site, where CEF is Carbon
fectiveness Dioxide Emission Factor taken Ratio No of iostat in order to continuously send new metric data to
(CUE) (ac-
from literature for each energy MONITOR.
counting)
source
V. COMPARISONANDANALYSIS
Concerningthesystemarchitecture,bothmonitoringframe-
works rely on the agent-server architecture. The monitoring
agents called ACTOR written in C and utilized in EXCESS,
7
while slower yet more advanced Zabbix agents are used in
ECO Clouds.Next,weconsiderthemajordifferencesbetween
2
the proposed approaches.
IncontrasttotheECO Cloudsproject,theEXCESSproject
2
is concerned with a holistic analysis of systems including
the application, system software, and hardware stack in order
to detect preventable energy dissipation (cf. Figure 3). Thus,
its first goal is not eco-ware automated deployment, but
rather enabling software developers to write energy-efficient
applications across different infrastructures. EXCESS sees a
need for a sophisticated co-design process between software
and hardware while developing.
Optimizing the energy consumption involves a thorough
analysis of all layers involved, in particular the hardware
and software layer. From a hardware point of view, EXCESS
developsnewenergy-savingcomponentssuchastheMovidius
Myriad 2 platform [25]. On the software side, EXCESS eval- Fig.6. Web-interfaceoftheECO2Cloudsportal
uates sophisticated solutions with respect to energy-efficient
libraries and algorithms.
TheEXCESSapproachprovidesnear-realtimemonitoring:
the maximal possible frequency of updates achieves 10 Hz,
while the Zabbix system of the ECO Clouds monitoring
2
framework updates data only once per 5 seconds for the
most metrics. As our experiments have shown, the CPU
performance overhead of the EXCESS monitoring framework
does not exceed 3% assuming that the sampling is not more
frequent than one per second. In turn, the CPU overhead
caused by the ECO Clouds Zabbix agents is closer to 10% if
2
thenumberofmonitoringmetricsishigherthan15.However,
a system based on Zabbix also supports virtual layer metrics
in addition to the physical level.
The monitoring server (MONITOR component) of the
EXCESS monitoring framework is collecting data, where Fig. 7. Web-interface of the Zabbix monitoring system used by the
Elasticsearch database is used in order to store monitoring ECO2Cloudsmonitoringframework
data. The ECO Clouds Monitoring component based on a
2
standard Zabbix aggregator relying on MySQL DB is used
in the ECO2Clouds monitoring framework, which allows for approach is better suitable for lightweight monitoring of
maintaining more complex data models with interconnected physical hardware, especially if it is required to be conducted
tables. in real time, while the ECO Clouds monitoring approach is
2
Definition of custom metrics is possible within EXCESS better to be used to collect sophisticated user-defined metrics
monitoring framework by simply editing a C configuration on virtual and application level.
file. This is also possible within the ECO Clouds monitoring
2
frameworkbyusingthecorrespondingfunctionalityofZabbix
and the energy-related values provided by the PDUs.
TABLEIII
A GUI was developed for the ECO Clouds approach in
2 KEYPROPERTIESOFTHEECO2CLOUDSANDEXCESSMONITORING
order to visualize the predicted CO2 emissions of the appli- FRAMEWORKS
cation and therefore to help user with choosing the optimal ECO2Clouds EXCESS monitoring
deployment (cf. Figure 6). In addition, ECO Clouds moni- Keyproperty monitoring
2 framework
framework
toring framework allows for using the standard Zabbix-based
Architecture agent-based agent-based
visualization of metrics (cf. Figure 7).
Non-intrusiveness medium low-intrusive
A similar GUI was developed on the top of EXCESS mon- Scalability lowupdatefrequency highupdatefrequency
itoring framework, which shows a diagram for the specified Timeliness oneupdatepersecond near-realtimeupdate
extrametricspossible extrametricspossible
metric values for the given time frame (cf. Figure 8). Extensibility
viaZabbix viacustomplug-ins
To summarize our comparison, we consider the key prop- DataStorage MySQLDB Elasticsearch
erties which characterize monitoring frameworks according to Visualization Zabbix-based web-based
theliterature[6],[9],[10]. Suchpropertieswithrespecttothe Adaptability run-time run-time
configuration configuration
ECO Clouds and EXCESS monitoring frameworks are listed
2 Predictability no no
in Table III. We can conclude that the EXCESS monitoring
8
[7] B.Koller,U.Ku¨ster,Y.Sandoval,D.Khabi,andM.Gienger,“EXCESS:
ExecutionModelsforEnergy-EfficientComputingSystems,”inSiDE—
JournalofInnovativeSupercomputinginGermany,vol.12,no.2,2014.
[8] up.time Software, “The Truth about Agent vs. Agentless Monitoring,”
http://www.uptimesoftware.com/pdfs/TruthAboutAgentVsAgentLess.
pdf,accessed:2015-03-13.
[9] G.Katsaros,R.Kubert,G.Gallizo,andT.Wang,“Monitoring:Afunda-
mentalProcesstoprovideQoSGuaranteesinCloudbasedPlatforms,”
in Cloud Computing: Methodology, System, and Applications. CRC
PressTaylorandFrancisGroup,2011.
[10] A. Telesca, F. Carena, W. Carena, S. Chapeland, V. Chibante Barroso,
F.Costa,E.De´nes,R.Divia`,U.Fuchs,A.Grigore,C.Ionita,C.Delort,
G. Simonetti, C. Soo´s, P. Vande Vyvre, and B. von Haller, “System
Performance Monitoring of the ALICE Data Acquisition System with
Zabbix,” Journal of Physics: Conference Series, vol. 513, no. 6, Jun.
2014.
[11] U.Wajid,c.cappiello,P.Plebani,B.Pernici,N.Mehandjiev,M.Vitali,
M.Gienger,K.Kavoussanakis,D.Margery,D.Perez,andP.Sampaio,
Fig. 8. Interactive visualization of running and historic experiments as “On achieving energy efficiency and reducing co2 footprint in cloud
providedbytheweb-frontendoftheEXCESSmonitoringframework computing,”CloudComputing,IEEETransactionson,vol.PP,no.99,
pp.1–1,2015.
[12] Y.Al-Hazmi,K.Campowsky,andT.Magedanz,“Amonitoringsystem
forfederatedclouds.”inCLOUDNET. IEEE,2012,pp.68–74.
VI. CONCLUSION
[13] A. Tenschert, P. Skvortsov, and M. Gienger, “Eco-efficient cloud
In this paper, we described two alternative solutions for resource monitoring and analysis,” in Joint Workshop Proceedings
of the 2nd International Conference on ICT for Sustainability 2014,
monitoring frameworks originated from the ECO Clouds and
2 Stockholm, Sweden, August 24-27, 2014., 2014, pp. 14–17. [Online].
EXCESS EU projects. Available:http://ceur-ws.org/Vol-1203/EES-paper4.pdf
The EXCESS monitoring framework is well suited for the [14] Zabbix, “The enterprise-class monitoring solution,”
fine-granular metrics gathering on the physical host level. http://www.zabbix.com/,September2015.
[15] B. E. Project, “BonFIRE API Documentation,” http://doc.
Relying on agent-based architecture, it provides low-intrusive
bonfire-project.eu/R4.0.5/reference/bonfire-api-spec.html, accessed:
real-time monitoring. 2015-11-01.
In turn, the ECO2Clouds monitoring framework provides [16] D. Hoppe, Y. Sandoval, and M. Gienger, “Atom: A near-real time
more advanced energy-aware metrics support on physical, monitoringframeworkforhpcandembeddedsystems,”July2015.
virtual and application level. It allows for the scheduler to [17] Elastic, “Elasticsearch—The Definitive Guide,” http://www.
elasticsearch.org/guide/en/elasticsearch/guide/current/, accessed: 2015-
optimize the applications deployment between cloud provider
10-01.
sites as well as within a single cloud in order to minimize the
[18] The OpenNMS Group, Inc., “The OpenNMS Project,” http://www.
resulting CO emissions. opennms.org/,accessed:2015-03-24.
2
[19] M. Bostock, J. Heer, and V. Ogievetsky, “D3.js—Data-Driven Docu-
ments,”http://d3js.org/,accessed:2015-10-01.
ACKNOWLEDGMENT [20] C. Viau, “What’s behind our Business Infographics Designer?
D3.js of course.” http://www.datameer.com/blog/uncategorized/
The research leading to these results has received funding
whats-behind-our-business-infographics-designer-d3-js-of-course-2.
from the European Union Seventh Framework Programme html,accessed:2015-03-24.
(FP7/2007-2013) under the grant agreements 611183 (EX- [21] Dan Terpstra and Heike Jagode and Haihang You and Jack Dongarra,
CESS Project) and 318048 (ECO Clouds Project). “Collecting Performance Data with PAPI-C,” in Tools for High Per-
2
formance Computing 2009, M. S. Mu¨ller, M. M. Resch, W. E. Nagel,
andA.Schulz,Eds. 3rdParallelToolsWorkshop,Dresden,Germany:
REFERENCES Springer,2009,pp.157–173.
[22] M.Ha¨hnel,B.Do¨bel,M.Vo¨lp,andH.Ha¨rtig,“MeasuringEnergyCon-
[1] J. Gray, “Jim Gray on e-Science: A Transformed Scientific Method,” sumption for Short Code Paths Using RAPL,” SIGMETRICS Perform.
NRC-CSTB,Jan.2007. Eval.Rev.,vol.40,no.3,pp.13–17,Jan.2012.
[2] G. Valentini, W. Lassonde, S. U. Khan, N. Min-Allah, S. A. Madani, [23] J. Treibig, G. Hager, and G. Wellein, “LIKWID: A Lightweight
J.Li,L.Zhang,L.Wang,N.Ghani,J.Kolodziej,H.Li,A.Y.Zomaya, Performance-oriented Tool Suite for x86 Multicore Environments,” in
C.-Z. Xu, P. Balaji, A. Vishnu, F. Pinel, J. E. Pecero, D. Kliazovich, Proceedings of the 1st International Workshop on Parallel Software
andP.Bouvry,“Anoverviewofenergyefficiencytechniquesincluster ToolsandToolInfrastructures,SanDiegoCA,2010.
computingsystems,”ClusterComputing,vol.16,no.1,pp.3–15,2013. [24] D. Khabi, “D5.2: Prototype of an Energy-aware System based on
[3] R.Basmadjian,H.DeMeer,R.Lent,andG.Giuliani,“CloudComput- ConventionalHPCTechnology,”TheEXCESSProject(FP7/2013-2016
inganditsInterestinSavingEnergy:TheUseCaseofaPrivateCloud,” grantagreementno611183),PublicDeliverable,2014.
JournalofCloudComputing,vol.1,no.1,pp.1–25,2012. [25] Movidius Ltd., “Myriad 2 Vision Processor,” http://assets5.movidius.
[4] L. Liu, H. Wang, X. Liu, X. Jin, W. B. He, Q. B. Wang, and com/wp-content/uploads/2014/07/MYRIAD2 MA2100A ProductBrief.
Y. Chen, “GreenCloud: A New Architecture for Green Data Center,” pdf,accessed:2015-03-30.
in Proceedings of the 6th International Conference Industry Session
on Autonomic Computing and Communications Industry Session, ser.
ICAC-INDST’09. NewYork,NY,USA:ACM,2009,pp.29–38.
[5] M. S. Obaidat, A. Anpalagan, and I. Woungang, Handbook of Green
Information and Communication Systems, 1st ed. Academic Press,
2012.
[6] G.Aceto,A.Botta,W.DeDonato,andA.Pescape`,“CloudMonitoring:
ASurvey,”ComputerNetworks,vol.57,no.9,pp.2093–2115,2013.