SECURE HARDWARE RESOURCE MONITORING, USAGE OPTIMIZATION AND AFFIRMATION FOR DATABASE OPERATIONS IN VIRTUALIZED CLOUD ENVIRONMENT TAN CHEE HENG THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENT FOR THE DEGREE OF DOCTOR OF PHILOSOPHY FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY UNIVERSITY OF MALAYA KUALA LUMPUR 2014 UNIVERSITY MALAYA ORIGINAL LITERARY WORK DECLARATION Name of Candidate: TAN CHEE HENG (I.C/Passport No: 751022-08-5517) Registration/Matric No: WHA060001 Name of Degree: DOCTOR OF PHILOSOPHY Title of Project Paper/Research Report/Dissertation/Thesis (“this Work”): SECURE HARDWARE RESOURCE MONITORING, USAGE OPTIMIZATION AND AFFIRMATION FOR DATABASE OPERATIONS IN VIRTUALIZED CLOUD ENVIRONMENT Field of Study: DATA MANAGEMENT I do solemnly and sincerely declare that: (1) I am the sole author/writer of this Work; (2) This Work is original; (3) Any use of any work in which copyright exists was done by way of fair dealing and for permitted purposes and any excerpt or extract from, or reference to or reproduction of any copyright work has been disclosed expressly and sufficiently and the title of the Work and its authorship have been acknowledged in this Work; (4) I do not have any actual knowledge nor do I ought reasonably to know that the making of this work constitutes an infringement of any copyright work; (5) I hereby assign all and every rights in the copyright to this Work to the University of Malaya (“UM”), who henceforth shall be owner of the copyright in this Work and that any reproduction or use in any form or by any means whatsoever is prohibited without the written consent of UM having been first had and obtained; (6) I am fully aware that if in the course of making this Work I have infringed any copyright whether intentionally or otherwise, I may be subject to legal action or any other action as may be determined by UM. Candidate’s Signature Date Subscribed and solemnly declared before, Witness’s Signature Date Name: Designation: ii ABSTRACT Hardware resource management is an important topic in Information Technology (IT) industry. This is due to the increasing demand of computing power by ever-evolving applications, especially those which are Service Level Agreement (SLA)-bound. Undeniably, hardware cost has reduced significantly in recent time. However this does not translate into saving in capital and operational costs of businesses as the computing resource requirement from new applications overwhelms the reduction in hardware cost. Hence, cloud computing paradigm evolved from conventional grid and utility computing, to provide for the aggressive computational demands. To better serve the hosting in cloud environments, particularly in industries where data sensitivity and privacy is of major concern, better mechanisms are needed in the resource management arena. The proposed mechanisms in this research avoided access to real data in the database, to meet the objectives of effective hardware resource administration. Here, the hardware resource management in virtualized cloud environment is scrutinized. The topics of interest are in the area of resource utilization monitoring, optimization and affirmation. The proposed mechanisms provide alternatives to conventional methods which are commonly adopted by the wide IT industry today. The target is to provide more simplified approaches to these conventional tools, with faster and more accurate attributes in sight. In resource utilization monitoring area, metadata of the actual data is characterized to yield an understanding of the workload in the database, which then contributes to the decision in planning for hardware provisioning and de-provisioning activities, as well as resource scaling arrangement. Consequently, a mechanism is proposed to serve the resource utilization optimization objective. In this research area, hardware fault and failure analysis are investigated, in iii order to provide an optimal operating environment to database transactions. The analysis on the hardware fault and failure symptoms is performed against the output obtained from the iterative execution of Transaction Processing Performance Council (TPC)-H queries. Baseline is established and parameters’ values obtained from subsequent testing on the same set of queries are compared to baseline’s values to obtain insightful information on the hardware state. Next, the resource utilization affirmation theme deals with the proposition to establish stress-testing scenario in the Virtual Machine (VM). The work here strives to construct an environment in the VM whereby validation on transactions’ response time can be performed at the hypothetical resource constraining point in the VM. It serves this validation purpose in 2 situations: when the VM undergoes hardware change, or during normal operations. Verification is performed by stressing the VM to the resource constraining point using the proposed method; subsequently SLA-bound transactions are sent to the database and their respective response time is examined and compared to expected response time. The proposed mechanism also incorporates technique to determine the resource threshold from database transactions perspective. The resource utilization monitoring utilizes metadata from representative workload, whereas the resource utilization optimization and affirmation mechanisms utilize the hypothetical data and queries from TPC-H benchmark, hence achieving the objective of eluding access to real data. These deliveries focus on the consistency, stability and accuracy attributes. iv ABSTRAK Pengurusan sumber perkakasan komputer merupakan satu topik penting dalam industri Teknologi Maklumat. Ini disebabkan oleh permintaan kuasa pemprosesan maklumat yang semakin meningkat, terutamanya daripada aplikasi yang mengalami evolusi yang pesat and terikat kepada Perjanjian Tahap Perkhidmatan yang ketat. Tidak dapat dinafikan, bahawa kos perkakasan komputer telah dikurangkan secara ketara dalam masa kebelakangan ini. Walaupun sedemikian, ini tidak dapat diterjemahkan kepada penjimatan dalam modal dan kos operasi perniagaan, kerana keperluan sumber pengkomputeran daripada aplikasi baru mengatasi pengurangan kos perkakasan komputer. Oleh itu, paradigma pengkomputeran awan berkembang daripada grid konvensional dan pengkomputeran utiliti, untuk mewujudkan peruntukan kuasa pengkomputeran demi memenuhi permintaan yang agresif ini. Dalam persekitaran awan, terutamanya dalam industri di mana sensitiviti dan privasi data merupakan faktor penting, mekanisme yang lebih baik diperlukan dalam arena pengurusan sumber pegkomputeran. Mekanisme yang dicadangkan dalam kajian ini mengelakkan akses kepada data sebenar dalam pangkalan data, sementara memenuhi objektif pentadbiran sumber perkakasan komputer yang berkesan. Kajian ini meneliti tentang pengurusan sumber perkakasan komputer dalam persekitaran pengkomputeran awan. Topik-topik yang dikaji adalah dalam bidang pemantauan, pengoptimuman dan pengesahan penggunaan sumber pemprosesan komputer. Mekanisme yang dicadangkan menyediakan alternatif kepada kaedah konvensional yang luas dipraktikkan oleh industri IT kini. Sasaran objektif adalah untuk memberi pendekatan yang lebih mudah berbanding dengan kaedah-kaedahkonvensional; sementara sifat-sifat seperti lebih cepat dan lebih tepat juga merupakan sasaran kajian tersebut. v Dalam subjek pengurusan sumber pemprosesan komputer, topik pemantauan penggunaan meneliti metadata data sebenar untuk menghasilkan pemahaman tentang beban kerja dalam pangkalan data. Pemahaman ini kemudiannya menyumbang kepada keputusan dalam perancangan penambahan atau pengurangan perkakasan pengkomputeran. Selepas topik pemantauan dikaji, topik seterusnya adalah untuk menghasilkan satu mekanisme untuk memenuhi objektif pengoptimuman penggunaan sumber pemprosesan komputer. Dalam kawasan kajian ini, kegagalan perkakasan pengkomputeran dan analisis kemungkinan kegagalan disiasat. Activiti sedemikian amat penting supaya transaksi pangkalan data dapat beroperasi dalam persekitaran yang optimum. Thesis ini menjalankan ujikaji analisa ke atas kegagalan and kemungkinan kegagalan perkakasan pengkomputeran dengan menggunakan output yang diperolehi daripada Institusi TPC. Data yang digunakan adalah daripada standard TPC-H. Garis dasar diasaskan dan nilai- nilai parameter yang diperolehi daripada ujian berikutnya pada set data yang sama berbanding dengan nilai-nilai garis dasar, akan memberikan penjelasan mengenai keadaan operasi perkakasan pengkomputeran. Kajian seterusnya berobjektif untuk mengesahkan keupayaan sumber pemprosesan komputer di persekitaran pengkomputeran awan. Eksperimen yang dilaksanakan bertujuan untuk mengasaskan persekitaran sistem operasi yang menggunakan sumber pemprosesan komputer secara agresif. Selepas keadaan sedemikian berlaku, transaksi penting diperkenalkan kepada pengkalan data, supaya tempoh pemprosesan dan tindak balas bagi transaksi berkenaan dapat dibandingkan dengan jangkaan. Keperluan pengesahkan sedemikian boleh dilakukan dalam 2 keadaan: Apabila sistem operasi mengalami situasi perubahan dalam kuantiti and kualiti perkakasan, atau semasa keadaan pengoperasian biasa di mana tiada perubahan langsung dalam perkakasan pengkomputeran. Pengesahan dilakukan dengan menekan sistem operasi ke tahap yang vi mengekang sumber pemprosesan komputer. Mekanisme yang dicadangkan juga menggabungkan teknik untuk menentukan had kuasa perkakasan pengkomputeran, daripada persepsi transaksi pangkalan data. Pemantauan penggunaan sumber pemprosesan komputer menggunakan metadata dari beban kerja yang berupaya mewakili keadaan sebenar, manakala mekanisme pengoptimuman dan pengesahan penggunaan sumber pemprosesan komputer menggunakan data hipotesis dan data daripada standard TPC-H. Sumber input sedemikian dapat mencapai objektif untuk mengelakkan akses kepada data sebenar. Tumpuan ujikaji tersebut memberi penekanan terhadap ketekalan, kestabilan dan ketepatan sementara mencapai objektif yang dekehendaki. vii ACKNOWLEDGEMENT First and foremost I would like to express my special appreciation to my supervisor, Associate Professor Dr. Teh Ying Wah, you have been a tremendous mentor to me. I would like to thank you for your patient in this lengthy journey of my pursuit, and for improving and refining my views in many topics. Your perspectives and advices will become my valuable assets in my life journey. A special thanks to my family. Words cannot express how grateful I am to my wife, Lee Choo, for her faith in me, and her incredible patience with me. Words cannot describe my love to my daughter, Ke Ying, my boys, Yong Hung, Yong Keat and Yong Jen. You have made my everyday life immensely joyous and fullfilling. Lastly, I would like to express my gratitude to my parents for their relentless support. Your kindness and confidence are critical elements for the completion of this thesis. viii TABLE OF CONTENTS ABSTRACT .................................................................................................................... iii ABSTRAK ....................................................................................................................... v ACKNOWLEDGEMENT ........................................................................................... viii TABLE OF CONTENTS ............................................................................................... ix LIST OF FIGURES ...................................................................................................... xv LIST OF TABLES ....................................................................................................... xxi LIST OF ABBREVIATIONS AND ACRONYMS ................................................. xxiii 1. INTRODUCTION ................................................................................................. 1 1.1 Background: Secure Resource Management .................................................... 1 1.2 Overview of cloud computing ........................................................................... 4 1.3 Motivation ....................................................................................................... 12 1.4 Problem statements ......................................................................................... 16 1.4.1 SLA compliance and monitoring systems ............................................... 16 1.4.2 Dynamic scalability issue for Parallel Database .................................... 17 1.4.3 Continuous fault analysis ........................................................................ 17 1.4.4 Shortcoming of benchmarks .................................................................... 18 1.4.5 Data security issue ................................................................................... 18 1.4.6 Insufficient measurement methods .......................................................... 19 1.5 Current practices ............................................................................................. 20 1.6 Research questions .......................................................................................... 22 1.7 Research objectives ......................................................................................... 24 1.8 Scope of research ............................................................................................ 25 1.9 Chapter organization ....................................................................................... 26 2. LITERATURE REVIEW ................................................................................... 28 2.1 Introduction ..................................................................................................... 28 ix 2.1.1 Resource utilization monitoring .............................................................. 30 2.1.2 Resource utilization optimization ............................................................ 31 2.1.3 Resource utilization affirmation .............................................................. 33 2.2 Virtualized cloud infrastructure ...................................................................... 33 2.3 Data security.................................................................................................... 44 2.4 Resource utilization monitoring ...................................................................... 57 2.4.1 Monitoring models and on-demand resource scaling .............................. 57 2.4.2 Resource scalability in Parallel Database architecture .......................... 69 2.4.3 Statistical modeling and benchmarking................................................... 74 Proof of concept – the linear correlation ......................................... 74 Mathematical models ...................................................................... 76 Linear regression ............................................................................. 77 Machine learning ............................................................................. 79 Fuzzy computing ............................................................................. 80 Linear Programming and Simplex Method ..................................... 82 TPC benchmark ............................................................................... 86 2.4.4 Measurement methods ............................................................................. 87 Hierarchical clustering .................................................................... 90 K-mean Clustering .......................................................................... 91 Maximum Likelihood Estimation ................................................... 92 Goodness of Fit ............................................................................... 93 2.4.5 Workload characterization ....................................................................... 96 2.5 Resource utilization optimization .................................................................. 103 2.5.1 Fault analysis and failure prediction ...................................................... 106 2.5.2 Resource utilization optimization models ............................................. 114 Task scheduling ............................................................................. 114 x

