ebook img

Data Generation for the Simulation of Artifact-Centric Processes PDF

126 Pages·2014·2.06 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Generation for the Simulation of Artifact-Centric Processes

Data Generation for the Simulation of Artifact-Centric Processes Emona Nakuçi Supervisor: Alberto Abelló Advisors: Petar Jovanovic and Vasileios Theodorou MASTER ON INFORMATION TECHNOLOGIES FOR BUSINESS INTELLIGENCE Universitat Politècnica de Catalunya Barcelona July 2014 ACKNOWLEDGEMENTS I would like to dedicate few words to the people who helped me accomplish this thesis, which represents not only my work, but also that of those who supported me throughout this research. Hence, it gives me great pleasure in acknowledging the support and help of my advisors Petar Jovanovic and Vasileios Theodorou for their continuous guidance and useful advices. I would like to express my deepest gratitude to my supervisor, Alberto Abell´o for believing in me and helping me think deeply about the problem. Many thanks go to my dearest friends who always made me smile and think positively even in the hardest times. Finally, my special thanks go to my wonderful family to whom I dedicate this thesis. I thank my parents for always being by my side, although often virtually. I would like to especially thank my brother, Enor, for always believing in me and encouraging me to achieve more. None of this would have been possible without your love and support. iii TABLE OF CONTENTS ACKNOWLEDGEMENTS iii ABSTRACT 1 1 INTRODUCTION 2 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Objectives of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Scientific Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 BACKGROUND 8 2.1 Business Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.1 BPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 BPMN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Data Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 ETL Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.2 ETL Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Quantitative and Qualitative Analysis of Business Processes . . . . . 18 3 RELATED WORK 22 3.1 ETL Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 ETL Evaluation and Optimization . . . . . . . . . . . . . . . . . . . 23 iv v 3.3 ETL Process Benchmarking . . . . . . . . . . . . . . . . . . . . . . . 24 3.4 Data Generators for Benchmarking Purposes . . . . . . . . . . . . . . 26 3.5 Constraint-based Data Generation . . . . . . . . . . . . . . . . . . . . 31 3.6 BPMN Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 FORMALIZING THE PROBLEM OF DATA GENERATION 37 4.1 Defining the ETL Operation . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Data Integration Tools Selection . . . . . . . . . . . . . . . . . . . . . 40 4.3 Literature Review on ETL Taxonomies . . . . . . . . . . . . . . . . . 40 4.4 Proposed ETL Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4.1 Defining List of ETL Operations . . . . . . . . . . . . . . . . 42 4.4.2 Proposed ETL Taxonomy . . . . . . . . . . . . . . . . . . . . 46 4.4.3 Value Level Operations . . . . . . . . . . . . . . . . . . . . . . 53 4.4.4 Attribute Level Operations . . . . . . . . . . . . . . . . . . . . 53 4.4.5 Tuple Level Operations . . . . . . . . . . . . . . . . . . . . . . 54 4.4.6 Dataset Level Operations . . . . . . . . . . . . . . . . . . . . 55 4.4.7 Schema Level Operations . . . . . . . . . . . . . . . . . . . . . 56 4.4.8 Relation Level Operations . . . . . . . . . . . . . . . . . . . . 57 4.5 ETL Operation Semantics Definition . . . . . . . . . . . . . . . . . . 58 4.6 Solution Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.6.1 Atomic Operations . . . . . . . . . . . . . . . . . . . . . . . . 63 4.6.2 Logical Predicates . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.6.3 Custom-defined Transformations . . . . . . . . . . . . . . . . 64 4.6.4 Tool-specific Operations . . . . . . . . . . . . . . . . . . . . . 64 4.7 Extensibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5 ARCHITECTURE DESIGN 67 5.1 ETL Data Generation Architecture . . . . . . . . . . . . . . . . . . . 67 5.1.1 Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.1.2 Model Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.1.3 Model Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.1.4 Model Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 71 vi 5.1.5 Data Generation . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.1.6 Presentation Layer . . . . . . . . . . . . . . . . . . . . . . . . 72 5.1.7 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.8 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.2 Data Generation Algorithm . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.1 Process Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2.3 Algorithm Illustration . . . . . . . . . . . . . . . . . . . . . . 80 6 PROTOTYPING 92 6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.1.1 Technologies Used . . . . . . . . . . . . . . . . . . . . . . . . 93 6.1.2 Implemented Architecture . . . . . . . . . . . . . . . . . . . . 94 6.2 Experimental Setup and Testing . . . . . . . . . . . . . . . . . . . . . 99 6.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7 CONCLUSIONS AND FUTURE WORK 106 7.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 BIBLIOGRAPHY 109 APPENDIX 116 A.1 ETL Operation Semantics Definition . . . . . . . . . . . . . . . . . . 116 LIST OF FIGURES 1.1 Thesis positioning and contribution . . . . . . . . . . . . . . . . . . . 5 2.1 An ETL process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Logical modeling layer of ETL flow. Example taken from [1] . . . . . 14 2.3 ETL flow modeled in BPMN. Example taken from [2] . . . . . . . . . 16 2.4 Performance taxonomy taken from [3] . . . . . . . . . . . . . . . . . . 19 3.1 BP simulation workflow . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1 Example of an ETL atom and molecule taken from [4] . . . . . . . . . 39 4.2 Adapting taxonomy from [4] to our scope . . . . . . . . . . . . . . . . 47 4.3 Relation access level taxonomy . . . . . . . . . . . . . . . . . . . . . 48 4.4 Proposed ETL taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.5 Value-level operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.6 Attribute-level operations . . . . . . . . . . . . . . . . . . . . . . . . 54 4.7 Tuple-level operations . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.8 Dataset-level operations . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.9 Schema-level operations . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.10 Relation-level operations . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.1 ETL data generator framework architecture . . . . . . . . . . . . . . 68 5.2 Model parsing layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3 Model extraction layer . . . . . . . . . . . . . . . . . . . . . . . . . . 70 vii viii 5.4 Model analyzer layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.5 Data generation layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.6 Presentation layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.7 Data generation process flow . . . . . . . . . . . . . . . . . . . . . . . 74 5.8 ETL flow example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.9 Attribute parameters structure, AP . . . . . . . . . . . . . . . . . . . 81 5.10 Operation parameters structure, OP . . . . . . . . . . . . . . . . . . 81 5.11 Table of constraints, TC . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.12 First iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.13 First iteration result . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.14 Second iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.15 Second iteration result . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.16 Third iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.17 Third iteration result . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.18 Fourth iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.19 Fourth iteration result . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.20 Final result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.1 Referenced architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.2 Example of an expression tree . . . . . . . . . . . . . . . . . . . . . . 95 6.3 Implementation architecture . . . . . . . . . . . . . . . . . . . . . . . 97 6.4 Basic ETL scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 6.5 Incremental ETL scenario . . . . . . . . . . . . . . . . . . . . . . . . 101 6.6 Generation time wrt flow complexity . . . . . . . . . . . . . . . . . . 102 6.7 Generation time wrt load size . . . . . . . . . . . . . . . . . . . . . . 103 6.8 Linear trend of the data generation performance for flow with com- plexity 4 and 9 as load size increases . . . . . . . . . . . . . . . . . . 103 LIST OF TABLES 2.1 BI definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1 Data generator tools’ feature support . . . . . . . . . . . . . . . . . . 29 4.1 Source and target operations . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 List of operations considered in the framework . . . . . . . . . . . . . 42 4.3 Transformations provided by four main ETL tools . . . . . . . . . . . 45 4.4 Table of operation semantics notations . . . . . . . . . . . . . . . . . 60 1 Table of ETL operations semantics . . . . . . . . . . . . . . . . . . . 116 ix LIST OF ABBREVIATIONS BI Business Intelligence BPEL Business Process Execution Language BPM Business Process Management BPMN Business Process Modeling Notation BPS Business Process Simulation CSV Comma-Separated Values DAG Directed Acyclic Graph DBMS Database Management System DW Data Warehouse ETL Extract, Transform and Load OWB Oracle Warehouse Builder SOA Service-Oriented Architecture SSIS SQL Server Integration Services SQL Structured Query Language UML Unified Modeling Language XML Extensible Markup Language x

Description:
1.1 Thesis positioning and contribution Data Warehouse. ETL. Extract, Transform and Load. OWB. Oracle Warehouse Builder. SOA. Service-Oriented Architecture. SSIS. SQL Server Integration Services. SQL. Structured . taxonomy based on the ETL semantics principles that will help and guide us.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.