ebook img

Analyzing Time Interval Data: Introducing an Information System for Time Interval Data Analysis PDF

250 Pages·2016·4.88 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Analyzing Time Interval Data: Introducing an Information System for Time Interval Data Analysis

Philipp Meisen Analyzing Time Interval Data Introducing an Information System for Time Interval Data Analysis Analyzing Time Interval Data Philipp Meisen Analyzing Time Interval Data Introducing an Information System for Time Interval Data Analysis Philipp Meisen Aachen, Germany D82 (Diss. RWTH Aachen University, 2015) ISBN 978-3-658-15727-2 ISBN 978-3-658-15728-9 (eBook) DOI 10.1007/978-3-658-15728-9 Library of Congress Control Number: 2016952631 Springer Vieweg © Springer Fachmedien Wiesbaden GmbH 2016 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specicfi ally the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microlfims or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specicfi statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper This Springer Vieweg imprint is published by Springer Nature The registered company is Springer Fachmedien Wiesbaden GmbH The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany Acknowledgments For Edison and Isaac First of all, I want to thank all the people that helped me making this work possible. Especially, I want to mention Sabina Jeschke for her super- vision and advice, my managing director, friend, and brother Tobias Meisen for sharing his knowledge and experience and pushing me whenever needed, my co-worker and friend Christian Kohlschein for listening, having endless discussions and reviewing my work, Angelika Reimer for creating the illustrations, and Diane Wittman for helping me formatting the book. I also want to give some special thanks and dedications to the people, which follow me my whole life like my own shadow. My elder brother Holger, who helped me whenever I was in doubt, my already mentioned twin- brother Tobias for all the “Schokostreuselbrötchen” and discussions, my parents for making all this possible by having, loving, and supporting me, and also my dearest friends Tummel, Hoomer, Christian, Diane, and Marco for every talk, time-out, and drink we had. Thank you all, for being there for me whenever needed. Last but not least, I want to express my deepest gratitude to my wife Deborah for her support whenever it was needed. Without her this work would never have been possible. Philipp Abstract Time interval data is data which associates information with a specific time range (i.e., a time window) defined by a start- and an end time point. Thus, time intervals are a generalization of time points, i.e., each time point is a time interval having the same start- and end time point. Nowadays, huge sets of time interval data is collected in various situations, e.g., personnel deployment, equipment usage, process control, or process management. Common systems are not capable to analyze these amounts of time inter- val data. Questions like “How many resources were utilized on Mondays in an annual average” or “Which days overlap with the planning and which are diametrically” cannot be answered utilizing modern systems or need extensive data integration processes. In this thesis, a model to analyze time interval data (TIDAMODEL) is in- troduced. Based on this model, a query language (TIDAQL) is defined, which can be utilized to answer complex questions as presented in the previous chapter. Furthermore, a similarity measure based on different types of distance measures (TIDADISTANCE) is presented. This similarity measure enables users to search for similar situations within a time interval database. The different solutions are combined to design and realize the central result of the thesis, i.e., an information system to analyze time in- terval data (TIDAIS). The introduced system utilizes different, bitmap based indexes, which enable the system to handle huge amounts of data. The results of the evaluation show that the presented implementation fulfills the requirements formulated by different stakeholders. In addition, it outperforms state-of-the-art solutions (e.g., solutions based on the Oracle database management system, icCube, or TimeDB). Zusammenfassung Zeitintervalldaten sind Daten welche innerhalb eines Zeitfensters, d.h. zwi- schen einem Start- und Endzeitpunkt, erfasst werden und eine Verallge- meinerung von Zeitpunktdaten darstellen. Heutzutage werden immer häu- figer große Mengen von Zeitintervalldaten in Bereichen wie z.B. der Perso- naldisposition, Gerätenutzung, Prozesssteuerung oder Planung erfasst. Die Auswertung von diesen Daten stellt gängige Analysesysteme vor große Herausforderungen. Fragestellungen wie „Wie viele Ressourcen wurden im Jahresdurchschnitt montags über den Tag verteilt in der Ferti- gung benötigt?“ oder „Welche Tage sind bzgl. der Planung am genausten und welche verlaufen diametral“ können meistens mit modernen Systemen gar nicht modelliert oder nur durch Verwendung von langwierigen Integra- tionsprozessen beantwortet werden. In dieser Arbeit wird zunächst eine auf diskreten Zeitachsen basierende Modellierung (TIDAMODEL) vorgestellt. Basierend auf dieser Modellierung wird im Weiteren eine Anfragesprache (TIDAQL) definiert, welche die Be- antwortung komplexer Fragestellungen, wie weiter oben angedeutet, er- möglicht. Neben der Beantwortung von Fragen ist die Suche nach ähnli- chen Gegebenheiten eine wichtige Eigenschaft von Informationssystemen. Um diese Ähnlichkeitssuche zu ermöglichen, wird in der Arbeit ein Ähn- lichkeitsmaß (TIDADISTANCE) präsentiert. Diese einzelnen vorgestellten Teilergebnisse werden genutzt, um das zentrale Ergebnis der Arbeit, ein Informationssystem zur Analyse von Zeitintervalldaten (TIDAIS), zu entwer- fen und zu realisieren. Das vorgestellte System basiert dabei auf Bitmaps, welche die Auswertung von großen Datenmengen von Zeitintervalldaten ermöglicht. Die Evaluierungsergebnisse zeigen, dass das vorgestellte Sys- tem andere Lösungen (z.B. Lösungen die auf icCube, TimeDB oder mo- derne Datenbankmanagementsysteme wie Oracle basieren) bzgl. der Aus- wertungsperformanz übertrifft. Table of Contents Acknowledgments V  Abstract VII  Zusammenfassung IX  Table of Contents XI  List of Abbreviations XV  List of Figures XIX  List of Tables XXV  List of Listings XXVII  List of Definitions XXXI  1 Introduction and Motivation 1  2 Time Interval Data Analysis 7  2.1  Time 7  2.1.1  Time Intervals 7  2.1.2   Time Interval Data Aggregation 10  2.1.3  Temporal Models 14  2.1.4  Temporal Operators 20  2.1.5  Temporal Concepts 22  2.1.6  Special Characteristics of Time 23  2.2  Features of Time Interval Data Analysis Information System 29  2.2.1  Analytical Capabilities 30  2.2.2  Time Interval Data Analysis Process 35  2.2.3  User Interface, Visualization, and User Interactions 42  2.3  Summary 43  3 State of the Art 45  3.1  Analytical Information Systems 45  3.2  Analyzing Time Interval Data: Different Approaches 46  3.2.1  On-Line Analytical Processing 47  3.2.2  Temporal Pattern Mining & Association Rule Mining 52  3.2.3  Visual Analytics 54 XII Table of Contents 3.3  Performance Improvements 56 3.3.1  Indexing Time Interval Data 56 3.3.2  Aggregating Time Interval Data 60 3.3.3  Caching Time Interval Data 61 3.4  Analytical Query Languages for Temporal Data 62 3.5  Similarity of Time Interval Data 67 3.6  Summary 70 4 TIDAMODEL: Modeling Time Interval Data 73  4.1  Time Axis  73 4.2  Descriptors  76 4.3  Time Interval Database  80 4.4  Dimensional Modeling  82 4.5  Summary 87 5 TIDAQL: Querying for Time Interval Data 91  5.1  Data Control Language 92 5.2  Data Definition Language 95 5.3  Data Manipulation Language 96 5.3.1  Insert, Delete, & Update Statements 97 5.3.2  Get & Alive Statements 99 5.3.3  Select Statements 100 5.4  Summary 108 6 TIDADISTANCE: Similarity of Time Interval Data 111  6.1  Temporal Order Distance 113 6.2  Temporal Relational Distance 115 6.3  Temporal Measure Distance 117 6.4  Temporal Similarity Measure 118 7 TIDAIS: An Information System for Time Interval Data 121  7.1  System’s Architecture, Components, and Implementation 121 7.1.1  Data Repository 125 7.1.2  Cache & Storage 127 7.2  Configuration 129 Table of Contents XIII 7.2.1  Model Configuration 130  7.2.2  System Configuration 145  7.3  Data Structures & Algorithms 149  7.3.1  Model Handling 150  7.3.2  Indexes 156  7.3.3  Caching & Storage 165  7.3.4  Aggregation Techniques 167  7.3.5  Distance Calculation 171  7.4  User Interfaces 176  7.5  Summary 178  8 Results & Evaluation 181  8.1  Requirements & Features 181  8.2  Performance 187  8.2.1  High Performance Collections 188  8.2.2  Load Performance 189  8.2.3  Selection Performance 190  8.2.4  Distance Performance 196  8.2.5  Proprietary Solutions vs. TIDAIS 197  8.3  Summary 201  9 Summary and Outlook 203  Appendix 205  Pipelined Table Functions (PL/SQL Oracle) 205  A Complete Sample Model-Configuration-File 206  A Complete Sample Configuration-File 211  Detailed Overview of the Runtime Performance 215  3-NN of the Temporal Relational Similarity 217  Bibliography 219

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.