Data Warehouses and OLAP: Concepts, Architectures and Solutions Robert Wrembel Poznań University of Technology, Poland Christian Koncilia Panoratio GmbH, Germany IRM Press Publisher of innovative scholarly and professional infor- mation technology titles in the cyberage Hershey • London • Melbourne • Singapore Acquisitions Editor: Kristin Klinger Development Editor: Kristin Roth Senior Managing Editor: Jennifer Neidig Managing Editor: Sara Reed Assistant Managing Editor: Sharon Berger Copy Editor: April Schmidt Typesetter: Diane Huskinson Cover Design: Lisa Tosheff Printed at: Integrated Book Technology Published in the United States of America by IRM Press (an imprint of Idea Group Inc.) 701 E. Chocolate Avenue, Suite 200 Hershey PA 17033-1240 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: [email protected] Web site: http://www.irm-press.com and in the United Kingdom by IRM Press (an imprint of Idea Group Inc.) 3 Henrietta Street Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 0609 Web site: http://www.eurospanonline.com Copyright © 2007 by Idea Group Inc. All rights reserved. No part of this book may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this book are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Data warehouses and OLAP : concepts, architectures, and solutions / Robert Wrembel and Christian Koncilia, editors. p. cm. Summary: “This book provides an insight into important research and technological problems, solutions, and development trends in the field of data warehousing and OLAP. It also serves as an up-to-date bibliography of published works for anyone interested in cutting-edge DW and OLAP issues”--Provided by publisher. Includes bibliographical references and index. ISBN 1-59904-364-5 (hardcover) -- ISBN 1-59904-365-3 (softcover) -- ISBN 1-59904-366-1 (ebook) 1. Data warehousing. 2. OLAP technology. I. Wrembel, Robert. II. Koncilia, Christian, 1969- QA76.9.D37D392 2007 005.74--dc22 2006027721British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the authors, but not necessarily of the publisher. iii Data Warehouses and OLAP: Concepts, Architectures and Solutions Table of Contents Foreword...........................................................................................................vi Preface.............................................................................................................viii Section.I:.Modeling.and.Designing Chapter.I Conceptual.Modeling.Solutions.for.the.Data.Warehouse.............................1 Stefano Rizzi, DEIS-University of Bologna, Italy Chapter.II Handling.Structural.Heterogeneity.in.OLAP...............................................27 Carlos A. Hurtado, Universidad de Chile, Chile Claudio Gutierrez, Universidad de Chile, Chile Chapter.III Data.Quality-Based.Requirements.Elicitation.for.Decision.Support. Systems.............................................................................................................58 Alejandro Vaisman, Universidad de Buenos Aires, Argentina iv Section.II:.Loading.and.Refreshing Chapter.IV Extraction,.Transformation,.and.Loading.Processes..................................88 Jovanka Adzic, Telecom Italia, Italy Valter Fiore, Telecom Italia, Italy Luisella Sisto, Telecom Italia, Italy Chapter.V Data.Warehouse.Refreshment......................................................................111 Alkis Simitsis, National Technical University of Athens, Greece Panos Vassiliadis, University of Ioannina, Greece Spiros Skiadopoulos, University of Peloponnese, Greece Timos Sellis, National Technical University of Athens, Greece Section III: Efficiency of Analytical Processing Chapter.VI Advanced.Ad.Hoc.Star.Query.Processing..................................................136 Nikos Karayannidis, National Technical University of Athens, Greece Aris Tsois, National Technical University of Athens, Greece Timos Sellis, National Technical University of Athens, Greece Chapter.VII Bitmap.Indices.for.Data.Warehouses..........................................................157 Kurt Stockinger, Lawrence Berkeley National Laboratory, University of California, USA Kesheng Wu, Lawrence Berkeley National Laboratory, University of California, USA Chapter.VIII Indexing.in.Data.Warehouses:.Bitmaps.and.Beyond................................179 Karen C. Davis, University of Cincinnati, USA Ashima Gupta, University of Cincinnati, USA Chapter.IX Efficient and Robust Node-Partitioned Data Warehouses........................203 Pedro Furtado, Universidade de Coimbra, Portugal Chapter.X OLAP with a Database Cluster....................................................................230 Uwe Röhm, University of Sydney, Australia v Chapter.XI Toward.Integrating.Data.Warehousing.with.Data.Mining.Techniques...253 Rokia Missaoui, Université du Québec en Outaouais, Canada Ganaël Jatteau, Université du Québec en Outaouais, Canada Ameur Boujenoui, University of Ottawa, Canada Sami Naouali, Université du Québec en Outaouais, Canada Chapter.XII Temporal.Semistructured.Data.Models.and.Data.Warehouses................277 Carlo Combi, University of Verona, Italy Barbara Oliboni, University of Verona, Italy Chapter.XIII Spatial.Online.Analytical.Processing.(SOLAP):.Concepts,.Architectures,. and.Solutions.from.a.Geomatics.Engineering.Perspective.......................298 Yvan Bédard, Laval University, Canada Sonia Rivest, Laval University, Canada Marie-Josée Proulx, Laval University, Canada About the Editors..........................................................................................320 About the Authors.........................................................................................321 Index................................................................................................................328 vi Foreword Data warehouse systems have become a key component of the corporate informa- tion system architecture, in which they play a crucial role in building business decision support systems. By collecting and consolidating data from a variety of enterprise internal and external sources, data warehouses try to provide a homo- geneous information basis for enterprise planning and decision making. We have recently witnessed a rapid growth both in the number of data warehousing products and services offered as well as in the acceptance of these technologies by industry. Within recent years, data warehouses have faced a tremendous shift from simple centralized repositories used to store cash-register transactions to a platform for data integration, federation, and sophisticated data analysis. Nowadays, data warehousing technologies are successfully used in many industries including retail, manufactur- ing, financial services, banking, telecommunication, healthcare, and so forth. Data warehousing technology is currently a very active field of research. Research problems associated with creating, maintaining, and using data warehouse technology are partially similar to those specific for database systems. In fact, a data warehouse can be considered as “large” database system with additional functionality. However, the well-known problems of index selection, data partitioning, materialized view maintenance, data integration, query optimization, have received renewed attention in warehousing research. Some research problems are specific to data warehousing: data acquisition and data cleaning, data warehouse refreshment, evolution of data warehouse schema, multidimensional and parallel query optimization, conceptual modeling for the data warehouses, data quality management, and so forth. This book addresses all the above mentioned issues in the area of data warehousing from multiple perspectives, in the form of individual contributions written by prominent data warehouse technology researchers, and it also outlines new trends and future challenges in the context of next generation data warehouse systems. vii In reading the book, I was impressed by how much the field of data warehousing has advanced and matured. The book describes different aspects of data warehousing technology and gives an insight into important research, technological, and practical problems and solutions related to the data warehousing technology. The content of the book covers fundamental aspects of data warehousing technology such as the conceptual modeling and design of data warehouse systems, data warehouse refresh- ment, query optimization, indexes, integration of the data warehouse technology with data mining techniques, and, finally, new trends in data warehousing such as temporal semistructured data models and spatial online analytical processing. I am pleased to recommend this book to the readers. If you are a researcher, a data warehouse developer, or just a keen reader wishing to understand important aspects of data warehouses and their potential, you will find that this book provides both a solid technical background and state-of-the-art knowledge on this interesting and important topic. The book is a valuable source of information for academics and practitioners who are interested in learning the key ideas in the field of data warehousing. This book is likely to become a standard reference in the field of data warehousing for many years. Tadeusz Morzy Poznań University of Technology, Poland June 2006 viii Preface Nowadays the economy is characterized by fast and continuously changing markets and business opportunities. Therefore, in order to be successful, it is essential for an enterprise to make right business decisions and to make them fast. Business decisions are taken on the basis of analyses of the past and current condition of an enterprise as well as market analysis and predictions for the future. To this end, various business operational data collected during the lifetime of an enterprise are analyzed. Typically, operational data are stored within an enterprise in multiple data storage systems (subsystems) that are geographically distributed, are heterogeneous and autonomous. The heterogeneity of data storage systems means that they come from different software vendors; they are implemented in different technologies (e.g., C, C++, .Net, Java, 4th generation programming languages); they offer different functionality (e.g., fully-functional databases, ODBC data sources, spreadsheets, Web pages, text files); they use different data models (e.g., relational, object-relational, object-oriented, semistructured) and different storage techniques; they are installed on different operating systems and use different communication protocols. The autonomy of data storage systems implies that they are often independent from each other and remain under separate, independent control; that is, a local system’s administrator can decide which local data are to be accessible from the outside of the system. The management of an enterprise requires a comprehensive view of all aspects of a company, thus it requires access to all possible data of interest stored in multiple subsystems. However, an analysis of data stored in distributed, heterogeneous, and autonomous subsystems is likely to be difficult, slow, and inefficient. Therefore, the ability to integrate information from multiple data sources is crucial for today’s busi- ness. x Data.Warehouse.and.OLAP One of the most important approaches to the integration of data sources is based on a data warehouse architecture. In this architecture, data coming from multiple external data sources (EDSs) are extracted, filtered, merged, and stored in a central repository, called a data warehouse (DW). Data are also enriched by historical and summary information. From a technological point of view, a data warehouse is a huge database from several hundred GB to several dozens of TB. Thanks to this architecture, users operate on a local, homogeneous, and centralized data repository that reduces access time to data. Moreover, a data warehouse is independent of EDSs that may be temporarily unavailable. However, a data warehouse has to be kept up to date with respect to the content of EDSs, by being periodically refreshed. The content of a DW is analyzed by the so called online analytical processing (OLAP) applications for the purpose of discovering trends, patterns of behavior, and anomalies as well as for finding hidden dependencies between data. The outcomes of these analyses are then the basis for making various business decisions. The market analysis of demand and supply is one of important steps in taking strategic decisions in a company. Likewise, an analysis of the development and course of diseases as well as the impact of different medications on the course of illnesses is indispensable in order to choose the most efficient methods of treatment. Many other applications include, among others: stock market, banking, insurance, energy management, and science. Data warehouses and OLAP applications are core com- ponents of decision support systems. Since the late 1980s, when the data warehouse technology developed, most of large and midsize companies worldwide have been building their own DWs into their information system infrastructures and have been successfully applying this tech- nology in business. Major commercially available database management systems (e.g., Oracle9i/10g, IBM DB2 UDB, Sybase IQ, Computer Associates CleverPath OLAP, NCR Teradata Database, Hyperion Essbase OLAP Server, MS SQL Server, SAP Business Warehouse, SAS Enterprise BI Server) include the DW and OLAP technologies in their database engines. However, despite some substantial achieve- ments in this technology, it still is and will be a very active research and technologi- cal field. The OLAPReport (2004) estimates that the total worldwide OLAP market constantly grew from less than $1 billion in 1994 to less than $5 billion in 2005, and it will grow up to $6 billion in 2007. The META Group’s (currently Gartner) survey estimates that the OLAP market will be worth almost $10 billion in 2008 (EDWMarket, 2004). For these reasons, it is important to understand the core tech- nological issues and challenges in the field of DW and OLAP.
Description: