Data Quality Data Quality Jack E. Olson Understanding the concepts of accurate data is fundamental to improving the ways we collect and use data. Acquisitions Editor Lothlórien Homet Publishing Services Manager Edward Wade Editorial Assistant Corina Derman Project Management Elisabeth Beller Cover Design Frances Baca Cover Image EyeWire of Getty Images Text Design Frances Baca Technical Illustration Dartmouth Publishing, Inc. Composition Nancy Logan Copyeditor Daril Bentley Proofreader Jennifer McClain Indexer Steve Rath Interior Printer The Maple-Vail Book Manufacturing Group Cover Printer Phoenix Color Corporation Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. Morgan Kaufmann Publishers An imprint of Elsevier Science 340 Pine Street, Sixth Floor San Francisco, CA 94104-3205 www.mkp.com © 2003 by Elsevier Science (USA) All rights reserved. Printed in the United States of America 07 06 05 04 03 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, or otherwise—without the prior written permission of the publisher. Library of Congress Control Number: 2002112508 ISBN: 1-55860-891-5 This book is printed on acid-free paper. toJean Colin White President, Intelligent Business Strategies Over the past thirty years I have helped organizations in many different coun- tries design and deploy a wide range of IT applications. Throughout this time, the topic of data accuracy and quality has been ever present during both the development and the operation of these applications. In many instances, how- ever, even though the IT development group and business managers recog- nized the need for improved data quality, time pressures to get the project in production prevented teams from addressing the data quality issue in more than a superficial manner. Lack of attention to data quality and accuracy in enterprise systems can have downstream ramifications. I remember working with an overseas bank on a data warehousing system a few years ago. The bank was struggling with delivering consistent business intelligence to the bank’s business users. On one occasion, a business manager discovered that financial summary data in the data warehouse was wrong by many millions of dollars. I visited the bank several months later, and I was told that the reason for the error had still not been found. Trying to analyze a data quality problem caused by upstream applications is time consuming and expensive. The problem must be corrected at the source before the error is replicated to other applications. Lack of accuracy in data not only erodes end-user confidence in IT appli- cations, it can also have a significant financial impact on the business. As I write this, I am reading a report from the Data Warehousing Institute on data quality that estimates that poor-quality customer data costs U.S. businesses a staggering $611 billion a year in postage, printing, and staff overhead. The same report states that nearly 50% of the companies surveyed have no plans for managing or improving data quality. At the same time, almost half the sur- vey respondents think the quality of their data is worse than everyone thinks. These results clearly demonstrate a gap between perception and reality regarding the quality of data in many corporations. The report goes on to state that “although some companies understand the importance of high- quality data, most are oblivious to the true business impact of defective or substandard data.” To solve this problem, companies need to become more educated about the importance of both data quality and techniques to improve it. This is especially important given that the world economy is becoming more and more information driven. Companies with access to timely and accurate information have a significant business advantage over their competitors. I must admit that when I was approached to write the foreword for this book, I had some reservations. As a practitioner, I have found that books on data quality are often very theoretical and involve esoteric concepts that are difficult to relate to real-world applications. In truth, I have a suspicion this may be one reason why less attention is given to data quality than in fact it deserves. We need education that enables designers and developers to apply data quality concepts and techniques easily and rapidly to application devel- opment projects. When I read this book I was pleasantly surprised, and my concerns in this regard vanished. The author, Jack Olson, has a background that enables him to address the topic of data quality and accuracy from a practical viewpoint. As he states in the preface, “Much of the literature on data quality discusses what I refer to as the outside-in approach. This book covers the inside-out approach. To make the inside-out approach work, you need good analytical tools and a talented and experienced staff of data analysts . . . . You also need a thorough understanding of what the term inaccurate data means.” The bottom line for me is that the book presents techniques that you can immediately apply to your applications projects. I hope that you will find the book as useful as I did and that the ideas presented will help you improve the quality and accuracy of the data in your organization.
Description: