ebook img

Business Intelligence Guidebook: From Data Integration to Analytics PDF

508 Pages·2014·34.654 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Business Intelligence Guidebook: From Data Integration to Analytics

Business Intelligence Guidebook From Data Integration to Analytics Rick Sherman AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier Acquiring Editor: Steve Elliot Editorial Project Manager: Lindsay Lawrence Project Manager: Punithavathy Govindaradjane Designer: Matthew Limbert Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA, 02451, USA Copyright © 2015 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as amatter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Sherman, Rick. Business intelligence guidebook : from data integration to analytics/Rick Sherman. pages cm ISBN 978-0-12-411461-6 1. Business intelligence. I. Title. HD38.7.S52 2014 658.4’72–dc23 2014031205 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-12-411461-6 For information on all MK publications visit our website at www.mkp.com Foreword There are many books explaining the need for BI, its methodology, and the myriad designs and imple- mentation pathways that can be taken. The missing book is one that covers all of these from start to finish in a complete, detailed, and comprehensive fashion. From justifying the project, gathering requirements, developing the architectural framework, designing the proper approach for BI data mod- els, integrating the data, generating advanced analytics, dealing with “shadow systems,” understanding and dealing with organizational relationships, managing the full project life cycle, and finally creating centers of excellence—this book covers the entire gambit of creating a sustainable BI environment. It is the one book you will need to create and maintain a world-class data warehouse environment. Rick’s deep understanding of technical implementations is only matched by his understanding of the history behind many of the decision points in the development of the BI components. This history will help you determine the best deployment options for your specific situation—so invaluable in today’s confusing and mixed messages BI world! I highly recommend this book to anyone just starting out in BI, who has a legacy environment that needs renovating or just wants to understand the entire implementation picture from start to finish. Rick’s mastery of all the critical implementation activities means you are receiving the best advice for creating a world-class BI environment that will last for long haul. Nicely done, Rick. Claudia Imhoff President of Intelligent Solutions, Inc. and Founder of The Boulder Business Intelligence Brain Trust. xvii How to Use This Book I wrote this book to fill in the gap between books that focus on the concepts of business intelligence (BI) and the nitty gritty of vendor tool training. It does not just provide a foundation, it shows you how to apply that foundation in order to actually get your work done. After this book, you should be ready to learn specific tools. As you do that, you’ll see how the concepts and the step-by-step instructions mesh together. See www.BIguidebook.com for companion material such as templates, examples, vendor links, and updated research. There, you will be able to subscribe to an email list and receive notices of updates, additions, and other occasional news that relates to the book. I will also use my blog at www.datadoghouse.com to post updates. If you are a professor choosing this book as a text book, contact us at BIguidebook@athena- solutions.com for a syllabus. Note: You may notice that this book does not use the word “user” any more than absolutely neces- sary. I explain my reasons in Chapter 17: People, Process and Politics, but, briefly, it is because we are building BI solutions for people. This is an important mind-set, seeing as BI projects are about people as much as they are about technology. For this book and other related titles, see the publisher’s website: http://booksite.elsevier.com/ 9780124114616. CHAPTER SUMMARIES BI projects require the participation of both business and IT groups. The simplistic view is that business people are the customers and IT people deliver the solution. The reality is that business people need to participate in the entire process. Below you will find summaries and guidance on how business and IT people can use this book. Chapter 1: The Business Demand for Data, Information, and Analytics sets the stage, and is important background for all audiences. It explains how the deluge of data and its accompanying need for analysis makes BI critical for the success of today’s enterprise. There is a big difference between raw data and actionable information. While there are attempts to circumvent BI with operational sys- tems, there really is no good substitute for true BI. Chapter 2: Justifying BI helps the BI team make both the business and technical case to determine the need, identify the benefits, and, most importantly, set expectations. Identifying risks and an organi- zation’s readiness is critical to determining realistic expectations. This chapter covers determining the scope, plan, budget, and return on investment. Chapter 3: Defining Requirements—Business, Data, and Quality discusses the process of creat- ing the foundation of a successful BI solution by documenting what you are planning to build. The development team then uses these requirements to design, develop, and deploy BI systems. This is one of the most people-oriented processes in a project, making it especially tricky. Use this chapter to understand the roles and workflow, and how to conduct the interviews that are the basis for the require- ments you will be documenting. Chapter 4: Architecture Introduction helps everyone understand the importance of a well- architected foundation. The architecture sets your directions and goals. It is a set of guiding principles, xix xx How To Use THis Book but is flexible enough to allow for incremental growth. One of the key concepts discussed in this chapter is that of the accidental architecture, which is what happens when there is no plan. Chapter 5: Information Architecture introduces the framework that defines the business con- text—“what, who, where, and why”—necessary for building successful BI solutions. An information architecture helps tame the deluge of data with a combination of processes, standards, people, and tools that establish information as a corporate asset. Using a data integration framework as a blueprint, you can transform data into consistent, quality, timely information for your business people to use in mea- suring, monitoring, and managing the enterprise. This gives your enterprise a better overall view of its customers and helps consolidate critical data. Chapter 6: Data Architecture explains that data architecture is a blueprint that helps align your company’s data with its business strategies. This is where the book gets more technical, as it delves into the history of data architecture, the different choices available, and details on the analytical data archi- tecture. It covers types of workflows, then explains operational data stores. It introduces the hybrid dimensional-normalized model. Chapter 7: Technology and Product Architecture gets into the nitty gritty of the technology and product architectures and what you should know when you are evaluating them. It does not name prod- ucts and vendors, as these can change frequently as companies merge and are acquired. If you want names, see the book Web site www.BIguidebook.com. Chapter 8: Foundational Data Modeling, Chapter 9: Dimensional Modeling, and Chapter 10: Advanced Dimensional Modeling are aimed at technical members of the BI team who will be involved with creating data models, which are the cornerstone to building BI applications. Chapter 8: Foundational Data Modeling describes the different levels of models: conceptual, logical and physical; the workflow, and where they are used. It explains entity relational (ER) modeling in depth,and covers normalization, the formal data modeling approach for validating a model. Chapter 9: Dimensional Modeling compares ER versus dimensional modeling and provides details on the latter that is better suited to BI. Dimensional modeling uses facts, dimensions, and attri- butes, which can be organized in different ways, called schemas. The chapter covers dimensional mod- eling concepts such as date, time, role-playing, and degenerative dimensions, as well as event and consolidated fact tables. Chapter 10: BI Dimensional Modeling gives you a strong understanding of how to develop a dimensional model and how dimensional modeling fits in your enterprise. It covers hierarchies, slowly changing dimensions, rapidly changing dimensions, causal dimensions, multivalue dimensions, and junk dimensions. The chapter also takes a closer look at snowflakes, as well as concepts such as value- band reporting, heterogeneous products, hot swappable dimensions. Chapter 11: Data Integration Design and Development introduces data integration (DI), where the bulk of the work in a BI project lies. The best approaches are holistic, incremental, and iterative. The DI architecture represents the workflow of source data as it is transformed to become actionable information. The chapter covers the DI design steps for creating each stage’s process model and deter- mining the design specifications. Standards are important for successful DI and it covers how to develop and apply them. Because historical data is often needed, it explains what to watch out for when loading it. It wraps up with discussions on prototyping and testing. Chapter 12: Data Integration Processes takes the DI discussion to the next level, covering why it is best to use a DI tool (as opposed to hand-coding) and how to choose the best fit. These tools cover many services, including access and delivery, data profiling, data transformation, data quality, process How To Use THis Book xxi management, operations management, and data transport; the data ingestion services change data capture, slowly changing dimensions, and reference look-ups. Chapter 13: BI Applications talks about specifying the content of the BI application you are build- ing, and the idea of using personas to make sure it resonates with your intended audience. It guides you on designing the layout as well as the function and form of the data. Matching the type of visualizations you create to the analysis that will be done makes the application much more effective. Chapter 14: BI Design and Development covers the next step, which is to design the BI applica- tion’s visual layout and how it interacts with its users. This includes creating and adhering to standards for the user interface and standards for how information is accessed from the perspectives of privacy and security. You will learn the different methods for working on the design of the components, proto- typing, developing the application, and then testing. Chapter 15: Advanced Analytics shows how you can use analytics not just to learn about what has happened, but also to gauge the future and act on predictions. Predictive analytics includes the pro- cesses by which you use analytics for forecasting and modeling. Analytical sandboxes and hubs are two self-service data environments that help business people manage their own analytical needs, although IT is needed to create the data backbone. The chapter addresses the challenges of Big Data analytics, including scoping, architecting, and staffing a program. Chapter 16: Data Shadow Systems sheds light on these frequently-seen departmental systems (usually spreadsheets) that business groups create and use to gather and analyze data on their own when they do not want to work with IT or cannot wait for them. Data shadow systems create silos, resulting in inconsistent data across the enterprise. The BI team needs to identify them and either replace them or incorporate them into the overall BI program. Chapter 17: People, Process, and Politics delves into the stickiest of BI project issues. Technol- ogy is easy; it is people that are hard. This chapter lays out the relationship between the business and IT groups, discussing who does what, how they interact, and how to build the project management and development teams. It covers training for both business and IT people and provides a firm foundation in data governance—a people-centric program that is critical for transforming data into actionable information. Chapter 18: Project Management stresses the need for an enterprise-wide BI program, which helps the BI project manager better plan and manage. The assessment is another arrow in the project manager’s quiver, helping to create a project plan that meets its requirements. The chapter gets into details on the work breakdown structure and project methodology choices. It guides you through all phases of the project and its schedule. Chapter 19: Centers of Excellence discusses these organizational teams that address the problem of disconnected application and data silos across an enterprise. The business intelligence center of excellence (BI COE) coordinates and oversees BI activities including resources and expertise. The data integration COE (DI COE) team focuses specifically on data integration, establishing its scope, defin- ing its architecture and vision, and helping to implement that vision. Acknowledgments Although this book is about learning, I am the one who has learned so much, starting with the folks at Project Software & Development, Inc., who took a chance on an industrial engineer whose only work experience was with companies that made grinding wheels and bubble gum. My then manager said he could train an engineer to program but he could not train a programmer to be an engineer. Digital Equipment Corporation got me into data warehousing, with a 150-Gb warehouse, one of the largest nongovernment data warehouses in the world at the time—now it would fit on my smartphone. The consulting I have done since then, including with CONNECT: The Knowledge Network (Maureen Clarry), PriceWaterhouseCoopers Consulting and through Athena IT Solutions has given me the privi- lege of working with customers and fellow consultants who have taught me so much. I have been for- tunate to work with people who would later become thought leaders in industry research firms. The people who have published my articles, podcasts, webinars, and videos over the years have been a great source of encouragement. Mary Jo Nott was the first, way back when DM Review was a thick, printed monthly magazine. Eric Kavanaugh, Hannah Smalltree, and Craig Stedman have been and continue to be strong supporters and I appreciate their enthusiasm. The team at Morgan Kaufmann had great patience with my loose interpretation of deadlines. Steve Elliot and Lindsay Lawrence, particularly, somehow managed to keep the book on track despite me. A special thanks to my editor and marketing communications expert, Andrea Harris, who has trans- formed this book, and all my other writing, from the prose of an engineering term paper to a more readable form for people who are not nerds (like me!). Andrea also deserves far more thanks from me, as she has been married to me for quite some time. And thanks to my sons, Jake and Josh, for humoring me and pretending that they are interested in what I do. xxiii CHAPTER 1 THE BUSINESS DEMAND FOR DATA, INFORMATION, AND ANALYTICS INFORMATION IN THIS CHAPTER: • The data and information deluge • The analytics deluge • Data versus actionable information • Data capture versus information analysis • The five Cs of data • Common terminology JUST ONE WORD: DATA “I just want to say one word to you. Just one word… Are you listening? … Plastics. There’s a great future in plastics.” Mr. McGuire in the 1967 movie The Graduate. The Mr. McGuires of the world are no longer advising newly-minted graduates to get into plastics. But perhaps they should be recommending data. In today’s digital world data is the key, the ticket, and the Holy Grail all rolled into one. I do not just mean it’s growing in importance as a profession, although it is a great field to get into, and I’m thrilled that my sons Jake and Josh are pursuing careers in data and technology. Data is where the dollars are when it comes to company budgets. Every few years there is another report showing that business intelligence (BI) is at or near the top of the chief information officer’s (CIO) list of priorities. Enterprises today are driven by data, or, to be more precise, information that is gleaned from data. It sheds light on what is unknown, it reduces uncertainty, and it turns decision-making from an art to a science. But whether it’s Big Data or just plain old data, it requires a lot of work before it is actually some- thing useful. You would not want to eat a cup of flour, but baked into a cake with butter, eggs, and sugar for the right amount of time at the right temperature it is transformed into something delicious. Like- wise, raw data is unpalatable to the business person who needs it to make decisions. It is inconsistent, incomplete, outdated, unformatted, and riddled with errors. Raw data needs integration, design, model- ing, architecting, and other work before it can be transformed into consumable information. This is where you need data integration to unify and massage the data, data warehousing to store and stage it, and BI to present it to decision-makers in an understandable way. It can be a long and Business Intelligence Guidebook. http://dx.doi.org/10.1016/B978-0-12-411461-6.00001-0 3 Copyright © 2015 Elsevier Inc. All rights reserved. 4 CHAPTER 1 THE BUSINESS DEMAND complicated process, but there is a path; there are guidelines and best practices. As with many things that are hard to do, there are promised shortcuts and “silver bullets” that you need to learn to recognize before they trip you up. It will take a lot more than just reading this book to make your project a success, but my hope is that it will help set you on the right path. WELCOME TO THE DATA DELUGE In the business world, knowledge is not just power. It is the lifeblood of a thriving enterprise. Knowl- edge comes from information, and that, in turn, comes from data. It is up to a BI team to gather and manage the data to empower the company’s business groups with the information they need to gain knowledge—knowledge that helps them make informed decisions about every step the company takes. Enterprises need this information to understand their operations, customers, competitors, suppliers, partners, employees, and stockholders. They need to learn about what is happening in the business, analyze their operations, react to internal and external pressures, and make decisions that will help them manage costs, grow revenues, and increase sales and profits. Forrester Research sums it up perfectly: “Data is the raw material of everything firms do, but too many have been treating it like waste mate- rial—something to deal with, something to report on, something that grows like bacteria in a petri dish. No more! Some say that data is the new oil—but we think that comparing data to oil is too limiting. Data is the new sun: it’s limitless and touches everything firms do. Data must flow fast and rich for your organization to serve customers better than your competitors can. Firms must invest heavily in building a next-generation customer data management capability to grow revenue and profits in the age of the customer. Data is an asset that even CFOs will realize should have a line on the balance sheet right alongside property, plant, and equipment” [1]. It can be a problem, however, when there is more data than an enterprise can handle. They collect massive amounts of data every day internally and externally as they interact with customers, partners, and suppliers. They research and track information on their competitors and the marketplace. They put tracking codes on their websites so they can learn exactly how many visitors they get and where they came from. They store and track information required by government regulations and industry initia- tives. Now there is the Internet of Things (IoT), with sensors embedded in physical objects such as pacemakers, thermostats, and dog collars where they collect data. It is a deluge of data (Figure 1.1). DATA VOLUME, VARIETY, AND VELOCITY It is not only that enterprises accumulate data in ever-increasing volumes, the variety and velocity of data is also increasing. Although the emerging “Big Data” databases can cause an enterprise’s ability to gather data to explode, the volume, velocity, and variety are all expanding no matter how “big” or “small” the data is. Volume—According to many experts, 90% of the data in the world today was created in the last two years alone. When you hear that statistic you might think that it is coming from all the chatter on social media, but data is being generated by all manner of activities. For just one example, think about the emergence of radio frequency identification (RFID) to track products from manufacturing to purchase. It is a huge category of data that simply did not exist before. Although not all of the data gathered is significant for an enterprise, it still leaves a massive amount of data with which to deal. WELCOME TO THE DATA DELUgE 5 Velocity—Much of the data now is time sensitive, and there is greater pressure to decrease the time between when it is captured and when it is used for reporting. We now depend on the speed of some of this data. It is extremely helpful to receive an immediate notification from your bank, for example, when a fraudulent transaction is detected, enabling you to cancel your credit card immediately. Busi- nesses across industry sectors are using current data when interacting with their customers, prospects, suppliers, partners, employees, and other stakeholders. Variety—The sources of data continue to expand. Receiving data from disparate sources further complicates things. Unstructured data, such as audio, video, and social media, and semistructured data FIGURE 1.1 Too much information. www.CartoonStock.com.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.