It’s All Analytics! The Foundations of AI, Big Data, and Data Science Landscape for Professionals in Healthcare, Business, and Government It’s All Analytics! ! It s All Analytics ’ The Foundations of AI, Big Data, and Data Science Landscape for Professionals The Foundations of AI, Big Data, and in Healthcare, Business, and Government Data Science Landscape for Professionals in Healthcare, Business, and Government Scott Burk, Ph.D. GaSryco Dt.t MBuinrekr, ,P Phh.D.D.. Gary D. Miner, Ph.D. First edition published 2020 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN © 2021 Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, LLC Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright hold- ers if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho- tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. ISBN: 978-0-367-35968-3 (hbk) ISBN: 978-0-367-49379-0 (pbk) ISBN: 978-0-429-34398-8 (ebk) Typeset in ITC Garamond STD by Deanta Global Publishing Services, Chennai, India Contents Foreword Number One ...........................................................................xv Foreword Number Two .........................................................................xvii Foreword Number Three .......................................................................xix Preface ......................................................................................................xxi Endorsements .......................................................................................xxvii Authors ...................................................................................................xxxi 1 You Need This Book ....................................................................1 Preamble ......................................................................................................1 The Hip, the Hype, the Fears, the Intrigue, and the Reality: ....................2 Hype, Fear, and Intrigue No 1: ...............................................................2 Hype, Fear, and Intrigue No 2: ...............................................................2 Hype, Fear, and Intrigue No 3: ...............................................................3 Professionals Need This Book ....................................................................6 Introduction .............................................................................................6 Technology Keeps Raging, but We Need More Than Technology to Be Successful ......................................................................................6 Data and Analytics Explosion ...............................................................10 A Bright Side of the Revolution ................................................................14 Where Is Someone to Turn for Information? .......................................17 The Problem, Too Many Self-Interests: The Need for an Objective View .......................................................................................................25 There Are Many Other Professional Stories That Are Concerned about Whether Analytics Is Important; Here Are a Few More Examples ...............................................................................................29 What This Book Is Not: ............................................................................33 Why This Book? ........................................................................................33 Sure, Business, but Why Healthcare, Public Policy, and Business? .........34 v vi ◾ C ontents How This Book Is Organized ..................................................................39 References .................................................................................................41 Resources for the Avid Learner .................................................................44 2 Building a Successful Program .................................................45 Preamble ....................................................................................................45 The Hip, the Hype, the Fears, the Intrigue, and the Reality ...................45 The Hype ...............................................................................................45 Reality ....................................................................................................45 The Hype ...............................................................................................46 Reality ....................................................................................................46 The Hype ...............................................................................................46 Reality ....................................................................................................46 Introduction ...............................................................................................46 Culture and Organization – Gaps and Limitations ..................................47 Gaps in Analytics Programs .................................................................48 Characterizing Common Problems .......................................................51 Don’t Confuse Organizational Gaps for Project Gaps .............................55 Justifying a Data-Driven Organization .....................................................56 Motivations ................................................................................................56 Critical Business Events ........................................................................57 Analytics as a Winning Strategy ...............................................................57 Part I – New Programs and Technologies ............................................57 Part II – More Traditional Methods of Justifcation ..............................58 Positive Return of Investment ...............................................................58 Scale .......................................................................................................59 Productivity ...........................................................................................59 Reliability ...............................................................................................59 Sustainability .........................................................................................60 Designing the Organization for Program Success ...................................61 Motivation / Communication and Commitment ......................................62 Establish Clear Business Outcomes ......................................................62 Organization Structure and Design ..........................................................63 The Organization and Its Goals – Alignment ......................................63 Organizational Structure ...........................................................................64 Centralized Analytics ................................................................................64 Decentralized or Embedded Analytics .....................................................66 Multidisciplinary Roles for Analytics ........................................................67 Data Scientists .......................................................................................68 Contents ◾ vii Data Engineers ......................................................................................68 Citizen Data Scientists ...........................................................................68 Developers .............................................................................................69 Business Experts ...................................................................................69 Business Leaders ...................................................................................69 Project Managers ...................................................................................69 Analytics Oversight Committee (AOC) and Governance Committee (Board Report) ..........................................................................................71 Postscript ...................................................................................................71 References .................................................................................................72 Resources for the Avid Learner .................................................................72 3 Some Fundamentals – Process, Data, and Models .....................75 Preamble ....................................................................................................75 The Hip, the Hype, the Fears, the Intrigue, and the Reality ...................75 The Hype ...............................................................................................75 Reality ....................................................................................................76 Introduction ...............................................................................................76 Framework for Analytics – Some Fundamentals .....................................76 Processes Drive Data ................................................................................77 Models, Methods, and Algorithms ............................................................80 Models, Models, Models ........................................................................80 Statistical Models .......................................................................................81 Rules of Thumb, Heuristic Models ...........................................................82 A Note on Cognition .................................................................................83 Algorithms, Algorithms, Algorithms .........................................................84 Distinction between Methods That Generate Models .............................85 There Is No Free Lunch ............................................................................86 A Process Methodology for Analytics ......................................................89 CRISP-DM: The Six Phases: ..................................................................90 Last Considerations ...................................................................................92 Data Architecture ..................................................................................92 Analytics Architecture ...........................................................................92 Postscript ...................................................................................................93 References .................................................................................................93 Resources for the Avid Learner .................................................................94 4 It’s All Analytics! .......................................................................95 Preamble ....................................................................................................95 Overview of Analytics – It’s All Analytics ...............................................95 viii ◾ C ontents Analytics of Every Form and Analytics Everywhere ...............................98 Introduction ...........................................................................................98 Analytics Mega List ...............................................................................98 Breaking it Down, Categorizing Analytics .............................................100 Introduction .........................................................................................100 Gartner’s Classifcation ........................................................................100 Descriptive Analytics ...........................................................................101 Diagnostic Analytics ............................................................................102 Predictive Analytics .............................................................................103 Prescriptive Analytics ..........................................................................104 Process Optimization ..........................................................................105 Some Additional Thoughts on Classifying Analytics .........................106 Fundamentals of Analytics – Data Basics ..............................................107 Introduction .........................................................................................107 Four Scales of Measurement ...............................................................107 Data Formats .......................................................................................108 Data Stores...........................................................................................109 Provisioning Data for Analytics ..........................................................109 Data Sourcing ......................................................................................111 Data Quality Assessment and Remediation .......................................111 Integrate and Repeat ........................................................................... 114 Exploratory Data Analysis (EDA)........................................................ 115 Data Transformations ..........................................................................116 Data Reduction ....................................................................................116 Postscript ................................................................................................. 117 References ............................................................................................... 117 Resources for the Avid Learner ...............................................................118 5 What Are Business Intelligence (BI) and Visual BI? ...............119 Preamble .................................................................................................. 119 Introduction ............................................................................................. 119 Background and Chronology .................................................................122 Basic (Digital) Reporting .....................................................................122 A View inside the Data Warehouse and Interactive BI .....................123 Beyond the Data Warehouse and Enhanced Interactive Visual BI and More .............................................................................................125 Business Activity Monitoring an Alert-Based BI, Version 4.0 ............125 Strengths and Weaknesses of BI ............................................................126 Transparency and Single Version of the Truth ...................................126 Contents ◾ ix Summary .................................................................................................135 Postscript .................................................................................................136 References ...............................................................................................136 Resources for the Avid Learner ...............................................................137 6 What Are Machine Learning and Data Mining? .......................139 Preamble ..................................................................................................139 Overview of Machine Learning and Data Mining .................................139 Is There a Difference? .........................................................................139 A (Brief) Historical Perspective of Data Mining and Machine Learning ...............................................................................................140 What Types of Analytics Are Covered by Machine Learning? ..............143 An Overview of Problem Types and Common Ground ....................144 The BIG Three! ...................................................................................144 Regression ............................................................................................144 Classifcation ........................................................................................ 145 Natural Language Processing (NLP) ................................................... 145 Some (of Many) Additional Problem Classes .....................................146 Association, Rules and Recommender Systems .................................147 Clustering .............................................................................................148 Some Comments on Model Types ......................................................148 Some Popular Machine Learning Algorithm Classes .........................149 Trees 1.0: Classifcation and Regression Trees or Partition Trees .....150 Trees 2.0: Advanced Trees: Boosted Trees and Random Forests, for Classifcation and Regression .................................................... 151 Regression Model Trees and Cubist Models ................................... 151 Logistic and Constrained/Penalized (LASSO, Ridge, Elastic Net) Regression ........................................................................................ 152 Multivariate Adaptive Regression Splines ....................................... 153 Support Vector Machines (SVMs) .................................................... 153 Neural Networks in 1000 Flavors ................................................... 153 K-Means and Other Clustering Algorithms ....................................154 Directed Acyclic Graph Analytics (Optimization, Social Networks) ........................................................................................154 Association Rules ............................................................................. 155 AutoML (Automated Machine Learning) ........................................ 155 Transparency and Processing Time of Algorithms ................................156 Model Use and Deployment ...................................................................156 Major Components of the Machine Learning Process ...........................156