Interpretable Machine Learning A Guide for Making Black Box Models Explainable Christoph Molnar 2022-03-04 Interpretable Machine Learning A Guide for Making Black Box Models Explainable © 2022 Christoph Molnar, Germany, Munich christophmolnar.com For more information about permission to reproduce selections from this book, write to [email protected]. 2022, Second Edition ISBN 9798411463330 (paperback) This book is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Christoph Molnar, c/o Mucbook Clubhouse, Westendstraße 78, 80339 München, Germany Contents Contents Preface by the Author ix 1 Introduction 1 1.1 Story Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 What Is Machine Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Interpretability 13 2.1 Importance of Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Taxonomy of Interpretability Methods . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Scope of Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Evaluation of Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.5 Properties of Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6 Human-friendly Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Datasets 31 3.1 Bike Rentals (Regression) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 YouTube Spam Comments (Text Classification) . . . . . . . . . . . . . . . . . 32 3.3 Risk Factors for Cervical Cancer (Classification) . . . . . . . . . . . . . . . . 33 4 Interpretable Models 35 4.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 GLM, GAM and more . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.5 Decision Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.6 RuleFit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.7 Other Interpretable Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5 Model-Agnostic Methods 109 6 Example-Based Explanations 113 7 Global Model-Agnostic Methods 115 7.1 Partial Dependence Plot (PDP) . . . . . . . . . . . . . . . . . . . . . . . . . . 116 7.2 Accumulated Local Effects (ALE) Plot . . . . . . . . . . . . . . . . . . . . . . 122 7.3 Feature Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.4 Functional Decompositon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 v vi Contents 7.5 Permutation Feature Importance . . . . . . . . . . . . . . . . . . . . . . . . . 157 7.6 Global Surrogate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 7.7 Prototypes and Criticisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8 Local Model-Agnostic Methods 179 8.1 Individual Conditional Expectation (ICE) . . . . . . . . . . . . . . . . . . . . 180 8.2 Local Surrogate (LIME) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 8.3 Counterfactual Explanations . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 8.4 Scoped Rules (Anchors) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 8.5 Shapley Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 8.6 SHAP (SHapley Additive exPlanations) . . . . . . . . . . . . . . . . . . . . . 227 9 Neural Network Interpretation 241 9.1 Learned Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 9.2 Pixel Attribution (Saliency Maps) . . . . . . . . . . . . . . . . . . . . . . . . 254 9.3 Detecting Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 9.4 Adversarial Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 9.5 Influential Instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 10 A Look into the Crystal Ball 295 10.1 The Future of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 296 10.2 The Future of Interpretability . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 11 Contribute to the Book 301 12 Citing this Book 303 13 Translations 305 14 Acknowledgements 307 Contents Summary Machine learning has great potential for improving products, processes and research. But computers usually do not explain their predictions which is a barrier to the adoption of machine learning. This book is about making machine learning models and their decisions interpretable. After exploring the concepts of interpretability, you will learn about simple, interpretable models such as decision trees, decision rules and linear regression. The focus of the book is onmodel-agnosticmethodsforinterpreting black box modelssuchasfeatureimportance and accumulated local effects, and explaining individual predictions with Shapley values and LIME. In addition, the book presents methods specific to deep neural networks. All interpretation methods are explained in depth and discussed critically. How do they work under the hood? What are their strengths and weaknesses? How can their outputs be interpreted? Thisbookwillenableyoutoselectandcorrectlyapplytheinterpretationmethod that is most suitable for your machine learning project. Reading the book is recommended for machine learning practitioners, data scientists, statisticians, and anyone else interested in making machine learning models interpretable. About me: My name is Christoph Molnar, I’m a statistician and a machine learner. My goal is to make machine learning interpretable. Follow me on Twitter! @ChristophMolnar1 Cover by @YvonneDoinel2 This book is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License3. 1https://twitter.com/ChristophMolnar 2https://twitter.com/YvonneDoinel 3http://creativecommons.org/licenses/by-nc-sa/4.0/ vii Preface by the Author This book started as a side project when I was working as a statistician in clinical research. I worked four days a week, and on my “day off” I worked on side projects. Eventually, interpretable machine learning became one of my side projects. At first I had no intention of writing a book. Instead, I was simply interested in finding out more about interpretable machine learning and was looking for good resources to learn from. Given the success of machine learning and the importance of interpretability, I expected that there would be tons of books and tutorials on this topic. But I only found the relevant research papers and a few blog posts scattered around the internet, but nothing with a good overview. No books, no tutorials, no overview papers, nothing. This gap inspired me to start writing this book. I ended up writing the book I wished was available when I began my study of interpretable machine learning. My intention with this book was twofold: to learn for myself and to share this new knowledge with others. Ireceivedmybachelor’sandmaster’sdegreeinstatisticsattheLMUMunich,Germany. Most ofmyknowledgeaboutmachinelearningwasself-taughtthroughonlinecourses,competitions, side projects and professional activities. My statistical background was an excellent basis for gettingintomachinelearning,andespeciallyforinterpretability. Instatistics,amajorfocusis on building interpretable regression models. After I finished my master’s degree in statistics, I decided not to pursue a PhD, because I did not enjoy writing my master’s thesis. Writing just stressed me out too much. So I took jobs as data scientist in a Fintech start-up and as statisticianinclinicalresearch. AfterthesethreeyearsinindustryIstartedwritingthisbook and a few months later I started a PhD in interpretable machine learning. By starting this book, I regained the joy of writing and it helped me to develop a passion for research. This book covers many techniques of interpretable machine learning. In the first chapters, I introduce the concept of interpretability and motivate why interpretability is necessary. There are even some short stories! The book discusses the different properties of explana- tions and what humans think is a good explanation. Then we will discuss machine learning models that are inherently interpretable, for example regression models and decision trees. The main focus of this book is on model-agnostic interpretability methods. Model-agnostic meansthatthesemethodscanbeappliedtoanymachinelearningmodelandareappliedafter the model has been trained. The independence of the model makes model-agnostic methods very flexible and powerful. Some techniques explain how individual predictions were made, like local interpretable model-agnostic explanations (LIME) and Shapley values. Other tech- niques describe the average behavior of the model across a dataset. Here we learn about the partialdependenceplot,accumulatedlocaleffects,permutationfeatureimportanceandmany other methods. A special category is example-based methods that produce data points as explanations. Counterfactual explanations, prototypes, influential instances and adversarial ix x Preface by the Author examples are example-based methods, which are discussed in this book. The book concludes with some reflections on what the future of interpretable machine learning might look like. You do not have to read the book from cover to cover, you can jump back and forth and concentrate on the techniques that interest you most. I only recommend that you start with theintroductionandthechapteroninterpretability. Mostchaptersfollowasimilarstructure and focus on one interpretation method. The first paragraph summarizes the method. Then I try to explain the method intuitively without relying on mathematical formulas. Then we look at the theory of the method to get a deep understanding of how it works. You will not be spared here, because the theory will contain formulas. I believe that a new method is best understood using examples. Therefore, each method is applied to real data. Some people say that statisticians are very critical people. For me, this is true, because each chapter contains critical discussions about advantages and disadvantages of the respective interpretation method. This book is not an advertisement for the methods, but it should help you decide whether a method works well for your application or not. In the last section of each chapter, available software implementations are discussed. Machine learning has received great attention from many people in research and industry. Sometimes machine learning is overhyped in the media, but there are many real and im- pactful applications. Machine learning is a powerful technology for products, research and automation. Today, machine learning is used, for example, to detect fraudulent financial transactions, recommend movies and classify images. It is often crucial that the machine learning models are interpretable. Interpretability helps the developer to debug and improve the model, build trust in the model, justify model predictions and gain insights. The in- creased need for machine learning interpretability is a natural consequence of an increased useofmachinelearning. Thisbookhasbecomeavaluableresourceformanypeople. Teaching instructors use the book to introduce their students to the concepts of interpretable machine learning. I received e-mails from various master and doctoral students who told me that this book was the starting point and most important reference for their theses. The book has helped applied researchers in the field of ecology, finance, psychology, etc. who use ma- chine learning to understand their data. Data scientists from industry told me that they use the “Interpretable Machine Learning” book for their work and recommend it to their colleagues. I am happy that many people can benefit from this book and become experts in model interpretation. I would recommend this book to practitioners who want an overview of techniques to make their machine learning models more interpretable. It is also recommended to students and researchers (and anyone else) who is interested in the topic. To benefit from this book, you shouldalreadyhaveabasicunderstandingofmachinelearning. Youshouldalsohaveamath- ematical understanding at university entry level to be able to follow the theory and formulas in this book. It should also be possible, however, to understand the intuitive description of the method at the beginning of each chapter without mathematics. I hope you enjoy the book!