Practicing PP rr aa cc Trustworthy tt ii cc ii nn gg TT Machine Learning rr uu ss tt ww oo Consistent, Transparent, and Fair AI Pipelines rr tt hh yy MM aa cc hh ii nn ee LL ee aa rr nn ii nn gg P r u k MMsa acc juAtha mdaeer, &tkun r , Yada Pruksachatkun, Matthew McAteer & Subhabrata Majumdar Practicing Trustworthy Machine Learning With the increasing use of AI in high-stakes domains such as medicine, law, and defense, organizations spend a lot of time “An excellent practical and money to make ML models trustworthy. Many books on book with code the subject offer deep dives into theories and concepts. This examples on making guide provides a practical starting point to help development AI systems more fair, teams produce models that are secure, more robust, less private, explainable, biased, and more explainable. and robust. Impressively, Authors Yada Pruksachatkun, Matthew McAteer, and it has kept up with the Subhabrata (Subho) Majumdar translate best practices in the ongoing Cambrian academic literature for curating datasets and transforming models into blueprints for building industry-grade trusted ML explosion of foundation systems. With this book, engineers and data scientists will models.” gain a much-needed foundation for releasing trustworthy ML —Kush Varshney applications into a noisy, messy, and often hostile world. Distinguished Research Scientist, Foundations of Trustworthy AI, You’ll learn: IBM Research • Methods to explain ML models and their outputs to stakeholders Yada Pruksachatkun is a machine • How to recognize and fix fairness concerns and privacy leaks learning scientist at Infinitus, a in an ML pipeline conversational AI startup that automates calls in the healthcare • How to develop ML systems that are robust and secure system. against malicious attacks Matthew McAteer is the creator of • Important systemic considerations, like how to manage trust 5cube Labs, an ML consultancy that debt and which ML obstacles require human intervention has worked with over 100 companies in industries ranging from architecture • The important features behind your model’s decisions to medicine to agriculture. P r u Subho Majumdar is a machine learning k scientist at Twitch, where he leads MMsa applied science efforts in responsible ML. ajucAtcha mdaeer, &tkun r , MACHINE LEARNING Twitter: @oreillymedia linkedin.com/company/oreilly-media US $79.99 CAN $99.99 youtube.com/oreillymedia ISBN: 978-1-098-12027-6 Praise for Practicing Trustworthy Machine Learning An excellent practical book with code examples on making AI systems more fair, private, explainable, and robust. Impressively, it has kept up with the ongoing Cambrian explosion of foundation models. —Kush Varshney, Distinguished Research Scientist, Foundations of Trustworthy AI, IBM Research This book is a valuable and conscientiously written introduction to the increasingly important fields of AI safety, privacy, and interpretability, filled with lots of examples and code snippets to make it of practical use to machine learning practitioners. —Timothy Nguyen, deep learning researcher, host of The Cartesian Cafe podcast This is an impressive book that feels simultaneously foundational and cutting-edge. It is a valuable reference work for data scientists and engineers who want to be confident that the models they release into the world are safe and fair. —Trey Causey, Head of AI Ethics, Indeed Practicing Trustworthy Machine Learning Consistent, Transparent, and Fair AI Pipelines Yada Pruksachatkun, Matthew McAteer, and Subhabrata Majumdar BBeeiijjiinngg BBoossttoonn FFaarrnnhhaamm SSeebbaassttooppooll TTookkyyoo Practicing Trustworthy Machine Learning by Yada Pruksachatkun, Matthew McAteer, and Subhabrata Majumdar Copyright © 2023 Yada Pruksachatkun, Matthew McAteer, and Subhabrata Majumdar. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Acquisitions Editor: Nicole Butterfield Indexer: nSight, Inc. Development Editor: Sarah Grey Interior Designer: David Futato Production Editor: Katherine Tozer Cover Designer: Karen Montgomery Copyeditor: Paula L. Fleming Illustrator: Kate Dullea Proofreader: Piper Editorial Consulting, LLC January 2023: First Edition Revision History for the First Release 2023-01-03: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781098120276 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Practicing Trustworthy Machine Learn‐ ing, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-098-12027-6 [LSI] This book is dedicated to the memory of security researcher, internet privacy activist, and AI ethics researcher Peter Eckersley (1979 to 2022). Thanks for your work on tools such as Let’s Encrypt, Privacy Badger, Certbot, HTTPS Everywhere, SSL Observatory and Panopticlick, for advancing AI ethics in a pragmatic, policy-focused, and actionable way. Thank you also for offering to proofread this book in what unexpectedly turned out to be your last months. Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1. Privacy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Attack Vectors for Machine Learning Pipelines 1 Improperly Implemented Privacy Features in ML: Case Studies 2 Case 1: Apple’s CSAM 3 Case 2: GitHub Copilot 4 Case 3: Model and Data Theft from No-Code ML Tools 5 Definitions 6 Definition of Privacy 6 Proxies and Metrics for Privacy 6 Legal Definitions of Privacy 8 k-Anonymity 8 Types of Privacy-Invading Attacks on ML Pipelines 8 Membership Attacks 9 Model Inversion 10 Model Extraction 11 Stealing a BERT-Based Language Model 13 Defenses Against Model Theft from Output Logits 17 Privacy-Testing Tools 19 Methods for Preserving Privacy 20 Differential Privacy 20 Stealing a Differentially Privately Trained Model 21 Further Differential Privacy Tooling 23 Homomorphic Encryption 23 Secure Multi-Party Computation 24 SMPC Example 25 vii Further SMPC Tooling 29 Federated Learning 29 Conclusion 30 2. Fairness and Bias. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Case 1: Social Media 34 Case 2: Triaging Patients in Healthcare Systems 34 Case 3: Legal Systems 35 Key Concepts in Fairness and Fairness-Related Harms 36 Individual Fairness 37 Parity Fairness 37 Calculating Parity Fairness 38 Scenario 1: Language Generation 39 Scenario 2: Image Captioning 43 Fairness Harm Mitigation 45 Mitigation Methods in the Pre-Processing Stage 47 Mitigation Methods in the In-Processing Stage 47 Mitigation Methods in the Post-Processing Stage 49 Fairness Tool Kits 50 How Can You Prioritize Fairness in Your Organization? 52 Conclusion 52 Further Reading 53 3. Model Explainability and Interpretability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Explainability Versus Interpretability 55 The Need for Interpretable and Explainable Models 56 A Possible Trade-off Between Explainability and Privacy 57 Evaluating the Usefulness of Interpretation or Explanation Methods 58 Definitions and Categories 59 “Black Box” 59 Global Versus Local Interpretability 59 Model-Agnostic Versus Model-Specific Methods 59 Interpreting GPT-2 60 Methods for Explaining Models and Interpreting Outputs 68 Inherently Explainable Models 68 Local Model-Agnostic Interpretability Methods 79 Global Model-Agnostic Interpretability Methods 98 Explaining Neural Networks 99 Saliency Mapping 99 Deep Dive: Saliency Mapping with CLIP 100 Adversarial Counterfactual Examples 120 viii | Table of Contents