Table Of ContentPracticing
PP
rr
aa
cc
Trustworthy
tt
ii
cc
ii
nn
gg
TT Machine Learning
rr
uu
ss
tt
ww
oo
Consistent, Transparent, and Fair AI Pipelines
rr
tt
hh
yy
MM
aa
cc
hh
ii
nn
ee
LL
ee
aa
rr
nn
ii
nn
gg
P
r
u
k
MMsa
acc
juAtha
mdaeer, &tkun
r ,
Yada Pruksachatkun,
Matthew McAteer &
Subhabrata Majumdar
Practicing Trustworthy Machine Learning
With the increasing use of AI in high-stakes domains such as
medicine, law, and defense, organizations spend a lot of time “An excellent practical
and money to make ML models trustworthy. Many books on book with code
the subject offer deep dives into theories and concepts. This examples on making
guide provides a practical starting point to help development
AI systems more fair,
teams produce models that are secure, more robust, less
private, explainable,
biased, and more explainable.
and robust. Impressively,
Authors Yada Pruksachatkun, Matthew McAteer, and
it has kept up with the
Subhabrata (Subho) Majumdar translate best practices in the
ongoing Cambrian
academic literature for curating datasets and transforming
models into blueprints for building industry-grade trusted ML explosion of foundation
systems. With this book, engineers and data scientists will models.”
gain a much-needed foundation for releasing trustworthy ML
—Kush Varshney
applications into a noisy, messy, and often hostile world. Distinguished Research Scientist,
Foundations of Trustworthy AI,
You’ll learn:
IBM Research
• Methods to explain ML models and their outputs to
stakeholders
Yada Pruksachatkun is a machine
• How to recognize and fix fairness concerns and privacy leaks learning scientist at Infinitus, a
in an ML pipeline conversational AI startup that
automates calls in the healthcare
• How to develop ML systems that are robust and secure
system.
against malicious attacks
Matthew McAteer is the creator of
• Important systemic considerations, like how to manage trust 5cube Labs, an ML consultancy that
debt and which ML obstacles require human intervention has worked with over 100 companies
in industries ranging from architecture
• The important features behind your model’s decisions
to medicine to agriculture. P
r
u
Subho Majumdar is a machine learning k
scientist at Twitch, where he leads MMsa
applied science efforts in responsible ML. ajucAtcha
mdaeer, &tkun
r ,
MACHINE LEARNING Twitter: @oreillymedia
linkedin.com/company/oreilly-media
US $79.99 CAN $99.99 youtube.com/oreillymedia
ISBN: 978-1-098-12027-6
Praise for Practicing Trustworthy Machine Learning
An excellent practical book with code examples on making AI systems more fair,
private, explainable, and robust. Impressively, it has kept up with the ongoing
Cambrian explosion of foundation models.
—Kush Varshney, Distinguished Research Scientist,
Foundations of Trustworthy AI, IBM Research
This book is a valuable and conscientiously written introduction to the increasingly
important fields of AI safety, privacy, and interpretability, filled with lots of examples
and code snippets to make it of practical use to machine learning practitioners.
—Timothy Nguyen, deep learning researcher,
host of The Cartesian Cafe podcast
This is an impressive book that feels simultaneously foundational and cutting-edge.
It is a valuable reference work for data scientists and engineers who want to be
confident that the models they release into the world are safe and fair.
—Trey Causey, Head of AI Ethics, Indeed
Practicing Trustworthy
Machine Learning
Consistent, Transparent,
and Fair AI Pipelines
Yada Pruksachatkun, Matthew McAteer,
and Subhabrata Majumdar
BBeeiijjiinngg BBoossttoonn FFaarrnnhhaamm SSeebbaassttooppooll TTookkyyoo
Practicing Trustworthy Machine Learning
by Yada Pruksachatkun, Matthew McAteer, and Subhabrata Majumdar
Copyright © 2023 Yada Pruksachatkun, Matthew McAteer, and Subhabrata Majumdar. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional
sales department: 800-998-9938 or corporate@oreilly.com.
Acquisitions Editor: Nicole Butterfield Indexer: nSight, Inc.
Development Editor: Sarah Grey Interior Designer: David Futato
Production Editor: Katherine Tozer Cover Designer: Karen Montgomery
Copyeditor: Paula L. Fleming Illustrator: Kate Dullea
Proofreader: Piper Editorial Consulting, LLC
January 2023: First Edition
Revision History for the First Release
2023-01-03: First Release
See http://oreilly.com/catalog/errata.csp?isbn=9781098120276 for release details.
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Practicing Trustworthy Machine Learn‐
ing, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the authors and do not represent the publisher’s views.
While the publisher and the authors have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.
978-1-098-12027-6
[LSI]
This book is dedicated to the memory of security researcher, internet privacy activist,
and AI ethics researcher Peter Eckersley (1979 to 2022). Thanks for your work on tools
such as Let’s Encrypt, Privacy Badger, Certbot, HTTPS Everywhere, SSL Observatory
and Panopticlick, for advancing AI ethics in a pragmatic, policy-focused, and actionable
way. Thank you also for offering to proofread this book in what unexpectedly turned out
to be your last months.
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1. Privacy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Attack Vectors for Machine Learning Pipelines 1
Improperly Implemented Privacy Features in ML: Case Studies 2
Case 1: Apple’s CSAM 3
Case 2: GitHub Copilot 4
Case 3: Model and Data Theft from No-Code ML Tools 5
Definitions 6
Definition of Privacy 6
Proxies and Metrics for Privacy 6
Legal Definitions of Privacy 8
k-Anonymity 8
Types of Privacy-Invading Attacks on ML Pipelines 8
Membership Attacks 9
Model Inversion 10
Model Extraction 11
Stealing a BERT-Based Language Model 13
Defenses Against Model Theft from Output Logits 17
Privacy-Testing Tools 19
Methods for Preserving Privacy 20
Differential Privacy 20
Stealing a Differentially Privately Trained Model 21
Further Differential Privacy Tooling 23
Homomorphic Encryption 23
Secure Multi-Party Computation 24
SMPC Example 25
vii
Further SMPC Tooling 29
Federated Learning 29
Conclusion 30
2. Fairness and Bias. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Case 1: Social Media 34
Case 2: Triaging Patients in Healthcare Systems 34
Case 3: Legal Systems 35
Key Concepts in Fairness and Fairness-Related Harms 36
Individual Fairness 37
Parity Fairness 37
Calculating Parity Fairness 38
Scenario 1: Language Generation 39
Scenario 2: Image Captioning 43
Fairness Harm Mitigation 45
Mitigation Methods in the Pre-Processing Stage 47
Mitigation Methods in the In-Processing Stage 47
Mitigation Methods in the Post-Processing Stage 49
Fairness Tool Kits 50
How Can You Prioritize Fairness in Your Organization? 52
Conclusion 52
Further Reading 53
3. Model Explainability and Interpretability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Explainability Versus Interpretability 55
The Need for Interpretable and Explainable Models 56
A Possible Trade-off Between Explainability and Privacy 57
Evaluating the Usefulness of Interpretation or Explanation Methods 58
Definitions and Categories 59
“Black Box” 59
Global Versus Local Interpretability 59
Model-Agnostic Versus Model-Specific Methods 59
Interpreting GPT-2 60
Methods for Explaining Models and Interpreting Outputs 68
Inherently Explainable Models 68
Local Model-Agnostic Interpretability Methods 79
Global Model-Agnostic Interpretability Methods 98
Explaining Neural Networks 99
Saliency Mapping 99
Deep Dive: Saliency Mapping with CLIP 100
Adversarial Counterfactual Examples 120
viii | Table of Contents