Computational Methods in Engineering & the Sciences Huixiao Hong Editor Machine Learning and Deep Learning in Computational Toxicology Computational Methods in Engineering & the Sciences Series Editor Klaus-Jürgen Bathe, Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA This Series publishes books on all aspects of computational methods used in engineering and the sciences. With emphasis on simulation through mathematical modelling, the Series accepts high quality content books across different domains of engineering, materials, and other applied sciences. The Series publishes mono- graphs, contributed volumes, professional books, and handbooks, spanning across cutting edge research as well as basics of professional practice. The topics of interest include the development and applications of computational simulations in the broad fields of Solid & Structural Mechanics, Fluid Dynamics, Heat Transfer, Electromagnetics, Multiphysics, Optimization, Stochastics with simulations in and for Structural Health Monitoring, Energy Systems, Aerospace Systems, Machines and Turbines. Climate Prediction, Effects of Earthquakes, Geotechnical Systems, Chemical and Biomolecular Systems, Molecular Biology, Nano and Microfluidics, Materials Science, Nanotechnology, Manufacturing and 3D printing, Artificial Intelligence, Internet-of-Things. Huixiao Hong Editor Machine Learning and Deep Learning in Computational Toxicology Editor Huixiao Hong Division of Bioinformatics and Biostatistics National Center for Toxicological Research U.S. Food and Drug Administration Jefferson, AR, USA ISSN 2662-4869 ISSN 2662-4877 (electronic) Computational Methods in Engineering & the Sciences ISBN 978-3-031-20729-7 ISBN 978-3-031-20730-3 (eBook) https://doi.org/10.1007/978-3-031-20730-3 This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply 2023 All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Preface Owing to the recent explosive growth of the chemical industry, compounded by improved methods to rapidly synthesize compounds with more diverse chemical structures, the need to efficiently assess the risk to public health of chemicals and evaluate the safety profile of chemical containing products is imperative. Safety evaluation and risk assessment are needed to protect not only the human species, but also other living organisms and our ecosystem. In addition to merely needing to assess chemicals, the field must also assess byproducts of chemicals and interactions with other chemicals that can occur in living organisms or the environment. Until recently, the gold standard for toxicology hazard assessment was animal models. However, animal models have numerous fundamental limitations around their utility, including, but not limited to, high cost, low throughput, low accuracy, and ethical concerns. Thus, the scientific community as a whole has declared the importance of the three “Rs”: Replace, Reduce, and Refine animal use. To meet any of these three R goals, alternative methods are required to assess the safety and hazard of chemicals. Fortunately, the toxicology community is beginning to fully realize the issues associated with animal models concomitant with incredible advancement in the field of computer science. Thus, computational toxicologists are collaborating with computer scientists to spark the dawn of the computational toxicology era. This field has the goal of predicting and understanding the hazards of chemicals in a cost-and time-efficient manner with very low risk of ethical issues. As new computational methods are developed, they can be applied to toxicology challenges, which will allow for continual iteration and improvement. Currently, artificial intelligence, mainly machine learning and deep learning, is in a Cambrian era with new advanced algorithms being developed and deployed across a number of fields. This rapid advance has quickly transformed industry, jobs, and even society. While machine learning and deep learning were developed in the context of other fields, they have all been successfully applied to computational toxicology. v vi Preface This textbook will survey the landscape of the deployment of various advanced algorithms to solve important questions around the hazard and toxicity of chemi- cals. As many of these algorithms come from other fields, this work will also cover important best practices to be aware while deploying such algorithms to solve crit- ical challenges. To give well-deserved attention to the many facets of computational toxicology, this book is divided into four parts. First, the text will detail the machine learning and deep learning algorithms most relevant to the field of computational toxicology and why they are relevant. Next, this will be expanded by describing tools and approaches to enable efficient application of advanced algorithms to toxi- cology. As examples are important to illustrate relevance, this textbook will then focus on the application of machine learning and deep learning to chemical toxicity prediction. This textbook will also delve into the future of nascent approaches and new progress in the area. This textbook was authored with a wide audience in mind. For the classical toxicol- ogist looking to dip a toe into computational approaches, computational concepts are thoroughly introduced and explained through clarifying examples. For the computer scientist looking to apply machine learning and deep learning to other fields, this textbook showcases numerous examples of how others have successfully applied advanced computation to solve toxicology problems. Lastly, for students and other trainees eager to further their career, this body of work will survey a variety of computational toxicology topics and could provide a spark of inspiration for a career direction. While the chapters following do review a variety of computational toxicology topics, it does not contain computer code samples, chapter quizzes, or practice exams. For instructional purposes, developing quizzes and exams are left to a course instructor. The authors would humbly appreciate comments, feedback, and corrections from readers so future work can be improved. The impressive quantity, speed, and diversity of data relevant to computational toxicology have substantially grown in recent years. These increasingly rich datasets have fueled the growth of machine learning and deep learning approaches, which are hungry for large and high-quality datasets. As the fields of toxicology and computer science continue to intersect and synchronize in new ways, the world will benefit from improved prediction and understanding of the toxicity and hazard potential of chemicals. Preface vii This preface reflects the views of the authors and does not necessarily reflect those of the U.S. Food and Drug Administration. Rebecca Kusko, Ph.D. Head of Business Development and Corporate Affairs Immuneering Corporation Cambridge, MA, USA Huixiao Hong, Ph.D. SBRBPAS Expert Chief Bioinformatics Branch National Center for Toxicological Research U.S. Food and Drug Administration Jefferson, AR, USA Contents 1 Machine Learning and Deep Learning Promote Computational Toxicology for Risk Assessment of Chemicals ...... 1 Rebecca Kusko and Huixiao Hong Part I Machine Learning and Deep Learning Methods for Computational Toxicology 2 Assessment of the Xenobiotics Toxicity Taking into Account Their Metabolism ............................................. 21 Dmitry Filimonov, Alexander Dmitriev, Anastassia Rudik, and Vladimir Poroikov 3 Emerging Machine Learning Techniques in Predicting Adverse Drug Reactions ....................................... 53 Yi Zhong, Shanshan Wang, Gaozheng Li, Ji Yang, Zuquan Weng, and Heng Luo 4 Drug Effect Deep Learner Based on Graphical Convolutional Network ...................................................... 83 Yunyi Wu, Shenghui Guan, and Guanyu Wang 5 AOP-Based Machine Learning for Toxicity Prediction ............ 141 Wei Shi, Rong Zhang, and Haoyue Tan 6 Graph Kernel Learning for Predictive Toxicity Models ........... 159 Youjun Xu, Chia-Han Chou, Ningsheng Han, Jianfeng Pei, and Luhua Lai 7 Optimize and Strengthen Machine Learning Models Based on in Vitro Assays with Mechanistic Knowledge and Real-World Data .......................................... 183 Thilini V. Mahanama, Arpan Biswas, and Dong Wang ix x Contents 8 Multitask Learning for Quantitative Structure–Activity Relationships: A Tutorial ...................................... 199 Cecile Valsecchi, Francesca Grisoni, Viviana Consonni, Davide Ballabio, and Roberto Todeschini Part II Tools and Approaches Facilitating Machine Learning and Deep Learning Methods in Computational Toxicology 9 Isalos Predictive Analytics Platform: Cheminformatics, Nanoinformatics, and Data Mining Applications ................. 223 Dimitra-Danai Varsou, Andreas Tsoumanis, Anastasios G. Papadiamantis, Georgia Melagraki, and Antreas Afantitis 10 ED Profiler: Machine Learning Tool for Screening Potential Endocrine-Disrupting Chemicals ............................... 243 Xianhai Yang, Huihui Liu, Rebecca Kusko, and Huixiao Hong 11 Quantitative Target-specific Toxicity Prediction Modeling (QTTPM): Coupling Machine Learning with Dynamic Protein–Ligand Interaction Descriptors (DyPLIDs) to Predict Androgen Receptor-mediated Toxicity .......................... 263 Sundar Thangapandian, Gabriel Idakwo, Joseph Luttrell, Huixiao Hong, Chaoyang Zhang, and Ping Gong 12 Mold2 Descriptors Facilitate Development of Machine Learning and Deep Learning Models for Predicting Toxicity of Chemicals .................................................. 297 Huixiao Hong, Jie Liu, Weigong Ge, Sugunadevi Sakkiah, Wenjing Guo, Gokhan Yavas, Chaoyang Zhang, Ping Gong, Weida Tong, and Tucker A. Patterson 13 Applicability Domain Characterization for Machine Learning QSAR Models ................................................ 323 Zhongyu Wang and Jingwen Chen 14 Controlling for Confounding in Complex Survey Machine Learning Models to Assess Drug Safety and Risk ................. 355 Paul Rogers 15 Multivariate Curve Resolution for Analysis of Heterogeneous System in Toxicogenomics ...................................... 375 Yuan Liu, Jinzhu Lin, Menglong Li, and Zhining Wen