MMEETTHHOODDSS IINN MMOOLLEECCUULLAARR BBIIOOLLOOGGYYTTMM Volume 275 CChheemmooiinnffoorrmmaattiiccss CCoonncceeppttss,, MMeetthhooddss,, aanndd TToooollss ffoorr DDrruugg DDiissccoovveerryy EEddiitteedd bbyy JJüürrggeenn BBaajjoorraatthh i Chemoinformatics ii M E T H O D S I N M O L E C U L A R B I O L O G YTM John M. Walker, SERIES EDITOR 294. Cell Migration: Developmental Methods and 275.Chemoinformatics,edited by Jürgen Bajorath, 2004 Protocols,edited by Jun-Lin Guan, 2005 274. Photosynthesis Research Protocols, edited by 293. Laser Capture Microdissection: Methods and Robert Carpentier, 2004 Protocols,edited by Graeme I. Murray and 273. Platelets and Megakaryocytes, Volume 2: Per- Stephanie Curran, 2005 spectives and Techniques, edited by Jonathan 292. DNA Viruses: Methods and Protocols, edited by M. Gibbins and Martyn P. Mahaut-Smith, 2004 Paul M. Lieberman, 2005 272. Platelets and Megakaryocytes, Volume 1: 291. Molecular Toxicology Protocols,edited by Functional Assays, edited by Jonathan M. Phouthone Keohavong and Stephen G. Grant, Gibbins and Martyn P. Mahaut-Smith, 2004 2005 271. B Cell Protocols, edited by Hua Gu and Klaus 290. Basic Cell Culture, Third Edition, edited by Rajewsky, 2004 Cheryl D. Helgason and Cindy Miller, 2005 270. Parasite Genomics Protocols, edited by Sara 289. Epidermal Cells, Methods and Applications, E. Melville, 2004 edited by Kursad Turksen, 2004 269. Vaccina Virus and Poxvirology: Methods and 288. Oligonucleotide Synthesis, Methods and Appli- Protocols,edited by Stuart N. Isaacs, 2004 cations,edited by Piet Herdewijn, 2004 268. Public Health Microbiology: Methods and Pro- 287. Epigenetics Protocols,edited by Trygve O. tocols,edited by John F. T. Spencer and Alicia Tollefsbol, 2004 L. Ragout de Spencer, 2004 286. Transgenic Plants: Methods and Protocols, 267. Recombinant Gene Expression: Reviews and edited by Leandro Peña, 2004 Protocols, Second Edition, edited by Paulina 285. Cell Cycle Control and Dysregulation Balbas and Argelia Johnson, 2004 Protocols:Cyclins, Cyclin-Dependent Kinases, 266. Genomics, Proteomics, and Clinical and Other Factors, edited by Antonio Giordano Bacteriology:Methods and Reviews, edited by and Gaetano Romano, 2004 Neil Woodford and Alan Johnson, 2004 284. Signal Transduction Protocols, Second Edition, 265. RNA Interference, Editing, and edited by Robert C. Dickson and Michael D. Modification:Methods and Protocols, edited by Mendenhall, 2004 Jonatha M. Gott, 2004 283. Bioconjugation Protocols, edited by Christof 264. Protein Arrays: Methods and Protocols, M. Niemeyer, 2004 edited by Eric Fung, 2004 282. Apoptosis Methods and Protocols, edited by 263. Flow Cytometry, Second Edition, edited by Hugh J. M. Brady, 2004 Teresa S. Hawley and Robert G. Hawley, 2004 281. Checkpoint Controls and Cancer, Volume 2: 262. Genetic Recombination Protocols, edited by Activation and Regulation Protocols,edited by Alan S. Waldman, 2004 Axel H. Schönthal, 2004 261. Protein–Protein Interactions: Methods and 280. Checkpoint Controls and Cancer, Volume 1: Applications,edited by Haian Fu, 2004 Reviews and Model Systems, edited by Axel H. 260. Mobile Genetic Elements: Protocols and Schönthal, 2004 Genomic Applications, edited by Wolfgang J. 279. Nitric Oxide Protocols, Second Edition, edited Miller and Pierre Capy, 2004 byAviv Hassid, 2004 259. Receptor Signal Transduction Protocols, Sec- 278. Protein NMR Techniques, Second Edition, ond Edition, edited by Gary B. Willars edited by A. Kristina Downing, 2004 and R. A. John Challiss, 2004 277. Trinucleotide Repeat Protocols, edited by 258. Gene Expression Profiling: Methods and Yoshinori Kohwi, 2004 Protocols,edited by Richard A. Shimkets, 2004 276. Capillary Electrophoresis of Proteins and 257. mRNA Processing and Metabolism: Methods Peptides,edited by Mark A. Strege and Avinash and Protocols, edited by Daniel R. Schoenberg, L. Lagu, 2004 2004 iii M E T H O D S I N M O L E C U L A R B I O L O G YTM Chemoinformatics Concepts, Methods, and Tools for Drug Discovery Edited by Jürgen Bajorath Albany Molecular Research Inc. Bothell Research Center, Bothell, WA and University of Washington, Seattle, WA Humana Press Totowa, New Jersey iv © 2004 Humana Press Inc. 999 Riverview Drive, Suite 208 Totowa, New Jersey 07512 www.humanapress.com All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise without written permission from the Publisher. Methods in Molecular Biology™ is a trademark of The Humana Press Inc. The content and opinions expressed in this book are the sole work of the authors and editors, who have warranted due diligence in the creation and issuance of their work. The publisher, editors, and authors are not responsible for errors or omissions or for any consequences arising from the information or opinions presented in this book and make no warranty, express or implied, with respect to its contents. This publication is printed on acid-free paper. (cid:146) ANSI Z39.48-1984 (American Standards Institute) Permanence of Paper for Printed Library Materials. Cover illustration: Background: Figure 3 from Chapter 7.Foreground: Illustration supplied by Jürgen Bajorath. Cover design by Patricia F. Cleary. For additional copies, pricing for bulk purchases, and/or information about other Humana titles, contact Hu- mana at the above address or at any of the following numbers: Tel.: 973-256-1699; Fax: 973-256-8341; E- mail: [email protected]; or visit our Website: www.humanapress.com Photocopy Authorization Policy: Authorization to photocopy items for internal or personal use, or the internal or personal use of specific cli- ents, is granted by Humana Press Inc., provided that the base fee of US $25.00 per copy is paid directly to the Copyright Clearance Center at 222 Rosewood Drive, Danvers, MA 01923. For those organizations that have been granted a photocopy license from the CCC, a separate system of payment has been arranged and is acceptable to Humana Press Inc. The fee code for users of the Transactional Reporting Service is: [1-58829- 261-4/04 $25.00]. Printed in the United States of America. 10 9 8 7 6 5 4 3 2 1 Library of Congress Cataloging in Publication Data Chemoinformatics : concepts, methods, and tools for drug discovery / edited by Jürgen Bajorath. p. cm. — (Methods in molecular biology ; v. 275) Includes bibliographical references and index. ISBN 1-58829-261-4 (alk. paper) eISBN 1-59259-802-1 1. Chemoinformatics. I. Bajorath, Jürgen. II. Series: Methods in molecular biology (Clifton, N.J.) ; v. 275. RS418.C48 2004 615'.19—dc22 2004047477 v Preface In the literature, several terms are used synonymously to name the topic of this book: chem-, chemi-, or chemo-informatics. A widely recognized defi- nition of this discipline is the one by Frank Brown from 1998 (1) who defined chemoinformatics as the combination of “all the information resources that a scientist needs to optimize the properties of a ligand to become a drug.” In Brown’s definition, two aspects play a fundamentally important role: deci- sion support by computational means and drug discovery, which distinguishes it from the term “chemical informatics” that was introduced at least ten years earlier and described as the application of information technology to chem- istry (not with a specific focus on drug discovery). In addition, there is of course “chemometrics,” which is generally understood as the application of statistical methods to chemical data and the derivation of relevant statistical models and descriptors (2). The pharmaceutical focus of many developments and efforts in this area—and the current popularity of gene-to-drug or simi- lar paradigms—is further reflected by the recent introduction of such terms as “discovery informatics” (3), which takes into account that gaining knowl- edge from chemical data alone is not sufficient to be ultimately successful in drug discovery. Such insights are well in accord with other views that the boundaries between bio- and chemoinformatics are fluid and that these dis- ciplines should be closely combined or merged to significantly impact bio- technology or pharmaceutical research (4). Clearly, from an algorithmic or methodological point of view, bio- and chemoinformatics are much more similar to each other than many of their applications would suggest, at least on a first glance. It is fair to assume that the application of information sci- ence and technology to chemical or biological problems will further develop and mature, as well as continue to define, and redefine, itself. If we wish to focus on chemoinformatics in a more narrow sense, what should we really consider? First, methods that support decision making in the context of pharmaceutical research (2) (such as compound design and selec- tion) or methods that help interfacing computational and experimental pro- grams (4) [such as virtual and biological screening (5)] are without doubt essential components. Second, equally important to developing methods and research tools is building and maintaining computational infrastructures to collect, organize, manage, and analyze chemical data. Third, I would propose v vi Preface that it has also become increasingly difficult to distinguish between chemoin- formatics and chemometrics, since statistical methods, models, and descrip- tors play a crucial role in, for example, similarity and diversity analysis or virtual screening. Fourth, approaches to explore (and exploit) structure–activ- ity or structure–property relationships can hardly be excluded from chemoinfor- matics research, much of which aims at helping to identify or make better molecules. This means that approaches that are long disciplines in their own right such as QSAR or structure-based design can—and perhaps should—also be considered to contrib- ute and belong to chemoinformatics. Lastly, evaluation of drug-likeness and predic- tion of downstream ADME characteristics of compounds have become highly relevant topics for chemoinformatics and drug discovery research and are approached using rather different concepts and algorithms. Being confronted with the task of putting Chemoinformatics: Concepts, Methods, and Tools for Drug Discoverytogether, I decided to focus on authors and their individual contributions, rather than trying to address everything pos- sible that could be covered under the chemoinformatics umbrella, as discussed above. It was my sincere hope that this approach would do justice to this still evolving and rather diverse field. Therefore, a variety of researchers (including well-recognized pioneers, senior scientists, and junior-level investigators) from diverse professional environments (academia, large pharmaceutical industry, and biotech companies) were asked to contribute. Chemoinformatics-relevant subject areas were initially outlined to provide some guidance, but authors were given as much freedom as possible in choosing their topics and design- ing their chapters. The result we are looking at is the rather diverse array of chapters I had initially hoped for. Certainly, many chapters go well beyond the introduction of single methods and protocols that is a major theme of the Methods in Molecular Biologyseries, at least as far as experimental science is concerned. Our contributions range from the description of specific methods or applications to the discussion of fundamentally important concepts and extensive review articles. On the other hand, some of the topics I initially envisioned to cover are missing, for example, neural network simulations or chemical genetics, to name just two. By contrast, some contributions present and discuss similar methods, for example, compound selection or library design, in rather different ways, which I find particularly interesting and stimu- lating. Chemoinformatics: Concepts, Methods, and Tools for Discoverybegins with an elaborate theoretical discussion of the concept of molecular similarity by Maggiora & Shanmugasundaram that is one of the origins and cornerstones of chemoinformatics as we understand it today. Chapter 2 by Willett follows up on this theme and extends the discussion to molecular diversity, a related Preface vii —yet distinct—and equally fundamental concept. Following these method- ological considerations, Bembenek & colleagues describe a computational infra- structure to enable pharmaceutical researchers to efficiently access basic chemoinformatics tools and help in decision-making. Chapters 4 and 5 by Parker & Schreyer and Lajiness & Shanmugasundaram describe efforts to interface chemoinformatics approaches with high-throughput screening and with screening and medicinal chemistry, respectively. As discussed above, the formation of such interfaces is one of the major challenges—and opportunities—for chemoinfor- matics in pharmaceutical research. Esposito & colleagues provide an extensive discussion of QSAR approaches in Chapter 6. The authors review basic principles and methods and then focus on the latest developments in multidimensional QSAR analysis. In the following chap- ter, Gomar & colleagues describe the development of a lipophilicity descriptor that alleviates the molecular alignment problem in QSAR and discuss exemplary appli- cations. In general, the majority of chemoinformatics applications critically depend on the use of descriptors of molecular structure and properties, and Chapter 8 by Labute presents a good example of descriptor design. The author describes the gen- eration of a novel class of molecular surface property descriptors that can be readily calculated from 2D representations of molecular structures. The next four chapters focus on partitioning algorithms and classification methods that have become very popular for the analysis of large compound databases, screening sets, and virtual screening for active molecules. Xue & colleagues describe cell-based partitioning based on principal component analysis and, to contrast with chemical space dimension reduction methods, Godden & Bajorath introduce a statistically based partitioning algorithm that directly oper- ates in higher-dimensional, albeit simplified, chemical descriptor spaces. In the following back-to-back chapters, Lam & Welch first apply clustering and cell- based partitioning methods for the selection of active compounds from the HIV data set of the National Cancer Institute. Based on their computational scheme and results, Young & Hawkins apply recursive partitioning (another statistical approach) to the same data set, thus enabling direct comparisons. Following these compound classification and selection methods, Chap- ters 13–15 describe different approaches to compound library design. Gillet discusses a genetic algorithm-based method to simultaneously optimize mul- tiple objectives or properties when designing libraries. Schnur & colleagues describe various approaches to focus compound libraries on families of thera- peutic targets, which represents a major trend in drug discovery, and Zheng introduces simulated annealing as a stochastic approach to library design. In Chapter 16, Lavine & colleagues return to a compound classification prob- lem by using a combination of principal component analysis and a genetic algo- viii Preface rithm that is here applied to an optimization problem different from the one dis- cussed by Gillet. In the next chapters, Crippen introduces novel ways of describ- ing molecular chirality and conformational parameters with relevance for the analysis of structure–activity relationships, and Pick provides a brief review of scoring functions for structure-based virtual screening. The book ends with an extensive and detailed description by Jalaie & colleagues of different types of methods, including structure-based approaches, to predict drug-like character of compounds and basic ADME properties based on modeling their putative interactions with cytochrome P450 isoforms, which are important drug metabo- lizing enzymes. This discussion complements other major themes represented herein including molecular similarity, structure-activity relationships, and com- pound classification and design. First and foremost, I would like to thank our authors whose diverse con- tributions have made this project a (hopefully, interesting!) reality. Jürgen Bajorath References 1. Brown, F. K. (1998) Chemoinformatics: What is it and how does it impact drug discovery.Ann. Rep. Med. Chem.33, 375–384. 2. Goodman, J. M. (2003) Chemical informatics. Chem. Inf. Lett.6 (2); http://www.ch.cam.ac.uk/MMRG/CIL/cil_v6n2.html#14 3. Claus, B. L. and Underwood, D. J. (2002) Discovery informatics: Its evolving role in drug discovery. Drug Discov. Today7, 957–966. 4. Bajorath, J. (2001) Rational drug discovery revisited: Interfacing experimental programs with bio- and chemo-informatics. Drug Discov. Today6, 989–995. 5. Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nature Rev. Drug Discov.1, 882–894. ix Contents Preface .............................................................................................................v Contributors.....................................................................................................xi 1 Molecular Similarity Measures Gerald M. Maggiora and Veerabahu Shanmugasundaram..............1 2 Evaluation of Molecular Similarity and Molecular Diversity Methods Using Biological Activity Data Peter Willett..........................................................................................51 3 A Web-Based Chemoinformatics System for Drug Discovery Scott D. Bembenek, Brett A. Tounge, Steven J. Coats, and Charles H. Reynolds................................................................65 4 Application of Chemoinformatics to High Throughput Screening: Practical Considerations Christian N. Parker and Suzanne K. Schreyer.................................85 5 Strategies for the Identification and Generation of Informative Compound Sets Michael S. Lajiness and Veerabahu Shanmugasundaram...........111 6 Methods for Applying the Quantitative Structure–Activity Relationship Paradigm Emilio Xavier Esposito, Anton J. Hopfinger, and Jeffry D. Madura.....................................................................131 7 3D-LogP:An Alignment-Free 3D Description of Local Lipophilicity for QSAR Studies Jérôme Gomar, Elie Giraud, David Turner, Roger Lahana, and Pierre Alain Carrupt...............................................................215 8 Derivation and Applications of Molecular Descriptors Based on Approximate Surface Area Paul Labute.........................................................................................261 9 Cell-Based Partitioning Ling Xue, Florence L. Stahura, and Jürgen Bajorath...................279 10 Partitioning in Binary-Transformed Chemical Descriptor Spaces Jeffrey W. Godden and Jürgen Bajorath........................................291 ix
Description: