Wendy A.Warr (Ed.) Chemical Structures The International Language of Chemistry With 213 Figures and 18 Tables Springer-Verlag Berlin Heidelberg New York London Paris Tokyo Dr. Wendy A. Warr Information Services Section ICI Pharmaceuticals Division Mereside, Alderley Park Macclesfield, Cheshire SK 10 4TG, UK ISBN-13 :978-3-642-73977-4 e-ISBN-13:978-3-642-73975-0 001: 10.1007/978-3-642-73975-0 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. Duplication of this publication or parts thereof is only permitted under the provi sions of the German Copyright Law of September 9,1965, in its version of June 24, 1985, and a copyright fee must always be paid. Violations fall under the prosecution act of the German Copyright Law. © Springer-Verlag Berlin Heidelberg 1988 Softcover reprint of the hardcover 1st edition 1988 The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Hope Services (Abingdon) Ltd. 215113140 -543210 - Printed on acid-free paper PREFACE This book constitutes the Proceedings of the conference 'Chemical Structures: The International Language of Chemistry' which was held at Leeuwenhorst Congress Centre, Noordwijkerhout in the Netherlands, between May 31 and June 4, 1987. The conference was jointly sponsored by the Chemical Structure Association, the American Chemical Society Division of Chemical Information, and the Chemical Information Groups of the Royal Society of Chemistry and the German Chemical Society. The purpose of the conference was to bring together experts and an international professional audience to discuss and to further basic and applied research and development in the processing, storage, retrieval and use of chemical structures, to focus international attention on the importance of chemical information and the vital research being carried out in chemical information science and to foster co-operation among major chemical information organisations in North America and Europe. Subjects covered included integrated in-house databases, substructure searching methodology, spectral databanks, new technologies (microcomputers, CD-ROM, parallel processing and expert systems) and chemical reactions. The keynote address was given by Mike Lynch of the University of Sheffield. In this, the opening chapter of the book, Mike discusses progress made in chemical information science in the last fifteen years and describes his own approach to research. In a plenary session, Myra Williams of Merck, Sharp and Dohme considered future trends from the point of view of the information manager and strategic planner in industry. She emphasises the need for integration, open architecture and a uniform user interface. The next group of chapters is concerned with in-house chemical structure databases and related property data. Tom Hagadone has carried out a survey of current practice and emerging trends in the integration of in-house databases and he presents his results. Mike Allen describes Glaxo's Chemical and Biological Information System, cms, with particular emphasis on registration and stereochemistry. Deibel and de Jong explain Akzo's integration of DARC structure handling software with the database management system ORACLE. Fisons, however, have integrated OSAC with their System 1032-based data handling system ABACUS. Dave Magrill gives details. Pfizer have developed SOCRATES to store and retrieve chemical structures, reactions and related data. SOCRATES, which exploits System 1032 for database management, but as a whole is proprietary to Pfizer, is described in Trevor Devon and David Bawden's paper. The next four chapters in this group all involve MACCS to a greater or lesser extent. Arnold Lurie describes both Kodak's private registry system with Chemical Abstracts Service and the use of MACCS for a separate chemical structure handling system. The considerable advantages ofintegrating these two systems are not addressed and remain the subject of interesting speculation. Harold Schlevin and his co-workers at Ciba Geigy integrate MACCS with a variety of commercially available software tools for handling and analysing data. They have written an in-house system, ChemSketch, to integrate text and chemical structures. Bill Henckler and his co-workers at Merck have converted their out-dated Chemical Structure Information System, CSIS, with its intricate database updating, to a new MACCS-based system. Finally, Erick Ahrens VI describes how customisable chemical database systems, and in particular MACCS-II, can serve the diverse information needs of the chemical and pharmaceutical industries. Substructure searching methodology is covered in the next set of chapters. John Barnard's chapter gives an historic overview and outlines areas of current research interest. Peter Bruck and his co-workers demonstrate how fast very large files can be searched when the database is tree-structured. Peter Willett's team at Sheffield University have developed techniques for substructure searching in files of three dimensional chemical structures. Their paper also describes hardware and software techniques for identifying the maximal common substructure in a set of three dimensional chemical structures. David Bawden of Pfizer (in conjunction with Sheffield University) has developed techniques for measuring similarity among structures. He describes associated ranking and clustering procedures and the concept of browsing rather than exact searching. The next three chapters concern generic structure search. Mike Lynch reviews progress in the Sheffield University Generic Chemical Structures project and the problems that his team are at present tackling. Kathy Shenton and co-authors describe the construction of a database of generic structures and the 'Markush DARC' search software developed collaboratively by Derwent, Tlsystemes and the French Patent Office, INPI. Yoshihiro Kudo's poster paper illustrated a system for searching using both specific and generic structural formulae. Three chapters about online databases follow. Clemens Jochum's concerns the construction of the Beilstein Online database. Of particular interest are the complete representation of stereochemistry and the replacement of bond orders by electronic information, allowing a global, and unique, representation of tautomers. Ole Norager describes the ECDIN project, in which data has been collected on environmental chemicals and made accessible online. Gerry vander Stouw shows how the Chemical Abstracts Registry System Enhancements Feasibility Study project identified aspects of the CAS structuring conventions which create problems for users and he considers the improvements which may be recommended. Two separate teams, those of Wolfgang Bremser and Henk van't Klooster, were chosen to cover the subject of spectral databases, computer-aided library search systems and expert systems for structure analysis. The chapter by Ernst Meyer and his colleague describes techniques for detecting classes of substances which might prove to be biologically active. Bench chemists at BASF can apply the system to a large collection of biological test data in their quest for 'lead' compounds. The next group of chapters arises from a session on new technologies. Bill Town's chapter covers hardware and software developments in the microcomputer area and, in particular, the use of microcomputers for chemical structure input and display, for managing personal databases, for offline structural query negotiation and for downloading of chemical structures. Dan Meyer defines five major categories of microcomputer software for handling chemical structures and then compares a number of packages for managing databases of chemical structures. Patrick Gibbins outlines the benefits and limitations of publishing on CD-ROM. Todd Wipke and Mathew Hahn describe a symbolic, non-numerical approach to molecular model building and conformational analysis. The last two chapters in the new technology group both cover parallel processing techniques. Peter Jochum and his colleague describe an architec ture consisting of a search machine with many independent parallel processing units administered by a master processor. Christian Zeidner and co-workers report on the VII completely new hardware and software for Chemical Abstracts' second generation, parallel structure searching system, which will involve inverted file searching at the screening stage (instead of the present serial search method). Eleven chapters on the subject of chemical reactions follow. Peter Johnson gives an overview of current approaches to reaction indexing with emphasis on the Leeds University system ORAC. A chapter by Tom Moock et al. addresses newer features of the REACCS system, especially atom-atom mapping in reaction substructure search. A poster by Guenther Grethe and colleagues was also centred on REACCS. David Elrod's poster gave an example of the integrated use of CHIRON and REACCS. CHIRON (a program developed by Steve Hanessian of Montreal University) was used to predict chiral precursors for an anthracycline and then REACCS was used to work out a synthetic pathway starting from those chiral precursors. Johnny Gasteiger and his co-workers seek to predict the course of complex organic reactions using empirical methods to quantify the electronic and thermochemical factors which influence chemical reactivity. Willi Sieber describes a method of integrating the complementary approaches of reaction retrieval and synthesis planning. The University of Nijmegen houses a Dutch national facility for both computer assisted organic synthesis and computer-assisted molecular modelling. Jan Noordik describes the tools it makes available to the academic bench chemist. George Vladutz's contribution, co-authored by Scott Gould, examines whether a file organisation consisting of superimposed structures ('hypergraphs') and superimposed reaction graphs has potential for improved retrieval capabilities. A second hyperstruc ture of superimposed reaction skeleton graphs, could be cross- referenced to the first hyperstructure. Rainer Herges has developed algorithms to search systematically for new reactions. With the computer system IGOR, he and his co-workers have predicted hitherto unknown reactions and realised them in the laboratory. Bob Dana and colleagues at Chemical Abstracts Service presented a poster on the machine generation of multi-step reactions for the CASREACT system on STN International. The online version of FIZ's ChemInform database will be available on STN International also. Alex Parlow gave a poster presentation on the creation of ChemInform both as an electronic database (for online and in-house use) and as a printed product. The book concludes with two 'miscellaneous' chapters, Kurt Loening's on chemical nomenclature and Robert Fugmann's on grammar in chemical indexing languages. In assembling the Conference programme, the organising committee hoped to cover most areas of chemical information science. Nearly all the 44 contributions were specifically invited by the committee. In the few cases where a speaker did not supply a written version of his paper, I, the editor, have written a summary. We hope, therefore, that this book truly reflects the state-of-the-art in chemical information science. I am deeply indebted to the organising committee, without whose enthusiasm and hard work this book would not have been possible. The committee members were Charles Citroen (CID-TNO, The Netherlands), David Johnson (Exxon Research and Engineering, USA), Reiner Luckenbach (Beilstein Institute, Federal Republic of Germany), Peter Nichols (lSI, UK) and Bill Town (Hampden Data Services, UK). I myself had the pleasure of chairing the committee. The interest and support of the four sponsoring organisations is gratefully acknowledged. VIII I am also extremely grateful to my secretary, Mary Burgess, who has patiently and accurately word-processed every word of all forty-four chapters. The book was typeset from the floppy disks by Hope Services Ltd., of Clifton Hampden, Oxfordshire. Their helpfulness and efficiency has been much appreciated. Two members of ICI Pharmaceuticals' Information Services Section, Madeline Gray and Pat Holohan, have helped with online searching for literature references, standard journal abbreviations and key subject terms, and Frank Loftus, of the same Section, redrafted a number of diagrams. I would also like to make special mention of David Johnson of Exxon, USA, who spent much time and care preparing the index. Finally I particularly want to thank Bill Town of Hampden Data Services for managing the finances, for suggesting and implementing the electronic typesetting exercise and for supplying constant advice and support when needed. May 1988 Wendy A Warr ICI Pharmaceuticals Macclesfield Cheshire UK CONTENTS o Preface Keynote Address R&D in Chemical Information Science: Retrospect and Prospect Michael F Lynch 1 Future Directions in Integrated Information Systems: Is There a Strategic Advantage? Myra Williams and Gary Franklin 11 Current Approaches and New Directions in the Management of In-house Chemical Structure Databases Tom R Hagadone 23 Development of an Integrated System for Handling Chemical Structures and Associated Data Michael J Allen 43 (Inter)facing DARC-ORACLE A J C M (Juus) de Jong and A M C (Twan) Deibel 45 Using OSAC to Keep Structures in Their Place David S Magrill 53 Development of the Pfizer Integrated Research Data System SOCRATES David Bawden, Trevor K Devon, D Tony Faulkner, Jeremy D Fisher, John M Leach, RobertJ Reeves and Frank E Woodward 63 CAS and MDL Registry Systems at Eastman Kodak Company Arnold P Lurie 77 Integration of Chemical Structures with Information in Support of Business Needs Harold H Schlevin, Marc M Graham, David F Pennington and Werner von Wartburg 79 Poster Session: Moving to an Online Environment for Chemical Information at Merck and Company William G Henckler, Debra L Allison, Patricia L Combs, Gary S Franklin, Walter B Gall and Susan J Sallamack 91 Customisation for Chemical Database Applications Erick K F Ahrens 97 x Problems of Substructure Search and Their Solution John M Barnard 113 Substructure Search on Very Large Files Using Tree-Structured Databases M Z Nagy, S Kozics, T Veszpremi and P Bruck 127 Substructure Searching in Files of Three-Dimensional Chemical Structures Andrew T Brint, Eleanor Mitchell and Peter Willett 131 Browsing and Clustering of Chemical Structures David Bawden 145 The Sheffield University Generic Chemical Structures Project A Review of Progress and of Outstanding Problems Geoffrey M Downs, Valerie J Gillet, John Holliday and Michael F Lynch 151 Generic Searching of Patent Information Kathleen E Shenton, Peter Norton and E Anthony Ferns 169 - Poster Session: A Search System of Generic Names in the List of Existing Chemical Substances of Japan with Generic Structures Yo shihiro Kudo 179 Building a Structure-oriented Numerical Factual Database Clemens Jochum 187 Poster Session: ECDIN, Environmental Chemicals Data and Information Network Ole Norager 195 Poster Session: Potential Enhancements to the CAS Chemical Registry System Gerald G Vander Stouw 211 Substructure Analysis as Basis for Intelligent Interpretation of Spectra Wolfgang Bremser 217 Computer-aided Spectroscopic Structure Analysis of Organic Molecules Using Library Search and Artificial Intelligence Henk A van't Klooster, Peter Cleij, Hendrik-Jan Luinge and Gerard J Kleywegt 219 Systematic Drug Structure-activity Evaluation Correlation Ernst Meyer and Ehrhard Sens 235 The Impact of Microcomputers on Chemical Information Systems William G Town 243 ' XI Use of Microcomputer Software to Access and Handle Chemical Data Daniel E Meyer 251 CD-ROM: A New Electronic Publishing Medium Patrick Gibbins 261 Analogy and Intelligence in Model Building (AIMB) W Todd Wipke and Mathew A Hahn 267 Poster Session: Analogy and Intelligence in Model Building (AIMB) Mathew A Hahn and W Todd Wipke 269 A Multiprocessor Architecture for Substructure Search Peter Jochum and Thomas Worbs 279 The Evolution of the CAS Parallel Structure Searching Architecture Nick Farmer, John Amoss, William Farel, Jerry Fehribach and Christian Zeidner 283 Reaction Indexing: An Overview of Current Approaches A Peter Johnson 297 The Implementation of Atom-Atom Mapping and Related Features in the Reaction Access System (REACCS) Tom E Moock, Jim G Nourse, David Grier and W Douglas Hounshell 303 Poster Session: Reaction Indexing in an Integrated Environment Guenter Grethe, Donna del Rey, Judy GJacobson and Melisande VanDuyne 315 Poster Session: Computer Assisted Synthesis Design Using Chiron and REACCS David W Elrod 331 Prediction of Chemical Reactivity and Design of Organic Synthesis Johann Gasteiger, Michael G Hutchings, Heinz Saller and Peter Low 343 Reaction Retrieval and Synthesis Planning Willi Sieber 361 CAOS/CAMM Services: Synthesis Planning and Molecular Modelling Tools Jan H Noordik 367 Joint Compound/Reaction Storage and Retrieval and Possibilities of a Hyperstructure-based Solution George Vladutz and Scott R Gould 371 Reaction Planning (Computer Aided Reaction Design) Rainer Herges 385