ebook img

Pattern Recognition in Chemistry PDF

231 Pages·1980·7.31 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Pattern Recognition in Chemistry

Editors Prof. Dr. Gaston Berthier Prof. Dr. Hans H. Jaffe Universite de Paris Department of Chemistry Institut de Biologie University of Cincinnati Physico-Chimique Cincinnati, Ohio 45221/USA Fondation Edmond de Rothschild 13, rue Pierre et Marie Curie Prof. Joshua Jortner F-75005 Paris Institute of Chemistry Tel-Aviv University Prof. Dr. Michael J. S. Dewar 61390 Ramat-Aviv Department of Chemistry Tel-Aviv/Israel The University of Texas Austin, Texas 78712/USA Prof. Dr. Werner Kutzelnigg Lehrstuhl fUr Theoretische Chemi, Prof. Dr. Hanns Fischer der Universitat Bochum Physikalisch-Chemisches Institut Postfach 102148 der Universitat ZUrich 0-4630 Bochum 1 Ramistr.76 CH-8001 ZUrich Prof. Dr. Klaus Ruedenberg Prof. Kenichi Fukui Department of Chemistry Kyoto University Iowa State University Dept. of Hydrocarbon Chemistry Ames, Iowa 50010/USA Kyoto/Japan Prof. Dr. Eolo Scrocco Prof. Dr. Hermann Hartmann Via Garibaldi 88 Akademie der Wissenschaften 1-00153 Roma und der Literatur zu Mainz Geschwister-Scholl-StraBe 2 0-6500 Mainz Prof. Dr. Werner Zeil Direktor des Instituts Prof. Dr. George G. Hall fUr Physikalische und Department of Mathematics Theoretische Chemie The University of Nottingham der Universitat TUbingen University Park Klinglerstr.16 Nottingham NG7 2RD/Great Britain 0-7406 Mossingen bei TUbingen Lecture Notes in Chemistry Edited by G. Berthier, M. J. S. Dewar, H. Fischer K. Fukui, G. G. Hall, H. Hartmann, H. H. Jaffe, J. Jortner W. Kutzelnigg, K. Ruedenberg, E. Scrocco, W. Zeil 21 Kurt Varmuza Pattern Recognition in Chemistry Springer-Verlag Berlin Heidelberg New York 1980 Author Kurt Varmuza Institut fOr Allgemeine Chemie der Technischen Universitat Wien Lehargasse 4 A-1060 Wien ISBN-13:978-3-540-10273-1 e-ISBN-13:978-3-642-93155-0 001: 10.1007/978-3-642-93155-0 This work is subject to copyright All rights are reserved, whether the whole or part of the material is concerned, specifically those of translation, re printing, re-use of illustrations, broadcasting, reproduction by photocopying machine or similar means, and storage in data banks. Under § 54 of the German Copyright Law where copies are made for other than private use, a fee is payable to the publisher, the amount of the fee to be determined by agreement with the publisher. © by Springer-Verlag Berlin Heidelberg 1980 2152/3140-543210 Preface Analytical chemistry of the recent years is strongly influenced by automation. Data acquisition from analytica~ instruments - and some times also controlling of instruments - by a computer are principally solved since many years. Availability of microcomputers made these tasks also feasible from the economic point of view. Besides these basic applications of computers in chemical measurements scientists developed computer programs for solving more sophisticated problems for which some kind of "intelligence" is usually supposed to be necessary. Harm less numerical experiments on this topic led to passionate discussions about the theme "which jobs cannot be done by a computer but only by human brain ?~. If this question is useful at all it should not be ans wered a priori. Application of computers in chemistry is a matter of utility, sometimes it is a social problem, but it is never a question of piety for the human brain. Automated instruments and the necessity to work on complex pro blems enhanced the development of automatic methods for the reduction and interpretation of large data sets. Numerous methods from mathematics, statistics, information theory, and computer science have been exten sively investigated for the elucidation of chemical information; a new discipline "chemometrics" has been established. Three different approaches have been used for computer-assisted interpretations of chemical data. 1. Heuristic methods try to formu late computer programs working in a similar way as a chemist would solve the problem. 2. Retrieval methods have been successfully used for library search (an unknown spectrum is compared with a spectral library). 3. Pattern recognition methods are especially useful for the classification of objects (substances, materials) into discrete classes on the basis of measured features. A set of characteristic features (e.g. a spectrum) of an object is considered as an abstract pattern that contains information about a not directly measurable property (e.g. molecular structure or biological activity) of the object. Pure pattern recognition methods try to find relationships between the pattern and the "obscure property" without using chemical knowledge or chemical prejudices. This book gives in the first part an introduction to some pattern recognition methods from a chemist's point of view. This introduction is by no means systematical or exhaustive but has been restricted to simple mathematical methods. No previous knowledge of pattern recog nition should be necessary for the reader. Chapter 1 gives some basi·c ideas of pattern recognition; this Chapter may be skipped by readers already familiar with pattern recog nition. Chapters 2 to 8 describe several pattern recognition methods with emphasis'on binary classifiers and methods already applied in chemistry. Each Chapter is preceded by an introductory part showing the principles of the method. These Chapters shouLd be readable in arbitrary sequence. Chapters 9 and 10 deal with preprocessing of original data and fea ture selection. These problems must be treated at the beginning of a pattern recognition application. These Chapters have not been positioned at the beginning of the text because a more detailed description of these subjects requires some basic knowledge of pattern recognition methods. Chapter 11 deals extensively with objective methods for evaluating pattern classifiers - a subject which has been often neglected in chemical applications of pattern recognition methods. After reading appropriate Chapters of the first part of this book a reader with some knowledge in computer programming should be able to apply simple versions of pattern recognition methods to actual problems without further studies. The second part of this book presents a description of reported applications in chemistry. The aim of these Chapters was completeness; however, the boundaries of chemistry are diffuse and the papers are spread over many journals. It was not always possible to judge the actual merit of pattern recognition methods in distinct fields of chemistry: conclusions should be drawn by the reader himseLf considering specific and varying demands and previous knowledge of actual classification problems. Chapter 12 gives an overview about pattern recognition applications in chemistry. Chapters 13 to 20 extensively describe applications in spectral analysis, chromatography, electrochemistry, material classifi cation, structure-activity--relationship research, clinical chemistry, environmental chemistry and classification of analytical methods. A comprehensive list of literature references, an author and subject index should facilitate a retrieval of detailed and original information about pattern recognition in chemistry. All cited literature except a few has been used in original form. Further developments on this field are necessary. It would be an honor for the author if this book was an aid to or stimulant of the reader. Vienna, April 1980 K. Varmuza Technical Re,marks This text has been typed by the author himself using a self-written simple text-editor-program running on a mass spectrometric data system. Therefore the author is responsible for all errors in the manuscript. For technical reasons vectors are denoted by over-lined characters; e.g. x is a vector with components x1, x2, ••• but not a mean value! The reference list and author list have been handled by a couple of self written Fortran programs running at the Computer Centre of the Technical University of Vienna. The author apologizes for some unusual notations. Acknowledgements Many thanks to my colleague and friend Dr. Heinz Rotter for stimulating and critical discussions. Also thanks to Harold Urban for the drawings. CON TEN T S Page Part A Introductiori to Some Pattern Recognition Methods 1. Basic Concepts 2 1.1. First Ideas of Pattern Recognition, 2 1.2. Pattern Space, 3 1.3. Binary Classifiers, 5 1.4. Training and Evaluation of Classifiers, 8 1.5. Additional Aspects, 11 1.6. Warning, 11 1.7. Applications of Pattern Recognition, 13 1.8. Literature, 14 1.8.1. General Pattern Recognition, 14 1.8.2. Pattern Recognition from the Chemist's Point of View, 15 2. Computation of Binary Classifiers 18 2.1. Classification by Distance Measurements to Centres of Gravity 18 2.1.1. Principle, 18 2.1.2. Centres of Gravity in a d-Dimensional Space, 19 2.1.3. Classification by Distance Measurements, 19 2.1.4. Classification by the Symmetry PLane, 21 2~1.5. Classification by Mean Vectors, 22 2.1.6. Evaluation, 23 2.1.7. PrOjection of Pattern Points on a Hypersphere, 24 2.1.8. Distance Measurements in Pattern Space (Overview), 25 2.1.9. Distance Measurements with Weighted Features (Generalized Distances), 27 2.1.10. ChemicaL AppLications, 29 2.2. Learning Machine 30 2.2.1. PrincipLe, 30 2.2.2. InitiaL Weight Vector, 32 2.2.3. Correction of the Weight Vector, 33 2.2.4. Methods of Training, 35 2.2.5. Restrictions, 37 2.2.6. EvaLuation, 38 2.2.7. Dead Zone Training, 39 2.2.8. ChemicaL AppLications, 41 VIII 2.3. Linear Regression (Least-Squares Classification) 42 2.3.1. Principle, 42 2.3.2. Mathematical Treatment, 43 2.3.3. Characteristics and Variations of the Method, 46 2.3.4. Chemical Applications, 48 2.4. Simplex Optimization of Classifiers 48 2.4.1. Principle, 48 2.4.2. Starting the Simplex, 50 2.4.3. Response Function, 51 2.4.4. Moving the Simplex, 52 2.4.5. Halting the Simplex, 55 2.4.6 •. Chemical Applications, 55 2.5. Piecewise-Linear Classifiers 56 2.6. Implementation of Binary Classifiers 57 2.6.1. Discrete or Continuous Response, 57 2.6.2. Classification by a Committee of Classifiers, 59 2.6.3. Multicategory Classification, 59 3. K - Nearest Neighbour Classification (KNN-Method) 62 3.1. Principle, 62 3.2. Maximum Risk of KNN-Classifications, 63 3.3. Characteristics and Variations of the KNN-Method, 64 3.4. Classification with Potential Functions, 65 3.5. KNN-Classification with a Condensed Data Set, 69 3.6. Chemical Applications, 69 4. Classification by Adaptive Networks 72 4.1. Perceptron, 72 4.2. Adaptive Digital Learning Network, 74 4.3. Chemical Applications, 75 5. Parametric Classification Methods 78 5.1. Principle, 78 5.2. Bayes- and Maximum Likelihood Classifiers, 78 5.3. Estimation of Probability Densities, 80 5.4. Bayes- and Maximum Likelihood Classifiers for Binary Encoded Patterns, 83 5.5~ A Simple Sequential Classification Method Based on Probability Densities, 85 5.6. Chemical Applications, 87 6. Modelling of Clusters 88 6.1. Principle, 88 6.2. Modelling by a Hypersphere, 88 6.3. SIMCA-Method, 89 7. Clustering Methods 92 7.1. Principle, 92 7.2. Hierarchical Clustering, 93 7.3. Minimal Spanning Tree Clustering, 95 7.4. Chemical Applications, 96 8. Display Methods 97 8.1. Principle, 97 8.2. Linear Methods; 97 8.3. Nonlinear Methods, 100 8.4. Chemical Applications, 101 ~. Preprocessing 102 9.1. Principle, 102 9.2. Scaling, 102 9.3. Weighting, 103 9.4. Transformation, 104 9.5. Combination of Features, 104 '0. Feature Selection 106 10.1. Principle, 106 10.2. Feature Selection by Using Data Statistics, 107 10.2.1. Variance Weighting, 107 10.2.2. Fisher Weighting, 108 10.2.3. Use of Probability Density Curves, 109 10.2.4. Methods for Binary Encoded Patterns, 110 10.2.5. Elimination of Correlations, 111 10.3. Feature Selection by Using Classification Results, 112 10.3.1. Evaluation of the Components of a Single Weight Vector, 112 10.3.2. Feature Selection with the Learning Machine, 112 10.3.3. Other Methods, 113 10.4. Number of Intrinsic Dimensions and Number of Patterns (n/d - Problem), 114 x 11. Evaluation of Classifiers 118 11.1. Principle, 118 11.2. Predictive Abilities, 119 11.3. Loss, 123 11.4. A posteriori Probabilities, 123 11.5. Terminology Problems, 127 11.6. Application of Information Theory, 129 11.6.1. Introduction to Information Theory, 129 11.6.2. Transinformation, 131 11.6.3. Figure of Merit, 136 11.7. Evaluation of Classifiers with Continuous Response, 136 11.8. Confidence of Predictive Abilities, 138 11.9. Comparison with the Capability of Chemists, 140 Part B Application of Pattern Recognition Methods in Chemistry 12. General Aspects of Pattern Recognition in Chemistry 142 13. Spectral Analysis 145 13.1. Mass Spectrometry, 145 13.1.1. Survey, 145 13.1.2. Representation of Mass Spectra as Pattern Vectors, 146 13.1.3. Determination of Molecular Formulas and Molecular Weights, 150 13.1.4. Recognition of Molecular Structures, 152 13.1.5. Chemical Interpretation of Mass Spectral Classifiers, 154 13.1.6. Simulation of Mass Spectra, 155 13.1.7. Miscellaneous, 156 13.2. Infrared Spectroscopy, 157 13.3. Raman Spectroscopy, 160 13.4. Nuclear Magnetic Resonance Spectroscopy, 161 13.5. Gamma-Ray Spectroscopy, 164 13.6. Combined Spectral Data, 165 14. Chromatography 166 14.1. Gas Chromatograp~y, 166 14.2. Thin Layer Chromatography, 168

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.