ebook img

Clustering in Bioinformatics and Drug Discovery PDF

235 Pages·2010·5.811 MB·\235
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Clustering in Bioinformatics and Drug Discovery

Clustering in Bioinformatics and Drug Discovery CHAPMAN & HALL/CRC Mathematical and Computational Biology Series Aims and scope: This series aims to capture new developments and summarize what is known over the entire spectrum of mathematical and computational biology and medicine. It seeks to encourage the integration of mathematical, statistical, and computational methods into biology by publishing a broad range of textbooks, reference works, and handbooks. The titles included in the series are meant to appeal to students, researchers, and professionals in the mathematical, statistical and computational sciences, fundamental biology and bioengineering, as well as interdisciplinary researchers involved in the field. The inclusion of concrete examples and applications, and programming techniques and examples, is highly encouraged. Series Editors N. F. Britton Department of Mathematical Sciences University of Bath Xihong Lin Department of Biostatistics Harvard University Hershel M. Safer Maria Victoria Schneider European Bioinformatics Institute Mona Singh Department of Computer Science Princeton University Anna Tramontano Department of Biochemical Sciences University of Rome La Sapienza Proposals for the series should be submitted to one of the series editors above or directly to: CRC Press, Taylor & Francis Group 4th, Floor, Albert House 1-4 Singer Street London EC2A 4BQ UK Published Titles Algorithms in Bioinformatics: A Practical Introduction to Bioinformatics Introduction Anna Tramontano Wing-Kin Sung Introduction to Computational Proteomics Bioinformatics: A Practical Approach Golan Yona Shui Qing Ye An Introduction to Systems Biology: Biological Sequence Analysis Using the Design Principles of Biological Circuits SeqAn C++ Library Uri Alon Andreas Gogol-Döring and Knut Reinert Kinetic Modelling in Systems Biology Cancer Modelling and Simulation Oleg Demin and Igor Goryanin Luigi Preziosi Knowledge Discovery in Proteomics Cancer Systems Biology Igor Jurisica and Dennis Wigle Edwin Wang Meta-analysis and Combining Information in Cell Mechanics: From Single Scale-Based Genetics and Genomics Models to Multiscale Modeling Rudy Guerra and Darlene R. Goldstein Arnaud Chauvière, Luigi Preziosi, Modeling and Simulation of Capsules and and Claude Verdier Biological Cells Clustering in Bioinformatics and Drug Discovery C. Pozrikidis John D. MacCuish and Norah E. MacCuish Niche Modeling: Predictions from Combinatorial Pattern Matching Algorithms Statistical Distributions in Computational Biology Using Perl and R David Stockwell Gabriel Valiente Normal Mode Analysis: Theory and Applications Computational Biology: A Statistical to Biological and Chemical Systems Mechanics Perspective Qiang Cui and Ivet Bahar Ralf Blossey Optimal Control Applied to Biological Computational Hydrodynamics of Capsules Models and Biological Cells Suzanne Lenhart and John T. Workman C. Pozrikidis Pattern Discovery in Bioinformatics: Computational Neuroscience: Theory & Algorithms A Comprehensive Approach Laxmi Parida Jianfeng Feng Python for Bioinformatics Data Analysis Tools for DNA Microarrays Sebastian Bassi Sorin Draghici Spatial Ecology Differential Equations and Mathematical Stephen Cantrell, Chris Cosner, and Biology, Second Edition Shigui Ruan D.S. Jones, M.J. Plank, and B.D. Sleeman Spatiotemporal Patterns in Ecology Engineering Genetic Circuits and Epidemiology: Theory, Models, Chris J. Myers and Simulation Horst Malchow, Sergei V. Petrovskii, and Exactly Solvable Models of Biological Ezio Venturino Invasion Sergei V. Petrovskii and Bai-Lian Li Stochastic Modelling for Systems Biology Darren J. Wilkinson Gene Expression Studies Using Affymetrix Microarrays Structural Bioinformatics: An Algorithmic Hinrich Göhlmann and Willem Talloen Approach Forbes J. Burkowski Glycome Informatics: Methods and Applications The Ten Most Wanted Solutions in Protein Kiyoko F. Aoki-Kinoshita Bioinformatics Anna Tramontano Handbook of Hidden Markov Models in Bioinformatics Martin Gollery Clustering in Bioinformatics and Drug Discovery John D. MacCuish Norah E. MacCuish CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2011 by Taylor and Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-13: 978-1-4398-1679-0 (Ebook-PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, micro- filming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www. copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750- 8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identi- fication and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents List of Figures List of Tables Preface Acknowledgments About the Authors List of Symbols Foreword 1 Introduction 1 1.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 Bioinformatics and Drug Discovery . . . . . . . . . . . . . . 13 1.3 Statistical Learning Theory and Exploratory Data Analysis . 16 1.4 Clustering Algorithms . . . . . . . . . . . . . . . . . . . . . . 17 1.5 Computational Complexity . . . . . . . . . . . . . . . . . . . 18 1.5.1 Data Structures. . . . . . . . . . . . . . . . . . . . . . 21 1.5.2 Parallel Algorithms. . . . . . . . . . . . . . . . . . . . 23 1.6 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2 Data 27 2.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.1.1 Binary Data . . . . . . . . . . . . . . . . . . . . . . . 28 2.1.2 Count Data . . . . . . . . . . . . . . . . . . . . . . . . 31 2.1.3 Continuous Data . . . . . . . . . . . . . . . . . . . . . 32 2.1.4 Categorical Data . . . . . . . . . . . . . . . . . . . . . 32 2.1.5 Mixed Type Data . . . . . . . . . . . . . . . . . . . . 32 2.2 Normalization and Scaling . . . . . . . . . . . . . . . . . . . 33 2.3 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4 Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5 Data Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.6 Measures of Similarity . . . . . . . . . . . . . . . . . . . . . . 37 2.6.1 Binary Data Measures . . . . . . . . . . . . . . . . . . 38 2.6.1.1 The Tanimoto or Jaccard Measure . . . . . . 39 2.6.1.2 The Baroni-Urbani/BuserCoefficient . . . . 41 2.6.1.3 The Simple Matching Coefficient . . . . . . . 42 2.6.1.4 The Binary Euclidean Measure . . . . . . . . 46 2.6.1.5 The Binary Cosine or Ochai Measure . . . . 46 2.6.1.6 The Hamann Measure . . . . . . . . . . . . . 47 2.6.1.7 Other Binary Measures . . . . . . . . . . . . 47 2.6.2 Count Data Measures . . . . . . . . . . . . . . . . . . 49 2.6.2.1 The Tanimoto Count Measure . . . . . . . . 49 2.6.2.2 The Cosine Count Measure . . . . . . . . . . 50 2.6.3 Continuous Data Measures . . . . . . . . . . . . . . . 50 2.6.3.1 Continuous andWeighted Forms of Euclidean Distance . . . . . . . . . . . . . . . . . . . . 50 2.6.3.2 Manhattan Distance . . . . . . . . . . . . . . 51 2.6.3.3 L∞ or Supremum Norm . . . . . . . . . . . . 51 2.6.3.4 Cosine. . . . . . . . . . . . . . . . . . . . . . 51 2.6.3.5 PearsonCorrelationCoefficient . . . . . . . . 52 2.6.3.6 Mahalanobis Distance . . . . . . . . . . . . . 52 2.6.4 Mixed Type Data . . . . . . . . . . . . . . . . . . . . 53 2.7 Proximity Matrices . . . . . . . . . . . . . . . . . . . . . . . 53 2.8 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . 53 2.8.1 Asymmetric Matrices . . . . . . . . . . . . . . . . . . 55 2.8.2 Hadamard Product of Two Matrices . . . . . . . . . . 56 2.8.3 Ultrametricity . . . . . . . . . . . . . . . . . . . . . . 57 2.8.4 Positive Semidefinite Matrices. . . . . . . . . . . . . . 57 2.9 Dimensionality, Components, Discriminants . . . . . . . . . . 57 2.9.1 Principal Component Analysis (PCA) . . . . . . . . . 58 2.9.1.1 Covariance Matrix . . . . . . . . . . . . . . . 58 2.9.1.2 Singular Value Decomposition . . . . . . . . 59 2.9.2 Non-Negative Matrix Factorization . . . . . . . . . . . 59 2.9.3 Multidimensional Scaling . . . . . . . . . . . . . . . . 59 2.9.4 Discriminants . . . . . . . . . . . . . . . . . . . . . . . 60 2.9.4.1 Fisher’s Linear Discriminant . . . . . . . . . 60 2.10 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.11 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 2.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3 Clustering Forms 69 3.1 Partitional . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.2 Hierarchical . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.2.1 Dendrograms and Heatmaps . . . . . . . . . . . . . . 73 3.3 Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.4 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.5 Overlapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.6 Fuzzy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.7 Self-Organizing . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.8 Hybrids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.9 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4 Partitional Algorithms 91 4.1 K-Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.1.1 K-Medoid . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.1.2 K-Modes . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.1.3 Online K-Means . . . . . . . . . . . . . . . . . . . . . 94 4.2 Jarvis-Patrick . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.3 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . 98 4.4 Self-Organizing Maps . . . . . . . . . . . . . . . . . . . . . . 99 4.5 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5 Cluster Sampling Algorithms 103 5.1 Leader Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 103 5.2 Taylor-Butina Algorithm . . . . . . . . . . . . . . . . . . . . 107 5.3 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6 Hierarchical Algorithms 113 6.1 Agglomerative . . . . . . . . . . . . . . . . . . . . . . . . . . 113 6.1.1 Reciprocal Nearest Neighbors Class of Algorithms . . 114 6.1.1.1 Complete Link . . . . . . . . . . . . . . . . . 114 6.1.1.2 Group Average . . . . . . . . . . . . . . . . . 114 6.1.1.3 Wards . . . . . . . . . . . . . . . . . . . . . . 115 6.1.2 Others . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.2 Divisive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.3 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7 Hybrid Algorithms 121 7.1 Self-Organizing Tree Algorithm . . . . . . . . . . . . . . . . 121 7.2 Divisive HierarchicalK-Means . . . . . . . . . . . . . . . . . 123 7.3 Exclusion Region Hierarchies . . . . . . . . . . . . . . . . . . 124 7.4 Biclustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.5 Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.