ebook img

Bioinformatics: Sequence alignment and Markov models PDF

338 Pages·2009·1.1 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bioinformatics: Sequence alignment and Markov models

Bioinformatics Sequence Alignment and Markov Models Kal Renganathan Sharma, Ph.D., P.E. Adjunct Professor Department of Chemical Engineering Prairie View A&M University Prairie View, Texas New York Chicago San Francisco Lisbon London Madrid Mexico City Milan New Delhi San Juan Seoul Singapore Sydney Toronto Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved. Manufactured in the United States of America. Except as permitted under the United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission of the publisher. 0-07-159307-1 The material in this eBook also appears in the print version of this title: 0-07-159306-3. All trademarks are trademarks of their respective owners. Rather than put a trademark symbol after every occurrence of a trademarked name, we use names in an editorial fashion only, and to the benefit of the trademark owner, with no intention of infringement of the trademark. Where such designations appear in this book, they have been printed with initial caps. McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. For more information, please contact George Hoare, Special Sales, at [email protected] or (212) 904-4069. TERMS OF USE This is a copyrighted work and The McGraw-Hill Companies, Inc. (“McGraw-Hill”) and its licen- sors reserve all rights in and to the work. Use of this work is subject to these terms. Except as per- mitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any part of it without McGraw-Hill’s prior consent. You may use the work for your own noncommercial and per- sonal use; any other use of the work is strictly prohibited. Your right to use the work may be termi- nated if you fail to comply with these terms. THE WORK IS PROVIDED “AS IS.” McGRAW-HILL AND ITS LICENSORS MAKE NO GUARANTEES OR WARRANTIES AS TO THE ACCURACY, ADEQUACY OR COMPLETE- NESS OF OR RESULTS TO BE OBTAINED FROM USING THE WORK, INCLUDING ANY INFORMATION THAT CAN BE ACCESSED THROUGH THE WORK VIA HYPERLINK OR OTHERWISE, AND EXPRESSLY DISCLAIM ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. McGraw-Hill and its licensors do not warrant or guarantee that the functions contained in the work will meet your requirements or that its operation will be uninterrupted or error free. Neither McGraw-Hill nor its licensors shall be liable to you or anyone else for any inaccuracy, error or omission, regardless of cause, in the work or for any dam- ages resulting therefrom. McGraw-Hill has no responsibility for the content of any information accessed through the work. Under no circumstances shall McGraw-Hill and/or its licensors be liable for any indirect, incidental, special, punitive, consequential or similar damages that result from the use of or inability to use the work, even if any of them has been advised of the possibili- ty of such damages. This limitation of liability shall apply to any claim or cause whatsoever whether such claim or cause arises in contract, tort or otherwise. DOI: 10.1036/0071593063 Professional Want to learn more? We hope you enjoy this McGraw-Hill eBook! If you’d like more information about this book, its author, or related books and websites, please click here. This work is dedicated to my son R. Hari Subrahmanyan Sharma (alias Ramkishan, born August 13, 2001) with unconditional love. About the Author Kal Renganathan Sharma,Ph.D., P.E., has written five books, 12 journal articles, and 448 conference papers. He has earned three degrees in chemical engineering— a B.Tech. from the Indian Institute of Technology, Chennai, and an M.S. and a Ph.D. from West Virginia University, Morgantown. He has held a number of high- level positions at engineering colleges and universities. Dr. Sharma currently teaches at Prairie View A&M University in Prairie View, Texas. Copyright © 2009 by The McGraw-Hill Companies, Inc. Click here for terms of use. For more information about this title, click here Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Molecular Biology . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Amino Acids and Proteins . . . . . . . 2 1.1.2 Structures of Proteins . . . . . . . . . . . . . . 3 1.1.3 Sequence Distribution of Insulin . . . . 6 1.1.4 Bioseparation Techniques . . . . . . . . . . 9 1.1.5 Nucleic Acids and Genetic Code . . . . 12 1.1.6 Genomes—Diversity, Size, and Structure . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.2 Probability and Statistics . . . . . . . . . . . . . . . . . 23 1.2.1 Three Defi nitions of Probability . . . . . 24 1.2.2 Bayes’ Theorem and Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . 25 1.2.3 Independent Events and Bernoulli’s Theorem . . . . . . . . . . . . . . . 25 1.2.4 Discrete Probability Distributions . . . 26 1.2.5 Continuous Probability Distributions . . . . . . . . . . . . . . . . . . . . . 28 1.2.6 Statistical Inference and Hypothesis Testing . . . . . . . . . . . . . . . . 30 1.3 Which Is Larger, 2n or n2? . . . . . . . . . . . . . . . . . 31 1.4 Big O Notation and Asymptotic Order of Functions . . . . . . . . . . . . . . . . . . . . . . . 32 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 References and Sources . . . . . . . . . . . . . . . . . . . . . . . . 34 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Part 1 Sequence Alignment and Representation 2 Alignment of a Pair of Sequences . . . . . . . . . . . . . . 41 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.1 Introduction to Pairwise Sequence Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2 Why Study Sequence Alignment . . . . . . . . . . 43 2.3 Alignment Grading Function . . . . . . . . . . . . . . 47 v vi Contents 2.4 Optimal Global Alignment of a Pair of Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.4.1 Needleman and Wunsch Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 51 2.5 Dynamic Programming . . . . . . . . . . . . . . . . . . 55 2.6 Time Analysis and Space Effi ciency . . . . . . . . 56 2.7 Dynamic Arrays and O(N) Space . . . . . . . . . . 56 2.8 Subquadratic Algorithms for Longest Common Subsequence Problems . . . . . . . . . . 57 2.9 Optimal Local Alignment of a Pair of Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.9.1 Smith and Waterman Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 59 2.10 Affi ne Gap Model . . . . . . . . . . . . . . . . . . . . . . . 60 2.11 Greedy Algorithms for Pairwise Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.12 Other Alignment Methods . . . . . . . . . . . . . . . . 65 2.13 Pam and Blosum Matrices . . . . . . . . . . . . . . . . 66 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3 Sequence Representation and String Algorithms. . . 85 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.1 Suffi x Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.1.1 Overview of Suffi x Trees in Sequence Analysis . . . . . . . . . . . . . 85 3.2 Algorithm for Suffi x Tree Representation of a Sequence . . . . . . . . . . . . . 88 3.3 Streaming a Sequence Against a Suffi x Tree . . . 89 3.4 String Algorithms . . . . . . . . . . . . . . . . . . . . . . . 91 3.4.1 Rabin-Karp Algorithm . . . . . . . . . . . . . 92 3.4.2 Knuth-Morris-Pratt (KMP) Algorithm . 92 3.4.3 Boyer-Moore Algorithm . . . . . . . . . . . 94 3.4.4 Finite Automaton . . . . . . . . . . . . . . . . . 96 3.5 Suffi x Trees in String Algorithms . . . . . . . . . . 97 3.6 Look-up Tables . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4 Multiple-Sequence Alignment . . . . . . . . . . . . . . . . . 115 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.1 What Is Multiple-Sequence Alignment? . . . . 115 Contents vii 4.2 Defenitions of Multiple Global Alignment and Sum of Pairs . . . . . . . . . . . . . . . . . . . . . . . . 117 4.2.1 Multiple Global Alignment . . . . . . . . . 117 4.2.2 Sum of Pairs . . . . . . . . . . . . . . . . . . . . . . 117 4.3 Optimal MSA by Dynamic Programming . . . 117 4.4 Theorem of Wang and Jiang [2] . . . . . . . . . . . . 118 4.5 What Are NP Complete Problems? . . . . . . . . . 118 4.6 Center-Star-Alignment Algorithm [4] . . . . . . 119 4.6.1 Time Analysis . . . . . . . . . . . . . . . . . . . . 119 4.7 Progressive Alignment Methods . . . . . . . . . . . 121 4.8 The Consensus Sequence . . . . . . . . . . . . . . . . . 122 4.9 Greedy Method . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.10 Geometry of Multiple Sequences . . . . . . . . . . 123 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Part 2 Probability Models 5 Hidden Markov Models and Applications . . . . . . . . 133 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.2 kth-order Markov Chain . . . . . . . . . . . . . . . . . . 134 5.3 DNA Sequence and Geometric Distribution [2–4] . . . . . . . . . . . . . . . . . . . . . . . . 135 5.4 Three Questions in the HMM . . . . . . . . . . . . . 143 5.5 Evaluation Problem and Forward Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.6 Decoding Problem and Viterbi Algorithm . . . 146 5.7 Relative Entropy . . . . . . . . . . . . . . . . . . . . . . . . 147 5.8 Probabilistic Approach to Phylogeny . . . . . . . 149 5.9 Sequence Alignment Using HMMs . . . . . . . . 152 5.10 Protein Families . . . . . . . . . . . . . . . . . . . . . . . . . 153 5.11 Wheel HMMs to Model Periodicity in DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 5.12 Generalized HMM (GHMM) . . . . . . . . . . . . . . 157 5.13 Database Mining . . . . . . . . . . . . . . . . . . . . . . . . 160 5.14 Multiple Alignments . . . . . . . . . . . . . . . . . . . . . 160 5.15 Classifi cation Using HMMs . . . . . . . . . . . . . . . 161 5.16 Signal Peptide and Signal Anchor Prediction by HMMs . . . . . . . . . . . . . . . . . . . . . 162 5.17 Markov Model and Chargaff's Parity Rules . 163 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 viii Contents 6 Gene Finding, Protein Secondary Structure . . . . . 179 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 6.2 Relative Entropy Site-Selection Problem . . . . 180 6.2.1 Greedy Approach . . . . . . . . . . . . . . . . . 180 6.2.2 Gibbs Sampler . . . . . . . . . . . . . . . . . . . . 181 6.3 Maximum-Subsequence Problem . . . . . . . . . . 182 6.3.1 Bates and Constable Algorithm . . . . . 182 6.3.2 Binomial Heap [4–7] . . . . . . . . . . . . . . . 182 6.4 Interpolated Markov Model (IMM) . . . . . . . . 184 6.5 Shine Dalgarno SD Sites Finding . . . . . . . . . . 185 6.6 Gene Annotation Methods . . . . . . . . . . . . . . . . 187 6.7 Secondary Structures of Proteins . . . . . . . . . . 191 6.7.1 Neural Networks . . . . . . . . . . . . . . . . . 193 6.7.2 PHD Architecture of Rost and Sander . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 6.7.3 Ensemble Method of Riis and Krogh [23] . . . . . . . . . . . . . . . . . . . . . . . 198 6.7.4 Protein Secondary Structure Using HMMs . . . . . . . . . . . . . . . . . . . . . 199 6.7.5 DAG RNNs: Directed Acyclic Graphs and Recursive NN Architecture and 3D Protein Structure Prediction . . . . . . . . . . . . . . 200 6.7.6 Annotate Subcellular Localization for Protein Structure . . . . . . . . . . . . . . . 201 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Part 3 Measurement Techniques 7 Biochips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 7.1.1 Microarrays, Biochips, and Disease . . . . . . . . . . . . . . . . . . . . . . . . . . 214 7.1.2 Five Steps and Ten Tips . . . . . . . . . . . . 218 7.1.3 Applications of Microarrays . . . . . . . . 220 7.2 Microarray Detection . . . . . . . . . . . . . . . . . . . . 223 7.2.1 Fluorescence Detection and Optical Requirements . . . . . . . . . . . . . . 223 7.2.2 Confocal Scanning Microscope . . . . . 224 7.3 Microarray Surfaces . . . . . . . . . . . . . . . . . . . . . 227 7.4 Phosphoramadite Synthesis . . . . . . . . . . . . . . . 231

Description:
GET FULLY UP-TO-DATE ON BIOINFORMATICS-THE TECHNOLOGY OF THE 21ST CENTURY Bioinformatics showcases the latest developments in the field along with all the foundational information you'll need. It provides in-depth coverage of a wide range of autoimmune disorders and detailed analyses of suffix trees
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.