ebook img

State-of-the-Art Protein Secondary-Structure Prediction Using a Novel Two-Stage Alignment and ... PDF

113 Pages·2008·1.63 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview State-of-the-Art Protein Secondary-Structure Prediction Using a Novel Two-Stage Alignment and ...

STATE-OF-THE-ART PROTEIN SECONDARY-STRUCTURE PREDICTION USING A NOVEL TWO-STAGE ALIGNMENT AND MACHINE-LEARNING METHOD By AMI M. GATES A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2008 1 © 2008 Ami M. Gates 2 To My Family and Friends 3 ACKNOWLEDGMENTS I would like to dedicate this overwhelming moment to my loving and supportive family, and to my wonderful friends. I would like to thank my parents, Eileen and Myke, who always supported my goals and listened endlessly; my brother Josh, who offered continuous encouragement; and my late brother Chad, whose last words to me were “PhD”. I would like to thank my dear friends Amos, Karina, Jesse, Neko, and Nathan for standing by me, and I would like to thank my committee chair, Arunava Banerjee, who always believed in me. 4 TABLE OF CONTENTS page ACKNOWLEDGMENTS ...............................................................................................................4 LIST OF TABLES ...........................................................................................................................8 LIST OF FIGURES .........................................................................................................................9 ABSTRACT ...................................................................................................................................11 CHAPTER 1 INTRODUCTION ..................................................................................................................13 Introduction .............................................................................................................................13 Proteins ...................................................................................................................................13 Protein Secondary Structure ...................................................................................................14 Machine Learning and Protein Secondary Structure Prediction .............................................15 Protein Secondary Structure Prediction Methods ...................................................................16 Dynamic Alignment-Based Protein Window-SVM Integrated Prediction for Three State Protein Secondary Structure ................................................................................................17 Overview .................................................................................................................................17 2 REVIEW OF THE BIOLOGY OF PROTEINS .....................................................................19 Brief Biology of Proteins ........................................................................................................19 From DNA to Protein .............................................................................................................19 Protein and Amino Acids ........................................................................................................20 Protein Folding .......................................................................................................................21 Secondary Structure ................................................................................................................22 Protein Evolution and Sequence Conservation .......................................................................23 3 LITERATURE REVIEW .......................................................................................................31 Problem of Secondary Structure Prediction ...........................................................................31 Literature Review of Secondary Structure Prediction ............................................................32 Methods Preceding 1993 .................................................................................................32 Methods Proceeding 1993 ...............................................................................................36 Neural network methods from 1993 – 2007 .............................................................37 Summary of neural network based methods ............................................................40 Support Vector Machine Methods from 2001 – 2007 .....................................................40 Summary of SVM Based Methods ..................................................................................42 Combined or Meta Methods ............................................................................................42 Direct Homology Based Methods ...................................................................................43 5 4 MATERIALS AND METHODS ...........................................................................................45 Introduction .............................................................................................................................45 Protein Data and Databanks ....................................................................................................45 Datasets ...................................................................................................................................46 Protein Identity, Similarity, and Homology ...........................................................................48 Multiple Sequence Alignment and PSI-BLAST .....................................................................50 Basic Local Alignment Search Tool (BLAST) Algorithm ..............................................52 BLAST: step 1 ..........................................................................................................53 BLAST: step 2 ..........................................................................................................54 BLAST: step 3 ..........................................................................................................54 Position-Specific Iterative BLAST (PSI-BLAST) Algorithm ........................................54 Creating the PSSM ...................................................................................................55 Summary of PSI-BLAST .........................................................................................57 Input Vectors and Sliding Windows .......................................................................................57 Accuracy Measures .................................................................................................................58 Machine Learning Techniques ...............................................................................................59 Support Vector Machines ................................................................................................59 Using SVMs in Secondary Structure Prediction .............................................................63 Neural Networks .....................................................................................................................63 Information Theory and Prediction ........................................................................................64 5 NEW SECONDARY STRUCTURE PREDICTION METHOD DARWIN .........................72 Dynamic Alignment-Based Protein Window-SVM Integrated Prediction for Three State Protein Secondary Structure: A New Prediction Server. ....................................................72 Introduction and Motivation of DARWIN .............................................................................73 Methods and Algorithms used in DARWIN ..........................................................................75 Phases of DARWIN: Stage 1 ..........................................................................................76 Phase 1 ......................................................................................................................76 Phase 2a: If at least one viable template is found: ...................................................77 Phase 2b: If no viable template is found: .................................................................78 Phase 3 ......................................................................................................................79 Phases of DARWIN Stage 2: Fixed-Size Fragment Analysis .........................................79 Fragment size selection ............................................................................................80 Step 1 ........................................................................................................................80 Step 2 ........................................................................................................................80 Step 3 ........................................................................................................................81 Ensemble of Support Vector Machines in DARWIN.............................................................82 The SVM Kernel and Equation .......................................................................................82 Training the SVM and Using PSI-BLAST Profiles ........................................................83 Datasets and Measures of Accuracy for DARWIN ................................................................84 Experiments, Measures, and Results ......................................................................................86 Conclusions on DARWIN ......................................................................................................90 6 6 DARWIN WEB SERVER ......................................................................................................94 Introduction .............................................................................................................................94 Using the Server .....................................................................................................................94 Design of the DARWIN Web Service ....................................................................................96 7 DISCUSSION AND CONCLUSION ..................................................................................102 Introduction ...........................................................................................................................102 Protein Secondary Structure Prediction Progress .................................................................102 Strength of DARWIN ...........................................................................................................104 Future Work and Improvements ...........................................................................................105 LIST OF REFERENCES .............................................................................................................106 BIOGRAPHICAL SKETCH .......................................................................................................113 7 LIST OF TABLES Table page 5-1 Detailed average prediction results for DARWIN. ............................................................92 5-2 Average prediction results for dataset EVA5 for DARWIN compared to top published indirect homology method results. ....................................................................92 5-3. Average prediction results for dataset EVA6 for DARWIN compared to top published indirect homology method results. ....................................................................92 8 LIST OF FIGURES Figure page 2-1 Simplification of the processes of transcription and translation.. ......................................24 2-2 Once a polypeptide is created through the process of translation, it is released into the cytosol and is known as the primary or linear sequence.. ............................................25 2-3 The 20 known amino acids. Adapted from Voet and Voet, 2005. .....................................26 2-4 Torsion angles phi and psi that offer rotational flexibility between amino acid peptide bonds. Adapted from Voet and Voet, 2005. ..........................................................27 2-5. Ramachandran Plot for a set of three alanine amino acids joined as a tripeptide. .............28 2-6 Example of a helical protein secondary structure. The hydrogen bonds are denoted with dashed lines ................................................................................................................29 2-7 Sheet protein secondary structure, with hydrogen bonds noted with dashed lines. Adapted from Voet and Voet, 2005. ..................................................................................30 3-1 Example of a linear sequence of amino acids, each accompanied by a secondary structure label of H, C, or E. ..............................................................................................44 4-1 Protein Data Bank (PDB) website. This area is a repository for known protein structures and related protein information. ........................................................................66 4-2 Matrix known as BLOSUM 62, a similarity matrix derived from small local blocks of aligned sequences that share at least 62% identity ........................................................67 4-3 Example of a PSI-BLAST generated alignment between a query protein and a subject protein.. ..................................................................................................................67 4-4 Example of a PSI-BLAST generated position specific scoring matrix (PSSM). ..............68 4-5 Example of the BLAST algorithm. A given query protein is analyzed by looking at all three amino acid word sets ............................................................................................69 4-6 Visual example of the production of input vectors that can be used to train and test machine learning constructs.. .............................................................................................70 4-7 Visual example of decision boundary between two classes and the margin that is maximized. .........................................................................................................................71 5-1 The PSI-BLAST example alignment portion. Several areas in a given alignment can result in missing information.. ...........................................................................................93 9 5-2 Histogram for each dataset, EVA5 and EVA6 displays the percentage of proteins predicted by DARWIN with given accuracy.. ...................................................................93 6-1 Image of the DARWIN Web page that allows Internet based graphical user interface with the DARWIN service. ..............................................................................................101 10

Description:
5 NEW SECONDARY STRUCTURE PREDICTION METHOD DARWIN . Barton,1999; Jones, 1999; Rost and Sander, 1993) or a support vector .. from Voet,D., and Voet,J. (2005) Biochemistry, Third Addition, Wiley Higher.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.