ebook img

Isoelectric point prediction from the amino acid sequence of a protein PDF

74 Pages·2016·8.91 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Isoelectric point prediction from the amino acid sequence of a protein

RRoocchheesstteerr IInnssttiittuuttee ooff TTeecchhnnoollooggyy RRIITT SScchhoollaarr WWoorrkkss Theses Summer 2005 IIssooeelleeccttrriicc ppooiinntt pprreeddiiccttiioonn ffrroomm tthhee aammiinnoo aacciidd sseeqquueennccee ooff aa pprrootteeiinn Matthew Conte Follow this and additional works at: https://scholarworks.rit.edu/theses RReeccoommmmeennddeedd CCiittaattiioonn Conte, Matthew, "Isoelectric point prediction from the amino acid sequence of a protein" (2005). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. THESIS ISOELECTRIC POINT PREDICTION FROM THE AMINO ACID SEQUENCE OF A PROTEIN Submitted by Matthew Conte Department of Biological Sciences In partial fulfillment of the requirements For the Master of Science degree in Bioinformatics at Rochester Institute ofTechnology Summer 2005 -~­ nIQlnformatlcs ~luT Rochester Institute of Technology Department of Biological Sciences Bioinformatics Program To: Head, Department of Biological Sciences The undersigned state that __. ..!...M----=.!~~·: ....:~.....!\--.....!h~~~v..\.J ~~A....!........~C:!z<.loooO~Vl-"-!e..LJo...---­ (Student Name) __- -:-:::---:-----:-:---_-:--__' a candidate for the Master of Science degree in (Student Number) Bioinformatics, has submitted his/her thesis and has satisfactorily defended it. This completes the requirements for the Master of Science degree in Bioinformatics at Rochester Institute of Technology. Thesis committee members: Name Date Gary R. Skuse (Committee Chair) Paul A. Craig (Thesis Advisor) Name Illegible Douglas P. Merrill 475-2532 (voice) [email protected] Thesis/Dissertation Author Permission Statement Title of thesis or dissertation: _______________________ k A~HhLw (0/1 Name of auth0J. Degree: ~ "'S~ Program: --~G~;~o~M~f9~C-M-~~I.-.-s---------------------- ,e. College: Sc.iC .. I understand that I must submit a print copy of my thesis or dissertation to the RIT Archi ves, per current RIT guidelines for the completion of my degree. I hereby grant to the Rochester Institute of Technology and its agents the non-exclusive license to archive and make accessible my thesis or dissertation in whole or in part in all forms of media in perpetuity. I retain all other ownership rights to the copyright of the thesis or dissertation. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. Print Reproduction Permission Granted: &t+kw It. , I, ~ hereby grant permission to the Rochester Institute Technology to reproduce my print thesis or dissertation in whole or in part. Any reproduction will not be for commercial use or profit. Matthew Conte Cf- OJ.. -J..065 Date: Signature of Author: Print Reproduction Permission Denied: 1, , hereby deny permission to the RIT Library of the Rochester Institute of Technology to reproduce my print thesis or dissertation in whole or in part. Signature of Author: __________________ Date: ----- Inclusion in the RIT Digital Media Library Electronic Thesis & Dissertation (ETD) Archive I, ' additionally grant to the Rochester Institute of Technology Digital Media Library (RIT DML) the non-exclusive license to archive and provide electronic access to my thesis or dissertation in whole or in part in all forms of media in perpetuity. I understand that my work, in addition to its bibliographic record and abstract, will be available to the world-wide community of scholars and researchers through the RIT DML. I retain all other ownership rights to the copyright of the thesis or dissertation. I also retain th.: right to use in future works (such as articles or books) all or part of this thesis or dissertation. I am aware that the Rochester Institute of Technology does not require registration of copyright for ETDs. I hereby certify that, if appropriate, I have obtained and attached written permission statements from the owners of each third party copyrighted matter to be included in my thesis or dissertation. I certify that the version I submitted is the same as that approved by my committee. Signature of Author: _____________ Date: _____ Abstract Proteins often do not migrate as expected in two dimensional electrophoresis based on their primary sequence. The predicted isoelectric point (pi) frequently does not coincide with experimental pivalues obtained in the laboratory. The reasons forthese differences led to this study. Initially, 2DE data from theE. coliproteome was collected and formatted. This dataset was split into three parts each consisting ofdifferent levels of pi discrepancy (Apl). The protein sequence data for each Apl subsetwas run through a pipeline. At each stage ofthepipeline the datawere analyzedbycomparingeach ofthe three Apl subsets to one another. The pipeline consisted ofanaive approach(considering individual amino acid frequencies), followedbythe application fourdifferent alphabets to represent sequences in a simplerwayby grouping similaramino acids based ontheir charge, functional, chemical, andhydrophobic properties . The final step inthe pipeline involved investigating the dipeptides ofall ofthese sequences usingboth the 20 amino acid alphabet andthe simplifiedgroupings. An evaluation ofthe alphabet dipeptide analysis demonstrated the existence ofcertain dipeptide sequences which correlate well with differences betweenpredictedpi and experimental pi. in Table of Contents 1 Introduction 1 2 Methods 7 2.1 Forming the data set 7 2.2 Experimental and predicted pi values 9 2.3 Extracting useful information from collected subset 10 sequences 2.2.1 Amino acid frequencyanalysis (naive approach) ... 10 2.2.2 Frequencyofamino acids (alphabetsapproach) ... 1 1 2.2.3 Frequencyofaminoacids (dipeptide approach) ... 14 2.2.4 Pipelineworkflow 15 3 Results 18 3.1 Naive approach 18 3.2 Alphabets approach 19 3.2.1 Charge 19 3.2.2 Chemical 21 3.2.3 Functional 22 3.2.4 Hydrophobic 23 3.3 Dipeptide approach 24 3.4 Dipeptide threshold 26 3.5 Dipeptide using alphabets 28 3.5.1 Charge 28 3.5.2 Chemical 29 3.5.3 Functional 31 3.5.4 Hydrophobic 32 4 Discussion 34 5 Conclusions 42 6 References 44 Introduction Two-dimensional gel electrophoresis (2DE) has been an important laboratory technique forthe field ofproteomics for overtwo decades. 2DE allows the researcherto separate and identifythousands ofproteins from a cellularextract in a single experiment. 2DE is difficult and time consuming as it is necessary to determine ideal initial conditions, wait forresults, andpossibly change conditions afterthat (1). In addition, reproducibility ofgels and comparison of2DE results between separate groups has proved difficult (1). In 2DE, proteins are separatedinthe first dimensionbytheir isoelectric points (the pH atwhichthe net charge ofthe protein is zero) and inthe second dimension bytheir molecularweights. The accurate prediction ofprotein isoelectric point (pi) andmolecular weight (MW) using simplythe amino acid sequence ofthe protein wouldbe extremelyvaluableto researchers who usetwo-dimensional gel electrophoresis. Computationalprocedures forcalculating andpredictingthe pi fromthe amino acid composition ofaproteinbased onthe dissociationconstants ofthe charged groups withintheproteinhavebeen developed(2-8). The accuracy ofthese algorithms is limitedbythe certaintyofthe values forthe dissociations constants andby microenvironmental effects such as charge-charge interactions andpost-translational modifications. To systematicallyexplore the relationship betweenpi, molecularweight and protein sequence, adata set ofproteins was collected and organized from amodel organism. TheEscherichia coliproteome was chosen since it contains fewpost- translational modifications suchas methylation, acylation, gylcosylation, or 1 phosphorylationwhich canalterthe pI/MW; the presence ofthese modifications makes pI/MW predictions much more difficult since the modifications in the proteins maycause themto migrate to aposition on a 2-D gel that is quite differentthan what is predicted based solely on the amino acid sequence ofthe protein. E. coli is also one ofthe best characterizedprokaryotes andmuch more databeyondsimplythe protein sequence for eachprotein is widely available for it. Atthis point it is necessaryto considerthebasic structural features ofproteins and the role ofindividual amino acids inthe structure and function ofproteins. Figure 1 below shows the structure ofthe 20 amino acids with side chain structures shown inred (10). The charge on all proteins arises from some ofthe amino acid side chains, as well as the carboxy- and amino-termini, someprosthetic groups, andbound ions. Ourpi predictiontool (11) is designedto calculate charge based on the side chains and carboxy- and amino-termini. The charge on amino acid side chains depends onthe pH ofthe solutionandthepKa ofthe side chains. It is also affectedbythe localized environment around a side chain. Our current calculationmodel uses the followingpK.A values for ionizable groups onthe protein and does notmake any adjustments to the pKAvalues of the side chains regardless oftheir environmentwithinthe protein (Table 1). We also assume thatthe separation is based onthe total charge on theprotein, notthe mass-to- charge ratios. H H H H H 1 1 P 1 P 1 P 1 P .0 H3N+-aC1 -ceVj H3N+ -ac1 - cXeP H3N+-aC1 -cxe'o H3N+-a1 c-cX'eP H3N+ -aC1 C^XeP (CH2)3 CH2 CH2 CH2 CH2 11 1 1 h1 w NH CH2 t^ 1 1 KJ y H C=NH2 C=0 1 | OH NH2 NH2 Phenylalanine Tyrosine Tryptophan (Phe/F) (Tyr/Y) (Trp.W) Arginine Glutamine (Arg/R) (Gln/Q) H H H H 1 p 1 P 1 P H3N+ -aC1 - C^/SP H1 /> H3N+ -ac1 c'XeP H3N+-aC1 -Cp"P H3N+ -ttC1 -CX>P 1 xo H3N+ -Mc - ce CH3 / rcH2 CH2 XC (CH2)4 1 HN 1 ,N | H OH NH2 Glycine Alanine Histidine Serine Lysine (Gly/G) (Ala/A) (His/H) (Ser/S) (Lys/K) H H H H H2 1 P 1 P 1 P 1 P C H3N+ -^C-C^e H3N+ -aC - C*e H3N+ -aC - CS H3N+ -aC-Ce 1 XP 1 XP 1 ^P 1 XP CH3 CH2 H-C-OH CH2 \ / P ' 1 1 1 1 H2N+ -aC - Ce CH2 COOH CH3 SH 0 Proline 1 (Pro/P) COOH GlutamicAcid AsparticAcid Threonine Cysteine jl (Glu/E) (Asp/D) (Thr/T) (Cys/C) 1 yP H3N+ - ac - c -e H H H H XC 1 1 P 1 P 1 /P 1 p CH2 HsN+^c-c'e H,N+ -*c - ce H3N+ -"C CS H3N+ -aC-Cve 1 1 XP 1 XP 1 XP 1 ^P CH2 CH2 CH2 HC-CH3 CH 1 1 1 1 S CH c=o CH2 CfH3 CH3 1 P\ 1 1 CH3 CH3 CH3 NH2 CH3 Methionine Leucine Asparagine Isoleucine Valine (Met/M) (Leu/L) (Asn/N) (He/1) (Val/V) Figure 1. Structures ofamino acids with side chains shown in red, carboxylate groups in green, and amino groups inblue (10). The charge onthe protein is the sum ofthe charges onthe individual amino acid side chains. However, the charge on individual amino acid side chains canvarywhen they are near a group ofnon-polarorhighlycharged side chains. For example the normal pKa for glutamic acid is about4.1. In lysozyme, two glutamic acid residues are in the active site. One is in apolar environment andhas anormal pKA value. The other glutamate side chain is in ahydrophobic environment, where a negative charge is energeticallyunfavorable. Therefore the pKA value forthis glutamate side chain increases, whichthen decreases the extent ofthe deprotonation ofthat side chain. This is veryimportant in the mechanism oflysozyme activity, whichrequires that one ofthe side chains be charged (deprotonated) andthe otherbe uncharged (protonated) atthe same time. In a second example, the serine inthe active sites ofserine proteases has amuch different acid-basebehaviorthan other serines normallyfound inproteins (9). The normal pKA value forthe hydroxyl group onthe serine side chain is greaterthan 15, meaningthatthis group is not found in an ionized state inmostproteins. In serine proteases, the interaction ofthe active site serine withnearbyhistidine and aspartate side chains (the so-called catalytic triad) leads to the ionization ofthe serine hydroxyl group. Meanwhile, the pKAvalue is reduced from about 15 to avalue closerto 7 or 8. This examplemakes it clearthat the microenvironment ofan individual amino acid side chain can change it ionizationbehavior. Other effects on the pKA ofan amino acid side chain canbe seenwhen certain amino acids arepositionednextto each other. For example, atypical Arginine residue which is basic will have apKA ofabout 12.5 (Table 1 below) and carrya full +1 charge inthe physiological pH range. However, when two ofthese basic Arginine residues are adjacent in aprotein sequence the pKA values will decrease, due to repulsionbetweenthe twopositive charges. This reduction inpKAvalue, inturn, will cause one orboth ofthe

Description:
point (pi) and molecular weight (MW) using simply the amino acid Computational procedures for calculating and predicting the pi from the amino.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.