ebook img

(NAS Colloquium) Computational Biomolecular Science PDF

116 Pages·1998·3.72 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview (NAS Colloquium) Computational Biomolecular Science

i COLLOQUIUM ON COMPUTATIONAL BIOMOLECULAR SCIENCE NATIONAL ACADEMY OF SCIENCES WASHINGTON, D.C. 1998 About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution. NATIONAL ACADEMY OF SCIENCES ii NATIONAL ACADEMY OF SCIENCES Colloquium Series In 1991, the National Academy of Sciences inaugurated a series of scientific colloquia, five or six of which are scheduled each year under the guidance of the NAS Council’s Committee on Scientific Programs. Each colloquium addresses a scientific topic of broad and topical interest, cutting across two or more of the traditional disciplines. Typically two days long, colloquia are international in scope and bring together leading scientists in the field. Papers from colloquia are published in About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution. COMPLETED NAS COLLOQUIA iii COMPLETED NAS COLLOQUIA (1991 TO PRESENT) Industrial Ecology Self Defense by Plants: Induction and Signaling Pathways May 20–21, 1991; Washington, D.C. September 15–17, 1994; Irvine, California Organizer: C.Kumar N.Patel Organizers: André Jagendorf, Clarence Ryan Proceedings: February 4, 1992 Proceedings: May 9, 1995 Images of Science: Science of Images Earthquake Prediction January 13–14, 1992; Washington, D.C. February 10–11, 1995; Irvine, California Organizer: Albert Crewe Organizer: Leon Knopoff Proceedings: November 3, 1993 Proceedings: April 30, 1996 Physical Cosmology Quasars and Active Galaxies: High Resolution Radio Imaging March 27–29, 1992; Irvine, California March 24–25, 1995; Irvine, California Organizer: David Schramm Organizers: Marshall Cohen, Kenneth Kellerman Proceedings: June 3, 1993 Proceedings: December 5, 1995 Molecular Recognition Vision: From Photon to Perception September 10–11, 1992; Washington, D.C. May 21–22, 1995; Irvine, California Organizer: Ronald Breslow Organizers: John Dowling, Lubert Stryer, and Torsten Wiesel Proceedings: February 16, 1993 Proceedings: January 23, 1996 Human-Machine Communication by Voice Science, Technology, and the Economy February 8–9, 1993: Irvine, California October 20–22, 1995; Irvine, California Organizer: Lawrence Rabiner Organizers: James Heckman, Ariel Pakes, and Kenneth Proceedings: October 24, 1995 Sokoloff Changing Human Ecology and Behavior: Effects on Proceedings: November 12, 1996 Infectious Diseases Developmental Biology of Transcription Control September 27–28, 1993; Washington, D.C. October 25–28, 1995; Irvine, California Organizer: Bernard Roizman Organizers: Roy Britten, Eric Davidson, and Gary Felsenfeld Proceedings: March 29, 1994 Proceedings: September 3, 1996 The Tempo and Mode of Evolution Carbon Dioxide and Climate Change January 27–29, 1994; Irvine, California November 13–15, 1995; Irvine, California Organizers: Francisco Ayala, Walter Fitch Organizer: Charles Keeling Proceedings: July 19, 1994 Proceedings: August 5, 1997 Chemical Ecology: The Chemistry of Biotic Interaction Memory: Recording Experience in Cells and Circuits March 25–26, 1994; Washington, D.C. February 17–20, 1996; Irvine, California Organizers: Thomas Eisner, Jerrold Meinwald Organizer: Patricia Goldman-Rakic Proceedings: January 3, 1995 Proceedings: November 26, 1996 Physics: The Opening to Complexity June 25–27, 1994; Irvine, California Organizer: Philip Anderson Proceedings: July 18, 1995 About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution. COMPLETED NAS COLLOQUIA iv COMPLETED NAS COLLOQUIA Elliptic Curves and Modular Forms March 15–17, 1996; Washington, D.C. Organizers: Barry Mazur, Karl Rubin Proceedings: October 14, 1997 Symmetries Throughout the Sciences May 10–12, 1996; Irvine, California Organizer: Ernest Henley Proceedings: December 15, 1996 Genetic Engineering of Viruses and Viral Vectors June 9–11, 1996; Irvine, California Organizers: Peter Palese, Bernard Roizman Proceedings: October 15, 1996 Genetics and the Origin of Species January 30-February 1, 1997; Irvine, California Organizers: Francisco Ayala, Walter Fitch Proceedings: July 22, 1997 The Age of the Universe: Dark Matter and Structure Formation March 21–23, 1997; Irvine, California Organizers: David Schramm, P.J.E.Peebles Proceedings: January 6, 1998 Neuroimaging and Human Brain Function May 29–31, 1997; Irvine, California Organizers: Michael Posner, Marcus Raichle Proceedings: February 3, 1998 Protecting Our Food Supply: The Value of Plant Genome Initiatives June 2–4, 1997; Irvine, California Organizers: Michael Freeling, Ronald Phillips, John Axtell Proceedings: March 5, 1998 Computational Biomolecular Science September 11–14, 1997; Irvine, California Organizers: Peter G.Wolynes, Russell Doolittle, J.A.McCammon Proceedings: May 26, 1998 A Library Approach to Chemistry October 19–21, 1997; Irvine, California Organizer: Peter Schultz, Jonathan Ellman About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution. PROGRAM v PROGRAM Computational Biomolecular Science Thursday, September 11, 1997 Registration and Welcome Reception Friday, September 12, 1997 Session I 8:45 AM-12:30 PM Chair, Russell Doolittle Introduction, Peter Wolynes. Measuring genome evolution. Peer Bork (EMBL, Heidelberg). Determining biological function from sequence: Building highly specific sequence motifs for genome analysis. Douglas Brutlag (Stanford). Experimental studies of protein folding dynamics. William Eaton (NIH). Coupling the folding of homologous proteins. Ron Elber (Hebrew University). Session II 2:00 PM-5:30 PM Chair, Andrew McCammon Photoactive yellow protein: Prototype for the PAS domains of sensors and clocks. Elizabeth Getzoff (Scripps Research Institute). Inhomogeneities in genomic sequence composition. Philip Green (Univ. Washington). New refinement methods for NOE-distance based NMR structure. Angela Gronenborn (NIH). Estimation of evolutionary distances between DNA sequences. Wen-Hsiung Li (Univ. Texas, Houston). Comments by Roy Britten After-dinner Lecture. From slide rule to super computer. Hans Frauenfelder (Los Alamos). Saturday, September 13, 1997 Session III 9:00 AM-12:30 PM Chair, Andrew McCammon Comparing sequence comparison with structure comparison. Michael Levitt (Stanford). Structural classification of proteins and its evolutionary implications. Alexey Murzin (MRC, Cambridge). Exploring the protein folding funnel landscape-connection to fast folding experiments. Jose Onuchic (UCSD). Bridged bimetallic enzymes: A challenge for computational chemistry. Gregory Petsko (Brandeis). Session IV 2:00 PM-5:30 PM Chair, Peter Wolynes Sequence determinants of protein folding and stability. Robert Sauer (MIT). The evolution of efficient light harvesting in photosynthesis-one goal, many solutions. Klaus Schulten (Illinois). Electrostatic steering and ionic tethering in simulations of protein-ligand interactions. Rebecca Wade (EMBL, Heidelberg). Computer simulation of enzymatic reactions and other biological process; finding out what was optimized by evolution. Arieh Warshel (USC). After-dinner Lecture. Applications of computers in structural biology. Harold Scheraga (Cornell). About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution. LIST OF ATTENDEES vi LIST OF ATTENDEES Computational Biomolecular Science Robert K.Adair, Yale University Zaida Luthey-Schultem, University of Ilinois Paul A.Bash, Argone National Laboratory Jeffry D.Madura, University of South Alabama R.L.Bernstein, San Francisco State University J.Andrew McCammon, University of California, San Diego Paul Beroza, CombiChem Inc. Gregory Mooser, University of Southern California, School of Peer Bork, European Molecular Biology Laboratory Dentistry David A.Brant, University of California Victor Munoz, National Institutes of Health Roy J.Britten, California Institute of Technology Alexey G.Murzin, Centre for Protein Engineering Thomas C.Bruice, University of California, Santa Barbara Craig Nevill-Manning, Stanford University Douglas Brutlag, Stanford University Medical School Louis Noodleman, The Scripps Research Institute Aloke Chatterjee. Lawrence Berkeley National Laboratory Hugh Nymeyer, University of California, San Diego Jiangang Chen, University of California, Los Angeles Jose N.Onuchic, University of California, San Diego Margaret S.Cheung, University of California, San Diego Jean-Luc Pellequer, The Scripps Research Institute Julian D.Cole, Rensselaer Polytechnic Institute Gregory A.Petsko, Brandeis University Kumari Devulapalle, University of Southern California, School Mike Potter, University of California, San Diego of Dentistry Vijay S.Reddy, The Scripps Research Institute Russel F.Doolittle, University of California, San Diego Carolina M.Reyes, University of California, San Francisco William Eaton, National Institutes of Health Ron Elber, Hebrew Roy Riblet, Medical Biology Institute University Andrey Rzhetsky, Columbia University Adrien Elcock, University of California, San Diego Suzanne B.Sandmeyer, University of California, Irvine Hans Frauenfelder, Los Alamos National Laboratory Robert Sauer, Massachusetts Institute of Technology Anthony Gamst, University of California, San Diego Harold Scheraga, Cornell University Robert Gerber, University of California, Irvine Rebecca K.Schmidt, Australian National University Elizabeth D.Getzoff, Scripps Research Institute Klaus Schulten, University of Illinois Raveh Gill-More, Compugen Ltd. Soheil Shams, BioDiscovery Adam Godzik, The Scripps Research Institute Sylvia Spengler, Lawrence Berkeley National Laboratory Jill E.Gready, Australian National University Tim Springer, Center for Blood Research Phillip Green, University of Washington T.P.Straatsma, Pacific Northwest National Laboratory Angela M.Gronenborn, National Institutes of Health Ivan Suthsland, Sun Microsystems Laboratories William Grundy, University of California. San Diego Mounir Tarek, National Institute of Standards and Technology Volkhard Helms, University of California San Diego Douglas Tobias, University of California, Irvine Dennis Kibler, University of California, Irvine Chandra S.Verma, University of York Robert Konecny, The Scripps Research Institute Rebecca Wade, European Molecular Biology Laboratory Kristin Korethe, Smith Kline Beecham Frederic Y.M.Wan, University of California, Irvine Leslie A.Kuhn, Michigan State University Arieh Warshel, University of Southern California Donald Kyle, Scios Inc. Stephen H.White, University of California, Irvine Peter W.Langhoff, San Diego Supercomputer Center Peter Wolynes, National Institutes of Health Michael Levitt, Stanford University, School of Medicine Willy Wriggers, University of Illinois at Urbana-Champaign Jian Li, The Scripps Research Institute William V.Wright University of North Carolina Wen-Hsiung Li, University of Texas Thomas Wu, Stanford University E.N.Lightfoot, University of Wisconsin Qiang Zhenq, Scios Inc. Jennifer H.Y.Liu, University of California Hartmut Luecke, University of California, Irvine Jia Luo, University of California, Santa Barbara About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution. TABLE OF CONTENTS vii PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Table of Contents Papers from a National Academy of Sciences Colloquium on Computational Biomolecular Science Computational biomolecular science 5848 Peter G.Wolynes Measuring genome evolution 5849–5856 Martijn A.Huynen and Peer Bork SMART, a simple modular architecture research tool: Identification of signaling domains 5857–5864 Jörg Schultz, Frank Milpetz, Peer Bork, and Chris P.Ponting Highly specific protein sequence motifs for genome analysis 5865–5871 Craig G.Nevill-Manning, Thomas D.Wu, and Douglas L.Brutlag A statistical mechanical model for β-hairpin kinetics 5872–5879 Victor Munoz, Eric R.Henry, James Hofrichter, and William A.Eaton Coupling the folding of homologous proteins 5880–5883 Chen Keasar, Dror Tobi, Ron Elber, and Jeff Skolnick Photoactive yellow protein: A structural prototype for the three-dimensional fold of the PAS 5884–5890 domain superfamily Jean-Luc Pellequer. Karen A.Wager-Smith, Steve A.Kay, and Elizabeth D.Getzoff New methods of structure refinement for macromolecular structure determination by NMR 5891–5898 G.Marius Clore and Angela M.Gronenborn Estimation of evolutionary distances under stationary and nonstationary models of nucleotide 5899–5905 substitution Xun Gu and Wen-Hsiung Li Precise sequence complementarity between yeast chromosome ends and two classes of just- 5906–5912 subtelomeric sequences Roy J.Britten A unified statistical framework for sequence comparison and structure comparison 5913–5920 Michael Levitt and Mark Gerstein Folding funnels and frustration in off-lattice minimalist protein landscapes 5921–5928 Hugh Nymeyer, Angel E.García, and José Nelson Onuchic Optimizing the stability of single-chain proteins by linker length and composition mutagenesis 5929–5934 Clifford R.Robinson and Robert T.Sauer Architecture and mechanism of the light-harvesting apparatus of purple bacteria 5935–5941 Xiche Hu, Ana Damjanovi , Thorsten Ritz, and Klaus Schulten Electrostatic steering and ionic tethering in enzyme-ligand binding: Insights from simulations 5942–5949 Rebecca C.Wade, Razif R.Gabdoulline, Susanna K.Lüdemann, and Valère Lounnas Computer simulations of enzyme catalysis: Finding out what has been optimized by evolution 5950–5955 Arieh Warshel and Jan Florián About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution. COMPUTATIONAL BIOMOLECULAR SCIENCE 5848 Proc. Natl. Acad. Sci. USA Vol. 95. p. 5848, May 1998 Colloquium Paper This paper is the introduction to the following papers, which were presented at the colloquium “Computational Biomolecular Science,” organized by Russell Doolittle, J.Andrew McCammon, and Peter G.Wolynes, held September 11–13, 1997, sponsored by the National Academy of Sciences at the Arnold and Mabel Beckman Center in Irvine, CA. Computational biomolecular science PETER G.WOLYNES School of Chemical Sciences, University of Illinois, Urbana-Champaign, Urbana, IL 61801 In this century, the study of the molecules of life has transformed the practice of biology as a whole. Molecular thinking now influences the research agenda for scientists studying both the behavior of individual cells and organisms, and the relationships between organisms as in natural history. Even ecology and anthropology are being influenced by this molecular revolution. It is impressive that this transformation has, to a large extent, been made possible by simply identifying (with very clever strategies!) active biological molecules and cataloging their information content through their sequences. One result of all this activity is that raw data about life at the molecular level have become abundant, but understanding its biological meaning remains, in many if not most respects, perplexing. Fortunately, just at this stage, new approaches to understanding the connection between biomolecular sequence and physiological behavior are coming forward. Computation, theory, and novel experimental approaches that utilize the combinatorial power of the genetic code allow us to begin to understand biomolecular function from both the bottom-up atomistic point-of-view of the physical sciences and the top-down view usually associated with the evolutionary perspective. The goal of this colloquium was to bring together some of the workers from different scientific disciplines who are approaching these problems by using quantitative methods. Because computation plays such a large part in exploiting the information content of sequence data, the conference was entitled “Computational Biomolecular Science,” although some of the essential input of new experiments to this emerging discipline was covered too. From the bottom-up perspective, the first event to consider on the road from sequence to the biological behavior of an organism is the folding of a linear polymer into a three-dimensional structure. Once a molecule is properly folded, a variety of motions still go on in the folded state. It is through these motions that the biological molecule can function. These dynamical aspects represent complex problems in chemistry and physics. But it is the aptness with which these functions are carried out that at last determines whether the organism containing that molecule can survive in the struggle with other organisms. Quantitatively understanding molecular behavior sufficiently well for understanding this final biological goal requires much work from both the theoreticians and the experimentalists. The top-down interpretation of molecular data appears to proceed quite differently. Avoiding the complexity of molecular theory, the evolutionary perspective takes inheritance, perhaps the most self-evident aspect of “living” things, as its central concept. Comparing sequences between different organisms then provides clues to their molecular function. In this study, dominant use is made of features of molecules that do not change an organism’s fitness, thus allowing markers of inheritance to be reliably assigned. In a sense then the nonfunctional parts of a molecule’s structure and dynamics are the most useful to the phylogenetically inclined scientist. Convergent evolution is hard to establish by such studies but is critically important to those who wonder whether, from the atomistic perspective, there are indeed general themes to the scheme of life. Despite its sometimes “life as a blackbox” character, the top-down viewpoint has achieved a myriad of successes in the practical applications of biomolecular science. A gap exists between the two different vantage points of looking at biomolecular information, but there are a surprising number of common concepts. In understanding the folding, motions, and function of biological molecules, for example, a powerful new viewpoint that describes the entire energy landscape of a biomoiecule in a statistical fashion is proving essential. Understanding and differentiating between those parts of the energetics and dynamics that are biologically significant and those that can be thought of as random noise is the hallmark of this approach. Similarly, in the comparative top-down approach to understanding sequence data, a tremendous amount of statistical thinking must be done to understand whether a perceptible similarity between two sequences really means the molecules have comparable function or structure or whether the similarity is just an accident. Just as in energy landscape theory, extracting signal from noise is the crucial point to understanding molecular evolution. Such frankly statistical viewpoints must also be brought together when planning modern molecular biology experiments that now begin to allow the study of a huge number of variants of a biomoiecule in the laboratory simultaneously at one time. It became apparent in the meeting that, apart from the general common interest in biomolecules and the common but general theoretical concepts based on statistics, there were many specific problems where the top-down and bottom-up viewpoints can profitably be merged For example, surveys of genomes reveal widespread structural themes that may be clues to folding thermodynamics and kinetic folding routes. For the atomists, several studies show how the structures of specific sequences can be predicted if knowledge of the sequences of many widely different but evolutionary related molecules is available. On the other hand, for the evolutionist, an a priori knowledge of structural and energetic patterns in molecules leads to refined algorithms for comparing sequences to obtain reliable phylogenies. Also, convergent evolution can be recognized if both comparative and physical studies are available for proteins in the same family. This breaks evolutionary explanation out of the mold of sophisticated Kipling “just-so” stories into the quantitative mode, most prized by natural scientists. The papers in this colloquium give a partial snapshot of computational biomolecular science today. The organizers of the meeting, J.A.McCammon, R.F.Doolittle, and I, hope these papers give the readers of the Proceedings an idea of what is going on in a branch of science that is destined to grow much larger in the coming years. © 1998 by The National Academy of Sciences 0027–8424/98/955848–1$2.00/0 PNAS is available online at http://www.pnas.org. About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution. MEASURING GENOME EVOLUTION 5849 Proc. Natl. Acad. Sci. USA Vol. 95, pp. 5849–5856, May 1998 Colloquium Paper This paper was presented at the colloquium “Computational Biomolecular Science,” organized by Russell Doolittle, J.Andrew McCammon, and Peter G.Wolynes, held September 11–13, 1997, sponsored by the National Academy of Sciences at the Arnold and Mabel Beckman Center in Irvine CA. Measuring genome evolution (ortholog/synteny/comptuer analysis/horizontal gene transfer) MARTIJN A.HUYNEN* AND PEER BORK European Molecular Biology Laboratory, Meyerhofstrasse 1, 69012 Heidelberg. Germany, and Max-Delbrück-Centrum for Molecular Medicine, 13122 Berlin-Buch. Germany ABSTRACT The determination of complete genome sequences provides us with an opportunity to describe and analyze evolution at the comprehensive level of genomes. Here we compare nine genomes with respect to their protein coding genes at two levels: (i) we compare genomes as “bags of genes” and measure the fraction of orthologs shared between genomes and (ii) we quantify correlations between genes with respect to their relative positions in genomes. Distances between the genomes are related to their divergence times, measured as the number of amino acid substitutions per site in a set of 34 orthologous genes that are shared among all the genomes compared. We establish a hierarchy of rates at which genomes have changed during evolution. Protein sequence identity is the most conserved, followed by the complement of genes within the genome. Next is the degree of conservation of the order of genes, whereas gene regulation appears to evolve at the highest rate. Finally, we show that some genomes are more highly organized than others: they show a higher degree of the clustering of genes that have orthologs in other genomes. Molecular evolution usually is studied at the level of single genes. With the determination of genome sequences we have an opportunity to study it at a higher, comprehensive level, that of complete genomes. This leads to the pertinent question: how can genomic information be used to obtain useful information concerning genome evolution? The goal of this paper is to create baseline expectations for measures of genome distances that are based on gene content. By describing some general patterns one also can identify the exceptions. Measuring evolution at the level of complete genomes is pertinent as it is, after all, the principal level for natural selection. Furthermore, it is intermediate to levels at which evolution has long been studied: namely, the molecular level in genes and genotypes, and the organismal level in the fossil record. The genome in principle contains all of the information necessary to bridge the gap between genotype and phenotype. For example, by-knowing the functions of the genes in a genome of a species we can postulate a model for its complete metabolism. However, we have to be careful not to overstate our expectations. The situation might turn out to be analogous to that of proteins, for which, in principle, all information necessary to determine three-dimensional structures in the form of amino acid sequences is known, yet we remain unable to predict their tertiary structures. Genomes can be analyzed and compared for various features: e.g., nucleotide content, compositional biases of leading and lagging strands in replication (e.g., in Escherichia coli) (1), dinucleotide frequencies (2), the occurrence of repeats (e.g., in virulence genes of Haemophilus influenzae: ref. 3), RNA structures, coding densities, protein coding genes, operons, the size distribution of gene families (4), etc. They also can be compared at a variety of levels: a first-order level where we regard the genome as a “bag of genes” without taking account of interactions between the various components, and a second-order level that considers whether properties of genomes are cross-correlated (e.g., the absence of certain polynucleotides together with the presence of restriction enzymes that specifically cut these polynucleotides; ref. 5). In this paper we focus on first- and second-order patterns in protein coding regions in genomes. Specifically we measure: (i) the fraction of orthologous sequences between genomes, (ii) the conservation of gene order between genomes, and (iii) the spatial clustering of genes in one genome that have an ortholog in another genome. We correlate these measures with the divergence time between the genomes compared. It is not our goal to define new distance measures to construct phylogenetic trees. Rather it is to analyze the conservation and differentiation of patterns between genomes, to show how we can extract useful information from these, and to analyze at what relative time scales they change. The analyses are done on the first nine sequenced Archaea and Bacteria that were publicly available: H.influenzae (6), Mycoplasma genitalium (7), Synechocystis sp. PCC 6803 (8), Methanococcus jannaschii (9), Mycoplasma pneumoniae (10), E.coli (1), Methanobacterium thermoautotrophicum (11), Helicobacter pylori (12), and Bacillus subtilis (13). Although the total number of publicly available genome sequences is growing rapidly, the trends that we observe should remain largely unchanged with the comparison of new species, given the diverse range of evolutionary distances of the species compared in this paper. Methodological Issues in Comparisons of Genomes Identification of Orthologous Genes. Defining orthology. In comparing the genes of different genomes it is important that we avoid comparisons of “apples and pears”: i.e., that we are able to identify which genes correspond to each other in the various genomes. Fitch (14) introduced the term “orthologs” for genes whose independent evolution reflects a speciation event rather than a gene duplication event. “Where the homology is the result of gene duplication so that both copies have descended side by side during the history of an organism, (for example, alpha and beta hemoglobin) the genes should be called paralogous (para=in parallel). Where the homology is the result of speciation so that the history of the gene reflects the history of the species (for example, alpha hemoglobin in man and mouse) the genes should be called orthologous (ortho=exact)” (14). Note that orthology and paralogy are *To whom reprint requests should be addressed at: European Molecular Biology Laboratory, Meyerhofstrasse 1,69012 Heidelberg. Germany, e-mail:

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.