Table Of Content2
Introduction to Bioinformatics
3
INTRODUCTION TO
BIOINFORMATICS
FOURTH EDITION
Arthur M. Lesk
The Pennsylvania State University
In nature’s infinite book of secrecy
A little I can read.
Antony and Cleopatra
4
Great Clarendon Street, Oxford, OX2 6DP, United Kingdom
Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research,
scholarship, and education by publishing worldwide. Oxford is a registered trade mark of Oxford University Press in the UK and in
certain other countries
© Arthur M. Lesk 2014
The moral rights of the author have been asserted
First Edition copyright 2002
Second Edition copyright 2005
Third Edition copyright 2008
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any
means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, by licence or under terms
agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should
be sent to the Rights Department, Oxford University Press, at the address above
You must not circulate this work in any other form and you must impose this same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
ISBN 978–0–19–965156–6
Printed in Italy by L.E.G.O. S.p.A—Lavis TN
Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for
the materials contained in any third party website referenced in this work.
5
Dedicated to Eda, with whom I have merged my genes.
6
CONTENTS
Preface to the first edition
Preface to the second edition
Preface to the third edition
Preface to the fourth edition
Plan of the book
Introduction to bioinformatics on the web
Acknowledgements
1 Introduction
Life in space and time
Phenotype = genotype + environment + life history + epigenetics
Evolution is the change over time in the world of living things
Dogmas: central and peripheral
Statics and dynamics
Networks
Observables and data archives
A database without effective modes of access is merely a data graveyard
Information flow in bioinformatics
Curation, annotation, and quality control
The world-wide web
Electronic publication
Computers and computer science
Programming
Biological classification and nomenclature
Use of sequences to determine phylogenetic relationships
Use of SINES and LINES to derive phylogenetic relationships
Searching for similar sequences in databases: PSI-BLAST
Introduction to protein structure
The hierarchical nature of protein architecture
7
Classification of protein structures
Protein structure prediction and engineering
Critical Assessment of Structure Prediction
Protein engineering
Proteomics and transcriptomics
DNA microarrays
Transcriptomics and RNA sequencing
Mass spectrometry
Systems biology
Clinical implications
The future
Recommended reading
Exercises and problems
2 Genome organization and evolution
Genomes, transcriptomes, and proteomes
Genes
Proteomics and transcriptomics
Eavesdropping on the transmission of genetic information
Identification of genes associated with inherited diseases
Mappings between the maps
High-resolution maps
Genome-wide association studies
Picking out genes in genomes
Genome-sequencing projects
Genomes of prokaryotes
The genome of the bacterium Escherichia coli
The genome of the archaeon Methanococcus jannaschii
The genome of one of the simplest organisms: Mycoplasma genitalium
Metagenomics: the collection of genomes in a coherent environmental sample
The human microbiome
Genomes of eukarya
Gene families
The genome of Saccharomyces cerevisiae (baker's yeast)
The genome of Caenorhabditis elegans
The genome of Drosophila melanogaster
8
The genome of Arabidopsis thaliana
The genome of Homo sapiens (the human genome)
Protein-coding genes
Repeat sequences
RNA
Single-nucleotide polymorphisms and haplotypes
Systematic measurements and collections of single-nucleotide polymorphisms
Ethical, legal, and social issues
Genetic diversity in anthropology
DNA sequences and languages
Genetic diversity and personal identification
Evolution of genomes
Please pass the genes: horizontal gene transfer
Comparative genomics of eukarya
Recommended Reading
Exercises and problems
3 Scientific publications and archives: media, content, and access
The scientific literature
Economic factors governing access to scholarly publications
Open access
The Public Library of Science
Traditional and digital libraries
How to populate a digital library
The information explosion
The web: higher dimensions
New media: video, sound
Searching the literature
Bibliography management
Databases
Database contents
The literature as a database
Database organization
Annotation
Database quality control
Database access
Links
9
Database interoperability
Data mining
Programming languages and tools
Traditional programming languages
Scripting languages
Program libraries specialized for molecular biology
Java: computing over the web
Markup languages
Natural language processing
Natural language processing and mining the biomedical literature
Applications of text mining
Recommended reading
Exercises and problems
4 Archives and information retrieval
Database indexing and specification of search terms
Follow-up questions
Analysis and processing of retrieved data
The archives
Nucleic acid sequence databases
Genome databases and genome browsers
Protein sequence databases
Databases of protein families
Databases of structures
Classifications of protein structures
Accuracy and precision of protein structure determinations
Specialized, or ‘boutique’, databases
Expression and proteomics databases
Bibliographic databases
Surveys of molecular biology databases and servers
Gateways to archives
Access to databases in molecular biology
ENTREZ
The Protein Identification Resource
ExPASy: Expert Protein Analysis System
Where do we go from here?
Recommended reading
10