Table Of ContentBioinformatics Primer
(An Introductory Handbook for Bioinformatics
Practitioners)
Bio-Bio-1 Team
March 26, 2011
Forward
Forward described here ...
(by some eminent personality like Prof. Dr. Liaqat Ali...)
i
ii
Preface
(Team’s introduction to the project)
iii
iv
Contents
I Introduction... 1
1 Introduction to Bioinformatics 5
2 Introduction to Cell Biology 15
2.1 Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Cell Structure. . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1.2 Cell Cycle & Cell Division Cycle . . . . . . . . . . . . . . 18
2.2 Chromosome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.3 DNA (DeoxyriboNucleic Acid) . . . . . . . . . . . . . . . . . . . 20
2.4 RNA (RiboNucleic Acid) . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Nucleotide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Introduction to Genetics and Genomics 25
3.1 Concept of Gene . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Discovery Chronology Revealing the Concept of Central Dogma
of Life . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Discovery of Gene Sequence . . . . . . . . . . . . . . . . . . . . . 27
3.4 Central Dogma of Biology . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Human Genome Project . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 Genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.7 Common Terms used in Genetics . . . . . . . . . . . . . . . . . . 31
4 Introduction to Proteomics 35
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Protein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.1 Amino Acids . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.2 General properties of Amino acids . . . . . . . . . . . . . 36
4.2.2.1 Structure . . . . . . . . . . . . . . . . . . . . . . 36
4.2.2.2 Zwitter Ion . . . . . . . . . . . . . . . . . . . . . 37
4.2.2.3 Isomerism. . . . . . . . . . . . . . . . . . . . . . 37
4.2.2.4 Classification of Amino acids . . . . . . . . . . . 38
4.3 The Structure of Proteins . . . . . . . . . . . . . . . . . . . . . . 40
4.3.1 Primary Structure . . . . . . . . . . . . . . . . . . . . . . 40
4.3.2 Secondary Structure . . . . . . . . . . . . . . . . . . . . . 42
v
vi
4.4 Amino Acid Classifications . . . . . . . . . . . . . . . . . . . . . 42
4.5 Ramachandran Plot . . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Some Bioinformatics Model Organisms 43
5.1 Origin and Early Evolution . . . . . . . . . . . . . . . . . . . . . 43
5.2 Virus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2.1 Use of Virus in Life Sciences and Medicine . . . . . . . . 47
5.2.2 Use of Virus in Materials Science and Nanotechnology . . 48
5.3 Bacteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.3.1 Importance of Bacteria in Bioinformatics . . . . . . . . . 51
5.4 Escherichia coli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.5 Archaea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.6 Fungi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.7 Human Being . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6 Computing Fundamentals for Bioinformatics 55
6.1 Bioinformatics Problem Solving and Algorithm Development . . 55
6.1.1 Why Do We Need Algorithm?. . . . . . . . . . . . . . . . 56
6.1.2 How to Design an Algorithm? . . . . . . . . . . . . . . . . 57
6.1.3 How to Write Pseudocode . . . . . . . . . . . . . . . . . . 59
6.1.4 Types of Algorithm . . . . . . . . . . . . . . . . . . . . . 59
6.2 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3 Concept and Usage of Database . . . . . . . . . . . . . . . . . . . 59
6.4 Computational Model . . . . . . . . . . . . . . . . . . . . . . . . 59
6.5 Programming Concept and Applications . . . . . . . . . . . . . . 59
6.6 World Wide Web (WWW) . . . . . . . . . . . . . . . . . . . . . 59
6.7 Web Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7 Math Primer for Bioinformatics 61
8 Biological Processes, Experimental Methods & Machinery 63
8.1 DNA Cloning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.2 DNA Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.3 Gel electrophoresis . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.4 DNA Cloning in Plasmid Vector . . . . . . . . . . . . . . . . . . 63
8.5 Sanger Method for DNA Sequencing . . . . . . . . . . . . . . . . 63
8.6 DNA Shotgun Sequencing . . . . . . . . . . . . . . . . . . . . . . 63
8.7 DNA Microarray . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
8.8 Recombinant DNA Technology . . . . . . . . . . . . . . . . . . . 63
8.9 Constructing Genomic and cDNA Libraries . . . . . . . . . . . . 63
II Introduction to Bioinformatics Problems 65
9 DNA & Protein Sequencing 69
9.1 DNA Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
vii
9.2 History of DNA Sequencing . . . . . . . . . . . . . . . . . . . . . 70
9.3 Methods of DNA Sequencing . . . . . . . . . . . . . . . . . . . . 70
9.4 DNA Sequencing Process . . . . . . . . . . . . . . . . . . . . . . 71
9.5 DNA Sequencing in Real Time . . . . . . . . . . . . . . . . . . . 73
9.6 Next Generation DNA Sequencing . . . . . . . . . . . . . . . . . 73
9.7 Complete Genome Sequencing . . . . . . . . . . . . . . . . . . . . 73
9.8 Challenges of DNA Sequencing . . . . . . . . . . . . . . . . . . . 74
9.9 Usage of DNA Sequencing . . . . . . . . . . . . . . . . . . . . . . 75
9.10 DNA Sequencing: Where to Next . . . . . . . . . . . . . . . . . . 76
9.11 Case Study: Human Genome Project . . . . . . . . . . . . . . . . 76
9.12 Protein Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . 76
10 Genome Mapping 79
10.1 Genetic Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
10.1.1 Landmarks of Genetic Maps. . . . . . . . . . . . . . . . . 81
10.1.2 Linkage Analysis . . . . . . . . . . . . . . . . . . . . . . . 81
10.2 Physical Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.3 Restriction Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 82
10.3.1 Historical Background . . . . . . . . . . . . . . . . . . . . 82
10.3.2 Restriction Map . . . . . . . . . . . . . . . . . . . . . . . 83
10.3.3 Restriction Mapping Process . . . . . . . . . . . . . . . . 84
10.3.4 Uses of Restriction Mapping . . . . . . . . . . . . . . . . 86
11 Sequences Alignment 91
11.1 DNA & Protein Sequences Comparison and Alignment . . . . . . 91
11.1.1 Sequence Alignment: . . . . . . . . . . . . . . . . . . . . . 92
11.1.2 Motivation for Sequence Alignment . . . . . . . . . . . . . 92
11.1.3 Similarity and Homology of Sequences . . . . . . . . . . . 93
11.1.4 Type of Sequence Alignment . . . . . . . . . . . . . . . . 94
11.1.5 Computational Methods & Models for Sequence Alignment 96
11.1.5.1 Dot Matrix . . . . . . . . . . . . . . . . . . . . . 97
11.1.5.2 Dynamic Programming . . . . . . . . . . . . . . 98
11.1.6 Importance of Sequence Alignment . . . . . . . . . . . . . 99
11.1.7 Sequence Alignment Tools . . . . . . . . . . . . . . . . . . 100
11.2 Multiple Sequence Alignment . . . . . . . . . . . . . . . . . . . . 100
11.2.1 Methods for Multiple Sequence Alignment . . . . . . . . . 102
11.2.1.1 Dynamic Programming based Models . . . . . . 102
11.2.1.2 Statistical Methods and Probabilistic Models . . 102
11.2.2 Usage of Multiple Sequence Alignment . . . . . . . . . . . 103
11.2.3 Tools for Multiple Sequence Alignment. . . . . . . . . . . 103
11.3 Regulatory Motif Finding . . . . . . . . . . . . . . . . . . . . . . 103
11.3.1 Gene-Regulation & Regulatory Motif. . . . . . . . . . . . 104
11.3.2 Motif Discovery Methods . . . . . . . . . . . . . . . . . . 104
11.3.3 Tools for Motif Finding . . . . . . . . . . . . . . . . . . . 108
viii
12 Gene Prediction 109
12.1 Introduction to Genome Annotation & Gene Prediction . . . . . 109
12.1.1 Gene Finding Principles and Guidelines . . . . . . . . . . 110
12.1.2 Gene Prediction Approaches . . . . . . . . . . . . . . . . 113
12.1.2.1 Extrinsic approaches. . . . . . . . . . . . . . . . 113
12.1.2.2 Ab-initio Gene Prediction . . . . . . . . . . . . . 114
12.1.2.3 Comparative Gene Prediction . . . . . . . . . . 114
12.1.2.4 Homology-based Methods . . . . . . . . . . . . . 114
12.1.3 Gene Prediction Tools . . . . . . . . . . . . . . . . . . . . 115
13 Genome Analysis 117
14 Phylogenetic Analysis 119
14.1 Introduction of Phylogeny . . . . . . . . . . . . . . . . . . . . . . 119
14.2 Concept of Evolution & Evolutionary Model . . . . . . . . . . . . 120
14.3 Phylogenetic Tree. . . . . . . . . . . . . . . . . . . . . . . . . . . 122
14.4 Types of Phylogenetic Trees . . . . . . . . . . . . . . . . . . . . . 123
14.5 Approaches in Phylogenetic Analysis . . . . . . . . . . . . . . . . 125
14.5.1 Phenetic(or Clustering) Approach . . . . . . . . . . . . . 125
14.5.2 Cladistic Approach . . . . . . . . . . . . . . . . . . . . . . 126
14.5.3 Evolutionary Systematic Approaches . . . . . . . . . . . . 126
14.6 Methods for Phylogenetic Tree-Construction . . . . . . . . . . . . 126
14.6.1 Distance-based Methods . . . . . . . . . . . . . . . . . . . 126
14.6.1.1 UnweightedPairGroupMethodwithArithmetic
Mean (UPGMA) . . . . . . . . . . . . . . . . . . 126
14.6.1.2 Neighbor Joining Algorithm(NJ) . . . . . . . . . 129
14.6.1.3 Fitch-Margobiash (FM) Method . . . . . . . . . 129
14.6.1.4 Minimum Evolution (ME) Method . . . . . . . . 130
14.6.2 Character-based Method. . . . . . . . . . . . . . . . . . . 130
14.6.2.1 Maximum Parsimony (MP) Method . . . . . . . 130
14.6.2.2 Maximum Likelihood (ML) Method . . . . . . . 131
14.7 Phylogenetic Analysis Tools . . . . . . . . . . . . . . . . . . . . . 132
15 Protein Folding 133
15.1 Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
15.2 Protein Classification . . . . . . . . . . . . . . . . . . . . . . . . . 134
15.3 Protein Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
15.4 Protein Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
15.4.1 Primary Structure . . . . . . . . . . . . . . . . . . . . . . 134
15.4.2 Secondary Structure . . . . . . . . . . . . . . . . . . . . . 134
15.4.2.1 α-helix . . . . . . . . . . . . . . . . . . . . . . . 134
15.4.2.2 β-sheets . . . . . . . . . . . . . . . . . . . . . . . 135
15.4.3 Tertiary Structure . . . . . . . . . . . . . . . . . . . . . . 135
15.4.4 Quaternary Structure . . . . . . . . . . . . . . . . . . . . 135
15.5 Experimental Techniques for Structure Determination . . . . . . 135
15.5.1 X-ray Crystallography . . . . . . . . . . . . . . . . . . . . 135
ix
15.5.2 Nuclear Magnetic Resonance spectroscopy (NMR) . . . . 136
15.5.3 Electron Microscopy/Diffraction . . . . . . . . . . . . . . 136
15.5.4 Free electron lasers . . . . . . . . . . . . . . . . . . . . . . 136
15.6 Protein Structure Classification . . . . . . . . . . . . . . . . . . . 136
15.6.1 Two types of algorithms . . . . . . . . . . . . . . . . . . . 136
15.7 Protein Structure Prediction . . . . . . . . . . . . . . . . . . . . 137
15.7.1 Stages of Protein Structure Prediction . . . . . . . . . . . 137
15.8 Secondary & Tertiary Structure Prediction Methods . . . . . . . 138
15.8.1 Ab-initio Method . . . . . . . . . . . . . . . . . . . . . . . 139
15.8.2 Statistical Method (old fashioned) . . . . . . . . . . . . . 140
15.8.3 Nearest Neighbor Approach . . . . . . . . . . . . . . . . . 140
15.8.4 Neural Network Approach . . . . . . . . . . . . . . . . . . 140
15.8.5 Hidden Markov Model . . . . . . . . . . . . . . . . . . . . 141
15.8.6 Support Vector Machine based methods . . . . . . . . . . 141
15.9 Performance of Structure Prediction Approaches . . . . . . . . . 141
15.10Protein Databases . . . . . . . . . . . . . . . . . . . . . . . . . . 142
15.10.1Structural Classification Databases . . . . . . . . . . . . . 142
16 Structural Bioinformatics & Drug Discovery 143
16.1 Traditional Methods of Drug Discovery . . . . . . . . . . . . . . 144
16.2 Modern Methods of Drug Discovery . . . . . . . . . . . . . . . . 144
16.3 Structural Bioinformatics . . . . . . . . . . . . . . . . . . . . . . 145
16.4 Bioinformatics and Drug Discovery Pipeline . . . . . . . . . . . . 145
16.4.1 Target Identification and Selection . . . . . . . . . . . . . 145
16.4.1.1 Types of Targets . . . . . . . . . . . . . . . . . . 146
16.4.2 Target Validation . . . . . . . . . . . . . . . . . . . . . . . 146
16.4.3 Assay Development. . . . . . . . . . . . . . . . . . . . . . 146
16.4.4 Lead Identification . . . . . . . . . . . . . . . . . . . . . . 146
16.4.5 Lead Development . . . . . . . . . . . . . . . . . . . . . . 146
16.4.6 Screening and Hits to Leads . . . . . . . . . . . . . . . . . 146
16.4.7 Lead Optimization . . . . . . . . . . . . . . . . . . . . . . 146
16.4.8 Drug Development . . . . . . . . . . . . . . . . . . . . . . 147
16.4.9 Drug Testing . . . . . . . . . . . . . . . . . . . . . . . . . 147
16.4.10Preclinical Development . . . . . . . . . . . . . . . . . . . 147
16.4.11Drug Toxicology . . . . . . . . . . . . . . . . . . . . . . . 147
16.4.12Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . 147
16.4.13NDA and New Drug to Market . . . . . . . . . . . . . . . 147
16.5 High-Throughput Screening (HTS) . . . . . . . . . . . . . . . . . 147
16.6 Ligand-based Drug Design . . . . . . . . . . . . . . . . . . . . . . 147
16.7 Computer Aided Drug Design (CADD) . . . . . . . . . . . . . . 147
16.8 Quantitative Structure Activity Relationships (QSAR) . . . . . . 148
16.9 Individual Drug Discovery . . . . . . . . . . . . . . . . . . . . . . 148