Identifying prokaryotic consortia that live in close interaction with algae Assia SALTYKOVA Master’s dissertation submitted to obtain the degree of Master of Science in Biochemistry and Biotechnology Major Bioinformatics and Systems Biology Academic year 2014-2015 Promoter: Prof. Dr. Kathleen Marchal Scientific supervisors: Stephane Rombauts and Sergio Pulido Tamayo UGent - Department Information Technology UGent - Department Plant Biotechnology and Bioinformatics VIB - Department Plant Systems Biology Research Group Data Integration and Biological Networks Acknowledgements Acknowledgements It is a great pleasure to thank those who made this work possible. I am grateful to Prof. Dr. Kathleen Marshal for giving me the opportunity to make this thesis and to Sergio Pulido for valuable advices and for revising the manuscript. Special thanks go to Stephane Rombauts who has provided the data and who with endless patience has guided me through the practical part of this work. It was also an adorable experience to join the Biocomp group with its absolutely unique atmosphere and people. i Acknowledgements ii Table of contents Table of contents Acknowledgements .............................................................................................................. i Table of contents ................................................................................................................ iii List of abbreviations .......................................................................................................... vii Samenvatting ..................................................................................................................... ix Abstract ............................................................................................................................. xi 1. Introduction .................................................................................................................... 1 1.1 Algae and the associated bacteria. ............................................................................. 3 1.1.1 Beneficial interactions between algae and bacteria. .................................................. 4 1.1.2 Detrimental interactions and defense. ........................................................................ 6 1.1.3 Structure of algal-associated bacterial communities. ................................................. 7 1.1.4 Future perspectives. ..................................................................................................... 9 1.2 Studying algal-bacterial interactions using whole-genome sequencing data. ............. 10 1.2.1 Illumina sequencing and NGS data assembly. ........................................................... 11 1.2.2 Binning of the data. ................................................................................................... 13 1.2.3 Using CONCOCT for binning of algal-bacterial assemblies. ...................................... 17 1.2.4 Estimating cluster quality. ......................................................................................... 18 2. Aim ............................................................................................................................... 21 3. Results .......................................................................................................................... 23 3.1 Assembly of non-algal reads within O. tauri sequencing data and CONCOCT-assisted binning. ......................................................................................................................... 24 3.2 Assessing the possibility to better delineate the eukaryote target genome from contaminants. ............................................................................................................... 28 3.3 Binning of O. mediterraneus data. ............................................................................ 29 3.4 Binning of filtered P. crispa assembly. ...................................................................... 31 3.5 Binning of C. braunii data ......................................................................................... 35 3.4.1 Binning of German C. braunii assembly .................................................................... 36 3.4.2 Binning of Japanese C. braunii assembly ................................................................... 38 4. Discussion ..................................................................................................................... 47 4.1 Performance of the binning method. ........................................................................ 47 4.2 Biology of the observed bacteria. ............................................................................. 48 4.2.1 Proteobacteria and Bacteroidetes. ............................................................................ 48 iii Table of contents 4.2.2 Actinobacteria, Acidobacteria and Bacteroidetes. .................................................... 52 4.3 Origin of contamination ........................................................................................... 55 4.4 Future perspectives .................................................................................................. 56 5. Discussie ........................................................................................................................ 56 5.1 Beoordeling van de gebruikte methode. ................................................................... 57 5.2 Biologie van de waargenomen bacteriën .................................................................. 58 5.2.1 Proteobacteria en Bacteroidetes ............................................................................... 58 5.2.2 Actinobacteria, Acidobacteria en Planctomycetes. ................................................... 60 5.3 Toekomstperspectieven ........................................................................................... 61 6. Conclusion ..................................................................................................................... 63 7. Materials and methods .................................................................................................. 65 7.1 Sequencing data and assemblies. ............................................................................. 65 7.1.1 O. tauri and O. mediterraneus. ................................................................................. 65 7.1.2 P. crispa. ..................................................................................................................... 66 7.1.3 C. braunii. ................................................................................................................... 66 7.2 Preparation of the data prior to binning. .................................................................. 67 7.2.1 De novo assembly of non-algal contigs from O.tauri genome sequencing data using CLC-assembly cell. ............................................................................................................... 67 7.2.2 Filtering of P. crispa assembly prior to binning. ........................................................ 68 7.2.3 Combining two German C. braunii draft assemblies using Newbler. ....................... 68 7.3 Binning of contigs with CONCOCT. ............................................................................ 69 7.6 Binning evaluation using taxonomic labels provided by MEGAN5. ............................ 69 7.7 Binning evaluation using single-copy core genes. ...................................................... 70 7.6 Isolation of bacterial genomes and scaffolding with Sspace. ..................................... 71 7.7 Aligning isolated genomes to reference using MUMmer. .......................................... 71 7.8 Evaluation of CONCOCT-assisted binning for separating prokaryotic and eukaryotic sequences. ..................................................................................................................... 72 8. References ..................................................................................................................... 74 9. Addendum ..................................................................................................................... 92 9.1 Scripts ...................................................................................................................... 92 9.1.1 CONFPLOT.R ................................................................................................................. 92 9.1.2 CLUSTERPLOT.R .............................................................................................................. 92 iv Table of contents 9.1.3 MEGAN_TO_CONCOCT.PY ................................................................................................ 94 9.1.4 MEGAN_CONCAT_TAXON.PY ............................................................................................ 94 9.1.5 CUT_FASTA.PY ............................................................................................................... 99 9.1.6 SCAFFOLD2CONTIGS.PL .................................................................................................. 100 9.1.7 COUNT_FRAGMENTS.SH ................................................................................................. 101 9.1.8 MFASTA_TOOLS.PL ....................................................................................................... 103 9.2 Supplementary figures ............................................................................................... 119 v Table of contents vi List of abbreviations List of abbreviations BCC Banyuls-sur-mer Culture Collection BIC Bayesian Information Criterion bp base pairs CCALA Culture Collection of Autotrophic Organisms CDD Conserved Domain Database COG Clusters of Orthologous Group DGGE Denaturing Gradient Gel Electrophoresis ENA European Nucleotide Archive Gbp Giga base pairs kbp kilo base pairs LCA Lowest Common Ancestor Mbp Mega base pairs NGS New Generation Sequencing PCA Principal Component Analysis PES Provasoli Enriched Seawater RCC Roscoff Culture Collection RPS-BLAST Reversed Position Specific BLAST SCG Single Copy Core Gene T-RFLP Terminal Restriction Fragment Length Polymorphisms (T-RFLP) vii List of abbreviations viii
Description: