Cyanobacteria Omics and Manipulation Edited by Dmitry A. Los Caister Academic Press Cyanobacteria Omics and Manipulation Edited by Dmitry A. Los Institute of Plant Physiology Russian Academy of Sciences Moscow Russia Caister Academic Press Copyright © 2017 Caister Academic Press Norfolk, UK www.caister.com British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-1-910190-55-5 (paperback) ISBN: 978-1-910190-56-2 (ebook) Description or mention of instrumentation, software, or other products in this book does not imply endorsement by the author or publisher. The author and publisher do not assume responsibility for the validity of any products or procedures mentioned or described in this book or for the consequences of their use. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publisher. No claim to original U.S. Government works. Cover design adapted from photograph of Synechocystis cells kindly provided by Dr Alexander Voronkov and Dr Maria Sinetova (Institute of Plant Physiology, Russian Academy of Sciences, Moscow, Russia). Ebooks Ebooks supplied to individuals are single-user only and must not be reproduced, copied, stored in a retrieval system, or distributed by any means, electronic, mechanical, photocopying, email, internet or otherwise. Ebooks supplied to academic libraries, corporations, government organizations, public libraries, and school libraries are subject to the terms and conditions specified by the supplier. Contents Preface v 1 The Cyanobacterial Core Genome: Global and Specific Features with a Focus on Secondary Metabolites 1 Stefan Simm, Enrico Schleiff and Rafael Pernil 2 Genome-wide Analysis of Cyanobacterial Evolution: The Example of Synechococcus 35 Petr Dvořák 3 Genomics of NRPS/PKS Biosynthetic Gene Clusters in Cyanobacteria 55 Claire Pancrace, Muriel Gugger and Alexandra Calteau 4 RNA-seq Based Transcriptomic Analysis of Single Cyanobacterial Cells 75 Zixi Chen, Jiangxin Wang, Lei Chen and Weiwen Zhang 5 Transcriptomics of Cyanobacterial Stress Responses: Genes, Sensors and Molecular Triggers 93 Maria A. Sinetova, Anna A. Zorina, Kirill S. Mironov and Dmitry A. Los 6 Transcriptomic and Proteomic Analysis to Understand Systems-level Properties of Diurnal Cycles in Nitrogen-fixing Cyanobacteria 117 Uma K. Aryal and Louis A. Sherman 7 Proteomic Analysis of Post Translational Modifications in Cyanobacteria 145 Qian Xiong, Zhuo Chen and Feng Ge 8 Metabolic Engineering and Systems Biology for Free Fatty Acid Production in Cyanobacteria 161 Anne M. Ruffing 9 Terpene Hydrocarbons Production in Cyanobacteria 187 Anastasios Melis 10 Ethanol Production in Cyanobacteria: Impact of Omics of the Model Organism Synechocystis on Yield Enhancement 199 J. Tony Pembroke, Lorraine Quinn, Helen O’Riordan, Con Sheahan and Patricia Armshaw iv | Contents 11 Engineering of Alkane Production in Cyanobacteria 219 Xuefeng Lu and Weihua Wang 12 Photoautotrophic Polyhydroxyalkanoate Production in Cyanobacteria 235 Ka-Kei Sam, Nyok-Sean Lau, Amirul Al-Ashraf Abdullah and Minami Matsui Index 253 Preface Cyanobacteria are represented by a diverse group with unique structural features and biological of microorganisms that, being a part of marine activities, including antiviral and antibacterial and freshwater phytoplankton, significantly con- agents, cytotoxins, antioxidants, and other bio- tribute to the planetary fixation of atmospheric active compounds. carbon and evolution of molecular oxygen via In this volume, the reader will find a collec- photosynthesis. Ancient cyanobacteria (~ 2.5 bil- tion of chapters devoted to cyanobacterial omics lion years of history) participated in the formation (genomic, transcriptomics, proteomics, etc.) of Earth’s oil deposits. Modern cyanobacteria targeted for understanding the basic principles grow fast; they do not compete for agricultural of cyanobacterial metabolism. This fundamental lands and resources; they efficiently convert knowledge is then converted into metabolic engi- excessive amounts of CO into biomass, thus neering in order to produce valuable compounds, 2 participating in both carbon fixation and organic e.g. pharmaceuticals, biofuels, bioplastics, etc. chemical production. Many cyanobacterial spe- Such a systemic approach fits well to the concept cies are easily transformable and, thus, may be of the ‘Green Planet’, which implies the sustainable genetically manipulated to produce photosyn- development on the basis of green (photosynthetic thetic carbohydrates, fatty acids, or alcohols as bacteria, plants, algae, cyanobacteria) technologies renewable sources of fourth-generation biofuels. that produce renewable and clean foods, energy, Genetic modification of strains is a powerful tool and materials. to redirect the biosynthetic pathways of cyano- The international team of authors would like bacteria to desirable end-products, including to bring the attention of the readers to the latest those that have never being produced by these achievements in biology of cyanobacteria – photo- organisms. In addition, cyanobacteria are studied synthetic microorganisms with great academic and and used as a rich source of bioactive metabolites industrial potential. Current Books of Interest Bacillus: Cellular and Molecular Biology 2017 Foot-and-mouth Disease Virus: Current Research and Emerging Trends 2017 Brain-eating Amoebae: Biology and Pathogenesis of Naegleria fowleri 2016 Staphylococcus: Genetics and Physiology 2016 Chloroplasts: Current Research and Future Trends 2016 Microbial Biodegradation: From Omics to Function and Application 2016 Influenza: Current Research 2016 MALDI-TOF Mass Spectometry in Microbiology 2016 Aspergillus and Penicillium in the Post-genomic Era 2016 The Bacteriocins: Current Knowledge and Future Prospects 2016 Omics in Plant Disease Resistance 2016 Acidophiles: Life in Extremely Acidic Environments 2016 Climate Change and Microbial Ecology: Current Research and Future Trends 2016 Biofilms in Bioremediation: Current Research and Emerging Technologies 2016 Microalgae: Current Research and Applications 2016 Gas Plasma Sterilization in Microbiology: Theory, Applications, Pitfalls and New Perspectives 2016 Virus Evolution: Current Research and Future Directions 2016 Arboviruses: Molecular Biology, Evolution and Control 2016 Shigella: Molecular and Cellular Biology 2016 Aquatic Biofilms: Ecology, Water Quality and Wastewater Treatment 2016 Alphaviruses: Current Biology 2016 Thermophilic Microorganisms 2015 Flow Cytometry in Microbiology: Technology and Applications 2015 Probiotics and Prebiotics: Current Research and Future Trends 2015 Epigenetics: Current Research and Emerging Trends 2015 Corynebacterium glutamicum: From Systems Biology to Biotechnological Applications 2015 Advanced Vaccine Research Methods for the Decade of Vaccines 2015 Antifungals: From Genomics to Resistance and the Development of Novel Agents 2015 Bacteria–Plant Interactions: Advanced Research and Future Trends 2015 Aeromonas 2015 Antibiotics: Current Innovations and Future Trends 2015 Leishmania: Current Biology and Control 2015 Acanthamoeba: Biology and Pathogenesis (2nd edition) 2015 Full details at www.caister.com The Cyanobacterial Core Genome: 1 Global and Specific Features with a Focus on Secondary Metabolites Stefan Simm1, Enrico Schleiff1,2,3* and Rafael Pernil1 1Department of Biosciences, Molecular Cell Biology of Plants, Goethe University, Frankfurt am Main, Germany. 2Cluster of Excellence Frankfurt, Frankfurt am Main, Germany. 3Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University, Frankfurt am Main, Germany. *Correspondence: [email protected] Abstract or environments with extreme temperatures or salt The technological progress done in next generation concentrations. Furthermore, differences between sequencing (NGS) has enabled to perform genome cultured and wild-living cyanobacteria and their analyses of environments to understand microbial acquisition and loss of genes in relation to changes communities through metagenomics and, by this, in the availability of nutrients or metabolites can be has increased the variety of available sequenced investigated. In addition, new genomic elements or genomes. The high number of DNA sequencing specific pathways for secondary metabolites could genome projects in the last years has created an be also identified and biotechnologically used in archive of ~20,000 fully sequenced prokaryotic the future, like the biosynthesis pathway of the anti- genomes that has influenced the current research. cancer agent curacin A (Jones et al., 2011). By this, multi-genome comparison approaches like In the past, pangenome studies have been pangenome analysis become of major importance performed focusing on the identification of to boost the understanding of the complexity intraspecies gene sets. Similarly, pangenomes can of biological systems. Such analyses range from be used to obtain evolutionary information by the definition of taxa composition in specific identifying the core-genome of interspecies com- environments by metagenomics, investigations of parisons in cyanobacteria and get insights into the evolutionary relations and assignments of core- and acquisition and loss of genes within a specific clade pangenomes to the definition of core gene sets, or phylum. Furthermore, the core-genome plays and even the correlation of specific gene sets for a role in the identification of so-called signature biological functions, phenotypes or traits in cyano- genes that only occur in all or most of cyanobacte- bacteria. In this regard, the challenge to handle ria. Pangenomics for cyanobacteria has helped to and use appropriately the increasing amount of define the origin of photosynthesis and to analyse information is of major importance. For this reason, the diversity of metabolism. Furthermore, pan- and data handling, analysis and visualization will be core-genome analyses have been used to identify described in this chapter from a bioinformatics trait-specific gene sets. In this context, cyanobacte- point of view. The following sections will explain ria were clustered regarding their different traits to in detail the general applications of metagenomics analyse subfractions of the pan- and core-genome and pangenomics and their benefits on the cyano- of a specific environmental adaptation. For the bacteria research field. Metagenomics was used correlation of pangenomics analysis and functional in the past to analyse the taxonomic composition annotation of cyanobacteria, discrimination in and functional characteristics of cyanobacterial core-genome and clade-specific functional annota- communities in different habitats such as oceans tion and determination of specific genes for a subset 2 | Simm et al. of cyanobacterial strains with either thermophilic classes of secondary metabolites from cyanobacte- character, growth habitat or the capability to dif- ria other than NRPS and PKS. ferentiate heterocyst were analysed. In addition, the detection of core- and clade-specific functions in cyanobacteria via multi-genome comparison analy- Introduction ses is explained and exemplified using the gene sets Current research is mainly influenced by the accel- of non-ribosomal peptide synthetases (NRPS) and erated rate of DNA sequencing projects due to the polyketide synthases (PKS). revolution of NGS technologies (Koboldt et al., Cyanobacteria are a prolific source of natural 2013; Reuter et al., 2015). Today, sequencing of products and produce a vast array of compounds, full genomes, especially of prokaryotes, is achiev- including many notorious toxins as well as natural able for modest research institutes and around products of huge interest to pharmaceutical and ~20,000 complete genomes are available to date biotechnological industries. Genome mining has (http://www.ncbi.nlm.nih.gov/genome/). On the enabled the identification and characterization of one hand, handling such an increasing amount of natural product gene clusters, and mechanisms that information represents a big challenge. On the are unique to cyanobacteria, or rarely seen in other other hand, the provided information boosts the organisms, have been discovered. Many cyano- understanding of the complexity of biological sys- bacterial secondary metabolites are cyanotoxins, tems and enables a multi-genome comparison (Dai which show a broad range of chemical structures et al., 2011; Hardison, 2003; Miller et al., 2004; and biological activities, but in addition to toxin Wei et al., 2002). Such analyses range from inves- production, also several NRPS and PKS gene tigations of evolutionary relations, assignments clusters are devoted to important cellular processes of core- and pan-genomes of strains or species to in cyanobacteria such as iron uptake and nitrogen define common gene sets, and up to the definition fixation. Most of the biosynthetic clusters identi- of the taxa composition extracted from a specific fied here have unknown end products, highlighting environment (Handelsman, 2004; Mira et al., the power of genome mining for the discovery of 2010). new secondary metabolites. These studies show The term ‘metagenomics’ (Table 1.1) was intro- that cyanobacteria encode a huge variety of cryptic duced by Handelsman and coworkers to analyse gene clusters involved in the production of natural the microflora of soil (Handelsman et al., 1998). products, and the known chemical diversity to date It had its beginning by the characterization of the is likely to be only a fraction of the true biosynthetic 16S rRNA sequence of microbes from an envi- capabilities of this fascinating and ancient group ronmental sample (Ussery et al., 2009). Today, of organisms. Furthermore, mechanistic insights metagenomics describes the analysis of entire com- obtained from the biochemical studies of cyano- munities of microbes without isolation or culturing bacterial pathways can inspire the development of of individual community members (Riesenfeld concepts for the design of bioactive compounds by et al., 2004) and is divided into two subcategories synthetic-biology approaches in the future. Here, called ‘full shotgun metagenomics’ (Xia et al., we survey the biosynthetic pathways of the top 2011) and ‘marker gene amplification metagenom- five most researched cyanobacterial species with ics’ (Handelsman, 2009). the most extensive literature, including Microcystis The first metagenomics studies were performed aeruginosa NIES-843, Synechocystis sp. PCC 6803, early in the past decade focusing on low-diversity Anabaena sp. PCC 7120, Arthrospira platensis environments like water samples of the Sargasso NIES-39 and Synechococcus elongatus PCC 7942. Sea (Venter et al., 2004), acid mine drainage (Tyson These analyses have enabled the identification of et al., 2004) or the human gut microbiome (Breit- biosynthetic gene clusters for structurally diverse bart et al., 2003). Today, this strategy is applied metabolites, including non-ribosomal peptides, to the description of communities in very diverse polyketides, ribosomal peptides, terpenes and fatty environments like oceans (Bohannon, 2007), soils acids. We highlight the unique enzyme mechanisms (Daniel, 2005), and the human mouth and gastro- that were elucidated or can be anticipated for the intestinal tract (Belda-Ferre et al., 2012; Breitbart individual products, but further include different et al., 2003; Segata et al., 2012; Wang et al., 2015). The Cyanobacterial Core Genome | 3 Table 1.1 Terminology of metagenome analysis Term Definition Metagenomics Metagenomics (from the Greek term ‘meta’, meaning ‘after’ or ‘beyond’) is the culture- independent genomics analysis of microbial communities. Metagenomics describes the functional and sequence-based analysis of the collective microbial genomes contained in an environmental sample. Metagenomics transcends the individual organism to the ‘meta level’ of the community (Handelsman, 2004, 2009; Handelsman et al., 1998). Pangenome The pangenome (from the Greek term ‘pan’, meaning ‘all’) is the ‘union’ of the ‘gene sets’ – both terms are taken from the set theory, a branch of mathematical logic – of all the selected genomes. The term pangenome was first used to define the gene sets of all strains of a species. It can be divided in a core-genome and a variable genome (Tettelin et al., 2005). Open For open pangenomes an undetermined number of additional genomes is needed to identify the pangenome whole gene repertoire of the species within a phylum (Bentley, 2009). Closed Closed pangenomes grant the assumption that the entire gene repertoire is covered by the strains pangenome used and newly sequenced genomes will not give additional information (Rouli et al., 2015). Core genome The core genome contains gene families shared by all the selected strains or species (intersection of gene families). Variable The variable genome contains strain-specific genes and gene families shared by two or more genome organisms, but not by the entirety of all analysed genomes. Dispensable The dispensable genome is the part of the variable genome that exists in 2n−1 organisms of the genome analysed set (Vernikos et al., 2015). Unique genes Unique genes are genes of the variable genome which only exist in one strain (Medini et al., 2005). CLOG CLique of Orthologous Genes (CLOG) defines a group of orthologous sequences from at least two different species (Simm et al., 2015). Signature genes Signature genes are defined to be specific to a taxonomic rank, occurring exclusively in all or most of the members in a taxon (Dutilh et al., 2008) Here, analyses of known and unknown organisms name just a few (Bashir et al., 2014; Handelsman, by metagenomics lead to a community biodiversity 2004, 2009; Lorenz and Eck, 2005). profile (Oulas et al., 2015). In addition, metagen- In addition to metagenomics, pangenome (Fig. omics has been applied to create fingerprints on 1.1) analyses were performed to determine the specific environments. For instance, the communi- entirety of the gene sets of strains, species or phyla ties in areas of volcanism (Kilias et al., 2013; Urich (Vernikos et al., 2015). The term ‘pangenome’ was et al., 2014; Xie et al., 2011), extreme temperature introduced by Tettelin and coworkers to describe all (Bradford et al., 2009; Pearce et al., 2012), alkalin- sequences shared by genomes of interest (Tettelin ity (Xiong et al., 2012), acidity (Johnson et al., et al., 2005). Some recent reports use distinct terms 2015; Mendez-Garcia et al., 2015), low oxygen such as supragenome or species-genome, which (Stevens and Ulloa, 2008; Voorhies et al., 2012) are all equivalent to the original term pangenome and high heavy metal composition (Chodak et al., (Broadbent et al., 2012). Pangenome analyses 2013; Golebiewski et al., 2014) have been studied. require a set of sequenced genomes with annotated Searching in extreme habitats can lead to the dis- genes and were used to identify the gene set of all covery of new species, but also of novel enzymes strains of a species (Medini et al., 2005; Mongodin for catalysing reactions of biotechnological com- et al., 2013; Tettelin et al., 2005; van Schaik et al., mercialization (Segata et al., 2011). At present, 2010) or a genus (Jacobsen et al., 2011; Kettler et technologies and bioinformatics strategies for al., 2007; Lefebure and Stanhope, 2007), a specific metagenomics enable the investigation of multi- set of species (Collins and Higgs, 2012; Eppinger tude of existing microbial communities (Fig. 1.1) et al., 2011; Lapidus et al., 2008), a whole phylum to reveal solutions for challenges in human health, (Collingro et al., 2011) and even a super kingdom alternative energy, environmental remediation, (Lapierre and Gogarten, 2009). Recently, pange- biotechnology and environmental stewardship, to nome analysis was introduced to define gene sets