Plant Physiology Preview. Published on June 9, 2015, as DOI:10.1104/pp.15.00520 1 Running head: Gene duplications of JmjC-containing histone demethylases in 2 angiosperms 3 4 Correspondence: 5 Hong Ma, 6 School of Life Sciences, Fudan University, Shanghai 200438, China 7 Tel: +86 21 65642800, 8 Email: [email protected]; 9 10 Liangsheng Zhang, 11 School of Life Sciences and Technology, Tongji University, Shanghai 200092, China 12 Tel: +86 21 65988501, 13 Email: [email protected] 14 15 Research area: Genes, Development and Evolution 16 17 1 Downloaded from on March 29, 2019 - Published by www.plantphysiol.org Copyright © 2015 American Society of Plant Biologists. All rights reserved. Copyright 2015 by the American Society of Plant Biologists 18 Expansion and functional divergence of JmjC-containing histone demethylases: 19 significance of duplications in ancestral angiosperms and vertebrates 20 21 Shengzhan Qian1, Yingxiang Wang1, Hong Ma1,2* and Liangsheng Zhang3,4* 22 1State Key Laboratory of Genetic Engineering and Collaborative Innovation Center of Genetics 23 and Development, Ministry of Education Key Laboratory of Biodiversity Science and Ecological 24 Engineering and Institute of Biodiversity Sciences, Institute of Plant Biology, Center for 25 Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200438, China 26 2Institute of Biomedical Sciences, Fudan University, Shanghai 200032, China 27 3Department of Bioinformatics, School of Life Sciences and Technology, Tongji University, 28 Shanghai 200092, China 29 4Advanced Institute of Translational Medicine, Tongji University, Shanghai 200092, China 30 31 32 One-sentence summaries: 33 Duplication and sequence divergence of a gene family for histone demethylases likely 34 contributed to the enhancement of chromatin-based regulation in angiosperms and vertebrates 2 Downloaded from on March 29, 2019 - Published by www.plantphysiol.org Copyright © 2015 American Society of Plant Biologists. All rights reserved. 35 This work was supported by grants from the National Natural Science Foundation of China 36 (91131007) and the Ministry of Science and Technology of China (2011CB944603). 37 Address correspondence to [email protected] or [email protected]. 38 3 Downloaded from on March 29, 2019 - Published by www.plantphysiol.org Copyright © 2015 American Society of Plant Biologists. All rights reserved. 39 ABSTRACT 40 Histone modifications, such as methylation and demethylation, are crucial mechanisms altering 41 chromatin structure and gene expression. Recent biochemical and molecular studies have 42 uncovered a group of histone demethylases called Jumonji C (JmjC) domain proteins. However, 43 their evolutionary history and patterns have not been examined systematically. Here we report 44 extensive analyses of eukaryotic JmjC genes and define 14 subfamilies, including the KDM3, 45 KDM5, JMJD6, PKDM11 and PKDM13 subfamilies shared by plants, animals and fungi. Other 46 subfamilies are detected in plants and animals but not in fungi (PKDM12) or in animals and fungi 47 but not in plants (KDM2 and KDM4). PKDM7, PKDM8 and PKDM9 are plant-specific groups, 48 whereas JARID2, KDM6 and PKDM10 are animal specific. In addition to known domains, most 49 subfamilies have characteristic conserved amino acid motifs. Whole genome duplication (WGD) 50 was likely an important mechanism for JmjC duplications, with four pairs from an 51 angiosperm-wide WGD and others from subsequent WGDs. Vertebrates also experienced JmjC 52 duplications associated with the vertebrate ancestral WGDs, with additional mammalian paralogs 53 from tandem duplication and possible transposition. The sequences of paralogs have diverged in 54 both known functional domains and other regions, showing evidence of selection pressure. The 55 increases of JmjC copy number and the divergences in sequence and expression might have 56 contributed to the divergent functions of JmjC genes, allowing the angiosperms and vertebrates to 57 adapt to a great number of ecological niches and contributing to their evolutionary successes. 58 4 Downloaded from on March 29, 2019 - Published by www.plantphysiol.org Copyright © 2015 American Society of Plant Biologists. All rights reserved. 59 Chromatin-based regulation is an important mechanism of modulating eukaryotic chromatin 60 structure and gene expression, by altering DNA and histone modifications rather than changing 61 DNA sequences. The eukaryotic chromatin contains a group of highly conserved proteins called 62 histones, including the core histones H2A, H2B, H3, and H4, as well as the linker histone H1. 63 The core histones, with two copies each, form an eight-subunit complex, which is wrapped by 64 146 base pairs of DNA to form the nucleosome, the basic unit of chromatin. In each nucleosome, 65 the hydrophobic carboxyl-terminal regions of the eight subunits occupy the interior, whereas the 66 hydrophilic amino-terminal regions (the tails) extend outward (Luger et al., 1997). The covalent 67 histone modifications occur on the tails of the core histone proteins and encode epigenetic 68 information that can be passed through mitosis, sometimes even meiosis, and alter chromatin 69 structure and modulate genomic functions. Specifically, histone modifications include 70 methylation, acetylation, phosphorylation, ubiquitination and sumoylation. Chromatin structure 71 can be regulated by three classes of proteins: DNA methylase and demethylases, chromatin 72 remodelers that regulate nucleosome positioning, and enzymes for histone modifications. 73 Among various histone modifications, the role of methylation varies in different species 74 (Feng et al., 2010; Liu et al., 2010) but is relatively stable and suited for the transmission of 75 epigenetic information (Strahl and Allis, 2000), sometimes even transgenerationally as supported 76 by a recent study in Arabidopsis (Arabidopsis thaliana) showing that the vernalized state can be 77 partially transgenerational inherited due to a defect in the reduction of H3K27me3 levels at the 78 FLC locus (Crevillen et al., 2014). Histone methylation occurs on arginine and lysine residues 79 and is involved in a wide range of biological processes including gene expression, chromatin 80 structure, dosage compensation, and epigenetic memory (Martin and Zhang, 2005). A lysine 81 residue can be mono-, di- or tri-methylated; however, an arginine residue can only be mono- and 82 di-methylated. Different histone modifications regulate distinct functional outcomes within an 83 epigenetic marking system: methylation at histone H3 lysine 4 (H3K4), H3K36 and H3K79 is 84 correlated with higher gene expression, whereas methylation at H3K9, H3K27 and H4K20 is 85 associated with lower gene expression, hence often termed ‘activating’ or ‘repressing’ marks, 86 respectively, although the causal relationships between histone modification and transcriptional 87 activity is still elusive (Martin and Zhang, 2005; Henikoff and Shilatifard, 2011; Dong and Weng, 88 2013). 5 Downloaded from on March 29, 2019 - Published by www.plantphysiol.org Copyright © 2015 American Society of Plant Biologists. All rights reserved. 89 Histone methylation has been regarded as an irreversible modification for a long time, 90 because of the stable nature of the C-N bond; this idea is also supported by similar half-lives of 91 histone fractions and methyl lysine and methyl arginine marks (Byvoet et al., 1972). In 2004, the 92 discovery of the first histone demethylase, known as LSD1 (lysine specific demethylase 1), 93 provided experimental evidence for enzymatic demethylation (Shi et al., 2004). The single-copy 94 LSD1 mediates oxidative demethylation on mono- or di-methylated H3K4 and/or H3K9, but not 95 tri-methylated H3K4 (Shi et al., 2004). Subsequently, a second and larger class of demethylases 96 containing a JmjC domain was identified (Tsukada et al., 2006). Unlike LSD1, the JmjC proteins 97 do not require a protonated nitrogen and can also reverse tri-methylated histone lysine state (Shi 98 and Whetstine, 2007). Five amino acid residues within predicted cofactor binding sites in the 99 JmjC domain are conserved and important for enzymatic activity (Klose et al., 2006). Among 100 these, three residues bind to the Fe (II) cofactor and two other residues are required for 101 α-ketoglutarate (αKG) binding. 102 Several plant JmjC genes are known to have important functions in regulating development 103 and environmental responses (Liu et al., 2010; Chen et al., 2011; Kooistra and Helin, 2012). For 104 example, Polycomb Repressive Complex 2 (PRC2), a conserved and key transcriptional regulator 105 in animals and plants, has been demonstrated to have H3K27me3 affinity (Cao et al., 2002). In 106 Arabidopsis, REF6/AtPKDM9A and ELF6/AtPKDM9B are closely related JmjC paralogs and 107 have H3K27me3 demethylase activities (Lu et al., 2011; Crevillen et al., 2014), but have different 108 roles in the regulation of flowering time (Noh et al., 2004). The elf6 mutant flowers early and 109 exhibits reduced expression of the flowering repressor FLC (Noh et al., 2004), due to an elevated 110 level of the H3K27me3 at the FLC locus (Crevillen et al., 2014), but the ref6 mutant shows an 111 FLC- dependent late-flowering phenotype (Noh et al., 2004). In addition, a recent study showed 112 that loss of H3K27me3 was observed after salt priming in Arabidopsis seedlings (Sani et al., 113 2013), suggesting that some JmjC genes may be induced upon salt stress. To date, many members 114 of JmjC gene family have not been characterized genetically or biochemically; nevertheless, 115 sequence and evolutionary analyses can help predict their functions in histone demethylation. 116 However, the expression and functional divergence of JmjC duplicates have not been studied 117 extensively, especially in response to abiotic stresses, such as ABA, salt and drought. In addition, 118 members of the PKDM7 subfamily exhibited distinct expression patterns correlated with their 6 Downloaded from on March 29, 2019 - Published by www.plantphysiol.org Copyright © 2015 American Society of Plant Biologists. All rights reserved. 119 different enzymatic activities (Lu et al., 2008), but the underlying sequence divergence and 120 dynamic evolution are still unknown. 121 An early study of the JmjC gene family using sequences from Arabidopsis, rice and human 122 defined 12 subfamilies according to phylogenetic relationships, with support from the presence of 123 other domains (Zhou and Ma, 2008). These domains play potential regulatory roles in the 124 demethylation process, such as the recognition of methylated histone marks (e.g. PHD and Tudor), 125 protein-protein interaction (e.g. F-box) and DNA binding (e.g. C2H2 zinc finger) (Zhou and Ma, 126 2008). However, the previous reports have focused on a single or a few species (Lu et al., 2008; 127 Zhou and Ma, 2008), thus there have not been a systematic study of JmjC family in plants or 128 animals. The recent progresses in genome sequencing provide an opportunity to gain additional 129 insights into the evolution of the JmjC family during the histories of angiosperms and vertebrates. 130 Histone methylation mainly mediated by SET domain protein methyltransferase, and genome 131 duplication has resulted in an increase of SET copy numbers. In Arabidopsis, five duplicated gene 132 pairs are retained after recent genome duplication events, and nineteen pairs are retained in 133 Populus trichocarpa (poplar) (Lei et al., 2012). It would also be helpful to learn whether the 134 JmjC gene family has a similar evolutionary pattern to that of the SET genes. The evolutionary 135 patterns of these two families could inform the balanced influence of demethylation and 136 methylation on species evolution and divergence. In this study, we have characterized the 137 evolution of JmjC genes in major eukaryotic lineages, including land plants, using phylogenetic 138 and domain analyses, and define 14 monophyletic subfamilies: JARID2, JMJD6, KDM2, KDM3, 139 KDM4, KDM5, KDM6, PKDM7, PKDM8, PKDM9, PKDM10, PKDM11, PKDM12 and 140 PKDM13. We show that some families underwent distinct gene duplication during evolution in 141 angiosperms, especially in the ancestor of extent angiosperms. 142 143 7 Downloaded from on March 29, 2019 - Published by www.plantphysiol.org Copyright © 2015 American Society of Plant Biologists. All rights reserved. 144 RESULTS 145 Identification of JmjC genes in plants, animals and fungi 146 The complete set of JmjC genes was identified from a comprehensive dataset that contains 147 selected plants, animals and fungi based on HMMER software. In all, 434 sequences, each 148 containing a JmjC domain, were retrieved from 35 different organisms including 11 plants, 16 149 metazoans, 7 fungi and Monosiga brevicollos (a unicellular choanoflagellate, a protist related to 150 animals) (Fig.1; Supplemental Tables S1 and S2). The identified JmjC proteins range in size from 151 266 to 2740 aa. Among major lineages of plants, JmjC genes are present in algae, Bryophyta, 152 Pteridophyta, and angiosperms. The copy number of JmjC genes varies considerably among 153 plants, ranging from 2 in the green algae Chlamydomonas reinhardtii and Volvox carteri to 17 in 154 rice O. sativa (monocot) and 21 in Arabidopsis (eudicot), with the highest number of 27 in poplar 155 Populus trichocarpa (eudicot). JmjC genes are also widespread in animals, from simple 156 invertebrates, such as sponge Amphimedon queenslandica, to mammals, such as human, with the 157 gene copy number ranging from 6 to 28, as well as in the unicellular M. brevicollis. Further 158 investigation reveals that the copy number variation in animals is mainly due to gene duplications 159 in vertebrates, also, the distribution of subgroup PKDM12 in different organisms is complex. In 160 fungi, there are fewer than 6 genes for each species, such as 1 in Schizosaccharomyces pombe, 3 161 in Saccharomyces cerevisiae and 5 in Agaricus bisporus var. bisporus, and altogether 27 JmjC 162 gene sequences were retrieved from 7 fungi. 163 In order to standardize gene name and be consistent with the literature, we adopted a 164 common nomenclature system based on the names of human genes according to the chromatin 165 modifying enzyme activities of animal members and Arabidopsis genes designated in previous 166 studies, as well as our phylogenetic analysis (Allis et al., 2007; Zhou and Ma, 2008). First, for 167 human and Arabidopsis genes with known functions or previous researches, the published gene 168 names were retained and used as a reference. Second, the orthologs of these genes from plants, 169 animals and fungi, respectively, were named following established reference. Ultimately, recent 170 paralogs were distinguished with an upper case letter after the number, using the same letter for 171 orthologs between organisms whenever possible. 172 173 Phylogenetic classification of JmjC genes into fourteen subfamilies 8 Downloaded from on March 29, 2019 - Published by www.plantphysiol.org Copyright © 2015 American Society of Plant Biologists. All rights reserved. 174 To explore the evolutionary relationships of eukaryotic JmjC genes, we conducted 175 phylogenetic analyses with an alignment of the conserved JmjC domain from representative 176 species using NJ, ML and Bayesian methods. ML and Bayes analyses showed that proteins from 177 different species cluster together in clades with high support values, with support from NJ 178 analysis for most results. According to the results from phylogenetic and motif analyses, the 179 eukaryotic JmjC genes can be divided into fourteen subfamilies, designated as JARID2, JMJD6, 180 KDM2, KDM3, KDM4, KDM5, KDM6, PKDM7, PKDM8, PKDM9, PKDM10, PKDM11 181 PKDM12 and PKDM13 (Fig. 2). Among these subfamilies, KDM3, KDM5, JMJD6, PKDM11 182 and PKDM13 each contain members from plants, animals and fungi. On the other hand, 183 PKDM12 lacks fungal members and KDM2 and KDM4 lack plant members. In addition, PKDM7, 184 PKDM8 and PKDM9 are plant-specific groups, and JARID2, KDM6 and PKDM10 are animal 185 specific. 186 The fact that KDM3, KDM5, JMJD6, PKDM11 and PKDM13 subfamilies have members 187 from major multicellular groups of eukaryotes, including plants, animals and fungi, suggests that 188 these clades originated from five respective ancestral genes in the most recent common ancestor 189 (MRCA) of the three kingdoms. According to our phylogenetic analyses, JMJD6, PKDM11 and 190 PKDM13 cluster together with strong support, but some of the internal relationships of these 191 clades are not clear. The KDM2 and KDM4 subfamilies both contain animal and fungus JmjC 192 genes, indicating that the two clades are derived from ancestral genes that were present before the 193 separation of animals and fungi. The KDM4 clade is well supported by all phylogenetic methods, 194 but the classification of KDM2 also relied on the protein domain structures. The PKDM7, 195 PKDM8 and PKDM9 subfamilies are plant specific. According to the tree topology, the PKDM8 196 subfamily forms a sister group to PKDM9 and together they are from ancestral gene in the 197 MRCA of plants, animals and fungi, because they are sister to a large clade with genes from all 198 three kingdoms. Both PKDM7 and PKDM9 subfamilies contain genes from major lineages of 199 land plants, including Bryophyta, Pteridophyta and angiosperms, revealing that they are retained 200 in land plants. Whereas PKDM8 genes are conserved from Pteridophyta to angiosperms, 201 suggesting a likely loss of this subfamily in non-vascular plants. The JARID2, KDM6 and 202 PKDM10 subfamilies are composed of only animal genes. However, the sisterhood of JARID2 203 and KDM6 to other clades containing genes from all three kingdoms suggests that the origins of 9 Downloaded from on March 29, 2019 - Published by www.plantphysiol.org Copyright © 2015 American Society of Plant Biologists. All rights reserved. 204 these subfamilies in the MRCA of three kingdoms; similarly, the ancestral gene of PKDM10 was 205 probably already present in the MRCA of animals and fungi. The early origins of JARID2 and 206 PKDM10 were further supported by identification of their homologs in M. brevicollis. 207 In summary, eukaryotic JmjC genes form fourteen subfamilies. These subfamilies can be 208 grouped into four categories: 5 shared by all three major eukaryotic kingdoms; 1 shared by 209 animals and plants; 2 found in both animals and fungi, with the counterparts loss from plants; 3 210 plant specific, as well as 3 animal specific. It is likely that there were at least 12 ancestral JmjC 211 genes present in the MRCA of three major eukaryotic kingdoms, with subsequent duplication and 212 losses in specific kingdoms (Fig. 2C). Additionally, within most subfamilies, except for JARID2, 213 PKDM11 and PKDM13, there are two or more gene copies in plants or animals, suggesting 214 further functional divergence of JmjC genes. 215 216 Different domain architecture and conserved non-JmjC motifs in subfamilies 217 According to the phylogenetic tree shown in Fig. 2A, fourteen monophyletic subfamilies 218 were identified in the JmjC gene family. These fourteen subfamilies represent twelve different 219 domain architectures, as three subfamilies (PKDM11, PKDM12 and PKDM13) possess only the 220 JmjC domain. Among these, the plant-specific PKDM7, PKDM8 and PKDM9 subfamilies have 221 similar domain architectures but PKDM7 proteins possess extra FYRN and FYRC domains. To 222 determine whether motifs outside the JmjC and other known domain were conserved between 223 members of the same subfamily, we searched for motifs in our dataset of JmjC proteins. We 224 found 40 motifs with lengths of greater than 10 amino acids that are specific within subfamilies 225 (Supplemental Table S3; Supplemental Fig. S1). The subfamilies exhibited various combinations 226 of motifs, yet similar with a subfamily, except for KDM3 and JMJD6, which have different 227 conserved motifs in plant and animal proteins (Fig. 2B). None of the 40 conserved motifs 228 corresponded to known domains in the Pfam database. The highly conserved motifs shared by 229 members of subfamilies of the JmjC domain proteins further support the classification presented 230 here. The conservation of these additional domains in respective subfamilies implies that they 231 play important roles in functions of these subfamilies. 232 10 Downloaded from on March 29, 2019 - Published by www.plantphysiol.org Copyright © 2015 American Society of Plant Biologists. All rights reserved.
Description: