Tuominenetal.BMCGenomics2011,12:236 http://www.biomedcentral.com/1471-2164/12/236 RESEARCH ARTICLE Open Access Differential phylogenetic expansions in BAHD acyltransferases across five angiosperm taxa and evidence of divergent expression among Populus paralogues Lindsey K Tuominen1, Virgil E Johnson1,2 and Chung-Jui Tsai1,2* Abstract Background: BAHD acyltransferases are involved in the synthesis and elaboration of a wide variety of secondary metabolites. Previous research has shown that characterized proteins from this family fall broadly into five major clades and contain two conserved protein motifs. Here, we aimed to expand the understanding of BAHD acyltransferase diversity in plants through genome-wide analysis across five angiosperm taxa. We focus particularly on Populus, a woody perennial known to produce an abundance of secondary metabolites. Results: Phylogenetic analysis of putative BAHD acyltransferase sequences from Arabidopsis, Medicago, Oryza, Populus, and Vitis, along with previously characterized proteins, supported a refined grouping of eight major clades for this family. Taxon-specific clustering of many BAHD family members appears pervasive in angiosperms. We identified two new multi-clade motifs and numerous clade-specific motifs, several of which have been implicated in BAHD function by previous structural and mutagenesis research. Gene duplication and expression data for Populus-dominated subclades revealed that several paralogous BAHD members in this genus might have already undergone functional divergence. Conclusions: Differential, taxon-specific BAHD family expansion via gene duplication could be an evolutionary process contributing to metabolic diversity across plant taxa. Gene expression divergence among some Populus paralogues highlights possible distinctions between their biochemical and physiological functions. The newly discovered motifs, especially the clade-specific motifs, should facilitate future functional study of substrate and donor specificity among BAHD enzymes. Background benzoyltransferase from Dianthus caryophyllus; DAT BAHD acyltransferases make up a large family of or deacetylvindoline 4-O-acetyltransferase from Cath- enzymes responsible for acyl-CoA dependent acylation aranthus roseus). Currently, the BAHD family encom- of secondary metabolites, typically resulting in the for- passes over sixty biochemically characterized members mation of esters and amides. In a foundational paper, in plant taxa ranging from gymnosperms to monocots St. Pierre & De Luca [1] named the family after the to legumes. Previous work has shown that these first four characterized members (BEAT or benzylalco- enzymes may be involved in synthesis or modification hol O-acetyltransferase from Clarkia breweri; AHCTs of such diverse metabolites as alkaloids, terpenoids and or anthocyanin O-hydroxycinnamoyltransferases from phenolics, with ecophysiological roles in minimizing Petunia, Senecio, Gentiana, Perilla, and Lavandula; cuticular water loss, defending against herbivory, and HCBT or anthranilate N-hydroxycinnamoyl/ attracting pollinators (reviewed in [2]). The BAHD family has been previously organized into *Correspondence:[email protected] five major phylogenetic clades, using 46 biochemically 1WarnellSchoolofForestryandNaturalResources,UniversityofGeorgia, or genetically characterized members [2]. This classifica- Athens,GA30602-2152,USA tion revealed both clade-specific and clade-independent Fulllistofauthorinformationisavailableattheendofthearticle ©2011Tuominenetal;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited. Tuominenetal.BMCGenomics2011,12:236 Page2of17 http://www.biomedcentral.com/1471-2164/12/236 biochemical activities among family members. For most abundant of these metabolites are the phenylpro- example, benzoyl-CoA donor utilization so far appears panoid-derived non-structural phenolics known to play to be limited to Clade V, while hydroxycinnamoyl-CoA significant roles in biotic and abiotic stress responses in has been reported as a donor for members in multiple this genus [8,9]. The diversity of Populus phenylpropa- clades [2]. Substrate specificity typically varies among noids (e.g., hydroxycinnamate derivatives, flavonoids, clades, and sometimes within clade as well. For example, condensed tannins and salicylate-containing phenolic Clade I members act mainly upon flavonoids, while glycosides) can be attributed in large part to side-chain Clade V members utilize substrates ranging from terpe- modifications, such as glycosylation, methylation, and noids to medium-chain alcohols to quinic acid, in asso- acylation [8]. We therefore used a phylogenomic ciation with major phylogenetic branches within this approach to develop an updated phylogeny of the Popu- clade [2]. Similar diversity of function was also noted for lus BAHD acyltransferase family in reference to four Clade III members, which are involved in formation of other angiosperm taxa. Together with gene duplication alkaloids, esters, and flavonoids, but functional associa- and expression analyses, our data suggest that lineage- tion was less clear due to the smaller size of subclades specific gene duplication is a key process in BAHD in this branch. This highlights both the diversity of the family evolution. The results are consistent with a role BAHD family and the potential challenge of phylogeny- of the BAHD acyltransferases in diversifying the second- based functional inference with limited sequence and/or ary metabolite repertoire in plants. species representation. Most functionally characterized BAHD acyltrans- Results ferases share two conserved motifs, HXXXD and Populus Has More BAHD Acyltransferase Genes Than DFGWG [2]. The conservation of these motifs has facili- Arabidopsis, Medicago, Oryza, and Vitis tated in silico identification of BAHD acyltransferases BLASTP searches against the JGI Populus trichocarpa from available genome sequences [3,4]. The HXXXD genome release v1.1 revealed 149 unique loci with high motif is also found in other thioester CoA-utilizing acyl- similarity to biochemically characterized BAHD acyl- transferase families [1] and is absolutely conserved transferases from a previous review [2]. Manual curation among BAHD acyltransferases. Its importance for cata- and referencing against the recently released genome lysis was first established by site-directed mutagenesis v2.0 were conducted to exclude loci lacking a conserved [5,6]. Crystallographic analysis of the chrysanthemum motif (HXXXD or DFGWG), loci that represented (Dendranthema × morifolium) malonyltransferase redundant, possibly allelic copies, and loci resembling Dm3MaT3 provided the structural basis for the catalytic spurious gene models (see Methods). The final list of role of the His residue in malonyl-CoA binding [7]. The 100 putative Populus BAHD acyltransferases was used importance of the DFGWG motif, which is highly but for all subsequent analyses and annotation (Additional not absolutely conserved, for enzyme activity was first File 1: SupplementalTable1.xls). In the course of our shown in a Salvia malonyltransferase [5] and a Rauvol- work, another group also annotated the BAHD family in fia vinorine synthase [6] based on mutagenesis studies Populus[3] and reported 94 putative gene models. These of the Asp residue. However, structural analysis of models correspond to 74 putative BAHD genes on our Dm3MaT3 suggested that this Asp residue most likely list, with one model that matched two v2.0 gene models plays a structural, rather than catalytic, role in enzyme on our list; the 21 remaining models were either redun- function [7]. Coupling the structural analysis with muta- dant or rejected based on our manual curation criteria genesis studies of two other malonyltransferases from (Additional File 1). Similar BLAST search and quality the same species also revealed a greater structural diver- control measures were also performed for the genomes sity of acyl acceptor binding sites relative to the acyl- of Arabidopsis, Medicago, Oryza, and Vitis, producing CoA donor binding sites [7]. This is consistent with the final lists of 55, 50, 84, and 52 putative BAHD genes, known broad range of acceptor molecules and relatively respectively (Additional File 2: SupplementalTable2.xls). narrow range of acyl-donors utilized by different BAHD These lists include ten biochemically characterized Ara- acyltransferases [2]. bidopsis members and one biochemically characterized Despite the prevalence of BAHD acyltransferases in Medicago member (see Additional File 2 for references). plants, cross-genome analysis of this family is lacking. Genome-wide analyses of this family have recently been Phylogenetic Analysis Supports Eight Major Clades of reported for Arabidopsis and Populus[3,4], but only in a Plant BAHD Acyltransferases single-taxon context. We sought to explore BAHD acyl- Phylogenetic relationships among the BAHD acyltrans- transferase diversity from an evolutionary perspective, ferases were reconstructed using a maximum-likelihood with a primary focus in Populus due to its ability to algorithm, for a collection of 69 biochemically character- synthesize a broad array of secondary metabolites. The ized plant BAHD acyltransferases and the putative Tuominenetal.BMCGenomics2011,12:236 Page3of17 http://www.biomedcentral.com/1471-2164/12/236 members from Populus, Arabidopsis, Oryza, Medicago, former Clade V [2] clustered into two well-supported and Vitis (Figure 1A). The resulting phylogenetic tree is groups in our analysis, renamed hereafter Clades Va and broadly consistent with that of D’Auria [2], who sorted Vb. These clades correspond to Yu et al.’s clades Ia and biochemically characterized BAHD acyltransferases into Ib for both Populus and Arabidopsis[3]. Characterized five major groups. Our expanded analysis suggests that proteins in Clade Va tend to be involved in volatile a grouping of eight major clades is now warranted, a ester formation, while those in Clade Vb are closely finding consistent with previous, single-genome-based, related to hydroxycinnamoyltransferases (HCTs) respon- neighbour-joining analyses [3,4,10]. In particular, a sible for the synthesis of chlorogenic acid and mono- strongly-supported clade comprised entirely of BAHD lignols. Our analysis also placed Clade IV basal to acyltransferases lacking biochemical characterization Clades Va and Vb, with good support. The remaining data was sister to the group of proteins previously desig- sequences clustered into one strongly supported group nated as Clade I by D’Auria [2]. To maintain consis- corresponding to D’Auria’s Clade II [2]. tency, we adopted a similar clade nomenclature, and The distribution of sequences among the five species name the previous and the “new” groups as Clades Ia varied within each clade (Figure 1B). Populus and Oryza and Ib, respectively. Clades Ia and Ib correspond respec- have the largest number of BAHD members overall, and tively to the Populus clades Vb+Vc and Va, and to the collectively these made up the majority of Clades Ia, Va, Arabidopsis clades IIb and IIa reported by Yu et al. [3]. and Vb. Populus also predominated in the dicot-specific Another strongly supported clade containing the Petu- Clade IIIa, while Clade IV was monocot-specific. Taxon nia acetyl CoA:coniferyl alcohol acetyltransferase bias was also evident in Clades Ib and IIIb, where Medi- (CFAT, [11]) was sister to the group classified by cago and Vitis, respectively, were over-represented. D’Auria as Clade III [2]. We name the previous and the When analyzed by species, Clade Va, the largest clade, “new” clades as IIIa and IIIb, respectively; these corre- remained the largest group in all taxa, except in Medi- spond to the Populus clades IV and II and Arabidopsis cago where Clade Ib predominated (Figure 1C). Clades clades IV and IIIa in Yu et al. [3]. Members of the II, IIIb, and IV had the lowest representations overall, consistent with their small overall sizes. The major exception to this pattern was Vitis, which showed a rela- tively higher representation of Clade IIIb, coinciding A B C40 Ia Populus with a much lower representation of Clade Ia. Other 100 Ia 20 species-biased patterns included high (>20%) representa- 93 tion for Clade IIIa in Populus, Clade Ib in Arabidopsis, Ib 0 40 and Clade Ia in Oryza. 100 Ib Arabidopsis Closer examination of the phylogeny revealed that IIIa 20 BAHD sequences from a single taxon tended to cluster together, especially within the larger clades. In Clade Ia, 100 IIIa 400 79 IIIb Medicago all sequences from the five taxa formed lineage-specific 100 IIIb 20 groups with strong bootstrap support, except for one 100 II well-supported subgroup (Figure 2, bracket). Oryza II 100 0 sequences were basal to all eudicot sequences in this 40 IV Oryza clade. Two strongly-supported subclades consisting of a 47 IV 20 combined total of sixteen Populus sequences comprised 84 86 Va another large, but in this case weakly-supported, group, 0 40 sister to a group of eight Arabidopsis sequences, includ- Va Vitis ing a malonyl CoA:cyanidin 3,5-diglucoside transferase 87 20 (At5MaT, [4,10]; Figure 2). Similar, but less dramatic 100 Vb Vb patterns were observed for Clade Ib (Additional File 3: 0 0.5 IIaa IIbb IIII IIIIIIaaIIIIIIbbIIVV VVaaVVbb SupplementalFigure1.png). While the two most basal Figure1PhylogenyandDistributionofBAHDAcyltransferases. subgroups in this clade did not show strong taxon speci- A:ProteinphylogenyofbiochemicallycharacterizedBAHD ficity, the two remaining subgroups each comprised five acyltransferasesandputativeBAHDproteinsfromArabidopsis, taxon-specific branches with strong support (Additional Medicago,Oryza,Populus,andVitisgenomes.Phylogenywas File 3). In accordance with its overrepresentation overall constructedusingmaximumlikelihoodanalysis.B:Percentage representationofputativeBAHDacyltransferasesacrossthefivetaxa in Clade Ib, Medicago exhibited substantial taxon-speci- withineachphylogeneticclade.Colorscorrespondtotheplanttaxa fic expansions within these two branches. aslistedinC.C:Percentagerepresentationofclademembershipfor Taxon-specific clustering appeared more scattered in putativeBAHDacyltransferaseswithineachplantgenome. Clade IIIa, perhaps because the larger of the two major Tuominenetal.BMCGenomics2011,12:236 Page4of17 http://www.biomedcentral.com/1471-2164/12/236 Clade Ia 75 100 53 100 0600.6m6.m.00m506025.m55229506751260 CIlIaIade 100 100 3g26C0C4rCaD0PrAMuTAn1T 01.m08401 100 AATL11 10090 6949 100 94 0088.0m0.04m402..m04m.m491191110010003301.7404m47728.00m.m51500770096825 63 55 79 100AA19T090LA1A4TL15453ggg14357039259458g0g004175948000 02.m07974 94 98 5g23970 100 100 1001080002.m0070292.m6.m10077997981 100 99 GA1A55T4L104g02041420 02.m07964 100 AATL2 79100 100 0G2m.mIF0779M6a8T 65 100 CmAACA2T04236814 GmMT7 100 RhAAT1 100 MtMAT3 100 FaSAAT MtMAT1 100 FvVAAT 100 9993 AMCtM14A6T524916 91 100 AAAATTLL1176 100 5g39090 Ss5MaT2 100 78 5g359g03590080 51 100 G3125C4b0B0E1AT 99 At5MaT AATL1 76 3g29670 98 AATL3 100 100 100 3g32g9526g93665181060 88 AAAAAATTTLLL12982 69 10094 10086100MMMMAAAATTTTLLLL11M1M16230AATTLL21 100 101100000 AAAAATAAAALTAT7TLTLL81L29243 100 67 MAMTLA1T7L3 100G3510A8A0T0L15 55 82 MAMTAL4TL5 100 100AATALA6TL20 100 MATL6 AATL21 70 72100101000 MAMMTALA9TTLL78 100 AAPTsRSLs1aV0lAISTY MATL11 100 CT57305510 92 MATL14 CT57305503 10099 DmS3cM3MAaTT1 78 100 A1AgT2L414230 100 DDv3mM3AMTAT2 0.5 10059 AAATCL1123357201 Gt5AT 100 Pf3AT Figure 3 Phylogenetic Relationship of Clade III Members. 100 60 100 PfS5Ms5aMTaT1 ExpandedviewofallCladeIIIsequencesfromFigure1A.Colorsand 98 NtMAT1 88 100 VhL3pM3AMTA1T1 sinydmicbaotelssaerqeutehnecseasmfreomaspinlaFnitgsuwreitsh1inatnhde2R.oIsnidasd,dwithioilen,tpeianlkcicricrlceles G29208001 100 G29205001 indicatessequencefromabasaleudicot. 100 MATL15 AC14656606 0.5 93 MATL18 79 At3AT2 53 100 ACA1t37A11T61806 subclade containing seven of the nine Arabidopsis 100 AC12538901 sequences in Clade IIIa also had high bootstrap support. Figure 2 Phylogenetic Relationship of Clade Ia Members. As the largest phylogenetic group, Clade Va contained ExpandedviewofallCladeIasequencesfromFigure1A.Bracket a number of highly-derived branches, some specific to indicatesregionlackingtaxon-specificclustering.Filledcircles representputativeBAHDacyltransferases,whileopencircles gymnosperms, monocots, or dicots (Figure 4). The lar- representcharacterizedBAHDproteins.Colorscorrespondtotaxaas gest well-supported branch in this clade contained four listedinFigure1,withgraycirclesindicatingsequencesfromplants taxon-specific clusters of at least seven members (Figure withintheAsterids.Populussequencenamesareprovidedin 4, boxed), one each for Vitis (eight members), Populus AdditionalFile1.Locifromtheotherfourgenomeshavebeen (seven), Medicago (nine), and Oryza (eleven). Oryza truncatedtoaccommodatetextinputlimitations(e.g.,1g03495for At1g03495ofArabidopsis,AC1253891forAC125389_1ofMedicago, sequences were over-represented in this clade and fell 01.m08401for12001.m08401ofOryza,G29205001for mainly into two large branches with moderate bootstrap GSVIVP00029205001ofVitis).GenBankaccessionnumbersandfull support. One was Oryza-specific as mentioned above, namesforpreviouslycharacterizedproteinsareprovidedin and the other contained three eudicot sequences (Figure AdditionalFile10. 4). Taxon-specific clustering was not as evident in Clade Vb, except for a well-supported branch of seven branches was poorly resolved (Figure 3). Ten Populus Oryza sequences, sister to a group of hydroxycinna- sequences formed a well-supported subclade together moyltransferases (HCT/HQT) involved in biosynthesis with a Clarkia breweri acetyltransferase involved in ben- of lignin, chlorogenic acid, and other phytoalexins zyl acetate formation (CbBEAT, [12]), and with an (Additional File 3). uncharacterized Vitis sequence. A smaller subclade con- Clade II lacked species-specific clustering patterns, as tained five Populus sequences, and a third taxon-specific members were more evenly distributed among species Tuominenetal.BMCGenomics2011,12:236 Page5of17 http://www.biomedcentral.com/1471-2164/12/236 (Additional File 3). Clade IIIb was relatively small, and 100 3g48720 Clade Va 5g63560 exhibited some degree of taxon-specific clustering. The 100 11.m07014 100 CT95425219 CU06265917 largest such grouping comprised nine Vitis sequences, ABTL12 9G317980001 ABTL11 consistent with their overrepresentation in this clade 56 G29106001 AtASFT (Additional File 3). A four-member subclade of Oryza ABTL13 100 10A0BTL1 1AgB0T3L3690 sequences and a three-member subclade each for Arabi- 88 68 100 G1010679920g0A41B02T3L07 dopsis and Medicago were also evident. Clade IV was 671g27620GA2B9T3L297001 the smallest clade and was restricted to monocots, as 100 95 99 ABTL5 mentioned previously. 93 98 88 ABG0T11.L4m250662709071 With regard to Populus, species-specific expansion 100 79 TcDTBcDNBTTcBABTTAPT was evidenced within Clades Ia, IIIa and Va. Because 88 TcDBBT 8870 TcTAT5g0A70tS8C0T the Populus-specific subgroup in Clade Ia is most clo- 100 100 AC130805A25tSDT sely related to several biochemically characterized malo- 84 96 991009584 ABGGT20LA3219B9148T9g0L0204080061180 wnyelthraanvsefenraamseesdfrmomemAbrearbsidoofptshiiss, Mclaeddeicaasgom, aanlodnGylltyrcainnse-, 100 06.m05344 100Ih3AT1 01.m10551 ferase-like (MATLs). The sequences in the Populus-spe- 01.m10552 89 74 100 100 06.00m510..mm84008575347693 cific branch are MATL1-14 and 16-17. We designated 100 01.m07526 all Populus sequences in Clade IIIa as alcohol acyltrans- 100 05.m04991 77 01.m08413 100 05.m06391 ferase-like (AATLs), after the numerous characterized MsAAT 97 71 AB3TgL6120160 alcohol acyltransferases within that clade. The Populus- 10017060 G042.8m20464100815 specific branch includes AATL1, 3, 7-9, 18-19, and 22- 06.m09382 100 04.m06213 100 04.m06424 24. We refer to the three Populus clusters within the 100 017.8m08945 100 1100.m.m0033775563 largest branch of Group Va by three names. First, we 07.m05853 64 10.0m50.m640151022 named the set of four Populus sequences clustering with 100 AMATL1 95 AAMMAATLT2L3 two Malus sequences and a set of Vitis sequences, 100100 VlAMAMATATL4 including an anthraniloyl-CoA:methanol acyltransferase 86 91 74 10091800998798 GGGGGG000002333915222422112302250640000000000000111111 f(rAoMmAVTiLtiss).lNaberxut,scwae(rVelfAerMtoATth,e[1si3x])P,oapsuAluMs pArTot-eliiknes 100 100 GM0d1A2A0T22001 most closely related to the Arabidopsis acetyl CoA:cis-3- 100 100CHATMLd1AAT1 98 CHATL2 hexen-1-ol acetyl transferase [14] as CHAT-like CHATL3 78 6565 100CHATL6 5g17540 (CHATLs). Finally, the subgroup of seven Populus AtCHAT 6696 1C00m1A00ATCC3HHAACTTCLmL4m5AAAATT21 sCelqadueenVcae,smtohsattclfoeslleliynttooaatipgolooyrll-yC-orAes:(o-l)v-1e3dar-ehgydiornoxoy-f 100 PNhtBBEPBBTT multiflorine/(+)-13a-hydroxylupanine O-tigloyltransfer- G13875001 CbBEBT HMTL6 ase from Lupinus (LaHMT/HLT, [15]), were named 51 100 HMTL7 98 HMHTLM3TL5 HMT-like (HMTLs). 99 HMTL4 85 HHMMTTLL21 98 LaHMT/HLT 84100 AACC1144883344554323 New Family-wide and Clade-Specific Motifs are Present in AC17149824 97 AC17149810 BAHD Acyltransferases AC14834515 10019090 AAACCC111444888333444552320603 The large number of BAHD genes available from 100 101000 AC11014.08m.3m04305534950394 sequenced plant genomes presents an opportunity to 100 10.m03591 expand the analysis of conserved motifs in this family 10.m03590 59 90100 06.m009504.3m9.m00605925998 beyond the two known functional domains, HXXXD 62 0055..mm0056186058 and DFGWG. We subjected sequences from each clade 92 56 G2238600400.1m7.0m902749807 to motif analysis using MINER v2.0 [16-18]. Clades II 100 G12869001 100 ABTL8 and IV were excluded from the analysis due to their 86 4g31910 ABTL3 0.5 small sizes. Using a sequence window of five amino Figure 4 Phylogenetic Relationship of Clade Va Members. acids and the default z-score threshold, four to nine ExpandedviewofallCladeVasequencesfromFigure1A.Colors motifs were predicted for each clade (Figure 5, Addi- andsymbolsarethesameasinFigures1-3.Inaddition,red trianglesindicatesequencesfromgymnosperms.Boxedregion tional File 4: SupplementaryFigure2.pdf). MINER identi- indicatesapoorlyresolvedbranchbasedonbootstrapanalysis. fied the DFGWG motif in four of the six tested clades Tuominenetal.BMCGenomics2011,12:236 Page6of17 http://www.biomedcentral.com/1471-2164/12/236 IIaa IIbb IIIIIIaa IIIIIIbb Va Vb clade- DDFFGGWWGG YYPPLLAAGGRR QQVVTTXX((FF//LL))XXCCGGGG ssppeecciiffiicc Figure5ConservedMotifsWithinPhylogeneticClades.WebLogodisplaysofconsensussequencescorrespondingtoMINER-identifiedmotifs, boxedinyellow.Logosarearrangedinrowsbyphylogeneticclade,namedatleft,andincolumnsbymotif,labelledatthebottom.Thethree leftmostcolumnsrepresentmotifsconservedacrossmultipleclades.Therightmostcolumnprovidesexamplesofclade-specificmotifs;motifsin thiscolumnarenotalignedrelativetooneanother. (Ia, Ib, IIIa, and Va). Although it did not meet the from Clade Ia was located at positions 33-37, the MINER threshold, visual inspection revealed high con- IKPSSPTP motif of Clade IIIa at positions 11-18, and servation of this motif in Clades IIIb and Vb as well SNLDL from Clade Vb at positions 25-29 (Figure 5). (Figure 5). This supports the validity of our approach Because the N-terminus often contains targeting peptide towards the identification of conserved motifs. The sequences, we examined the predicted protein subcellu- HXXXD motif escaped detection by MINER, but this lar localization patterns by clade using three different was expected since the motif contains a variable core. prediction programs. However, we found no evidence Two new motifs were identified with multi-clade con- for a link between the observed clade-specific N-term- servation. The first motif had a consensus of YPLAGR inal motifs and the predicted subcellular targeting of the beginning around position 71-78, and was predicted in BAHD proteins (Additional File 5: SupplementaryFi- Clades IIIa, Va, and Vb. Manual inspection of the other gure3.pdf). clades identified a similar motif in this region, but with Although Clade II was too small for motif analysis, we notable variability from the consensus, especially for the note that none of its members would have been two flanking residues (Figure 5). The second motif had accepted using our initial search criteria (both HXXXD a consensus of QVTX(F/L)XCGG around position 136- and DFGWG present). The two original clade members, 156 and was predicted in Clades Ib, IIIa, and Va. Man- ZmGlossy2 and AtCER2, are known to participate in ual inspection revealed that QVT was highly conserved cuticular wax biosynthesis based exclusively on genetic in the other three clades, but CGG was poorly con- characterization studies [19-21]. In the absence of bio- served in Clades Ia and Ib (Figure 5). Clade-specific chemical data, it remains debatable as to whether Clade motifs were also observed, several of which were located II members should be considered true BAHD near the N-terminus of the protein: the LTFFD motif acyltransferases. Tuominenetal.BMCGenomics2011,12:236 Page7of17 http://www.biomedcentral.com/1471-2164/12/236 Multiple Gene DuplicationTypes Have Contributed To Although Clade IIIa exhibited several duplications, the BAHD Family Expansion in the Populus Genome Populus-dominatedAATLsubcladehadjustonetandem Populus has experienced at least two genome-wide pair (AATL23 and AATL24). Clade Ia had the lowest duplication events, the salicoid event approximately 60- rate of duplications among the larger clades, with two 65 MYA and the older eudicot triplication event, as well local tripletswithinthe Populus-dominated MATL sub- as numerous segmental and tandem duplication events clade (Table 1). The relatively low numbers of local and [22,23]. We sought to determine whether the various salicoid duplications in the Populus-dominated AATL types of gene duplications contributed towards the and MATL subclades raises the possibility that some of expansion of the Populus BAHD family, especially with these genes might have originated through other regard to Populus-specific subclades (HMTLs, CHATLs, mechanisms, such as transposable elements. We there- and subgroups of MATLs and AATLs). Overall, we foresearchedforthepresenceofretrotransposonswithin found sixty BAHD genes were associated with recent thetwo10-kbwindowsflankingeithersideofeachPopu- (salicoid or local) duplications (Additional File 6: Sup- lus BAHDgene. We found retrotransposon associations plementalTable3.xls), accounting for more than half of in each clade, covering over one third of the family as a the BAHD acyltransferases in Populus (Table 1). This is whole, although the majority of associated genes were broadly consistent with previous analysis of chromoso- flankedononlyoneend(Table1).Retrotransposonasso- mal location of BAHD acyltransferases in Populus, ciationswerefrequentlyobservedforrecentlyduplicated which mapped 25 of 58 genes to homeologous chromo- genes(Table1,AdditionalFile6).Retrotransposonasso- some segments or tandem duplication blocks based on ciations were overrepresented in Clade Va, noted for all the v1.1 genome release [3]. Events were spread AMATLs and the majority of CHATLs and HMTLs approximately evenly across the two duplication types, (Table 1, Additional File 6). However, all of these gene with a greater number of local (e.g., tandem) duplica- modelscontainedatleastoneintron(Additional File 1), tions overall. Duplications were found in all but the two suggesting that retrotransposition is unlikely to be a smallest clades (II and IIIb). Salicoid and local duplica- directcauseofduplication.Retrotransposonassociations tions were overrepresented in Clades Ib, Va, and Vb wereunderrepresentedinCladeIIIaandabsentfromthe relative to the genome overall. Such duplications AATLPopulus-dominatedsubclade(Table1,Additional impacted every member of Clade Ib (three salicoid pairs, File 6). Despite its average representation of retrotran- one local pair and one local triplet), all but two genes in sposonassociations,CladeIahadthegreatestnumberof the largest subclade of Va (Figure 4, boxed; including genes with retrotransposons flanking both sides (Table two salicoid duplications, three local pairs, one local tri- 1). Twosuchgenes,MATL12and13, formeda strongly plet, and one local quadruplet), and all but one member supportedbranchwithMATL10.Allthreearelocatedon of Clade Vb (including two local pairs and two salicoid LGIV(Figure 6), lackpredictedintrons(AdditionalFile pairs; Figure 6, Additional File 6). For two subclades 1), and share a high degree of nucleotide identity with within the large, poorly resolved region in Clade Va, one another (98%). Although preliminary, our analysis multiple local duplications appear to have followed gen- suggests that retrotransposons have contributed to the ome-wide duplication events in one of the two salicoid duplicationsofsomeBAHDgenes. paralogues (Figure 6, Additional File 6). The first instance is the relationship between HMTL7 on linkage Some Recently Duplicated BAHD Acyltransferases are group (LG) XI and the HMTL1-6 cluster on LG I. The Differentially Expressed second is the relationship between CHATL6 on LG XIX To investigate expression of Populus BAHD genes, we and the CHATL1-3 triplet on LG XIII. mined a set of nine Affymetrix microarray datasets Table 1SummaryofGene Duplication Events Among PopulusBAHD Acyltransferases Clade Ia Ib II IIIa IIIb Va Vb Genome Totalgenesinclade 18 11 5 24 2 31 9 100 Recentduplication 6(33%) 11(100%) 0(0%) 14(58%) 0(0%) 21a(68%) 8(89%) 60(60%) Salicoidduplication 0 6 0 6 0 10 4 26 Localduplication 6b 5c 0 8 0 13d 4 36 Retrotransposonassociation 7(39%) 4(36%) 1(20%) 4(17%) 1(50%) 16(52%) 2(22%) 35(35%) Both5’and3’ofgene 5 1 1 2 0 3 1 13 Either5’or3’ofgene 2 3 0 2 1 13 1 22 aTwomembersinCladeVaareassociatedwithbothsalicoidandlocalduplications(21=10+13-2);bIncludestwotriplets;cIncludesonetriplet;dIncludesone tripletandonequadruplet. Tuominenetal.BMCGenomics2011,12:236 Page8of17 http://www.biomedcentral.com/1471-2164/12/236 AMATL4 AATL5 AMATL3 AATL6 CFATL1 HCT5 AATL7 AATL2 AATL3 HCT7 AATL8 HCT6 ABTL2 CERL2 ABTL4 AMATL1 ATL4 MATL5 MATL6 AATL12 AAATTTLLL123 MMMAAATTTLLL789 ABTL6 AAAATTLL910 AATBLT5L7 AAAAAATTTLLL111345 MATL10 SHTL1 MATL11 MATL14 CERL3 AATL11 ABTL8 MATL15 ABTL9 HCT1 MATL12 ABTL5 ATL6 ABTL3 MATL13 AATL4 MATL16 CFATL2 2 3 4 5 6 7 8 9 10 AATL1 CERL1 ABTL1 MATL1 ATL8 MATL2 ATL9 AATL23 MATL3 AATL24 HMTL1 CERL4 ABTL10 AATL20 CERL5 CHATL4 HMTL2 CHATL1 AATL21 CHATL5 HMTL3 CHATL2 ABTL11 AATL22 CHATL6 CHATL3 HMTL4 HMTL5 ABTL13 HMTL6 ABTL12 HCT4 MATL4 1 AAAATTLL1167 ATL7 AMATL2 AATALT1L018 ATL11 MATL17 HHCCTT32 MATL18 HMTL7 AATL19 SHTL2 ABTL14 11 12 13 14 15 16 17 18 19 Figure6LocationsofPutativePopulusBAHDAcyltransferasesonLinkageGroups.Homeologousblocksarisingfromthesalicoidgenome duplicationeventarecolor-codedacrossthenineteenlinkagegroups(chromosomes).BAHDacyltransferasesincloseproximitytooneanother areboxedforeaseoflabelling.Notethatproximityonalinkagegroupdoesnot,byitself,indicateaclosephylogeneticrelationship. encompassing five different genotypes and four different majority of other Clade Va genes showed the more typi- tissue types generated in our laboratory [24]. After cal leaf-biased expression (Figure 7A). In another case, excluding probes that had consistently low expression HCT1 and HCT6 were relatively uniformly expressed in across all samples (see Methods) and annotating probes all three P. fremontii × angustifolia hybrid genotypes based on the POParray database [25], we obtained examined, while HCT5 and HCT7 were detected only in expression data for 41 probes corresponding to 48 genotype 1979 (Figure 7A). HCT2, on the other hand, BAHD genes (some probe sequences match multiple was most abundant in roots. Expression patterns gene targets, and some gene targets are represented by diverged for closely related genes in several cases, multiple probes). Pairwise correlations of BAHD gene including genes within the Populus-dominated sub- expression across all microarray experiments were com- clades. For example, MATL4 was biased towards P. fre- puted and the results organized by duplication type montii × angustifolia genotype 1979 relative to MATL1- (Additional File 7: SupplementalFigure4.pdf). Median 3, which were more evenly expressed across genotypes Spearman rank correlations were significantly different and tissues. The Populus-dominated AATL subclade among the duplication categories according to one-way includes AATL3, which was preferentially expressed in ANOVA (p < 0.001). Not surprisingly, median correla- cell suspension cultures, as well as AATL7, 23, and24, tions for gene pairs derived from local or salicoid dupli- which exhibited different expression patterns by leaf age cations were significantly higher than for other types of and genotype. The CHATL cluster includes two mem- (all possible) gene pairs (Additional File 7). bers (CHATL3 and 6) that were fairly evenly expressed When the log-transformed microarray data were across sampled tissues, and two (CHATL1 and 2) that visualized as a heatmap, expression across the BAHD were detected only in leaves. The more divergent family as a whole was biased towards leaves, and we did CHATL4/5 were most strongly expressed in non-photo- not observe clear differences in expression patterns synthetic tissues, yielding an overall pattern that among the major clades (Figure 7A). Within the major resembled the HMTLs more than the other CHATLs clades, genotype- and/or tissue-dependent expression (Figure 7A). patterns were evident. For example, root-specific expres- QPCR was performed to verify the expression patterns sion dominated in the HMTL subclade, while the of closely related CHATL transcripts observed by Tuominenetal.BMCGenomics2011,12:236 Page9of17 http://www.biomedcentral.com/1471-2164/12/236 1979 3200 RM5 wound 271 L4 AA BB 11997799 33220000 RRMM55 227711 LL44 llooww NN llooww NN 11 wwkk 9900 hhrr ddeettoopp MMJJ YL EL YL EL YL EL R EL C YL EL YL EL YL EL R EL C MATL1/3 3.2 3.0 2.8 2.3 3.0 2.8 2.2 2.3 MATL1/3 1.0 0.1 0.3 0.5 0.8 0.7 -0.2 -0.8 MATL1/2/3 3.3 3.1 2.5 1.9 2.8 2.9 MATL1/2/3 0.7 0.3 -0.5 0.7 -0.1 0.3 MATL4 1.9 2.1 MATL4 1.0 0.9 MATL6 2.4 2.0 2.9 2.2 2.7 2.7 MATL6 1.9 0.2 0.1 1.0 1.7 0.6 1.3 MATL18 1.8 2.0 MATL18 -3.1 -0.2 ATL11 2.2 2.2 1.7 2.1 3.4 2.1 2.2 ATL11 -1.0 -0.5 -1.1 -1.3 -2.9 1.6 0.8 ATL4 1.6 2.3 ATL4 -3.0 -1.1 ATL1/2/3 1.7 1.8 1.8 2.1 1.7 1.8 1.6 1.7 1.7 ATL1/2/3 0.0 -0.1 -0.2 -1.1 0.4 -0.5 -0.7 0.3 -1.2 ATL3 3.3 3.4 3.2 3.5 3.0 3.0 2.8 3.0 2.9 ATL3 0.1 0.3 -0.5 -1.0 0.8 -0.1 -0.7 1.0 -1.0 ATL9 1.8 2.3 ATL9 1.1 -0.6 ATL7 2.5 2.0 2.5 1.9 2.3 1.9 ATL7 0.1 -1.0 1.1 1.0 0.5 -1.0 ATL10 3.4 3.0 3.5 3.0 3.0 2.2 3.6 ATL10 0.0 -0.7 0.3 0.6 0.3 0.4 -0.6 AATL11 1.8 1.9 AATL11 1.1 0.7 AATL2 1.8 AATL2 1.1 3.0 1.2 -0.6 AATL16 1.7 2.1 1.9 2.0 1.7 AATL16 -0.4 -1.3 -1.1 0.8 0.3 AATL3 2.8 AATL3 0.2 AATL24 2.7 1.9 2.2 2.5 2.6 AATL24 0.6 0.5 -0.1 -0.3 -1.9 AATL23 3.2 3.0 3.2 3.3 3.0 2.6 2.5 AATL23 0.7 -0.2 0.5 0.1 0.4 -0.1 -2.0 AATL7 2.7 2.6 2.3 1.6 AATL7 2.1 -0.6 0.5 5.1 0.1 AATL6 1.8 1.6 2.4 1.7 2.4 2.7 AATL6 0.2 0.8 -0.6 -0.7 1.1 1.3 CCEERRLL33 22.77 22.77 22.66 33.11 22.00 22.44 CCEERRLL33 00.66 00.22 00.55 00.33 00.55 -00.66 ABTL12 2.2 2.6 2.5 3.3 1.6 3.2 2.0 ABTL12 1.1 1.5 0.5 0.3 2.1 -0.2 1.2 -1.1 ABTL11 1.8 2.7 1.8 ABTL11 0.5 1.0 -0.5 -0.6 ABTL14 1.6 1.7 1.9 2.4 1.7 2.0 ABTL14 -1.3 -1.7 -0.5 -1.7 0.3 0.3 ABTL4 1.6 2.6 ABTL4 -0.6 0.7 1.3 -0.6 ABTL10 1.9 2.1 1.8 1.8 ABTL10 0.2 -0.2 -0.2 1.5 0.3 AMATL1 2.1 3.3 1.8 2.6 1.7 AMATL1 2.9 4.8 0.3 0.4 2.4 -0.3 CHATL1 2.4 2.3 2.1 2.3 1.6 1.9 CHATL1 0.4 1.2 -0.8 -0.7 1.1 1.7 CHATL2 3.1 2.7 3.3 3.5 2.7 3.2 CHATL2 0.1 1.1 0.5 0.0 1.3 -0.6 CHATL3 3.6 3.6 3.4 3.4 3.0 3.6 2.9 2.5 CHATL3 0.7 0.2 0.3 -0.3 1.5 0.5 -0.1 -1.1 CHATL6 2.2 2.0 2.2 2.0 1.9 2.1 1.9 2.3 1.9 CHATL6 1.4 0.0 -0.1 0.4 1.4 0.0 0.2 1.1 -0.4 CHATL4/5 2.6 2.4 CHATL4/5 -0.2 -1.8 HMTL6 2.6 HMTL6 1.0 HMTL7 2.0 HMTL7 1.9 HMTL1/2/3/4/5 4.0 HMTL1/2/3/4/5 0.3 ABTL8 3.0 2.9 2.7 2.8 2.1 1.9 2.3 ABTL8 1.0 -0.5 0.8 0.6 0.9 -0.7 0.4 HCT1 2.3 2.2 2.6 2.5 2.5 2.3 3.0 HCT1 0.1 0.4 0.5 0.5 -0.1 0.0 0.7 HCT6 2.5 2.3 2.4 2.1 2.5 2.3 2.4 1.8 2.6 HCT6 0.6 -0.5 0.0 0.2 0.7 0.0 1.3 -0.4 -1.0 HCT2 3.6 HCT2 0.6 HCT5 2.0 1.8 HCT5 -1.0 1.7 HCT7 2.3 2.5 2.3 HCT7 -0.8 -0.3 -3.2 Both 0.4 0.8 1.2 1.6 >2.0 <1.6 1.6 1.9 2.2 2.5 2.8 3.1 3.4 3.7 >4.0 0.0 LLooww -00.44 -00.88 -11.22 -11.66<<-22.00 Figure 7 Expression of BAHDAcyltransferases in Populus Tissues, Organized by Phylogenetic Relationship. A: Expression of BAHD acyltransferasegenesacrosstissuesandgenotypes.B:StressresponsesofBAHDacyltransferasegeneexpressionacrosstissuesandgenotypes. Expressiondataorratios(stressedvs.controlsamples)werelog-transformedandvisualizedinheatmaps(seeMethods).Genesareorganizedby phylogeneticrelationshipandlabelledbythecladecolorinFigure1.Genotypesanalyzedincluded:P.fremontii×angustifoliaclones1979,3200, andRM5,andP.tremuloidesclones271andL4.Tissuesanalyzedincluded:youngleaf(YL),expandingleaf(EL),roottips(R),andsuspensioncell cultures(C).Stresstreatmentsincluded:nitrogenlimitation(lowN),leafwounding(wound,sampledeither1weekor90hoursafterwounding), removalofshootuptoleafplastochronindexthree(detop,90hoursafterremoval),andmethyljasmonateelicitation(MJ)[24].Whitetext indicatesthatrawhybridizationintensityforeithercontrol(forupregulatedgenes)orstressedtreatment(fordownregulatedgenes)sampleswas belowthequantitationlimit(seeMethods). microarray analysis, using an independent set of P. tre- threshold marked by a dotted line in Additional File 8, muloides tissues (Additional File 8: SupplementalFi- Panel A). CHATL3/6 were most strongly expressed in gure5.pdf). Specific primers were designed to distinguish young leaves and roots, followed by apices and mature among the three paralogous pairs with different duplica- leaves, and were much lower in stem and flower tissues. tion history (Supplemental Table 2): CHATL1/2, The transcript levels of CHATL4/5 were very low over- CHATL3/6 and CHATL4/5. CHATL1/2 were expressed all, with the highest levels detected in roots, similar to relatively consistently across all leaf and stem internode the microarray data of P. fremontii × angustifolia. Over- tissues sampled, but were lower in root and flower tis- all, the QPCR data were broadly consistent with the sues (near or below the corresponding microarray microarray results, and support the idea that the three Tuominenetal.BMCGenomics2011,12:236 Page10of17 http://www.biomedcentral.com/1471-2164/12/236 pairs of CHATL genes have diverged in their expression Populus BAHD family relative to the genome average. patterns despite their high homology. We speculate that this pattern may be generally applic- We next analyzed the microarray data to examine the able to the other angiosperm genomes surveyed in this responses of BAHD gene expression to four different study. Local duplications might be more likely than stress treatments, including nitrogen limitation, wound- polyploidization events to account for the observed ing, detopping, or methyl jasmonate feeding, across sev- taxon-specific expansions of BAHD acyltransferases. eral tissues and/or genotypes. Again, no clear overall This was indeed the case for the Populus-dominated patterns by clade were observed, and the differential HMTL and CHATL subclades, where the majority of gene expression patterns observed among some paralo- the genes were derived from local duplications, and to a gous genes described above also held for the stress much lesser extent, for the MATL subclade. In contrast, treatments (Figure 7B). Additional evidence of func- only two of the ten members in the Populus-dominated tional divergence was observed. For example, Clade IIIa subclade among the AATLs were implicated in any member AATL7 showed its strongest upregulation in duplication event. Preliminary molecular clock analysis young leaves one week after wounding, while the suggested that the divergence times among members of responses of AATL23 and AATL24 were most drastically the Populus-dominated MATL and AATL subclades affected via down-regulation in expanding leaves follow- were similar and predated the salicoid duplication event. ing detopping. The leaf-expressing CHATL genes were This suggests that other duplications, prior to the sali- generally up-regulated by nitrogen stress in P. fremontii coid duplication event but after the eudicot triplication × angustifolia genotype 1979, except CHATL6 in event, probably contributed to the Populus-specific expanding leaves. However, the trend was more variable expansion as well. in genotype 3200 (Figure 7B), despite similar baseline Previous work has shown that genes involved in stress expression of these genes between the two genotypes responses, including secondary metabolic genes, are (Figure 7A). QPCR analysis of the same suite of samples more likely than average to experience lineage-specific confirmed this general discrepancy between the two diversification via tandem duplication [27]. When placed genotypes (Additional File 8, Panel B), although the in a metabolic pathway context, we suggest that taxon- degree of expression changes varied between the two specific, local duplication-derived expansion of gene analytical (microarray vs. QPCR) methods. The data (sub)families may be characteristic of enzymes that hint at differential expression among closely related occupy a terminal or tangential position in a metabolic BAHD genes in response to nitrogen stress between dif- pathway. Conversely, enzymes with an intermediate ferent Populus hybrid genotypes. Future investigation position in a core pathway would likely retain a more would help determine how widespread this pattern is constant number of gene copies across taxa due to evo- across the BAHD family and a broader range of geno- lutionarily constraint for balanced stoichiometry types in this genus. between enzymes acting within the same pathway. In support of this idea, HCTs known to be involved in Discussion intermediate steps of monolignol biosynthesis formed a BAHD Family Expansion as a Factor Enabling Metabolic multi-taxon cluster within Clade Vb, encoded by 1-2 Diversification genes in all sequenced genomes. In Populus, Clade Vb Across the five angiosperm genomes investigated here, diversified about equally via salicoid and local duplica- we observed numerous differential lineage expansions tion events. In contrast, the sister Clade Va exhibited within the BAHD acyltransferase phylogeny. Examina- extensive taxon-specific clustering (boxed region, Figure tion of retained gene copies following duplications in 4); Populus genes in the major subclade were associated Populus revealed that the majority of BAHD genes, at with more than three times as many local duplications least in this genus, are associated with recent genome- as retained salicoid duplicates. The biochemically char- wide as well as local duplication events. An estimated acterized enzymes within this branch are all involved in 32% of all v2.0 Populus genes (6655 pairs or 13268 the final step of various volatile ester and alkaloid ester unique gene models) were derived from the salicoid biosynthetic pathways [13-15,28-31]. duplication event [26]. However, only 26% of the Popu- Taxon-specific phylogenetic expansions have also been lus BAHD acyltransferases were associated with the sali- observed within the O-methyltransferase (OMT) coid duplication. Tandem or local duplications, on the [9,32,33] and glycosyltransferase (GT, especially Group other hand, accounted for over one-third (36%) of the 1) ([34,35], Tsai and Johnson, unpublished) families. Populus BAHD genes, much higher than the genome Like BAHD acyltransferases, OMTs and GTs form large average estimated at 16% [23]. It thus appeared that families, and collectively the three are responsible for local duplications were over-represented and genome- the elaboration (acylation, methylation, and glycosyla- wide duplications were under-represented in the tion) of a wide range of secondary metabolites [2,8].
Description: