ebook img

The Evolution of Non-Coding Chloroplast DNA and Its Application in Plant Systematics PDF

17 Pages·2000·15.9 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Evolution of Non-Coding Chloroplast DNA and Its Application in Plant Systematics

THE EVOLUTION OF Scot Kelchner^ A, NON-CODING CHLOROPLAST DNA AND APPLICATION ITS PLANT SYSTEMATICSi IN AliS'IKACT This arlic'le reviews several |)r<)p<)st'(] mechanisms of molecular evolution operating in non-coding regions of ihe genome cliloroplast and argues that awareness and identification of tliese mechariisms are essential for improving alignment and phylogcnctic analysis of non-coding sequence data. The mechanisms are of five categories: slipped- (1) strand inispairing; insertions and deletions link<Ml with secondary' structure formations; inversions associated (2) (,'?) with hairpins and stem-loop structures; localized or extra-n»gional intramolecular recombination; and nucleotide (4) (5) substitutions. These rruitalions seem to be largely a function of se<|uence structure and palU^rn and may be highly homoplasious In a parsimony topology; therefore, mutations in non-coding regions of die chloroplasl genome are de- scribed here as slructure»d, nonrandom, and non-independent events. Established methodologies are based in large part DNA on a collective understanding of genie evolution and may need modification when non-coding sequence appli(*d to Here dala. I suggest an approach to die phylogenetic study of non-coding cpDN/V thai incoq)orales identification of mutational mechanisms in alignment and homology assessment of indels. also discuss rei)ercussions of non-coding I sequence evolution for such aspects of phylog<Miy estimation as maximum likelihood, distance, and parsimony analysis, the inclusion of indels as phyh)genetic characters, and bootstrapping, jackknifing, and "decay" analysis as measures of clade support. Key wonls: alignment, Inlergenic s[)acers, introns, molecular evolution, mutational biases, mutational mechanisms, phylogenetic secondary analysis, structure. There is growing interest in comparative analysis coding regions in the chloroplast: the IrnL-trnF & & cpDNA) of non-coding chloroplast (non-coding se- spacer (e.g., Gielly Taberlet, 1994; Mes t'Hart, Ham quences for plant systematic studies low taxo- 1994; van 1994; Sang 1997; Cros at et al., et al., & nomic levels. Recognition of the limitations of cod- et al., 1998; Bayer Starr, 1998), the trriT-lrnh DNA ing (genie) for resolving relationships these spacer (Bohle 1994, 1997; Small at et al., et al., levels inspired the probing of chloroplast introns 1998), the rpoA-petD and rpsll-rpoA spacers (Pe- & and intergenic spacers for phylogenetic utility. Un- terson Seberg, 1997), the atpB-rbcL spacer (Go- & was derlying this effort the reasonable premise that lenberg et al., 1993; Hodges Arnold, 1994; Na- non-coding regions experience limited or no selec- 1995; Samuel 1997; Savolainen tali et al., et al., et & and tive pressure are likely to evolve at rates far al., 1997; Setoguchi et al., 1997; Hoot Douglas, & & surpassing those of genie regions Curtis 1998), the rhcL-psal spacer (Morton Clegg, (e.g., Clegg, 1984; Wolfe et al, 1987; Palmer, 1987, 1993), the psbA-trnH spacer (Aldrich 1988; et al., & 1991; Olmstead Palmer, 1994; Bohle Sang et al., et al., 1997), the accD-;>.saI spacer (Small et 1994). There was also an expectation that non-cod- al., 1998), the rp/16-r/?/14 and rp.s8-r/>/14 spacers ing regions should experience random and inde- (Wolfson et 1991), the intron surrounding ma/K al., & mode pendent mutations, both in and distribution. (Johnson Soltis, 1994), the r/>oCl intron (Downie & For these s, a remarkable number of plant et al., 1996a, 1996b; Asmussen Liston, 1998; systematics studies currently in progress include a Downie et al., 1998), the rpll6 intron (Ionian et al., mo & component lar of comparative analysis of 1996; Kelchner, 1996; Kelchner Clark, 1997; 1 cpDNA A & Baum non-coding sequences. considerable Schnabel Wendel, 1998; 1998; et al., amount work of already published has demonstrat- Small et al., 1998), the tniL intron (Sang et al., & ed the potential phylogenetic utility of discrete non- 1997; Bayer Starr, 1998; Kajita et 1998; Bay- al., thank Conny Asmussen, Lyn Cook, Steve Dickie, Jonathan W*Midel, Hich Cronn, and Randy Small many for " I and interesting insightful discussions on the topic of non-coding secjuence evolution; R. G. Olmstead for his valuable L comments on an earlier version of this paper; R. Rayer, M. D. Crisp, C. West, C. Kelchner, and R. Oxelnum J. J. for review of the manuscript; Victoria llollowell for her careful editing and commentar>'; and the anonymous reviewers, whose comments insighthil contributed to the clarity and presentation of ideas in this paper. ^Centre CSIRO for Plant Riodiversily Research. Australian Natioiial Herbarium, Plant Industry, C.PO. Ro\ 16()(), Canberra, A.C.T., 2601, Australia. Second affiliation: Division of Botany and Zoology, The Auslralian National Univer- sity, (^anlx^rra, Australia, scot. kelchner(£f^pi. csiro.au. Ann. Missouri Bot. Gaud. 482^98. 87: 2000. Volume Number 87, 4 Kelchner 483 2000 Non-Coding Chloroplast DNA Evolution er et al., 2000), the rpsl6 intron (Liden et al., 1997; ble for generating sequence diversity in non-coding Oxelman ndhA and et al., 1997), the intron (Small regions of the chloroplast genome. Unfortunately, et al, 1998). these hanisms are often invoked, but rarely in- The literature above not only reveals profound corporated, into the analysis. differences between the evolution of non-genic and Recognition of the potential of structured molec- cpDNA, cpDNA genic but critically contradicts initial as- ular evolution in non-coding regions im- to sumptions of constraint-free evolution in non-cod- prove alignment and assessment of phylogenetic re- ing regions. Recurring difficulties associated with lationships is, I believe, critical for the non-coding sequence data include alternative development of functional molecular systematic re- alignment possibilities of insertions and deletions search based on non-coding sequence Toward data. (indels), regions of length mutation in which ho- this end, endeavor here to illustrate the following: I mology assessment is questionable or impossible, non-coding regions are highly structured and (1) and the occurrence of localized "hot spots" of in- their elements evolve non-randomly and non-in- ferred excessive mutation, frequently to the point dependently; this structure may be used to align (2) How and of saturation loss of phylogenetic signal. the sequence matrix and better assess homology; best to proceed with the phylogenetic analysis of the resulting gaps in the aligned matrix may (3) such regions should be a topic of considerable con- contain phylogenetically important information and cem Golenberg Downie (see et al., 1993; et al., should be used in a phylogenetic analysis; and (4) & 1996a; Kelchner Clark, 1997; Sang et al., 1997; the mode of non-coding sequence evolution de- Downie may et al., 1998). scribed here have potentially serious reper- now It is evident that sequence evolution in non- cussions for the accuracy of genetic-distance, max- imum coding regions of the chloroplast is far more com- likelihood, and parsimony analyses, and for A plex than previously supposed. Both introns and bootstrapping and jackknifing techniques. de- embody intergenic spacers are thought to a consid- scription of proposed mechanisms of non-coding erable degree of sequence structure, sometimes in sequence evolution followed by a discussion of is DNA manner a similar to that of ribosomal (rDNA). the appropriateness of current alignment and anal- may This structure generate either regionalized se- ysis procedures, with the expectation that may it quence conservation or mutational hot spots of both provide a more informed approach the applica- to nucleotide and substitutions insertion/deletion tion of non-coding sequence data in plant system- Sequence-directed events. initiators of mutational atics research. may events persist as "mutational triggers" (Kel- This article is not intended to be a complete re- & chner, 1996; Kelchner Clark, 1997), dramatical- view of literature pertaining to the evolution of in- ly increasing the possibility of reversal or parallel trons and intergenic spacers in genomes of an all gain of mutations, particularly length mutations or organism. Instead, serves as a brief review of it minute Hence, cpDNA inversions. there exist tial vi- current literature on non-coding regions, olations of the assumptions of randomized and in- and summarizes mutational mechanisms suggested dependent embedded much character evolution in to occur in these regions. Discussed are some of of the current phylogenetic methodology for com- the serious implications this manner of molecular — parative sequence analysis methodology that is evolution has for the assumptions underlying mod- based largely on observational comparative study els employed today by plant molecular systematists. of coding sequence data. Considering that these are today's commonly employed phylogeny tools for es- MECHANISMS or NoN-CoDi\G Sequence DNA timalion based on sequences, there has been Evolltion remarkably as yet little controversy in the literature about their application to non-genic sequence data. The strength of any phylogenetic estimation rests There are ways to account for mutational patterns on the accuracy of character homology assessment, DNA. observed in non-coding Comparative studies Thus, the molecular systematist strives maximize to cpDNA of non-coding sequences during the past character homology by the careful alignment of DNA decade in particular (e.g.. Palmer, 1985; Blasko et sequences in a data matrix. Fundamental to & vom cpDNA 1988; Stein Hatchel, 1988; Wolfson any alignment procedure of non-coding al., et se- & al., 1991; Golenberg et al., 1993; Gielly Taberlet, quence data should be a familiarity with mutational 1994; Morton, 1995a; Downie et al., 1996a; Kel- mechanisms directing molecular evolution in non- & & chner Wendel, 1996; Kelchner Clark, 1997; coding regions. Recognition of these mechanisms Sang et al., 1997) have allowed inference of spe- as generators of specific mutations can be a pow- cific underlying mutational mechanisms responsi- erful tool for the placement of gaps and for the 484 Annals the of Garden Missouri Botanical assessment of probable homology of insertions and between the probability of inserting subsequent & deletions (Kelchner, 1996; Kelchner Clark, length mutations and the probability of removing 1997). sequence from the repeat Whether such an string. may equilibrium is present or not, there be a com- phenomenon keeps petitive that the length of tan- SUri»l::D-STKAND MISPAIRING (SSM) dem repeated sequence units continually in flux. A mechanism widely reported of length mutation Representation of long repeat strings in non-coding in non-coding regions of the chloroplast is slipped- sequence alignments would therefore be a "snap- SSM strand mispairing (SSM). is thought to be a shot" of sequences experiencing continual inser- major, even principal, factor in length mutations tions and deletions that at locality. within non-coding regions of the chloroplast, mi- follows that a point substitution within a long It & tochondrial, and nuclear genomes (e.g., Levinson string of mononucleotide repeat units could act as Gutman, 1987; Hancock, 1995; Wolfson et al., a stabilizing factor, disrupting previous unifor- its & 1991; Kelchner Clark, 1997; Sang et al., 1997). mity and lowering the probability of further SSM Length mutations are important components of non- events. Such a substitution would directly influence coding sequence evolution and have been suggest- ensuing mutations in the region and one example is ed to occur at least as frequently as base substi- of a non-independent character mutation non- in some DNA. tutions in chloroplast non-coding regions coding the situation were reversed, with a If & (Curtis Clegg, 1984; Wolfe 1987; Zurawski non-homogeneous sequence becoming et al., a string of & & SSM Clegg, 1987; Clegg Zurawski, 1992; Golen- repeat units, the likelihood of an event would & berg al, 1993; Gielly Taberlet, 1994; Clegg and could induce non-independent et further et al., 1994). mutations by the addition or removal of repeated Slipped-strand mispairing is thought to proceed sequence by slipped-strand mispairing. DNA by a localized mispairing of single-stranded As an aid to alignment, SSM-generated inser- in regions of sequence repeats, as either a string of tions and deletions can be used position and to A mononucleotide repeats or tandemly arranged mul- determine number of gaps. quick study of a re- tibase repeat units (Palmer, 1991; Wolfson et peat unit or the flanking sequence of a gap may be al., Cummings Hancock, 1991; et al., 1994; 1995; re- enough to determine if slipped-strand mispairing i & viewed by Levinson Gutman, Diagrams 1987). of the likely progenitor of an observed length muta- SSM proposed mechanics can be found Levinson SSM may in tion. Occasionally, evidence of an event Gutman and (1987) and Wolfson et al. (1991). Be- not be apparent, particularly a deleted sequence if cause A/T-rich regions of bacterial genomes are not a direct repeat of flanking sequence, or is its if particularly susceptible to slipped-strand mispair- a subsequent length mutation due another mech- to & ing (Levinson Gutman, 1987), one could expect anism obscures an earlier SSM event (Kelchner, a similar effect in the A/T-rich non-coding regions 1996). genome of the chloroplast (Wolfson 1991). et al., SSM A This is not to imply that acts uniquely on STEM-LOOP SECONDARY STRUCTURE T and nucleotides; aligned non-coding sequenr (' G matrices often infer inserted repeats containing Striking both intergenic spacers and introns to C G and nucleotides, sometimes as pure strings of in the chloroplast genome the presence and num- is C mononucleotide or repeats. ber of probable secondary structures referred as to mononucleotide Strings of repeats, particularly of "stem-loops." Stem-loops are believed occur dur- to A cpDNA, or T, appear frequently in non-coding ing single-stranding events when inverted repeats may and slipped-strand mispairing potentially gen- meet to form a region of pairing (the stem) sur- erate length amtations within these strings. The dif- mounted by their interceding sequence (the loop). ficulty in assessing homology of length variation in Such structures have been widely discussed for ri- DNA, rDNA long strings of repeats, whether mononucleotide or bosomal with ITS and 18S regions be- multinut^leotide repeats, derives from the increas- ing of particular interest to the plant systematist ing potential for further length mutation relative to (see Baldwin et al. (1995), Soltis et al. (1997), and & & Owen, string length (Streisinger 1985; Golenberg Soltis Soltis (1998) for discussion of secondary & et al., 1993; Kelchner Clark, 1997; Sang et structures in these regions and their phylogenetic al., SSM may Subsequent 1997). activity either gener- implications). ate additional repeats of the initial sequence or de- Probable stem-loop dary structure com- is 5 lete sequence susceptible to slipped-strand mis- monly reported in non-coding regions of organellar pairing. Perhaps an equilibrium might exist genomes Michel 1989; Buroker (e.g., et al., et al.. Volume Number 4 87, Kelchner 485 2000 Non-Coding Chloroplast DNA Evolution Ham 1990; Golenberg et al., 1993; van et al., 1994; substitutions and length mutation, the inverted re- & Gielly Taberlet, 1994; Natali et al., 1995; Rigaa peated sequence composing the stem is frequently & Downie et al., 1995; et al., 1996b; Kelchner conserved in character (Learn et 1992; Gielly al., & & Wendel, 1996; Kelchner Clark, 1997; Sang et Taberlet, 1994; Downie 1996a, 1996b; et al., & al, 1997; Downie et al, 1998). Gielly and Taberlet Kelchner Clark, 1997), particularly when stems (1994) reported several probable stem-loops in the are long and possess highly favorable energy of for- & trnL-trnF region of the chloroplast genome, includ- mation values (AG values; see Kelchner Wendel, A ing nine highly probable structures within the trnL 1996; Dumolin-Lapfegue sequence et 1998). al., intron itself. All other introns in the chloroplast ge- involved in stem formation is less available for sub- nomcs of land plants are classified as Group II in- stitution and length mutation because paired is it and trons share a diagnostic secondary structure of with its sister repeat; this can engender non- six well-defined stem-loop domains (Kohchi domly and non-independently evolving sequence et al., 1988; Michel 1989; Downie 1996b; et al., et al., units. RNA Downie et al., 1998). Diagrams of putative single- Similar to ribosomal and rDNA secondary & may stranded secondary structure of introns be structure Curtiss Voumakis, 1984; Wheeler (e.g., & & & found in Michel and Dujon (1983), Michel et al. Honeycutt, 1988; Dixon Hillis, 1993; SoUis and Downie (1989), et al. (1998). Soltis, 1998), a nucleotide substitution occurring in Loop cpDNA regions of stem-loop secondary structures a stem sequence of a non-coding region are often associated with hot spots for mutation in could compromise secondary structure formation, non-coding regions, both of nucleotide substitutions Compensatory mutation may then occur pre to & and indel events (vom Stein Hatchel, 1988; Al- the potential for structure formation (Kelchner, & & drich et al, 1988; Golenberg et al., 1993; Gielly 1996; Kelchner Clark, 1997). Although se- Ham Taberlet, 1994; van et al., 1994; Clegg et al., quence conservation may be present merely as a 1994; Ferris et al., 1995; Downie et al., 1996b; function of sequence pattern (perhaps the case in & Kelchner Clark, 1997). Indels located in prob- intergenic spacers), the degree of secondary struc- able loop sequence are frequently inserted or de- ture conservation in a chloroplast Group intron II SSM. How- leted repeat units likely the result of suggests secondary structures are integral to proper ever, length mutations not attributable to functioning of the intron (Clegg 1986; Learn et al., slipped-strand mispairing often occur within loop 1992; Downie 1996a). Experimental et al., et al., may sequences as well and be remnants of recom- evidence has shown some of this structure es- is bination events. mechanisms Group sential for auto-splicing in I common Although indels are most in the termi- and introns (Bonnard 1984; Kohchi II et al., et al., & may nal loop, they occur anywhere along a second- 1988; Dujon, 1989; Cech, 1990; Michel Westhof, ary structure. For example, Kelchner and Clark 1990; Hibbett, 1996). (1997) detected what appeared to be an entire de- Identification of probable secondary structure letion of a small sub-loop positioned partway up the can be valuable when aligning and analyzing non- stem of an rpll6 intron stem-loop in Oryza sativa. coding sequences by improving gap positioning and Such side loops, when present, may be removed in the appraisal of character homology. Gaps flanked some taxa without compromising by and G the favorability of inverted repeats regions relatively rich in a stem formation. Occasionally, small segments of and C content are suspect as possible stems of see- the stem itself will be deleted, decreasing the stem ondary structures. As noted, regions of chaotic though perhaps an would length, not to extent that length mutations are correlated with loops, so the annihilate possible secondary structure formation. boundaries of a chaotic region frequently will cor- Very large loops are often associated with regions respond with inverted repeats that can form a stem, of chaotic or "labile" length variation characteristic even they do not directly neighbor the chaotic if cpDNA many OLIGO of non-coding sequence matrices Computer programs such region. as (Ry- & MULFOLD (e.g., Golenberg et al., 1993; Downie et al„ 1996a; chlik Rhoads, 1989), (Jaeger et al., & Baum Soltis et al., 1996; Kelchner Clark, 1997; 1989; Zuker, 1989), and GCG's Stemloop (Genetics Homology et 1998). assessment here can be Computer Group, Madison, Wisconsin) can al., assist and difficult or impossible, the conservative ap- in the detection of secondary structure in non-cod- A proach of removing these regions from the data ma- ing sequences. search can be conducted by hand, trix before phylogenetic analysis is frequently particularly if a published data set exists for the adopted. region. Free energy of formation values (AG) can In contrast to the loop of stem-loop secondary be calculated with some of the prior software as an structures being highly susceptible to nucleotide appraisal of the likelihood of formation of a partic- 486 Annals the of Garden Missouri Botanical & & & ular secondary structure (see Kelchner Wendel 1987; Zurawski Clegg, 1987; Clegg Zu- al., AG & (1996) lor an example where values were ap- rawski, 1992; Golenberg 1993; Gielly Ta- et al., plied to parallel inversion events in their data). berlet, 1994; however, see Small et 1998). al., AT Percent content quite variable in non-cod- is MIM cpDNA INVERSIONS IK ing regions, though generally higher is it than the average value for the chloroplast genome Minute inversions of four to six base pairs have & (Shimada Downie Sugiura, 1991; 1996a; at al., been linked small stem-loop secondary to struc- AT Small et al., 1998). Because of their high con- tures conmionly referred to as hairpins (Kelchner make non-genic regions must tent, a significant & Wendel, 1996). Hairpins consist of a stem com- A contribution to the high overall frequency of and posed of nearly adjacent inverted repeats producing T in the chloroplast genome. Kajita (1998) et al. stem-loop a structure with a particularly small loop. AT 67% reported an content of the IrnL-tniF in may become This loop inverted by recombination, spacer and IrnL intron, Kelchner and Clark (1997) and the inversion may be so small that either it AT 70.5% reported composition in the intron of & escapes notice during alignment (Kelchner Wen- chloroplast gene rpll6 bamboos, and Small in et al. & del, 1996; Kelchner Clark, 1997), or the inverted AT 77.1% found an (1998) incredible content in sequence matches particular bases of the uninvert- the intergenic spacer trnT-trnL in G<\ssypium. Un- ed sequence, resulting in a confusing array of mi- AT doubtedly, unequal tendency toward this rich- nute gaps Golenberg (see et 1993). al., DNA ness in non-genic chloroplast has several as Identifying minute inversions can require careful undetermined yet implications for phylogenetic when sequence attention aligning data, particularly analysis of non-coding sequence data. At a mini- alternative gap weighting schemes of an align- if mum, introduces a strong base composition bias it ment program have been not rigorously explored. into the analysis. Candidates hidden for a inversion severa ad- 1 may Substitutions demonstrate rather high levels jacent nucleotide substitutions, a series of tiny cpDNA of homoplasy in non-coding regions due to gaps, or a gap that demonstrates no repeat aspect the frequency of inferred multiple-hit sites (nucle- to its sequence structure. Alternatively, one could otide sites experiencing multiple substitution investigate these probable secondary structures by events). Multiple-hit occur even very low sites at hand or with a secondary structure computer pro- estimates of percent sequence divergenc-e (Kel- gram. Failure recognize minute inversions a to in & chner, 1996; Kelchner Clark, 1997), suggesting sequence data set has several repercussions for accepted coding that the region estimates of phylogenetic analysis, discussed fully in Kelchner 10-15%" "around sequence divergence optimal for and Wendel and summarized (1996) here Anal- in may phylogenetic signal be inadequate measures Non-Coding Sequence ysis of Data. for phylogenetic utility of a non-coding region. Finally, small inversions associated with hairpins Precise understanding of mechanisms underlying may be highly susceptible to reversal and parallel- DNA multiple-hit substitutions in non-coding is within a study group, even the interspecific at lacking. However, attributes of the molecular evo- & & level (Kelchner Wendel, 1996; Kelchner non-coding manner lution of regions influence the Clark, 1997; Sang 1997; Dumolin-Lapegue et al., of nucleotide mutation or the distribution of nucle- et al., 1998). This susceptibility to reversal or par- otide substitution events in an intron or intergenic due allelism to the persistence of the mutational is — spacer. Stem sequence and loop regions may dif- & trigger (Kelchner Clark, 1997) the nearly ad- — ferentially permit mutations, resulting in non-ran- jacent inverted repeats after the initial inversion domly and non-independent distributed nucleotide event. substitutions. Statistical significance of differential may mutation rates in loops relative to stems be NUCLKOTIDE SUBSTITUTIONS tested for an adequate distribution model (see 01m- Nucleotide substitutions rally reported stead et al. s (1998) test for stochastic mutation in common as being more in non-coding than in cod- the chloroplast genes ndhF and rhcL), yet has rare- & cpDNA ing regions (Wolfe et al., 1987; Zurawski Clegg, ly, if ever, been performed on non-coding & & 1987; Olmstead Palmer, 1994; Hoot Douglas, data sets. 1998; however, see Sang et al., 1997, for an excep- In addition to secondary structure affecting the number random tion). Surprisingly, a of studies report nu- distribution of nucleotide substitutions, cleotide substitutions as being just equal to or less there may be constraints on the type of mutation frequent than length mutations in closely related an individual experiences. For example, site there & taxonomic groups (Curtis Clegg, 1984; Wolfe at is a correlation between transilion/transversion ra- Volume Number 4 Kelchner 487 87, 2000 Non-Coding Chloroplast DNA Evolution and neighboring base composition in non-cod- certain inserted or deleted tandem-repeat length tios ing regions (Morton, 1995a, b; Morton et aL, 1997; mutations (Palmer, 1985; Blasko et al., 1988). Savolainen et al., 1997). The correlation suggests However, Wolfson et al. (1991), Sang et al. (1997), A T SSM that nucleotides flanked by and/or will dem- and Kelchner and Clark (1997) suggested is more mechanism onstrate a significant tendency toward transversion a likely for length mutation in mutations. Such a tendency limits possible nucle- their studies of chloroplast introns and intergenic otide replacements at these sites, increasing the spacers, chance of parallelism and reversals, particularly if One would the site experiences multiple hits. also ALIGNMENT be more com- expect transversion substitutions to mon AT many in data sets of high content. There are philosophies for sequence align- much ment, and of the literature centers on the proper application of computer software for this INTRAMOLECULAR RECOMBINATION The purpose. structure present in a non-coding cpDNA Intramolecular recombination on an extra-re- sequence makes an excellent example for it gional or genomic scale has been suggested be- discussing what believe to be the fundamental I tween adjacent or nearby repeats in the chloroplast problem of most computer alignment programs: de- genome (Howe, 1985; Palmer et aL, 1985; Palmer fining the nucleotide as a discrete and independent The et al., 1987; Blasko et al., 1988; Ogihara et al., character. identification of secondary structure & Kanno and mechanisms may 1988; Milligan et 1989; Hirai, 1992; mutational in the data greatly al., & & Kanno 1993; Morton Clegg, 1993; Hoot improve on current algorithmic alignments of gaps. et al., Palmer, 1994). In the context of non-coding se- and thus on assessment of character homology. Many quence comparison, such a large-scale recombi- have found software, particularly versions CLUSTAL Thompson nation involving the particular region of study could of (Higgins et 1992; et al., result in indels of surprising size that contain se- 1994), to be of help at least initially with the al., quence content not readily identifiable in origin. alignment of non-coding sequences. The alignment may Recombination events operate on a then subjected an "improvement by hand" finer is to to scale within a discrete non-coding region. Occa- position gaps (e.g., Samuel et al., 1997; Downie et & sionally one infers extensive deleted sequence in al., 1998; Bayer Starr, 1998; Kajita et al., 1998). an alignment with no apparent mechanistic expla- This procedure saves time the sequences are sim- if when become numerous nation, presence of a small or moderately sized in- ilar in length, but indels version, or a large insertion showing congru- in the data matrix the difficulties of alignment dra- little ence with surrounding sequence pattern. Such matically increase. This because most alignment is mutations suggest intramolecular recombination, software initially regards each character in the ma- and they frequently occur in the loop regions of trix as an independent unit, unless otherwise spec- probable secondary structures. Sequences involved ified by particular position or gap weighting in stem-loops may be particularly susceptible to re- schemes defined by the user. The software is in- when combinatlon events due the conserved inverted capable of determining mutations other than to repeats and mutationally flexible loop. Therefore, substitutions have arisen, such as non-independent such structures could experience interactive recom- insertions, deletions, or inversions correlated with SSM bination with other stem-loops, particularly with and secondary structure. Appropriate weight- those existing in complementary sequence position. ing for these mutations that could be incorporated Recombination involving the entire loop of a sec- into an alignment algorithm at present, unde- is, may ondary structure occur, particularly in struc- veloped. minute mod- The method Wheeler tures with long stems, resulting in or Elision of et al. (1995) at- erate-sized inversions in both intron and intergenic tempts to improve gap placement and indel homol- & spacer regions (Natali et 1995; Kelchner ogy by alignment software. The Elision method uses al., & Wendel, 1996; Kelchner Clark, 1997; Sang et standard alignment algorithms to produce a series Such homoplasious competing alignments based on varying gap 1997). incidents are often of al., & & (Kelchner Wendel, 1996; Kelchner Clark, weighting schemes. These competing alignments 1997; Sang et al., 1997; Dumolin-Lapegue et al., are then combined in a single matrix and an anal- 1998) due to the persistence of the mutational trig- ysis is performed, with the effect that support is ger; in this case, the hairpin stem. increased for aligned regions that most frequently among Intramolecular recombination a notable appear the various gap-weighting schemes, is alter- native to slipped-strand mispairing as a source for This method aims at objectivity, but makes no im- 488 Annals of the Garden Missouri Botanical provement on the alignment algorithm's inabihty alignment issues: examples from non-coding to assess mutation types other than independent point cpDNA DATA substitutions. Mutations in non-coding regions are influenced by surrounding sequence structure and & Here present examples (Kelchner Wendel, I frequently occur not as independent base mutations & 1996; Kelchner Clark, 1997; Kelchner, unpub- but as linked multinucleotide mutation events, like lished data) to illustrate the inference of mutational the insertion of a repeat unit (Kelchner, 1996; Kel- cpDNA mechanisms non-coding sequences and in & chner The many Clark, 1997). likelihood that demonstrate the practice of applying mechanistic non-coding mutations are derived from sequence _*jji*j* »*u* _.j explanations alignment and homolocv assess- to r ^ iragments that are m• serted, deleted, mverted, or ... ment. Nuclieoti• dies i• n li ower-case bi oli di pri• nt are in- ol,1herwi.se rearranged1, negat.es xtihe assumptXi* on r ol . . , ferred insertions; underlined leotides indicate discrete, independent nucleotide characters under- probable progenitor sequence of an insertion *^^ or, lying alignment algorithms, as well as any ex- all Examples 4 and in 5, call attention to a particular tension of those algorithms like the Elision method. sequence mterest. of minimum, At a those using sequence alignment A common type of insertion in non-coding programs homology establish putative of char- to ^P^^A a direct repeat of a neighboring sequence is acters in their data matrix should experiment with & 1^" Golenberg Hoot al, 1993; et a wide variety of gap-weighting options. These op- C'^yP^ g^P' may mu- Douglas, 1998). These often take the form of vari- tions, however, not reveal the underlying able-length strings of a mononucleotide repeat unit mechani occasionmg sequence tational rear- ^ (Example rangements in chloroplast non-coding regions. They 1). may, however, facilitate the rapid alignment of seg- ments of the matrix that share consistent sequence EXAMPLE 1. and integrity thus pinpoint regions of variable TTAAAAAAAAA TTGA 1. length that require special consideration. Alternatively, some have avoided alignment pro- TTAAAAAAAAAA--TTGA 2. grams entirely and describe aligning sequences by TTAAAAAAAA TTGA & 3. hand Golenberg 1993; Hodges Ar- (e.g., et al., & nold, 1994; Kelchner Clark, 1997). This ap- TTAAAAAAAAAAAATTGA 4. proach facilitates a careful study of the matrix as forms and increases the researcher's familiarity it Homology can be highly uncertain for these re- with mutations in the sequences. However, align- nucleotides. Therefore, such regions are ei- ment by hand, especially when dealing with con- P*^^*^^ removed from consideration as potential phy- ^^""^ siderably divergent taxa or with the presence of a logenetic characters conservative approach) or (a number great of length mutations, can be tedious included as coded gap characters corresponding to and time consuming. becoming Kelchner and Clark (1997) suggested that aware- ^'^'^^^ ^^ ^he repeat string (often highly homoplasious in the context of a resulting topology). ness of the proposed mutational mechanisms active Uncertainty of homology exacerbated by potential in non-coding regions can be useful for inferring is PCR inaccuracies of enzymatic processes during and and positioning gaps ultimately assessing in amplification and sequencing, which can gen- homology. Golenberg were also (1993) the et al. first to erate variable-length repeat strings independent of detail a criterion for aligning gaps in non-coding When cpDNA sequence matrices. Based on their example, Kel- the template's constitution. strings chner (1996) and Kelchner and Clark (1997) mod- of adjacent mononucleotide repeats are highly var- dth ified the alignment criterion for chloroplast rpllG iable in length in a matrix and h or excee intron sequences. Hoot and Douglas (1998) also re- range demonstrated above, they become more likely SSM vised Golenberg et al.s (1993) method of gap align- to experience further mutation. For this rea- ment, framing the beginnings of a nomenclatural son, is perhaps most reasonable to remove such it procedure for defining gap categories. Although a areas from consideration in a phylogenetic analysis. nomenclatural system is not requisite for gap treat- Insertions can also be muhinucleotide repeat ment in a phylogenetic analysis, may be useful units of a neighboring sequence, as demonstrated it Example in collating information of inferred mutational in 2 by the inserted repeat unit ataaa & ;hanisms if universally applied in non-coding ("Type lb" gap; Golenberg et al., 1993; Hoot DNA studies. Douglas, 1998). Volume Number 4 87, Kelchner 489 2000 Non-Coding Chloroplast DNA Evolution — EXAMPLE common 2. repeat unit a mutation type non-cod- in B C ing regions. alignment options or were used If ATAAAACAAA GAGCG 1. for phylogenetic analysis, the content of the inser- ATAAAATAAAataaaGAGCG 2. tion would be of unexplainable origin (though still and possible) the potential of incorrectly assessing ATAAAATAAA GAGCG 3. nucleotide homology in the region may be consid- ATAAAATAT^ GAGCG erable. 4, Any gap of the positions in this particular ex- An inserted repeat of this nature could be exten- ample would not affect a topology generated from may sive in length and be difficult to recognize as may these four taxa, but gap positioning have a a repeat unit during alignment example, have (for I more significant effect in a larger matrix of distantly 73 bp identified a inserted repeat [unpublished The related taxa. position of the gap in alignment data] in the trnT-trnL intergenic spacer Myopor- in 3A and may detection of the repeat unit also be A by aceae). repeat unit very nature shares nu- its relevant in determining a weighting scheme for and cleotide content order with flanking sequence; these non-independent characters. may therefore, multiple gaps be inferred by pairing may Length mutations overlap with one another segments an of inserted repeat with progenitor its to create a progressive-step indel. In the more ex- sequence. This particularly problematic the is in- if treme homology cases, appraisal of in these regions sertion or progenitor has experienced subse- its can be very or impossible (Palmer difficult, et al., quent nucleotide substitutions. & Downie 1985; 1996b; Kelchner et al., Clark, Even when a single gap inferred, positioning is Example 4 1997). demonstrates a probable pro- may of the gap hide evidence that the insertion is which gressive-step indel in two possible place- Example a repeat unit. 3 reproduced from Kel- is TTGA. ments exist for the repeat Note that the un- chner and Clark and demonstrates how (1997) a derlined sequence a direct repeat of the is may repeat unit be obscured in a sequence matrix. TCGTAATTGA preceding sequence in the matrix. EXAMPLE EXAMPLE 3. 4. A. AATCGTAATTGA AACAGA - 1. - GGTTATGA ATTAACA 1. AATCGTAATTGA AACAGA 2. GGTTATAA ATTAACA AATCGTAATTGA TCGTAATTGA AACAGA 2. 3. GGTTATAA ATTAACA AATCGTAATTGA TCGTAATTGA AACAGA 3. 4. GGTTATAA ATTAACA AATCGTAATTGA TCGTAATTGA ttgaAACAGA 4. 5. TTGA B If part of the underlined in sequences 3 and 4 moved from is its current position to align GGTTAT GA ATTAACA 1. with ttga sequence in the possibility that the 5, sequence ttga a direct repeat of the preceding is GGTTAT AA ATTAACA 2. may sequence be obscured; however, alignment this GGTTATAA ATTAACA choice would not be impossible. As the preceding 3. sequence to the underlined 10 bp repeat does not GGTTATAA ATTAACA we 4. contain this additional ttga repeat, can infer that two separate events have given an rise to initial C 10 bp insertion in sequences 3 and followed by 4, GGTTA TGA ATTAACA an additional 4 bp insertion In sequence Wheth- 5. 1. TTGA er ttga itself or the preceding the sub- is GGTTA TAA ATTAACA 2. sequent inserted mutation is impossible to deter- mine. In this case, either alternative alignment of GGTTATAA ATTAACA 3. TTGA the unit would cause no effect in a phylo- genetic analysis; most important here is to dis- it GGTTATAA ATTAACA 4. cern the two length mutation events. any poten- If Alignment possibilities A, B, and C were equally tially informative nucleotide substitutions were W CLUSTAL probable using (Thompson et al., present in either of the repeat units in Example 4, A Only 1994). alignment reveals the insertion a these substitutions should be excluded from a phy- is 490 Annals the of Garden Missouri Botanical logenetic analysis on the basis that nucleotide ho- sequence Sequences 2 and 3 probably share a 1. mology of the repeats not discernable. similar origin as a repeat of the preceding sequence is The example above homology may TTAAT. The Example suggests that events, aligned as they are in A be indicated by the length of insertions or deletions 6A, are probably non-homologous. re-alignment in a gap, although such an assumption is not with- could be performed to accommodate the two sepa- out risk. Example 5 below demonstrates multiple rate indel events (Example 6B), even though the possible alignments of the gait repeat unit (repre- insertions are of the same length and the alignment & sented individually by sequences and with infers an additional gap (see Hoot Douglas, 2, 3, 4) sequence the insertion in 1998). 1. EXAMPLE EXAMPLE 6B. 5. L AGATTGATTGATTATTATACTGATTATGC GGTTAAT tctat TCTATCT C 1 . GGTTAAT ttaat TCTATCT 2. CAGATT ATGC 2. GGTTAAT ttaat TCTATCT CAGATTgatt ATGC 3. 3. GGTTAAT TCTATCT 4. CAGATT gatt ATGC 4, GGTTAAT TCTATCT CAGATT ATGC 5. 5. There a hazard that minute inversions (Kel- is Again, actual homology impossible assess to is & chner Wendel, 1996) can be completely ob- GATT with confidence, for there exist three repeat scured in a matrix they introduce no gaps during if units in the insertion in sequence 1. In cases like alignment, particularly alternative gap-weighting if homology on often inferred the basis this, is oi schemes have not been rigorously pursued. pre- If minimum number and gaps length of indel of re- sent and unrecognized in a data matrix, minute in- quired to position the repeat. Hence, the gatt re- may versions overweigh a particular mutation by peats in sequences and 4 would be aligned 2, 3, interpreting the single mutation event (an inversion) one above the other and on one side of the gap to apomorphies as multiple of adjacent nucleotide When number reduc^e the of inferred indel events. Example below substitutions. 7 illustrates a situa- coding indels as characters, this would be a rea- tion in which sequences 2 and 3 share the inversion sonable solution in lieu of other evidence for indel TTGG CCAA & (from Kelchner Wendel, 1996). to origin, and the repeat gatt would be treated as ho- EXAMPLE mologous ior those sequences that contain it. 7. may Equal be length of insertions not strong ev- TAATATT TTGG AATATTA 1. homology idence of their (Kelchner, 1996; Kel- & & chner Clark, 1997; Hool Douglas, 1998). Con- T7VATATT CCAA AATATTA 2. Example 6A. sider the insertions in TAATATT CCAA AATATTA 3. EXAMPLE 6A. TAATATT TTGG AATATTA 4. GGTTAAT tctat TCTATCT 1. TAATATT TTGG AATATTA 5. GGTTAAT TCTATCT 2. If the inversion is of sufficient length to introduce multiple gaps in the matrix (see Golenberg et al., GGTTAAT ttaat TCTATCT 3. 1993; Sang el 1997), two possibilities can oc- al., GGTTAAT TCTATCT 4. cur: the gaps will be misaligned to parts of the in- verted sequence sharing spurious sequence simi- GGTTAAT TCTATCT 5. larity with the uninverted sequences; there will or, 6A Alignment of the insertions in Example re- be inference of an inserted sequence of unknown sults in the probably mistaken homology of indels origin the inverted nucleotides), which (in reality, in sequences 2 and 3 with that of sequence The corresponds with a deletion in the homologous un- 1. insertion in sequence 1 likely arose from an in- inverted sequences. Each possibility will lead to may serted repeat of the sequence to the right of the inaccurate assessment of homology and poten- gap, TCTAT. This would be a more parsimonious tially have a considerable effect on phylogeny es- number explanation, in terms of total of mutation timation. many events, than to infer a single inserted repeat fol- Regions in the matrix demonstrating in- lowed by two adjacent nucleotide substitutions in dependent variable-length insertion and deletion Volume Number 4 87. Kelchner 491 2000 Non-Coding Chloroplast DNA Evolution events will likely be associated with secondary mutations, and multiple-hit point substitutions, any may structures, specifically with loop regions of stem- of which obscure evolutionary history. loops (Kelchner, 1996; Downie et al, 1996b; Kel- Inversions may show high levels of parallel- (3) & chner Clark, 1997). Identification of flanking se- ism and reversal, and their phylogenetic may utility form not be particularly robust. Undetected minute in- locate the boundaries for the gion and aid in versions may be buried within a data matrix and aligning the Indels. Discerning probable SSM-sus- consequently treated as multiple base substitution informat synapomorphies instead of a single mutational ence and of parallel reversed insertions or dele- event. may tions. Nucleotide substitutions be under pe- (4) Perhaps methods gap of or character weighting culiar constraints not fully understood. There ev- is and alignment based on mechanisms of mutation id of a bias in non-coding regions involving 1 can be incorporated into software designed for non- transition/transversion substitution ratios due to the A coding sequence alignment, particularly by includ- influence of neighboring bases. particular base AG an may ing evaluation of values for proba})le sec- experience substitution events multiple times ondary However, structures. the diversity of rates in closely related lineages, reaching saturation long and types of molecular evolution in non-coding re- before the expected saturation level for the remain- gions may be profound. As with coding DNA, we ing sequence. A base-composition bias toward A/T are far from understanding all forces directing non- content clearly present in non-coding cpDNA. is coding molecular evolution to a degree that we can. Selective pressures exerted on non-coding re- with any may certainty, assign probabilities to individual gions be largely a function of the physical mutations. sequence and structure of the possible functionality Considering that alignment of sequence data is of introns and intergenic spacers. Reliance on fundamental to the entire phylogeny estimation pro- methodology developed coding sequence, which for cess, authors should more fully describe the steps includes estimates of constraints on coding se- taken to align their sequence data in order to pro- quence evolution, transition/transversion and ratios, vide necessary information for the assessment of mutation probabilities, inappropriate the is for proposed their reconstructions of phylogenies. analysis of non-coding regions. Phylogenetic estimations based on genetic dis- cpDNA Analysis of Non-Coding Sequence Data tance measures of non-coding sequences must be approached with care. Superficial appli- The mechanisms of evolution described above cation of models for maximum likelihood (ML; Fel- number & have a of significant implications for the senstein, 1981) or neighbor-joining (NJ; Saitou phylogenetic analysis of non-coding sequence data. Nei, 1987) could easily produce erroneous phylo- Among these are the following: genetic estimations several key assumptions un- if Shpped-strand mispairing can be (1) the result derlying the methodology are violated. of persistent mutational triggers (especially when For example, most models consider a nucleotide & the trigger sequence is located in the stem of a site as the unit of evolution (Ritland Eckenwald- stem-Ioop secondary structure). This can introduce 1992), a consideration contradicted by er, that is homoplasy from parallelisms and reversals into any the mode of non-coding sequence evolution. Sim- phylogenetic estimations that include gap-coded models based on commonly plistic the calculated characters in the matrix. Multiple indel events in a Kimura estimates (Kimura, 1980) and Jukes-Cantor & may localized region obscure homology of length estimates (Jukes Cantor, 1969) assume an equal 25% Non-independence mutations. of these mutations frequency each for nucleotide type throughout introduces the issue of relative weighting of nucle- the sequence and generate base mutation proba- otide characters linked in a repeat unit, each from assumption. Because non-coding if bilities this cpDNA base is treated as a character in an analysis. Weight regions can demonstrate much higher A/T of the unit taken as a single character is also an content, this assumption clearly contradicted. is issue if the unit is included in the analysis as a Furthermore, transition/transversion ratios in non- coded gap character. coding regions can considerably from coding differ & Secondary structure shows nonrandom mu- ones Hoot Douglas, and may (2) (see 1998), tation in the form of compensatory mutation and vary between discrete non-coding regions of the possible homogenization of sequence necessary for chloroplast genome. Among-site mutation rate het- stem formation. Loop sequence is available for mul- erogeneity is highly probable, especially if regions tiple mutations in the form of inversions, length of conservation and hot spots for mutation exist in

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.