JVI Accepted Manuscript Posted Online 22 June 2016 J. Virol. doi:10.1128/JVI.00832-16 Copyright © 2016, American Society for Microbiology. All Rights Reserved. Distinct Viral Lineages from Fish and Amphibians Reveal the 1 Complex Evolutionary History of Hepadnaviruses 2 3 4 5 Jennifer A. Dill,a Alvin C. Camus,a John H. Leary,a Francesca Di Giallonardo,b Edward C. 6 Holmes,b Terry Fei Fan Ng,ac 7 D 8 o 9 Department of Pathology, University of Georgia, Athens, GA, USAa, Marie Bashir Institute for w n 10 Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental lo a 11 Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, d e 12 Australiab, Current address: Division of Viral Diseases, NCIRD, CDC, Atlanta, GA, USAc d 13 fr o 14 m 15 Running Head: Fish and Amphibian Hepadnaviruses h t t 16 p : / 17 Corresponding author: /jv 18 Terry Fei Fan Ng i.a s 19 College of Veterinary Medicine, University of Georgia m 20 501 D. W. Brooks Drive, Athens, GA 30602 .o r 21 Athens, GA 30602 g / 22 o n 23 Division of Viral Diseases M 24 Centers for Disease Control and Prevention a r 25 1600 Clifton Rd. NE, Mailstop G-10 c h 26 Atlanta, GA 30329 2 8 27 , 2 28 [email protected], Phone: 404.639.4880 , FAX: 404.639.4011 0 1 9 b y g u e s t 1 29 Abstract 30 Hepadnaviruses (HBVs) are the only animal viruses that replicate their DNA by reverse 31 transcription of an RNA intermediate. Until recently, the host range of hepadnaviruses was 32 limited to mammals and birds. We obtained and analyzed the first amphibian HBV genome, as 33 well as several prototype fish HBVs that allow the first comprehensive comparative genomic D o 34 analysis of hepadnaviruses from four classes of vertebrates. Bluegill hepadnavirus (BGHBV) w n lo 35 was characterized from in-house viral metagenomic sequencing. The African cichlid a d e 36 hepadnavirus (ACHBV) and the Tibetan frog hepadnavirus (TFHBV) were discovered using in d f r o 37 silico analyses of the whole-genome shotgun and transcriptome Shotgun assembly databases. m h t 38 Residues in the hydrophobic base of the capsid (core) proteins, designated motif I, II and III, are tp : / / 39 highly conserved, suggesting that structural constraints for proper capsid folding are key to jv i. a s 40 capsid protein evolution. Surface proteins in all vertebrate HBVs contain similar predicted m . o r 41 membrane topology, characterized by three transmembrane domains. Most striking was that the g / o 42 BGHBV, ACHBV, and the previously described white sucker hepadnavirus did not form a fish- n M a 43 specific monophyletic group in the phylogenetic analysis of all three hepadnaviral genes. r c h 44 Notably, BGHBV was more closely related to the mammalian hepadnaviruses, indicating that 2 8 , 2 45 cross-species transmission events have played a major role in viral evolution. Evidence of cross- 0 1 9 46 species transmission was also observed with TFHBV. Hence, these data indicate that the b y g 47 evolutionary history of the hepadnaviruses is more complex than previously realized and u e s 48 combines both virus-host co-divergence over millions of years and host species jumping. t 49 2 50 Importance 51 Hepadnaviruses are responsible for significant disease in humans (hepatitis B virus) and have 52 been reported from a diverse range of vertebrates as both exogenous and endogenous viruses. 53 We report the full length genome of a novel hepadnavirus from a fish and the first hepadnavirus 54 genome from an amphibian. The novel fish hepadnavirus, sampled from bluegill, was more D o 55 closely related to mammalian hepadnaviruses than to other fish viruses. This phylogenetic w n lo 56 pattern reveals that although hepadnaviruses have likely been associated with vertebrates for a d e 57 hundreds of millions of years, they have also been characterized by species jumping across wide d f r o 58 phylogenetic distances. m h t t p 59 : / / jv 60 Key Words: Hepatitis B Virus, Hepadnaviridae, fish hepadnavirus, amphibian hepadnavirus, i. a 61 evolution, phylogeny s m . 62 o r g / o n M a r c h 2 8 , 2 0 1 9 b y g u e s t 3 63 Introduction 64 65 The Hepadnaviridae are characterized by extremely small (3-3.3kbp), partially double- 66 stranded DNA (dsDNA) genomes. Viral particles are spherical, with a diameter of approximately 67 42 nm, each containing a single copy of the genome covalently linked to the viral reverse 68 transcriptase (RT) that provides DNA polymerase activity (1, 2, 3). The hepadnaviruses are D o w 69 unique among animal viruses in that they replicate their DNA by reverse transcription of an n lo 70 RNA intermediate and comprise the only Group VII animal virus (dsDNA-RT virus) of the a d e d 71 Baltimore system, which classifies viruses according to their genome composition and method of f r o m 72 replication (1, 4). h t t p 73 At present, the Hepadnaviridae are subdivided into two genera (5-7): the genus :/ / jv i. 74 Orthohepadnavirus that infects mammals, including humans, and the genus Avihepadnavirus a s m 75 that infects birds (1, 8-13). Within both genera, the circular viral genomes exhibit multiple .o r g / 76 overlapping open reading frames (ORF), comprising the polymerase, pre C/C, and pre S/S ORFs o n M 77 that encode the viral polymerase (P), core (C), and surface (S) proteins, respectively. In the a r c 78 Orthohepadnavirus genus, a fourth ORF encodes protein X. Despite these similar genome h 2 8 79 organizations, nucleotide sequence identity between hepadnavirus genera is limited, with the , 2 0 1 80 exception of some highly conserved functional domains (14, 15). 9 b y 81 Human hepatitis B virus (HBV) affects more than one third of the human population and g u e s 82 infections have the potential to cause both severe chronic liver disease and hepatocellular t 83 carcinoma (1-3). Interestingly, chronic infection by woodchuck hepatitis B virus (WHBV) can 84 result in similar pathologic changes in that species (11, 12). Liver pathology is less commonly 85 induced by avihepadnaviruses, although duck hepatitis B virus (DHBV) can cause liver necrosis 86 (9). The first hepadnavirus from a bony fish, the white sucker (Catostomus commersonii), class 4 87 Actinopterygii, was described in 2015, although no disease association was observed (16). To- 88 date, no exogenous reptilian or amphibian hepadnaviruses have been reported. 89 In addition to exogenous hepadnaviruses, a number of endogenous sequences (eHBV), in 90 the form of endogenous viral elements (EVEs), have been identified in animal genomes. 91 Hepadnaviral EVEs have been documented in turtles, crocodiles, snakes, and birds (14, 17-19), D o w 92 although no mammalian, amphibian or fish endogenous hepadnaviruses have yet been detected. n lo a 93 The presence of EVEs has helped provide a time-scale of hepadnavirus evolution, particularly as d e d 94 some of the endogenization events may have occurred as early as 200 million years ago (15). fr o m 95 Hence, although there is clear evidence for some cross-species transmission (20), current data h t t p 96 suggests that hepadnavirus evolution largely follows a pattern of virus-host co-divergence that : / / jv 97 extends to at least the origin of the ray-finned fishes. i.a s m . o 98 To better understand the host range and evolution of the hepadnaviruses in vertebrates, r g / 99 particularly the extent of virus-host co-divergence, we investigated new fish and amphibian o n M 100 (exogenous) hepadnaviral homologs that are highly divergent from the hepadnaviruses a r c h 101 previously described in mammals and birds. These include the second fish hepadnavirus, from 2 8 , 102 bluegill sunfish (Lepomis macrochirus), the first amphibian hepadnavirus from a Tibetan frog 2 0 1 103 (Nanorana parkeri), and analysis of a hepadnavirus-like sequence from Lake Tanganyika 9 b y 104 African cichlid fish (Ophthalmotilapia ventralis). g u e s t 105 106 107 108 109 5 110 Materials and Methods 111 112 Sample collection 113 Tissues used in this study were originally part of an investigation into suspected virally 114 induced orocutaneous neoplasms in two populations of bluegill. In total, 46 fish were examined, 115 including 40 bluegill, five related Lepomis spp. and one largemouth bass Micropterus salmoides D o w 116 (Table 1). Five bluegill from a mixed species aquarium exhibit were submitted to the Aquatic n lo 117 Pathology Service at the College of Veterinary Medicine, University of Georgia, in 2009. In a d e d 118 2014, similar lesions were observed on bluegill by a pond owner in Waleska, Georgia. Between f r o m 119 April 2014 and July 2015, 26 bluegill, 17 with lesions and nine without, were received, along h t t 120 with one largemouth bass. An additional five bluegill, two redbreast sunfish (Lepomis auritus) p : / / jv 121 and two redear sunfish (Lepomis microlophus) were received from a commercial fish hatchery in i. a s m 122 Hawkinsville, Georgia in January 2015. Four bluegill and one green sunfish (Lepomis cyanellus) . o r 123 were received from local anglers in the Athens, Georgia area in September 2015. g / o n 124 Necropsies were performed and samples of organs and lesions were fixed in 10% neutral M a 125 buffered formalin and processed routinely for histologic evaluation. Additional samples were rc h 2 126 fixed in 2% glutaraldehyde, 2% paraformaldehyde and 0.2% picric acid in 0.1M cacodylate-HCl 8 , 2 127 buffer and processed for transmission electron microscopy (TEM). Portions of lesions were 0 1 9 128 collected separately and archived in a -80°C freezer. In addition, pooled samples of liver, spleen b y g 129 and kidney were collected from a subset of fish and frozen at -80°C. Select histologic sections u e s t 130 were later used for in situ hybridization evaluation using probes designed from PCR products. 131 Fin clip samples from two O. ventralis cichlids were provided by a local hobbyist and 132 archived at -80°C. 133 Viral metagenomic and bioinformatics analysis of next generation sequencing (NGS) data 6 134 In the absence of a definitive diagnosis for the skin lesions, metagenomic sequencing was 135 performed on seven lip lesions and one non-lesioned lip, according to previously described 136 protocols, to further investigate a potential underlying viral etiology (21-24). In brief, a tissue 137 homogenate was centrifuged through a 0.22 µm filter to enrich viral particles by size, then 138 treated with nucleases to deplete host nucleic acids. Nucleic acids from nuclease-resistant viral D o 139 particles were extracted using the QIAquick viral RNA column purification system, followed by w n lo 140 sequence-independent amplification using random priming. First strand synthesis (for both DNA a d e 141 and RNA) was performed using a 28-base oligonucleotide whose 3′ end consisted of eight d f r o 142 random nucleotides (primer N1_8N, CCTTGAAGGCGGACTGTGAGNNNNNNNN) using m h t 143 superscript III reverse transcriptase (Invitrogen) (21-24). A second strand was synthesized using tp : / / 144 Klenow fragment DNA polymerase (New England BioLabs). The resulting double-stranded jv i. a s 145 cDNA and DNA were then PCR amplified using AmpliTaq Gold DNA polymerase and a 20- m . o r 146 base primer (primer N1, CCTTGAAGGCGGACTGTGAG). A duel-indexed sequencing library g / o 147 was then prepared using the Nextera XT DNA Sample Prep Kit (Illumina). After pooling, the n M a 148 final library was sequenced using the MiSeq sequencing system, with 2 × 250 bp paired-end r c h 149 sequencing reagents (Illumina MiSeq Reagents V2, 500 cycles). 2 8 , 2 150 A total of 11 million reads were generated and analyzed as previous described (23). An 0 1 9 151 in-house analysis pipeline running on a 32-node Linux cluster was used to process the data b y g 152 (University of California, San Francisco). Adaptor and primer sequences were trimmed using u e s 153 VecScreen (25), while duplicate reads and low-sequencing-quality tails were removed using a t 154 Phred quality score of 10 as the threshold. The cleaned reads were assembled de novo using an 155 in-house sequence assembler employing an ensemble strategy (26) that consisted of 156 SOAPdenovo2, ABySS, meta-Velvet, and CAP3. The assembled sequence was compared with 7 157 an in-house viral protein sequence database using BLASTx. Viral contigs were further inspected 158 manually using Geneious (version R6; Biomatters). 159 160 Complete genome sequencing of the bluegill hepadnavirus 161 To obtain the last 1% of the genome that was not covered by NGS, PCR was performed D o 162 using primers BGHBV-CirF 5′- CAACGCCAACAGCATTTTTA-3′ and BGHBV-CirR 5′- w n lo 163 TAATATCGGTCGAGACTGCG-3′, which anchored in the polymerase and core ORFs, a d e 164 bridging the intergenic region. The resulting 373-bp amplicons were sequenced using Sanger d f r o 165 methods to confirm the circularity of the genome. m h t 166 tp : / / 167 Molecular screening of the fish hepadnavirus jv i. a s 168 Tissues from 40 bluegill, three related Lepomis species, and one largemouth bass were m . o r 169 extracted using Qiagen DNA extraction kits. Screening for BGHBV was accomplished by g / o 170 traditional PCR, targeting the polymerase with primer sets BGHBV-PolF 5′- n M a 171 TGTGGACAAAAATCCACGAA-3′ and BGHBV-PolR 5′-CGTAAAGCACCTATGGGCAT-3′ r c h 172 using a previously described touch down protocol (21). Additional primers targeting the 2 8 , 2 173 polymerase, capsid and core proteins were also designed and verified (Table 4). 0 1 9 174 Quantitative (q)PCR was used to assess the presence of viral DNA from the selected b y g 175 tissues as indicated (Table 1). Primers were designed from the polymerase gene to yield a 110 bp u e s 176 amplicon (PolQpcrF and PolNestR, Table 4). The primer set was used in a standard PCR t 177 reaction with DNA extracted from bluegill GAI-2 (referred to as the positive control). The DNA 178 was run on a 2% agarose gel, purified (Qiaquick Gel Extraction Kit) and quantitated (NanoDrop 179 2000, Thermo Fisher). DNA was adjusted to 1 ng/µl. Ten-fold dilutions of this stock were made 8 180 in water for qPCR standard curve generation. Preliminary analysis indicated that the 10-1 through 181 10-8 dilutions (10-1 -10-8 ng) would cover the dynamic (linear) range of the assay (R2 ≥ 0.95). 182 qPCR was performed on a Bio-Rad IQ5 iCycler using iQ5 system software for analysis. One µl 183 of extracted DNA was added to each 25 µl reaction mix containing iQ SYBR Green Supermix 184 (Bio-Rad) and 100 nmol each of the indicated primers. A 2-step cycling program was used as D o 185 follows: an initial 95° C for 3 min followed by 35 cycles of 95° C for 10 seconds and 60° C for w n lo 186 30 seconds. Initial screening of all samples was performed twice using one PCR well/sample. a d e d 187 Final assessment of viral DNA presence was made on samples run in triplicate. f r o 188 Endpoint PCR were performed to test the cichlids for ACHBV. Fin biopsies from two O. m h t 189 ventralis cichlids were extracted using spin columns as described above. Tissue DNA was tp : / / jv 190 screened for the presence of cichlid hepadnavirus DNA using primers specific to the cichlid i. a s 191 hepadnavirus polymerase sequence (ACHBV-PolF and ACHBV-PolR, Table 4). PCR for m . o r 192 Cytochrome b was used as a positive control to verify extraction and PCR methods (Primer g / o n 193 OVCytBF, OVCytBR, Table 4) (27). M a 194 r c h 2 195 In silico screening of public sequence data 8 , 2 196 The core, polymerase, and surface protein sequences from BGHBV were used as queries 0 1 9 197 in a BLAST analysis against the GenBank whole-genome shotgun (wgs) and transcriptome b y g 198 Shotgun assembly (TSA) databases in March 2016 to detect hepadnavirus homologs in u e s 199 amphibians and fish, employing an e-value of 10 e-4. The resulting sequences were then re- t 200 analyzed by reverse-BLAST, ORF predication, sequence comparison and alignment, as well as 201 bioinformatics analysis to validate the initial assembly. Other orthohepadnavirus and 9 202 avihepadnavirus proteins used as queries detected identical sequences as that from BGHBV (data 203 not shown). 204 205 Sequence comparisons and phylogenetic analysis. 206 Coding sequences of representative hepadnavirus core (C), polymerase (P) and surface D o 207 (S) genes were downloaded from GenBank and combined with those of BGHBV and TFHBV. w n lo 208 To be as broad as possible, the background GenBank data set included both exogenous a d e 209 Avihepadnavirus, Orthohepadnavirus, and white sucker hepatitis B virus (WSHBV) sequences, d f r o 210 as well as available avian and reptilian (crocodilian) endogenous (e) hepadnavirus sequences that m h t 211 were of sufficient length to conduct phylogenetic analyses, although sequence availability tp : / / 212 differed by gene. Although a number of snake eHBVs have been documented (14, 15), these are jv i. a s 213 highly fragmentary, contain multiple stop codons, and hence were of insufficient length to be m . o r 214 included in our phylogenetic analysis that was based on amino acid sequences (see below). A full g / o 215 list of the sequences utilized are available (Table 5). n M a r 216 Amino acid sequence alignment of the C, P and S data sets were inferred using multiple ch 2 8 217 cycles of the MUSCLE algorithm (28). Because the highly divergent nature of some sequences , 2 0 218 could compromise phylogenetic accuracy, alignment gaps and ambiguously aligned sequences 1 9 b 219 were removed using the Gblocks program with relatively relaxed settings (i.e. allowing smaller y g u 220 final blocks and less strict flanking regions) (29). This resulted in final multiple sequence e s t 221 alignments lengths of (i) P = 35 taxa, 272 amino acids; C = 34 taxa, 110 amino acids; S = 24 222 taxa, 187 amino acids. Based on these alignments, maximum likelihood (ML) phylogenetic trees 223 were estimated using PhyML (30), employing the LG+Γ model of amino acid substitution and 10
Description: