Worm1:1,42–50;January/February/March2012;G2012LandesBioscience Toward 959 nematode genomes Sujai Kumar, Georgios Koutsovoulos, Gaganjot Kaur and Mark Blaxter* InstituteofEvolutionaryBiology;UniversityofEdinburgh;Edinburgh,UK Keywords: nematode, genome, next-generation sequencing, second-generation sequencing, wiki Raw sequencingcosts have droppedfive orders ofmagnitude13 The sequencing of the complete genome of the nematode inthepasttenyears,whichmeansthatitisnowaviableresearch Caenorhabditis elegans was a landmark achievement and usheredinaneweraofwhole-organism,systemsanalysesof goal to obtain genome sequences for all nematodes of interest, the biology of this powerful model organism. The success of rather than just a few model organisms. Inspired by large-scale theC.elegansgenomesequencingprojectalsoinspiredcom- genome initiatives for other major taxa14-17 we have initiated a munities working on other organisms to approach genome push to sequence, in the first instance, 959 nematode genomes. sequencingoftheirspecies.ThephylumNematodaisrichand Why (only) 959 genomes? The adult hermaphrodite C. elegans diverseandofinteresttoawiderangeofresearchfieldsfrom has 959 somatic cells, and one of the first major projects that basic biology through ecology and parasitic disease. For all turned C. elegans from a local curiosity into a key global research thesecommunities,itisnowclearthataccesstogenomescale organismwasthedecipheringofthenear-invariantdevelopmental datawillbekeytoadvancingunderstanding,andinthecase ©ofpara site2s,dev0eloping1neww2aysto cLontrolaorcunredisedases. ecesll line agBe that igiveos risesto thcese aidulet cellsn, starticng freom th.e fertilised zygote. In an analogous way we hope that a nematode The advent of second-generation sequencing technologies, phylogeny (the evolutionary lineage of the extant species) with improvements in computing algorithms and infrastructure andgrowthinbioinformaticsandgenomicsliteracyismaking 959ormorespecieswillbesimilarlycatalyticindrivingnematode the addition of genome sequencing to the research goals of research programs across the spectrum of basic and applied any nematode research program a less daunting prospect. science. Obviously, as sequencing technologies improve and To inspire, promoteDand cooordin atengenoomic stequ endcing isbecomte mroreiaccbessibleu, we wtillemove.beyond this initial goal of across the diversity of the phylum, we have launched a 959, especially with over 23,000 described species and an community wiki and the 959 Nematode Genomes initiative estimated one to two million undescribed species in the phylum. (www.nematodegenomes.org/).Justasthedecipheringofthe The goal of this article is to describe the current status of developmental lineage of the 959 cells of the adult nematode genome research, to encourage everyone to sequence hermaphroditeC.eleganswasthegatewaytobroadadvances their favorite nematode and to share genome sequencing in biomedical science, we hope that a nematode phylogeny experiences and data. We show how inexpensive it has become with (at least) 959 sequenced species will underpin further to obtain high quality draft genomes and introduce the 959 advances in understanding the origins of parasitism, the dynamics of genomic change and the adaptations that have Nematode Genomes wiki18 as a way to collate and track madeNematodaone of themost successfulanimal phyla. sequencing projects worldwide. The Genomes We Have The worm community has been at the forefront of animal Introduction genome sequencing since 1998, when Caenorhabditis elegans was the first metazoan to be fully sequenced.2 The C. elegans genome The phylum Nematoda is fascinating because it is the most and its extensive annotation is accessible through the WormBase ubiquitous, numerous and diverse of all animal phyla, present in portal.19 WormBase was one of the first databases to integrate justaboutevery ecological niche onoursmallplanet.Nematodes genomic, genetic and phenotypic data, and its curators aim to have been indispensable for research programs in developmental catalog and link all C. elegans literature and research, including biology,1 genome biology,2,3 evolutionary genomics,4 neurobiol- large scale analyses such as modENCODE.20 ogy,5 aging,6 health7 and parasitology.8 SincethereleaseoftheC.elegansgenome,nineothernematode In the last two decades, DNA sequencing technology has genomes have been published, including six species parasitic in evolved dramatically and allowed us to create genome resources plantsandanimals(Table1).OnlyC.elegansandC.briggsaehave for many of these nematodes, which have transformed our been sequenced to “finished” status2 with all sequence data understanding of the biology of not just this phylum, but of all organizedintochromosome-sizedpieces.Theremainingeightare organisms.9-12 high-quality draft genomes and all ten can be accessed at WormBase19 through graphical genome browsers and via bulk- data downloads. *Correspondenceto:MarkBlaxter;Email:[email protected] On the 959 Nematode Genomes wiki, 26 additional genome Submitted:11/23/11;Accepted:11/24/11 http://dx.doi.org/10.4161/worm.19046 sequencing projects are listed with publicly available draft 42 Worm Volume1Issue1 REVIEW Table1.Publishednematodegenomes Species Systematic Year Technology Genome Numberof ScaffoldN50 ATcontent Numberof position Published Size(Mbp)† chromosomesor (kbp)†,4 (%)† genes/ (BlaxterClade, scaffoldsin proteins† HelderClade*) assembly† Caenorhabditis V,9E 19982 Sanger 100 6chromosomes 17,494 64.6 20,461/25,244 elegans Caenorhabditis V,9E 20033 Sanger 108 6chromosomes+5 17,485 62.6 NA/21,986 briggsae fragments Brugiamalayi III,8 200752 Sanger 96 27,210 38 69.4 18,348/21,332 Meloidogynehapla IV,11 200824 Sanger 53 3,452 38 72.6 NA/13,072 Meloidogyneincognita IV,11 200827 Sanger 82 9,538 13 68.6 NA/21,232 Pristionchuspacificus V,9B 200853 Sanger 172 18,083 1,245 57.2 NA/24,217 Caenorhabditis V,9E 201033 Illumina 80 33,559 9 63.7 22,662/26,265 angaria Trichinellaspiralis I,2A 201136 Sanger 64 6,863 6,373 66.1 16,380/16,380 Bursaphelenchus IV,10D 201137 Roche454, 75 5,527 950 59.6 18,074/18,074 xylophilus Illumina Ascarissuum III,8 201125 Illumina 273 29,831 408 62 18,542/18,542 © 2012 Landes Bioscience. *NematodasystematiccladesasdefinedbyBlaxteretal.21andHoltermanetal.29†Nucleargenomeonly,notincludingmitochondriaorendosymbionts, computedfromWormBasereleaseWS227whereavailableorfromdataURLsinTable2.4ScaffoldN50:Halftheassemblyisinscaffoldsofthissizeorlarger inthenucleargenome. assemblies and, in some cases, annotations (Table 2). Seven of Meloidogyne incognita genome27 and the presence of obligate, Do not distribute. these are hosted at WormBase and the rest are available either vertically-transmitted symbiont alphaproteobacterial Wolbachia through the 959 Nematode Genomes website or at sequencing and their genomes inside the cells of many filarial nematodes.28 centerwebsites.Thesedraftgenomesareexpectedtohaveatleast Apart from understanding genome organization and origins, 95% of the genes present in multi-gene sized contigs, but the richer sampling of sequenced genomes would allow a better exactorderingandchromosomallocationofthecontigsisusually understanding of the phylogeny of Nematoda and the evolution- notknown.Despitetheseshortcomings,draftdataareveryuseful arydynamicsofimportanttraits—suchasparasitismofplantsand for comparative and evolutionary genomics or simply for animals—and developmental modes. The most comprehensive identifying single genes of interest. Early access to these data molecular phylogenies of Nematoda have been based on a single not only allows researchers to test hypotheses, but, equally gene, the ~1600 bp nuclear small subunit rRNA locus,21,29,30 but importantly, to identify potential problems early in the assembly this single locus is insufficient for robust resolution of the deep process. Researchers wishing to publish analyses using pre- divergences in the phylum. Methods for generating large-scale publication draft data should contact the sequencing center or multi-genephylogeniesnowexistandcanbeappliedeventodraft lead researchers for permissions (and also to see if better versions genomes. of these data are or will soon be available). Whenlargescaleexpressedsequencetag(EST)sequencingwas first performed,31 new insights into nematode gene evolution Why We Need More: One Nematode Genome Does became possible from the partial catalogs of expressed genes.22 Not a Phylum Make More nematode genomes, even draft-quality ones, take those insights several steps further, as they allow analysis of complete C.elegansisanexcellentmodelnematodeanditsgenome,withits genecatalogs.Additionally,whole-genomeresourcesincludenon- wealth of annotation, is an excellent model genome. However genic regions, such as the regulatory regions upstream of genes, C. elegans cannot be taken to represent all nematode genomes which are often even more conserved than coding regions and (Fig.1). We know that C. elegans is quite derived within may function in developmental regulation.32,33 Nematoda21 and that it lacks many genes shared between other nematodes and other Metazoa.22 Nematode genomes have been How to Make More sized from 20 Mb to 500 Mb (i.e., one fifth to five times that of C. elegans).23 Sequenced nematode genomes range from C. elegans was sequenced over a decade ago using Sanger Meloidogyne hapla24 at 54 Mb to Ascaris suum25 at 273 Mb. sequencing. At that time, sequencing the genome to ten-fold Interesting genomic features in other species include chromatin depth took a decade and cost roughly $10 M. Once the diminutioninAscarissuumandotherascaridids(i.e.,thegermline sequencing was completed, similar resources were required to has a larger genome than the soma26), aneuploid triploidy in the finish the genome. Sanger sequencing is still considered the gold www.landesbioscience.com Worm 43 Table2.Nematodespeciesforwhichpublishedordraftgenomedataarepubliclyavailable Species(Strain) Status GenomedataandbrowserURLs Ascarissuum(Davis) ongoing www.ncbi.nlm.nih.gov/nuccore/320321071 Ascarissuum(Victoria/Ghent) published ftp://ftp.wormbase.org/pub/wormbase/species/a_suum/ Ascarissuum(WTSI) ongoing www.sanger.ac.uk/resources/downloads/helminths/ascaris-suum.html Brugiamalayi(TRS) published www.wormbase.org/db/gb2/gbrowse/b_malayi ftp://ftp.wormbase.org/pub/wormbase/species/b_malayi/ Bursaphelenchusxylophilus(Ka4C1) published www.genedb.org/Homepage/Bxylophilus Caenorhabditisangaria(PS1010) published www.wormbase.org/db/gb2/gbrowse/c_angaria/ ftp://ftp.wormbase.org/pub/wormbase/species/c_angaria/ Caenorhabditisbrenneri(PB2801) complete www.wormbase.org/db/gb2/gbrowse/c_brenneri/ ftp://ftp.wormbase.org/pub/wormbase/species/c_brenneri/ Caenorhabditisbriggsae(AF16) published www.wormbase.org/db/gb2/gbrowse/c_briggsae/ ftp://ftp.wormbase.org/pub/wormbase/species/c_briggsae/ Caenorhabditiselegans(N2) published www.wormbase.org/db/gb2/gbrowse/c_elegans/ ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/ Caenorhabditisjaponica(DF5081) complete www.wormbase.org/db/gb2/gbrowse/c_japonica/ ftp://ftp.wormbase.org/pub/wormbase/species/c_japonica/ Caenorhabditisremanei(PB4641) complete www.wormbase.org/db/gb2/gbrowse/c_remanei/ ftp://ftp.wormbase.org/pub/wormbase/species/c_remanei/ © Cae2norhabd0itissp111(JU13723) Loangoingndgeenome.swustl.e du/Bpub/orgainismo/Invertesbrates/cCaenoirhabeditis_spn11_JU13c73/ e. Caenorhabditissp5DRD-2008(JU800) ongoing nematodes.org/downloads/959nematodegenomes/blast Caenorhabditissp7(JU1286) ongoing ftp://ftp.wormbase.org/pub/wormbase/species/c_sp7/ Caenorhabditissp9(AC-2009JU1422) inannotation ftp://ftp.wormbase.org/pub/wormbase/species/c_sp9/ Dictyocaulusviviparus(Notspecified) ongoing www.nematode.net/ Dirofilariaimmitis(EdinDburgh/TRSo/Basel) ninaonnotatiotn disnetmatordes.oirg/dbownloadus/959netmateodegen.omes/blast Globoderapallida(Notspecified) ongoing www.sanger.ac.uk/sequencing/Globodera/pallida/ Hemonchuscontortus(Moredun) ongoing www.sanger.ac.uk/Projects/H_contortus/ ftp://ftp.wormbase.org/pub/wormbase/species/h_contortus/ Heterorhabditisbacteriophora(M31e) inannotation genome.wustl.edu/genome.cgi?GENOME=Heterorhabditis%20%20bacteriophora Howardulaaoronymphium(Jaenike) ongoing nematodes.org/downloads/959nematodegenomes/blast Litomosoidessigmodontis(labstrainestablished ongoing nematodes.org/downloads/959nematodegenomes/blast fromCameroonbyOdileBain) Loaloa(Nutman/Broad) inannotation www.broadinstitute.org/annotation/genome/filarial_worms/MultiHome.html Meloidogynehapla(VW9) published www.hapla.org/ www.wormbase.org/db/gb2/gbrowse/m_hapla/ ftp://ftp.wormbase.org/pub/wormbase/species/m_hapla/ Meloidogyneincognita(Morelos) published www.inra.fr/meloidogyne_incognita www.wormbase.org/db/gb2/gbrowse/m_incognita/ ftp://ftp.wormbase.org/pub/wormbase/species/m_incognita/ Nippostrongylusbrasiliensis(labstrain) ongoing www.sanger.ac.uk/sequencing/Nippostrongylus/brasiliensis/ Onchocercaochengi(Cameroon/wild) inannotation nematodes.org/downloads/959nematodegenomes/blast Onchocercavolvulus(Nutman/Broad) ongoing www.broadinstitute.org/annotation/genome/filarial_worms/MultiHome.html Onchocercavolvulus(WTSI/wildLiberia) ongoing www.sanger.ac.uk/resources/downloads/helminths/onchocerca-volvulus.html Oscheiustipulae(CEW1) ongoing nematodes.org/downloads/959nematodegenomes/blast Pristionchuspacificus(California) published www.pristionchus.org/ www.wormbase.org/db/gb2/gbrowse/p_pacificus/ ftp://ftp.wormbase.org/pub/wormbase/species/p_pacificus/ Strongyloidesratti(ED321) inannotation www.sanger.ac.uk/resources/downloads/helminths/strongyloides-ratti.html Teladorsagiacircumcincta(Notspecified) ongoing www.sanger.ac.uk/resources/downloads/helminths/teladorsagia-circumcincta.html Trichinellaspiralis(Notspecified) published www.nematode.net/ www.wormbase.org/db/gb2/gbrowse/t_spiralis/ ftp://ftp.wormbase.org/pub/wormbase/species/t_spiralis/ Trichurismuris(Eisolate) ongoing www.sanger.ac.uk/Projects/T_muris/ Wuchereriabancrofti(Nutman/Broad) inannotation www.broadinstitute.org/annotation/genome/filarial_worms/MultiHome.html Note:Seewww.nematodes.org/nematodegenomes/index.php/Strains_with_Dataforanup-to-datelist. 44 Worm Volume1Issue1 © 2012 Landes Bioscience. Do not distribute. Figure1.SystematictreeofNematodaindicatingcurrentsequenced,inprogressorproposedgenomesequencingprojects.Thesystematicarrangement ofNematodaisbasedonDeLeyandBlaxter;51thecladesdefinedbyBlaxteretal.21andvanMengenetal.54areindicated.Foreachmajorgroupwe summarizethetrophicecology(microbivore,predator,fungivore,plantparasite,non-vertebrateparasiteorassociate,vertebrateparasite)andthe numberofspeciesforwhichgenomeprojectsarereportedinthe959NematodeGenomeswiki.FiguredevelopedfromBlaxter.55 standardintermsofquality,butbecauseofthehighcostandtime time-consuming step and uses only whole-genome shotgun investment, it is unlikely that there will be any more Sanger- sequencing. As a result, current genome projects typically result sequenced nematode genomes. in draft genomes with multi-gene sized contigs rather than Sequencing the C. elegans genome was based on an array of chromosome-sized sequences. The substantial additional effort mapped and ordered large-insert genomic clones, which greatly required to finish a genome is necessary if the goal is to study facilitated assembly. Most genome sequencing today avoids this chromosome organization or long-range regulation. However, www.landesbioscience.com Worm 45 manyquestionsaboutphylogenetics,geneevolutionandsharedor information,33 linking genome sequence contigs that contain novel gene functions can be approached using high-quality draft exons for a gene that cannot be joined by genome sequence data genomes generated at a tiny fraction of the time and cost of a because of repeats. finished genome. Inthelastyear,genomesequenceshavebeenpublishedforfour Second-generation sequencing platforms have dramatically nematode species. Each project used different sequencing reduced costs and increased throughput, with the trade-off of strategies. The genome of Trichinella spiralis was determined reducedreadlengthcomparedwithSangerdideoxyreads(Table3). usingtraditionalSangerdideoxysequencing,36witha33-foldbase Shorter reads mean that most genomic repeats are longer than a coverage in the final assembly. Bacterial artificial chromosome read, and the only way to attempt to resolve them in a genome clonesandmultiple-sizeinsertclonelibrarieswereusedtoscaffold assembly is to use pairs of reads sequenced from opposite ends of the64Mbgenome.TheBursaphelenchusxylophilus37genomewas fragmentsthatarelongerthan therepeats.Sophisticatedassembly sequencedusingIlluminaPEandRoche454single-endreads for programs that use high sequencing depth and multiple insert basic contig generation and Roche 454 MP for scaffolding libraries to get around the problems of sequencing errors and contigs. For Caenorhabditis angaria,33 Illumina PE (from libraries repeats have been developed specifically for second-generation with multiple insert sizes from 200–450 bp) totalling 170-fold data.34 coverage were used, and then deep transcriptome data (Illumina Each platformhas different read lengthsand error profiles that RNA-Seq)wereused toimprovethisassembly.Thiswasthefirst affect their suitability for de novo genome sequencing projects. genome project to use RNA-Seq reads to scaffold genomic TheIlluminaplatformsgeneratereadsupto150basesandarethe contigs.TwoversionsoftheA.suumgenomehavebeenreleased. workhorsesofsequencingprojects.Illuminasequencingerrorsare Wang et al.38 generated an assembly using Roche 454 and usually miscalled bases and higher read-depths are recommended Illumina data from short insert libraries and mate-pair data from © 2012 Landes Bioscience. to consensus-correct such errors. Roche 454 reads can extend to 5.5 kb libraries sequenced using Sanger dideoxy technology as 750 bases but are more expensive than the shorter-read part of an extensive transcriptome sequencing project. Jex et al.25 technologies. In Roche 454 data, sequencing depths higher than used a mix of Illumina PE 170 bp and 500 bp PE reads, 30-fold are not recommended35 because homopolymer errors scaffolded with Illumina MP data from 800 bp, 2 kb, 5 kb and accumulate and confound assembly algorithms. Life Tech’s 10 kb libraries. Interestingly, these long-insert MP libraries were SOLiD technology generates short (~75 base) reads but is not generated from DNA that was whole-genome amplified using Do not distribute. suitable for de novo genome sequencing because each base is strand-displacing isothermal amplification, a technology that represented by two “colors” (readings) and sequencing errors are holds great promise for additional nematode genome projects difficult to identify in the absence of a reference sequence. where starting materials may be limiting. Different combinations of technologies, insert lengths and So which strategy should you use? If you are on a bargain- depths of coverage can be employed to exploit the best basementbudgetandwantthemostvalueformoney,asinglelaneof characteristics of each and minimise known classes of errors. In IlluminaHiSeq2000PE(100basesplus100bases)sequencingwith particular,paired-endsequencingfromamixoflibraryinsertsizes multiplexed 300 bp and 600 bp PE libraries can provide a highly appears to be optimal for de novo assembly, using short-insert usabledraftgenome.Forexample,inourlaboratory,Caenorhabditis (200–700 bp) paired end (PE) libraries complemented by long- species5wasrecentlysequencedusingthisstrategyandresultedina insert (1–20 kb) mate pair (MP) libraries. While PE data derive draftassemblyspanning131Mbinonly16,384scaffolds,withmore fromdirectlycapturedgenomefragmentsandarethuslargelyfree thanhalftheassemblyinscaffoldslargerthan31kb(S.Kumar,A. of chimaeras, construction of MP libraries involves additional Cutter,M-A.Felix,M.Blaxter,unpublished;seewww.nematodes. manipulations, including circularisation of long DNAs, that can org/nematodegenomes/index.php/Caenorhabditis_sp._5_DRD- result in high proportions of chimaeric or aberrantly short virtual 2008_JU800).Roche454dataaremoreexpensivebaseforbase inserts. MP data are typically used for scaffolding contigs than Illumina, but usually assemble into longer contigs at the generated from PE data, which are generated in higher coverage. same effective coverage. Mate pair data serve to scaffold the Deep sequencing of the transcriptome can also yield scaffolding primary contigs generated from single-end Roche 454 or PE Table3.Currentsequencingcosts,throughputandreadlengths Technology Readlength Errormodel Recommended Costper Costper100 Throughput Timeper100Mb (bases) sequencing base Mbgenome (bases/day/ genomeper depth (£/J/$) (£/J/$) instrument) instrument(days) Sanger 1000–1500 Goldstandard,accuratebase 10X 1023 106 106 103 dideoxy quality,typicalerrorprobability 0.0001 Roche454 400–1000 Homopolymererrors 20–30X 1025 2!104 5!108 5 FLX/FLX+ Illumina 100–150 Typicalerrorprobability0.01, 50–100X 1027 103 1010 1 HiSeq2000 Lowerqualitytowardendof read 46 Worm Volume1Issue1 Illumina sequencing and significantly improve the assembled content closer to 50%. Ultra-long reads could span repetitive fragment lengths. Construction of MP libraries for Illumina or regionsandthuseaseassembly.PacBioSMRT47isthefirstsingle- Roche 454 sequencing requires much more and higher quality moleculetechnologytobereleasedcommerciallyandcanproduce starting DNA than do PE libraries and MP libraries are more readsover2kb,buthasrelativelylowthroughputandanaccuracy costly to produce. In addition to genomic sequencing, a single far lower than second-generation sequencers. Another single- lane of Illumina HiSeq2000 RNA-Seq data (100 base PE reads moleculetechnologyisfromOxfordNanopore,48whichpromises from300bplibrariesmadefromRNApooledfrommanystages) high-quality, high-throughput reads with no theoretical length is highly recommended for aiding assembly and annotation. limit. However, the company has not yet released any data or metricsonerrorrates,readlengths,throughputorcosts,soallwe The Costly As: Assembly, Annotation and Analysis cansayisthatthetechnologywillchangegenomesequencingifit works. The generation of the raw sequence data are rapidly becoming a marginal cost in a genome sequencing program: the relative cost Keeping Track Using the 959 Nematode and time taken for assembly, annotation and analysis post- Genomes Wiki sequencingismuchgreater.Rawreadsneedtobequalitychecked, checkedforcontaminantsandassembled.Theassembliesneedto We set up the 959 Nematode Genomes wiki (959NG wiki) at be verified and possibly repeated in turn and then annotated to www.nematodegenomes.org to keep track of genomes being identify genes and other genomic features of interest such as sequenced and published.18 As second-generation sequencing regulatory regions, repeats and transposons. The intricacies of becomes more accessible, we anticipate that several hundred assembly algorithms, assembly strategies and annotation options nematodes will be sequenced in the next few years. Although the © 2012 Landes Bioscience. arebeyondthescopeofthisarticle,butawidevarietyofexcellent INSDCdatabases(GenBank/ENA/DDBJ)arethefirstsourcesthat methods and tools have been published.35,39-46 The most most of us turn to when looking for sequences from or related to comprehensive recent analysis of assembly strategies for complex ourorganismofinterest,genomesareoftendepositedthereonlyat eukaryotic genomes was the Assemblathon.34 If all goes well, a thetime of publication, and this canyield theimpression that no nematode genome can, in theory, be shepherded from DNA projectisunderway.Wehopethe959NGwikiwillenablegenomic extraction to an annotated assembly and be ready for further resourcestobesharedpre-publication,avoidduplicationofeffort, Do not distribute. analyses in as little as a month. allow new genomes to be proposed and forge collaborations Both bioinformatics and sequencing technologies are changing betweenresearchersinterestedinthesamespeciesorclade. so rapidly that recommendations on strategies may quickly Web-based databases for tracking genome sequencing projects become obsolete. For bioinformatics solutions, the most up-to- are not a new idea and we know of at least four (diArk,49 date tips and recommendations will probably come from low- Genomes OnLine Database (GOLD),50 The International latency sources such as conference presentations, blog posts, Sequencing Consortium (www.intlgenome.org) and Genome forums,crowdsourcedQ&Asitesandcollaborativewikissuchas News Network (www.genomenewsnetwork.org). However, only 959 Nematode Genomes (as described below). For sequencing, thefirsttwoarecurrentlymaintainedandallfourrelyoncentrally the two emerging wet laboratory technologies that could updating the database whenever a new genome is proposed or dramatically change how we sequence nematodes are whole released.As959NGisawikiwhere anyone cansign intoaddor genome amplification and single-molecule sequencing. edit information, we anticipate that the site will stay up to date, Whole-genomeamplification(WGA)hasbeenusedtogenerate and because it is specific to nematodes, it is more likely to be of sufficient quantities of DNA from tissues of single A. suum for use to the nematode community. MP libraries.25 This opens the prospect of using WGA on single Thehomepageofthewikihaslinkstoalltheimportantpartsof nematode specimens, though the mass of DNA input from A. the site (Fig.2). The 959NG wiki is organized taxonomically suum used by the BGI team (200 ng) is much more than is using the systematics proposed by De Ley and Blaxter51 (Fig.1). present in most individual nematodes (one C. elegans adult We also use the clades defined by Blaxter et al.21 and Holterman contains ~200 pg). Proof that amplification does not overly bias et al.29 and derive other systematic information from the NCBI sequencing coverage or generate chimaeras that mislead assembly taxonomy. The tree is editable, so if new evidence is found for algorithms would be a major advance. Sequencing from single resolutionofanyparaphyleticnodesorrearrangements,additional nematodes will reduce the assembly issues arising from extremes nodes can be added or taxa reassigned simply by changing the of heterozygosity observed in wild populations and will allow parent taxon for that set of taxa. For any taxon (class, order, researchers to select specimens directly from environmental family, genus, etc.), the wiki lists all the species and strains that samples. have active or proposed genome projects. For published projects Thepromiseofsingle-moleculesequencingisthegenerationof weencourageadditionofPubMedIDsforpublicationsandlinks ultra-long reads (several kilobases) from templates that have to genome browsers and data repositories. For species where a undergone a minimum of in vitro manipulation. It is well genomeprojectis“ongoing”weassociatetheprojectwithastrain recognized in second-generation sequencing that the several PCR ofthespecies,topermitmorethanoneindependentprojecttobe steps involved can exclude some regions of a genome from registered. Again, genome project leaders are encouraged to add sequencingandpositivelybiassequencingtoregionsthathaveGC links to project web pages and data access portals. www.landesbioscience.com Worm 47 © 2012 Landes Bioscience. Figure2.The959NematodeGenomeswikihomepage. Do not distribute. One goal of the 959NG wiki is to reduce the “activation Join the 959 Nematode Genomes Initiative energy” for starting a new genome project. Embarking on a genome scale endeavor can be daunting, but we hope that the The 959 Nematode Genomes initiative (and the959NG wiki)is 959NG wiki will promote collaboration on genomes of interest. open to all and we encourage all interested to join. Anyone can Individual researchers can “propose” a species (and strain) for view the wiki (and free registration gives editing rights). The genomesequencing andregisterinterestinspeciesthathavebeen 959NG wiki will only be as good as (and as up to date as) the proposed. By making interests known, it is more likely that information we, collectively, enter. In particular, we would fruitful collaborations will ensue. We know of two multi-center encourage registration of interest in ongoing and proposed projects, both now mature, where the proponents first met on genomesandtheactiveproposalofadditionalnematodegenomes the 959NG wiki. for sequencing. As the community of researchers producing and Finding current genome data can be frustrating. We therefore consuming new nematode genomes grows, the synergy of provide alistofavailabledataportalsforgenomes sequenced and combining skills and discoveries in data generation, assembly in sequencing. These include genome browsers (such as those and annotation will become more evident and will facilitate the provided by WormBase) as well as data download sources. The generation of new genomes. The availability of large numbers of 9595NG wiki also includes a standard BLAST search portal that phylogenetically diverse genomes will also—we hope—inspire a allowsresearcherswithspecific(gene-centered)interestsinoneor new breed of nematode genomics researchers not wedded to any many genomes to query available published and pre-publication onespeciesbuthungryfordataacrossthephylumandthuseager draft genomes. to collaborate in the analyses of new genomes. The959NGwikiisbuiltontheMediaWikiplatform(thesame The 959NG wiki will evolve as the community evolves. The tools that run Wikipedia) and uses the Semantic MediaWiki snapshotpresentedhere(Tables1and2)willsoonbeoutofdate. (SMW) extension. Each page about a strain, species, taxon, The open architecture of the SMW system will allow us to add researcher or sequencing center has semantic properties associated additional concepts and linking data between genomes and thus with it that can be queried in new ways to extract inferred the wiki should also be able to nucleate and serve special interest relationshipsandnewpropertiescanbeaddedtoanypagewithout groups where the core themes are not simply systematic, but changing any database schemas. For example, we plan to add ratherothersharedphenotypes(reproductivemode,parasitism)or lifecycle strategies (as shown in Fig.1) to taxon pages to enable specificgenesetsorsystems.Byidentifyingcolleagueswithshared queriessuchas“Listallongoinggenomeprojectsforplantparasitic interests,jointfundingtogeneratenematodegenomedatawillbe nematodes with genomes smaller than 100 Mb.” Other query more easily sourced. The collective experience embodied in the examplesanddetailsofhowSMWisanappropriatetechnologyfor 959NG wiki will also mean that the costs (in both consumables suchsitescanbefoundinKumar,SchifferandBlaxter.18 andhumaneffort)ofdenovosequencingagenomewillcontinue 48 Worm Volume1Issue1 to drop and multi-genome projects will become even more advice on bioinformatics analyses. We would also like to thank attractive to funding agencies and more rewarding for the Dan Lawson at the European Bioinformatics Institute for nematode genomes community. inspiring us by setting up arthropodgenomes.org using the SMW platform. The SMW community was very helpful in Acknowledgments answering questions about customising the platform. S. K. is We acknowledge the input of Philipp Schiffer in the initiation funded by a School of Biological Sciences PhD studentship and of the 959NG wiki. We thank all our colleagues around the the Overseas Research Student Awards Scheme at the University world who are sequencing nematode genomes and adding of Edinburgh; G. Kuar is funded by MRC funding to the information to the wiki. At the University of Edinburgh, we GenePool, Edinburgh (G00900740) and the EU Project especially want to thank Karim Gharbi and colleagues at the “Enhanced protective Immunity Against Filariasis” focused GenePool Genomics and Bioinformatics Facility for their research project (SICA; contract number 242131); expertise and advice on de novo genome sequencing and the G. Koutsovoulos is funded by a BBSRC School of Biological members of the Blaxter Lab for their constant support and Sciences University of Edinburgh PhD studentship. References 13. National Human Genome Research Institute. DNA 27. Abad P, Gouzy J, Aury J-M, Castagnone-Sereno P, SequencingCosts.genome.gov,2011. DanchinE,DeleuryE,etal.Genomesequenceofthe 1. SulstonJE,SchierenbergE,WhiteJG,ThomsonJN. The embryonic cell lineage of the nematode 14. The 1000 Genomes Project Consortium.. A map of metazoan plant-parasitic nematode Meloidogyne incog- Caenorhabditis elegans. Dev Biol 1983; 100:64-119; humangenomevariationfrompopulation-scalesequen- nita.NatBiotechnol2008;26:909-15;PMID:18660804; PMID:6684600; http://dx.doi.org/10.1016/0012- cing. Nature 2010; 467:1061-73; PMID:20981092; http://dx.doi.org/10.1038/nbt.1482 1606(83)90201-4 http://dx.doi.org/10.1038/nature09534 28. Fenn K. Are filarial nematode Wolbachia obligate 2. The C elegans Genome Sequencing Consortium. 15. Genome10KCommunityofScientists.Genome10K: mutualistsymbionts?TrendsEcolEvol2004;19:163- ©Gpleantfoomr me sf2eoqrueinncvees0toigfattihneg n1beimolaotgoyd.e2SCci.enecleeg a1n9s:9L8a; aA10Pn0r0o0poVsaelrtdetboraOtebStapieencieWs.hJolHse-eGreedno2m 00e9B;Se1q0u0e:n65ce9i-7fo4r;os62;00P4M.0I1cD.0:012670i124e8; http:/n/dx.doi.ocrg/10.1e016/j.tre.e. 282:2012-8; PMID:9851916; http://dx.doi.org/10. PMID:19892720;http://dx.doi.org/10.1093/jhered/esp086 29. Holterman M, van der Wurff A, van den Elsen S, 1126/science.282.5396.2012 16. Robinson G, Hackett K, Purcell-Miramontes M, van Megen H, Bongers T, Holovachov O, et al. 3. SteinLD,BaoZ,BlasiarD,BlumenthalT,BrentMR, BrownS,EvansJ,GoldsmithM,etalCreatingabuzz Phylum-Wide Analysis of SSU rDNA Reveals Deep ChenN,etal.ThegenomesequenceofCaenorhabditis aboutinsectgenomes.Science2011;331. Phylogenetic Relationships among Nematodes and briggsae: a platform for comparative genomics. PLoS 17. CaoJ,SchneebergerK,OssowskiS,GuntherT,Bender Accelerated Evolution toward Crown Clades. Mol Biol2003;1:e45;PMID:14624247;http://dx.doi.org/ S,FitzJ,etal.Whole-genomesequencingofmultiple Biol Evol 2006; 23:1792-800; PMID:16790472; 10.1371/journal.pbio.0000045 Arabidopsis thaliana populations. Nat Genet 2011; http://dx.doi.org/10.1093/molbev/msl044 4. Cutter AD, Dey A, MurDray R. Eovolution ofnthe o43:9t56-63 ; dPMID:2i1874s002; thttp:/r/dx.doii.orgb/10. u30. vantMegeenH,va.ndenElsenS,HoltermanM,Karssen CaenorhabditiselegansGenome.MolBiolEvol2009; 1038/ng.911 G,MooymanP,BongersT,etal.Aphylogenetictreeof 26:1199-234; PMID:19289596; http://dx.doi.org/10. 18. Kumar S, Schiffer P, Blaxter M. 959 Nematode nematodesbasedonabout1200full-lengthsmallsubunit 1093/molbev/msp048 Genomes:asemanticwikiforcoordinatingsequencing ribosomalDNAsequences.Nematology2009;11:927- 5. WhiteJG,SouthgateE,ThomsonJN,BrennerS.The projects.NucleicAcidsResearch2011. 50;http://dx.doi.org/10.1163/156854109X456862 Structure of the Nervous System of the Nematode 19. HarrisTW,AntoshechkinI,BieriT,BlasiarD,ChanJ, 31. Parkinson J, Mitreva M, Whitton C, Thomson M, Caenorhabditis elegans. Philos Trans R Soc Lond B ChenW,etal.WormBase:acomprehensiveresource DaubJ,MartinJ,etal.Atranscriptomicanalysisofthe Biol Sci 1986; 314:1-340; http://dx.doi.org/10.1098/ for nematode research. Nucleic Acids Res 2010; 38: phylum Nematoda. Nat Genet 2004; 36:1259-67; rstb.1986.0056 D463-7;PMID:19910365;http://dx.doi.org/10.1093/ PMID:15543149;http://dx.doi.org/10.1038/ng1472 6. Crittenden SL, Eckmann C, Wang L, Bernstein D, nar/gkp952 32. Vavouri T, Walter K, Gilks W, Lehner B, Elgar G. Wickens M, Kimble J. Regulation of the mitosis/ 20. Gerstein MB, Lu Z, Van Nostrand E, Cheng C, Parallel evolution of conserved non-coding elements meiosis decision in the Caenorhabditis elegans germ- Arshinoff B, Liu T, et al. Integrative Analysis of the thattargetacommonsetofdevelopmentalregulatory line. Philos Trans R Soc Lond B Biol Sci 2003; CaenorhabditiselegansGenomebythemodENCODE genesfromwormstohumans.GenomeBiol2007;8: 358:1359-62; PMID:14511482; http://dx.doi.org/10. Project.Science2010;330:1775-87;PMID:21177976; R15; PMID:17274809; http://dx.doi.org/10.1186/gb- 1098/rstb.2003.1333 http://dx.doi.org/10.1126/science.1196914 2007-8-2-r15 7. WangMC,O’RourkeE,RuvkunG.FatMetabolism 21. BlaxterML,DeLeyP,GareyJ,LiuL,ScheldemanP, 33. Mortazavi A, Schwarz E, Williams B, Schaeffer L, Links Germline Stem Cells and Longevity in C. Vierstraete A, et al. A molecular evolutionary frame- Antoshechkin I, Wold B, et al. Scaffolding a elegans.Science2008;322:957-60;PMID:18988854; workforthephylumNematoda.Nature1998;392:71- Caenorhabditis nematode genome with RNA-seq. http://dx.doi.org/10.1126/science.1162011 5;PMID:9510248;http://dx.doi.org/10.1038/32160 Genome Res 2010; 20:1740-7; PMID:20980554; http://dx.doi.org/10.1101/gr.111021.110 8. Brooker S. Estimating the global distribution and 22. WasmuthJ,SchmidR,HedleyA,BlaxterM.Onthe diseaseburdenofintestinalnematodeinfections:adding extent and origins of genic novelty in the phylum 34. EarlD,BradnamK,StJohnJ,DarlingA,LinD,Faas up the numbers–a review. Int J Parasitol 2010; Nematoda. PLoS Negl Trop Dis 2008; 2:e258; J,etalAssemblathon1:Acompetitiveassessmentofde 40:1137-44; PMID:20430032; http://dx.doi.org/10. PMID:18596977; http://dx.doi.org/10.1371/journal. novoshortreadassemblymethods.GenomeResearch 1016/j.ijpara.2010.04.004 pntd.0000258 2011. 9. FireA,XuS,MontgomeryMK,KostasSA,DriverSE, 23. GregoryTR,NicolJ,TammH,KullmanB,KullmanK, 35. FinotelloF,LavezzoE,FontanaP,PeruzzoD,Albiero MelloCC.Potentandspecificgeneticinterferenceby LeitchI,etal.Eukaryoticgenomesizedatabases.Nucleic A,BarzonL,etalComparativeanalysisofalgorithms double-stranded RNA in Caenorhabditis elegans.. AcidsRes2007;35:D332-8;PMID:17090588;http:// for whole-genome assembly of pyrosequencing data. Nature1998;391:806-11;PMID:9486653;http://dx. dx.doi.org/10.1093/nar/gkl828 BriefingsinBioinformatics2011. doi.org/10.1038/35888 24. Opperman CH, Bird D, Williamson V, Rokhsar D, 36. Mitreva M, Jasmer D, Zarlenga D, Wang Z, 10. Hengartner MO, Ellis R, Horvitz R. Caenorhabditis BurkeM,CohnJ,etal.Sequenceandgeneticmapof Abubucker S, Martin J, et al. The draft genome of elegansgeneced-9protectscellsfromprogrammedcell Meloidogynehapla:Acompactnematodegenomefor theparasiticnematodeTrichinellaspiralis.NatGenet death. Nature 1992; 356:494-9; PMID:1560823; plant parasitism. Proc Natl Acad Sci USA 2008; 2011;43:228-35;PMID:21336279;http://dx.doi.org/ http://dx.doi.org/10.1038/356494a0 105:14802-7; PMID:18809916; http://dx.doi.org/10. 10.1038/ng.769 11. HorvitzHR,SternbergP.Multipleintercellularsignalling 1073/pnas.0805946105 37. KikuchiT,CottonJ,DalzellJ,HasegawaK,Kanzaki systems controlthe developmentof theCaenorhabditis 25. JexA,LiuS,LiB,YoungN,HallR,LiY,etalAscaris N, McVeigh P, et al. Genomic Insights into the elegansvulva.Nature1991;351:535-41;PMID:1646401; suumdraftgenome.Nature2011. OriginofParasitismintheEmergingPlantPathogen Bursaphelenchus xylophilus. PLoS Pathog 2011; 7: http://dx.doi.org/10.1038/351535a0 26. MüllerF,BernardV,ToblerH.Chromatindiminutionin e1002219; PMID:21909270; http://dx.doi.org/10. 12. BlaxterM.Nematoda:Genes,GenomesandtheEvolu- nematodes.Bioessays1996;18:133-8;PMID:8851046; 1371/journal.ppat.1002219 tion of Parasitism. Advances in Parasitology 2003; http://dx.doi.org/10.1002/bies.950180209 54:101-95. www.landesbioscience.com Worm 49 38. WangJ,CzechB,CrunkA,Wallace A,Mitreva M, 45. LinY,LiJ,ShenH,ZhangL,PapasianC,DengHW. 51. De Ley P, Blaxter M. Systematic position and HannonG,etal.DeepsmallRNAsequencingfrom Comparativestudiesofdenovoassemblytoolsfornext- phylogeny.In:LeeDL,ed.Thebiologyofnematodes: the nematode Ascaris reveals conservation, functional generation sequencing technologies. Bioinformatics Taylor&Francis,2002:1-30. diversification, and novel developmental profiles. 2011;27:2031-7;PMID:21636596;http://dx.doi.org/ 52. Ghedin E, Wang S, Spiro D, Caler E, Zhao Q, Genome Res 2011; 21:1462-77; PMID:21685128; 10.1093/bioinformatics/btr319 CrabtreeJ,etal.Draftgenomeofthefilarialnematode http://dx.doi.org/10.1101/gr.121426.111 46. Martin JA, Wang Z. Next-generation transcriptome parasite Brugia malayi. Science 2007; 317:1756-60; 39. ChaissonMJ,BrinzaD,PevznerP.Denovofragment assembly. Nat Rev Genet 2011; 12:671-82; PMID: PMID:17885136; http://dx.doi.org/10.1126/science. assemblywithshortmate-pairedreads:Doestheread 21897427;http://dx.doi.org/10.1038/nrg3068 1145406 lengthmatter?GenomeRes2009;19:336-46;PMID: 47. Korlach J, Bjornson K, Chaudhuri B, Cicero R, 53. Dieterich C, Clifton S, Schuster L, Chinwalla A, 19056694;http://dx.doi.org/10.1101/gr.079053.108 FlusbergB,GrayJ,etal.Real-timeDNAsequencing DelehauntyK,DinkelackerI,etal.ThePristionchus 40. FlicekP,BirneyE.Sensefromsequencereads:methods from single polymerase molecules. Methods Enzymol pacificus genome provides a unique perspective on for alignment and assembly. Nat Methods 2009; 6: 2010; 472:431-55; PMID:20580975; http://dx.doi. nematode lifestyle and parasitism. Nat Genet 2008; S6-12; PMID:19844229; http://dx.doi.org/10.1038/ org/10.1016/S0076-6879(10)72001-2 40:1193-8; PMID:18806794; http://dx.doi.org/10. nmeth.1376 48. MagliaG,HeronA,StoddartD,JaprungD,BayleyH. 1038/ng.227 41. Pop M. Genome assembly reborn: recent computa- Analysisofsinglenucleicacidmoleculeswithprotein 54. van Megen H, van den Elsen S, Holterman M, tional challenges. Brief Bioinform 2009; 10:354-66; nanopores. Methods Enzymol 2010; 475:591-623; Karssen G, Mooyman P, Bongers T, et al. A PMID:19482960;http://dx.doi.org/10.1093/bib/bbp026 PMID:20627172; http://dx.doi.org/10.1016/S0076- phylogenetictreeofnematodesbasedonabout1200 42. AlkanC,SajjadianS,EichlerE.Limitationsofnext- 6879(10)75022-9 full-length small subunit ribosomal DNA sequences. generation genome sequence assembly. Nat Methods 49. HammesfahrB,OdronitzF,HellkampM,KollmarM. Nematology 2009; 11:927-50; http://dx.doi.org/10. 2011;8:61-5;PMID:21102452;http://dx.doi.org/10. diArk 2.0 provides detailed analyses of the ever 1163/156854109X456862 1038/nmeth.1527 increasing eukaryotic genome sequencing data. BMC 55. Blaxter M. Nematodes: the worm and its relatives. 43. Picardi E, Pesole G. Computational methods for ab ResearchNotes2011;4:338;PMID:21906294;http:// PLoS Biol 2011; 9:e1001050; PMID:21526226; initioandcomparativegenefinding.MethodsMolBiol dx.doi.org/10.1186/1756-0500-4-338 http://dx.doi.org/10.1371/journal.pbio.1001050 2010; 609:269-84; PMID:20221925; http://dx.doi. 50. Liolios K, Chen IM, Mavromatis K, Tavernarakis N, org/10.1007/978-1-60327-241-4_16 HugenholtzP,MarkowitzV,etal.TheGenomesOnLine 44. Miller JR, Koren S, Sutton G. Assembly algorithms Database (GOLD) in 2009: status of genomic and fornext-generationsequencingdata.Genomics2010; metagenomic projects and their associated metadata. ©95:315 -27;2PMID0:202112412; http:2//dx.doi.o rg/1L0. aNuncleicAciddsRes201e0;38:D3s46-54; PMIBD:199149i34;oscience. 1016/j.ygeno.2010.03.001 http://dx.doi.org/10.1093/nar/gkp848 Do not distribute. 50 Worm Volume1Issue1