University of California, St. Petersburg Academic University San Diego Anton Bankevich Pavel Pevzner Sergey Nurk Son Pham Dmitry Antipov Glenn Tesler Alexey Gurevich Mikhail Dvorkin Alexander Kulikov University of South Carolina Valery Lesin Max Alekseyev Srergey Nikolenko Andrey Prjibelski Funding Alexey Pyshkin Alexander Sirotkin Russian Federation grant Nikolay Vyahhi 11.G34.31.0018 NIH 3P41RR024851-02S1 3 Genome sequencing ◦ Conventional ◦ Metagenomics ◦ Single Cell De Bruijn graphs and SPAdes Results on E. coli and an uncultivated marine genome 5 Multiple (Unsequenced) Genome Copies Read Generation Reads Fragment Assembly Sequenced Genome GGCATGCGTCAGAAACTATCATAGCTAGATCGTACGTAGCC … … Traditional microbial genome sequencing requires isolating a pure strain and reproducing it in a ‘culture’ under controlled laboratory conditions. But >99% of bacteria cannot be cultured. Metagenomics enables studies of organisms not easily cultured in a laboratory. It uses collective sequencing of non-identical cells. Until recently, metagenomics was the only option for studies of microbial communities. However, metagenomics provides information about only a few genes (across many species). gene 1 gene 2 gene 3 17 Traditional microbial genome sequencing requires isolating a pure strain and reproducing it in a ‘culture’ under controlled laboratory conditions. But >99% of bacteria cannot be cultured. Metagenomics enables studies of organisms not easily cultured in a laboratory. It uses collective sequencing of non-identical cells. Until recently, metagenomics was the only option for studies of microbial communities. However, metagenomics provides information about only a few genes (across many species). 18 Traditional microbial genome sequencing requires isolating a pure strain and reproducing it in a ‘culture’ under controlled laboratory conditions. But >99% of bacteria cannot be cultured. Metagenomics enables studies of organisms not easily cultured in a laboratory. It uses collective sequencing of non-identical cells. Single Cell Bacterial Genomics: Complementing gene-centric metagenomics data with whole-genome assembly of uncultivated organisms. 1000s of genes sequenced from a single cell 19 1. Random hexamer primers 2. Phi29 DNA polymerase Strand displacing 3. Isothermal reaction (30°C) Genomic DNA F.B. Dean, J.R. Nelson, T.L. Giesler, R.S. Lasken (2001). Genome Res. 11:1095-9 F.B. Dean, S. Hosono, L. Fang, et al. (2002). PNAS 99:5261-6 Roger Lasken’s lab developed Multiple Displacement Amplification (MDA). More effective than PCR for amplification of a single cell. TempliPhi and GenomiPhi (GE Healthcare) and REPLI-g (Qiagen). REPLI-g: fragments ~ 2 – 100 kb; usually > 10 kb on average. 24 Lander-Waterman model predicts ~15x coverage needed for complete E. coli assembly. Assumes uniform coverage; error-free reads; and no repeats in genome. For our single cell E. coli assembly, 600x average coverage still has some gaps since there are positions with no reads. 28 A cutoff threshold will eliminate about 25% of valid data in the single cell case, whereas it eliminates noise in the normal multicell case. Chitsaz, et al., Nat. Biotechnol. (2011). 29 E. coli, Lane 1 E. coli, Lane 2 200000 250000 180000 s s r 160000 r 200000 ai ai p 140000 p d d a 120000 a 150000 e e r 100000 r f f o o 80000 100000 r r e e b 60000 b m m u 40000 u 50000 N N 20000 0 0 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 Insert length Insert length E. coli, Lane 3 E. coli, Lane 4 250000 250000 s s r 200000 r 200000 ai ai p p d d a 150000 a 150000 e e r r f f o o 100000 100000 r r e e b b m m u 50000 u 50000 N N 0 0 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 Insert length Insert length E. coli, Lane 6 E. coli, Lane 7 180000 180000 160000 160000 s s r 140000 r 140000 ai ai p p 120000 120000 d d a a e 100000 e 100000 r r f f o 80000 o 80000 r r e e 60000 60000 b b m m u 40000 u 40000 N N 20000 20000 0 0 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 Insert length Insert length Normal Single Cell E. coli, Lane 8 E. coli, Lane normal E. coli, Lane 1 E. coli, Lane 2 160000 500000 200000 250000 450000 180000 140000 s s s s r r 400000 r 160000 r 200000 ai 120000 ai ai ai p p 350000 p 140000 p d 100000 d d d a a 300000 a 120000 a 150000 e e e e of r 80000 of r 250000 of r 100000 of r r 60000 r 200000 r 80000 r 100000 e e e e b b 150000 b 60000 b m 40000 m m m u u 100000 u 40000 u 50000 N N N N 20000 50000 20000 0 0 0 0 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 Insert length Insert length Insert length Insert length E. coli, Lane 3 E. coli, Lane 4 250000 250000 s s r 200000 r 200000 Chitsaz, et al., Nat. Biotechnol. (2011). ai ai p p d d a 150000 a 150000 e 30 e r r f f o o 100000 100000 r r e e b b m m u 50000 u 50000 N N 0 0 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 Insert length Insert length E. coli, Lane 6 E. coli, Lane 7 180000 180000 160000 160000 s s r 140000 r 140000 ai ai p p 120000 120000 d d a a e 100000 e 100000 r r f f o 80000 o 80000 r r e e 60000 60000 b b m m u 40000 u 40000 N N 20000 20000 0 0 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 Insert length Insert length E. coli, Lane 8 E. coli, Lane normal 160000 500000 450000 140000 s s r r 400000 ai 120000 ai p p 350000 d 100000 d a a 300000 e e r 80000 r 250000 f f o o 200000 r 60000 r e e b b 150000 m m 40000 u u 100000 N N 20000 50000 0 0 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 Insert length Insert length
Description: