AMOS Assembly Validation and Visualization Michael Schatz Center for Bioinformatics and Computational Biology University of Maryland April 7, 2006 Outline AMOS Introduction (cid:132) Getting Data into AMOS (cid:132) AMOS Validation Pipeline (cid:132) Mate-Based Validation (cid:132) C/E Statistic (cid:132) Read Alignment Validation (cid:132) Read Depth Validation (cid:132) AMOS Assembly Investigator (cid:132) Contigs, Inserts, Histograms, SNP Barcode, Features (cid:132) Misassembly Walkthrough (cid:132) Demo (cid:132) Outline AMOS Introduction Slides available at: (cid:132) Getting Data into AMOS (cid:132) http://www.cbcb.umd.edu/~mschatz/ AMOS Validation Pipeline (cid:132) Mate-Based Validation (cid:132) C/E Statistic (cid:132) Read Alignment Validation (cid:132) Read Depth Validation (cid:132) AMOS Assembly Investigator (cid:132) Contigs, Inserts, Histograms, SNP Barcode, Features (cid:132) Misassembly Walkthrough (cid:132) Demo (cid:132) AMOS Goals Open Source Assembly Package (cid:132) http://amos.sourceforge.net (cid:132) Modular design (cid:132) Flexibility in building “pipelines” (cid:132) Well defined input/output formats (cid:132) General use: does not depend on databases, (cid:132) proprietary data formats, specialized hardware, etc. Modular Design scaffolder overlapper Bank reads error A A viewer inserts P P corrector I overlaps I contigs ... scaffolds etc. contigger etc. Converters: Celera Assembler, .ACE, TIGR Assembler, Trace Archive (cid:132) Overlapper (cid:132) Contigger (Minimus) (cid:132) Consensus caller (cid:132) Comparative assembler (AMOScmp) (cid:132) Mate-pair based QC tool (cid:132) Viewer (Assembly Investigator) (cid:132) Pipeline executor (cid:132) Assembly Data Conversions .seq bank2fasta .qual .fasta AMOS .mates bank-transact toAmos bank2contig Message AMOS .asm .contig File Bank bank-report .frg bank2scaff .afg .contig .bnk/ .scaffolds.fasta .ace CA Assembly w/ Surrogates to AMOS Message File (.asm, .frg) $ toAmos –a prefix.asm –f prefix.frg –o prefix.afg –S Finished Assembly to AMOS Message File (.contig, .frg) $ toAmos –f prefix.frg –c prefix.contig –o prefix.afg AMOS Message File to Bank $ bank-transact –m prefix.afg –b prefix.bnk -c AMOS Validation Pipeline Automatically scan an assembly to locate (cid:132) misassembly signatures for further analysis and correction cavalidate prefix (.frg, .asm) (cid:132) Load CA Assembly Data into Bank 1. Evaluate Mate Pairs & Libraries 2. Evaluate Read Alignments 3. AMOS Analyze Depth of Coverage 4. Bank List Surrogates 5. Load Misassembly Signatures into Bank 6. amosvalidate prefix (.afg) (cid:132) Same as cavalidate, except skips surrogates (cid:132) Mate-Happiness: asmQC Evaluate mate “happiness” across assembly (cid:132) Happy = Correct orientation and distance (cid:132) Finds regions with multiple: (cid:132) Compressed Mates (cid:132) Expanded Mates (cid:132) Invalid same orientation ((cid:198) (cid:198)) (cid:132) Invalid outie orientation ((cid:197) (cid:198)) (cid:132) Missing Mates (cid:132) Linking mates (mate in a different scaffold) (cid:132) Singleton mates (mate is not in any contig) (cid:132) Regions with high C/E statistic (cid:132) Mate-Happiness: asmQC Excision: Skip reads between flanking repeats (cid:132) Truth (cid:132) Misassembly: Compressed Mates, Missing Mates (cid:132) Mate-Happiness: asmQC Insertion: Additional reads between flanking repeats (cid:132) Truth (cid:132) Misassembly: Expanded Mates, Missing Mates (cid:132)
Description: