Next Generation Sequencing and Whole Genome Selection in Aquaculture Next Generation Sequencing and Whole Genome Selection in Aquaculture Edited by Zhanjiang (John) Liu © 2011 Blackwell Publishing Ltd. ISBN: 978-0-813-80637-2 Next Generation Sequencing and Whole Genome Selection in Aquaculture Edited by Zhanjiang (John) Liu Auburn University A John Wiley & Sons, Ltd., Publication Edition fi rst published 2011 © 2011 Blackwell Publishing Ltd. Blackwell Publishing was acquired by John Wiley & Sons in February 2007. Blackwell’s publishing program has been merged with Wiley’s global Scientifi c, Technical, and Medical business to form Wiley-Blackwell. Editorial Offi ce 2121 State Avenue, Ames, Iowa 50014-8300, USA For details of our global editorial offi ces, for customer services, and for information about how to apply for permission to reuse the copyright material in this book, please see our Website at www.wiley.com/wiley-blackwell. Authorization to photocopy items for internal or personal use, or the internal or personal use of specifi c clients, is granted by Blackwell Publishing, provided that the base fee is paid directly to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923. For those organizations that have been granted a photocopy license by CCC, a separate system of payments has been arranged. The fee code for users of the Transactional Reporting Service is ISBN-13: 978-0-8138-0637-2/2011. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought. Library of Congress Cataloging-in-Publication Data Next generation sequencing and whole genome selection in aquaculture / [edited by] Zhanjiang (John) Liu. p. cm. Includes bibliographical references and index. ISBN 978-0-8138-0637-2 (hardcover : alk. paper) 1. Gene mapping. 2. Fishes–Breeding. 3. Shellfi sh–Breeding. I. Liu, Zhanjiang. QH445.2.N49 2011 639.8–dc22 2010030977 A catalog record for this book is available from the U.S. Library of Congress. Set in 10 on 12 pt Dutch 801 BT by Toppan Best-set Premedia Limited Printed in •• Disclaimer The publisher and the author make no representations or warranties with respect to the accu- racy or completeness of the contents of this work and specifi cally disclaim all warranties, including without limitation warranties of fi tness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read. 1 2011 Contents Preface vii List of Contributors ix Chapter 1. Genomic Variations and Marker Technologies for Genome-based Selection 3 Zhanjiang (John) Liu Chapter 2. Copy Number Variations 21 Jianguo Lu and Zhanjiang (John) Liu Chapter 3. Next Generation DNA Sequencing Technologies and Applications 35 Qingshu Meng and Jun Yu Chapter 4. Library Construction for Next Generation Sequencing 57 Huseyin Kucuktas and Zhanjiang (John) Liu Chapter 5. SNP Discovery through De Novo Deep Sequencing Using the Next Generation of DNA Sequencers 69 Geoffrey C. Waldbieser Chapter 6. SNP Discovery through EST Data Mining 91 Shaolin Wang and Zhanjiang (John) Liu Chapter 7. SNP Quality Assessment 109 Shaolin Wang, Hong Liu, and Zhanjiang (John) Liu Chapter 8. SNP Genotyping Platforms 123 Eric Peatman Chapter 9. SNP Analysis with Duplicated Fish Genomes: Differentiation of SNPs, Paralogous Sequence Variants, and Multisite Variants 133 Cecilia Castaño Sánchez, Yniv Palti, and Caird Rexroad Chapter 10. Genomic Selection for Aquaculture: Principles and Procedures 151 Anna K. Sonesson Chapter 11. Genomic Selection in Aquaculture: Methods and Practical Considerations 165 Ashok Ragavendran and William M. Muir Chapter 12. Comparison of Index Selection, BLUP, MAS, and Whole Genome Selection 185 Zhenmin Bao Index 219 Color plates appear between pages 108 and 109. v Preface Over the last 25 years of genomics development, molecular markers have been a major limiting factor. That was true for human genomics, animal genomics, as well as for aquaculture genomics. As a result, the goals of genomic research have been a moving target based on the availability of molecular markers. Scientists celebrated at each stage of marker development, from the classical restriction fragment length polymorphism (RFLP), microsatellites, random amplifi ed polymorphic DNA (RAPD), amplifi ed fragment length polymorphism (AFLP), to the most recent marker type of single- nucleotide polymorphisms (SNPs). The demands for molecular markers keep increasing from thousands to tens of thousands, to the current level of hundreds of thousands or millions of polymorphic markers per species to fully mark and map the genomes. Such limitations were imposed mostly because of the lack of the whole genome sequences in many species, especially in aquaculture species. Finally, in the last few years, this bottleneck is to be released due to advances in next generation sequencing technologies. Now, with the powerful second generation and third generation sequencing technologies, many gigabases of nucleotide sequences can be generated in just a few hours, and thousands of thousands of SNPs, among other types of polymorphisms, can be discovered. Since the start of this book project, sequencing technologies have evolved and matured to such a level that they are now widely used, even with aquaculture species. Huge numbers of SNPs are being discovered, validated, and applied to aquaculture genome research. This brings aquaculture genome research to the same level as ter- restrial livestock genomics where whole genome - based selection can be conducted. As a result, this book is focused on providing a basic description of next generation sequencing technologies, genomic copy number variations, SNP discovery, validation, and applications to whole genome - based selection. It can be said that whole genome selection is a direct result of genome research, and it perhaps represents the most powerful genome - based technologies. Since its proposal in 2001 by Meuwissen et al. ( Genetics 157:1819 – 1829 ), whole genome - based selection has become the center and future direction for animal breeding. It will certainly fi nd its way for application in aquaculture. This book has 12 chapters: genome variations and traits; copy number variations; next generation sequencing technologies; methods and protocols for library construc- tion for the next generation sequencing; SNP discovery through sequencing reduced representation libraries; SNP mining from expressed sequence tag (EST) databases; SNP quality assessment; SNP genotyping platforms; complexities of SNP analysis in duplicated teleost fi sh genomes; whole genome - based selection: principles and pro- cedures; whole genome - based selection: methods and practical considerations; and comparative analysis of conventional index selection, best linear unbiased prediction (BLUP) selection, marker - assisted selection, and whole genome - based selection. The last three chapters each address the theory and principles of whole genome - based selection, but from different perspectives. These chapters were intentionally included from authors with different experiences. As genome selection is still in its vii viii Preface infancy, its theories are still evolving, and yet the practical effectiveness still needs to be validated by future experimentation. The inclusion of chapters written by experts of different perspectives should provide readers some comfort as to where genome selection is going in aquaculture. Chapter 10 was written by Anna Sonesson, who is a member of the group that proposed the theory of whole genome selection in Norway; Chapter 11 was written by Ashok Ragavendran and Bill Muir, the latter of whom has worked with a whole genome selection project in poultry in the United States, but with a good knowledge of aquaculture; and Chapter 12 was written by Zhenmin Bao, who is an expert in aquaculture and aquaculture breeding programs in China. This book was written to bridge genome - based technologies with aquaculture breeding programs. It should be useful to academic professionals, research scientists, graduate students and college students in agriculture, as well as for students of aqua- culture and fi sheries. I am grateful to all the contributors of this book. It is their great experience and efforts that made this book possible. I am grateful to postdoctoral fellows and graduate students in my laboratory and in the Aquatic Genomics Unit at Auburn University for their proofreading and technical assistance. I have had a year of pleasant experience interacting with Susan Engelken, Editorial Program Coordinator, and with Justin Jeffryes, Commissioning Editor for Plant Science, Agriculture, and Aquaculture with Wiley - Blackwell of John Wiley & Sons. During the course of writing and editing this book, I have worked extremely hard as the Associate Dean for Research while also fulfi lling my duty and passion as a professor and graduate adviser. As a consequence, I could not possibly work as hard as I wished to fulfi ll my responsibility as a father of my three lovely daughters: Elise, Lisa, and Lena Liu. I wish to express my appreciation for their independence and great progress. Finally, this book is a product of the encouragement of my lovely wife, Dongya Gao. As I always say, my mother always expects a lot of me, and my wife always makes sure that I deliver the high expectations. This book, therefore, is dedicated to my extremely supportive wife. Zhanjiang (John) Liu List of Contributors Zhenmin Bao Zhanjiang (John) Liu Key Lab of Marine Genetics and The Fish Molecular Genetics and Breeding Biotechnology Laboratory Ministry of Education Department of Fisheries and Allied College of Marine Life Science Aquacultures and Program of Cell and Ocean University of China Molecular Biosciences Qingdao, China Aquatic Genomics Unit Auburn University Cecilia Casta ñ o S á nchez Auburn, AL 36849 USA United States Department of Agriculture/Agricultural Research Jianguo Lu Service The Fish Molecular Genetics and National Center for Cool and Cold Biotechnology Laboratory Water Aquaculture Department of Fisheries and Allied Kearneysville, WV 25430 USA Aquacultures and Program of Cell and Molecular Biosciences Huseyin Kucuktas Aquatic Genomics Unit The Fish Molecular Genetics and Auburn University Biotechnology Laboratory Auburn, AL 36849 USA Department of Fisheries and Allied Aquacultures and Program of Cell and Qingshu Meng Molecular Biosciences CAS Key Laboratory of Genome Aquatic Genomics Unit Science and Information Auburn University Beijing Institute of Genomics Auburn, AL 36849 USA Chinese Academy of Sciences Beijing 100029, China Hong Liu The Fish Molecular Genetics and William M. Muir Biotechnology Laboratory Pulse Molecular Evolutionary Genetics Department of Fisheries and Allied Program and Department of Animal Aquacultures and Program of Cell and Sciences Molecular Biosciences Room G406 Lilly Hall Aquatic Genomics Unit 915 West State Street Auburn University Purdue University Auburn, AL 36849 USA West Lafayette, IN 47907 USA ix x List of Contributors Yniv Palti Anna K. Sonesson USDA/ARS Nofi ma Marine AS National Center for Cool and Cold PO Box 5010, 1432 Å s Water Aquaculture Norway Kearneysville, WV 25430 USA Geoffrey C. Waldbieser Eric Peatman USDA, Agricultural Research Service The Fish Molecular Genetics and Catfi sh Genetics Research Unit Biotechnology Laboratory 141 Experiment Station Road Department of Fisheries and Allied Stoneville, MS 38776 USA Aquacultures and Program of Cell and Molecular Biosciences Shaolin Wang Aquatic Genomics Unit The Fish Molecular Genetics and Auburn University Biotechnology Laboratory Auburn, AL 36849 USA Department of Fisheries and Allied Aquacultures and Program of Cell and Ashok Ragavendran Molecular Biosciences Pulse Molecular Evolutionary Genetics Aquatic Genomics Unit Program and Department of Animal Auburn University Sciences Auburn, AL 36849 USA Room G406 Lilly Hall 915 West State Street Jun Yu Purdue University CAS Key Laboratory of Genome West Lafayette, IN 47907 USA Science and Information Beijing Institute of Genomics Caird Rexroad III Chinese Academy of Sciences United States Department of Beijing 100029, China Agriculture/Agricultural Research Service National Center for Cool and Cold Water Aquaculture Kearneysville, WV 25430 USA Genomic DNA Evenly spaced features Array with features designed from genome sequences R Cy3 label e Df Ner Ae n c e Cy5 label DT Ne As t Hybridization D C et y e 3 c & tio C n y o 5 f r C a N t io V b y Figure 2.1 Principles of array comparative genome hybridization (array CGH). A large number of evenly spaced features are designed from the reference genome sequence and placed to an array. Equal amount of reference genome (normal genome) and test genome DNA are labeled by differential fl uorescence, for example, Cy3 and Cy5, and hybridized to the array. The ratios of Cy3 and Cy5 defi ne CNV. If red fl uorescence is observed, the feature on the array has more copy numbers in the test genome than in the normal genome. Reference Cancer DNA DNA + Hybridization Array CGH Figure 2.2 An example of using array CGH for the detection of chromosomal segment dupli- cations in cancer. Next Generation Sequencing and Whole Genome Selection in Aquaculture Edited by Zhanjiang (John) Liu © 2011 Blackwell Publishing Ltd. ISBN: 978-0-813-80637-2 Biotinylated Hairpin adaptor Ligation Sheared Circularized Genome DNA DNA fragments Bio Randomly sheared >TPGaTirG 1A,T ECnAdC ACCGCCAATATCTC454 sequencing Isolation AGATGACACAATGGACCAAAGT TTACGAGCGGCTGACATAGGCT Linker (+) library DNA fragments >Pair1, End B TGTGATCACCCGCCAATATCTC Paired ends AGATGACACAATGGACCAAAGT Data analysis 0 0 0 Paired ends span 4 SVs mapping nt0 u0 o0 C2 0 0 2000 4000 6000 8000 Span of paired ends Figure 2.3 Principles of paired-end mapping-based CNV detection. Genomic DNA is sheared into approximately 3-kb fragments. The genomic fragments are then ligated to biotinylated adaptors to mark the orientation. The segments are circularized, followed by linearization at random sites. Next generation sequencing is used to massively sequence the segments. Bioinformatic mapping by in silico positioning of the sequences to the reference genome would detect any size difference or orientation difference, which suggest genome structural variations including CNVs.
Description: