ebook img

Cap-Analysis Gene Expression (CAGE): The Science of Decoding Genes Transcription PDF

269 Pages·2010·6.109 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Cap-Analysis Gene Expression (CAGE): The Science of Decoding Genes Transcription

C A AP- NALYSIS G E ENE XPRESSION (CAGE) the Science of Decoding Gene Transcription C A AP- NALYSIS G E ENE XPRESSION (CAGE) S D G the cience of ecoding ene T ranscription editor Piero Carninci RIKEN, Japan Published by Pan Stanford Publishing Pte. Ltd. Penthouse Level, Suntec Tower 3 8 Temasek Boulevard Singapore 038988 Email: [email protected] Web: www.panstanford.com British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. CAP-ANALYSIS GENEEXPRESSION (CAGE) The Science of Decoding Gene Transcription Copyright © 2010by Pan Stanford Publishing Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher. For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher. ISBN978-981-4241-34-2(Hardcover) ISBN978-981-4241-35-9(eBook) Printed in Singapore. September4,2009 9:6 RPS:PanStanfordPublishingBook-6inx9in preface Preface About the time when the draft of the human genome sequence had first appeared, I kept asking myself if there is a systematic, scalable approach to decoding regulatory elements. I also won- dered ”how can we now understand the network between tran- scriptionfactorsandthegenesthattheyactivateandcanwehave a technology that can be applied to any biological question and sample”? Even then it was possible to map expressed sequencing tags (ESTs) to the genome, to locate promoter regions and somehow correlate this with expression, but it was too expensive to do it routinely. I had a crazy idea one Sunday afternoon after hours of drawingon a notebook: whatif I could chop and concatenate the 5’ ends deriving from full-length cDNA and sequence more than 10or 20of these short tagsin a single sequencing run? De- spitethetechnicalchallengesitseemedtomakesense:byaligning short tags to the genome, it would become possible to detect all theactivepromotersandthentheexpressionoftranscriptionfac- tors. Iconcludedthat,ifmorelibrariesthanthenumberofexisting transcription factorsand regulatorsaremade, we could theoreti- callydetectthenetworkbetweentheseproteinsandtheregulated genes. Thuscap-analysisgeneexpression(CAGE)wasconceived andIthoughtitwasworthgivingitatry. After a couple of years of overcoming technical challenges, myteamcouldcreateaprotocolforCAGEwithamethodofpro- filing gene expression by producing and sequencing 20-nt short sequencing tags corresponding to the beginning of the RNAs. CAGE technology wasreadilyemployed inthe Fantom3 project. Thisrevolutionizedourunderstandingofthegenome,becausewe unexpectedlyfoundthatthegenomeproducesamuchlargerva- rietyof RNAsthanearlierreported. Thegenome sequencealone gave us enough information to explain its functions, but with CAGE it became possible to promptly identify novel mRNAs, non-coding RNAs and their promoters, as we did in FANTOM, ENCODEandagrowingnumberofotherprojects. September4,2009 9:6 RPS:PanStanfordPublishingBook-6inx9in preface vi Preface These projects have challenged the dogma: contrary to ex- pectations,mostgenesproducemultiplemRNAsandnon-coding RNAsstartingfrommultiplepromoters. Ultimately,CAGEanal- ysis can elucidate the relationship between the mRNAs and the promoters that control their expression in order to decipher the networksthatregulategeneexpressionandthetranscriptionfac- tors. Moreover with CAGE, we can comprehensively identify the exactlocations of the genome from which the mRNAs originate, identify core promoters, and simultaneously quantify RNA ex- pression levels. Therefore, CAGE has been broadly adopted to infertranscriptionalnetworksbecauseitprovidesthetoolstoun- derstandthemolecularmechanismsunderlyinggeneexpression. CAGE becomes even more valuable when it is used in conjunc- tion with next-generation sequencing technologies, which make CAGEcheaperandmoreinformativethancurrentmicroarrays. ThisbookisaguideforcurrentandpotentialusersofCAGE technology who wish to reveal molecular mechanisms in CAGE experiments. The book includes protocols and a guide to the bioinformatics analysis of CAGE datasets, including the design ofsoftwareandtoolsforconstructingwebresourcesorusingex- isting genome browserstocustomize data. I hope thatthe chap- ters will be particularly useful to those who are not yet special- ists in the field, and provide them with a guide for setting up CAGEtechnologyand/oranalysisintheirlaboratories.Thisbook alsoprovidesexamplesofapplicationswrittenbythefirstgroup ofscientiststouseCAGEtechnologyforpromoteridentification, genomeannotation,identificationofnovelRNAsandreconstruc- tionofmodelsoftranscriptionalcontrolandnetworks,whichmay helpthereadersextractingadditionalbiologicalinsightsfromthe publisheddata. In conclusion, CAGE technology offers a revolutionary ap- proachfor a growing number of scientists beyond early users of genomesequencingcenters. ThisbookintroducesCAGEtechnol- ogy and its analysis to a broad readership with interests in ex- pression analysis, transcriptional control, marker identification, molecular diagnostics, analysis of networks and RNA biogene- sis. I hope the scientists, postdocs, students, technicians and all other readers will expand these approaches to a variety of bio- logical problems using differentmodels and bring forth exciting results to enrich our knowledge of biological systems. Exciting timesforscientificdiscoveriesareahead. September4,2009 9:6 RPS:PanStanfordPublishingBook-6inx9in preface Preface vii My final thoughts are for the people that have been work- ing with me over years at RIKEN and in the FANTOM consor- tium and all other collaborators. There are many technicians that have diligently developed experimentalconditions and oth- ers that have carefullyanalyzed larger and larger datasets. I am the most grateful to the scientists that have inspired the analy- sisandinterpretationoftheCAGEdata: theircontributionshave been essential and the process of discovery has been excitement andfun. Nothingcouldhavebeendoneinisolation,andmoreex- citement lies ahead fromthe analysis of rich datasetsinherent in CAGElibraries. PieroCarninci OmicsScienceCenter,RIKEN,YokohamaInstitute September4,2009 9:6 RPS:PanStanfordPublishingBook-6inx9in cont Contents Preface v 1. CapAnalysisGeneExpression(CAGE) 1 2. TaggingTranscriptionStartingSiteswithCAGE 7 2.1 TheOutputoftheGenomeisComplex . . . . . . 7 2.2 Mapping 5’ Ends: From ESTs to Tagging Tech- nologies . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 LinkingCorePromoterstoGenomicElements . . 12 2.4 cDNAEndsortheWholeSequence? . . . . . . . . 13 2.5 IdentificationofFunctionalElementsintheGenome 15 2.6 TechnologyEvolution,SameLessons? . . . . . . . 17 3. ConstructionofCAGELibraries 21 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Stage1: SynthesisofFirst-StrandcDNA . . . . . . 22 3.2.1 SynthesisofFirst-StrandcDNA . . . . . . 22 3.2.2 CTAB/ureaPurification. . . . . . . . . . . 23 3.3 Stage2: Oxidation/Biotinylation . . . . . . . . . . 24 3.3.1 Oxidation . . . . . . . . . . . . . . . . . . . 24 3.3.2 Biotinylation . . . . . . . . . . . . . . . . . 24 3.3.3 RNase I Treatment Removal of Biotiny- latedCapwhencDNAsdonotReachthe 5’end . . . . . . . . . . . . . . . . . . . . . 25 3.4 Stage3: Capture-Release . . . . . . . . . . . . . . . 25 3.4.1 Capture and Subsequent Release of 5’-CompletedcDNAs . . . . . . . . . . . . 26 3.5 Stage4: SingleStrandLinkerLigation . . . . . . . 27 3.5.1 SingleStrandLinkerLigation . . . . . . . 28 3.5.2 S400SpinColumn . . . . . . . . . . . . . . 28 3.6 Stage5: theSecondStrandcDNASynthesis . . . . 29 3.7 Stage6: PreparingCAGETags . . . . . . . . . . . 30 3.7.1 MmeIDigestion . . . . . . . . . . . . . . . 30 3.7.2 2ndLinkerLigation . . . . . . . . . . . . . 31 September4,2009 9:6 RPS:PanStanfordPublishingBook-6inx9in cont x Contents 3.7.3 PurificationwithMagneticBeads . . . . . 31 3.7.4 PurificationbyG50Column . . . . . . . . 32 3.8 Stage: 7AmplificationofCAGETags . . . . . . . . 32 3.8.1 1stPCRAmplification. . . . . . . . . . . . 33 3.8.2 PAGEPurification . . . . . . . . . . . . . . 34 3.8.3 2ndPCRAmplification . . . . . . . . . . . 34 3.8.4 Purification with QIAGEN MinElute Column . . . . . . . . . . . . . . . . . . . . 35 3.9 Stage8: Restriction . . . . . . . . . . . . . . . . . . 35 3.9.1 RestrictionwithXmaJI . . . . . . . . . . . 36 3.9.2 RemovaloftheLinkersTips . . . . . . . . 36 3.9.3 PAGEPurification . . . . . . . . . . . . . . 36 3.10 Stage9: Concatenation . . . . . . . . . . . . . . . . 37 3.10.1 Appendix . . . . . . . . . . . . . . . . . . . 38 4. Transcriptome and Genome Characterization Using Massively Parallel Paired End Tag (PET)SequencingAnalysis 41 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 42 4.2 TheDevelopmentofPairenddiTag(PET)Analysis 43 4.3 GIS-PETforTranscriptomeAnalysis . . . . . . . . 45 4.4 ChIP-PET for Whole Genome Mapping of TranscriptionFactorBindingSitesandEpigenetic Modifications . . . . . . . . . . . . . . . . . . . . . 48 4.5 ChIA-PET for Whole Genome Identification of LongRangeinteractions . . . . . . . . . . . . . . . 52 4.6 Perspective . . . . . . . . . . . . . . . . . . . . . . . 55 5. NewEraofGenome-WideGeneExpressionAnalysis 61 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 62 5.2 TaggingTechnologiesforGenome-WideAnalysis 62 5.3 Principles of Next Generation Sequencing Technologies . . . . . . . . . . . . . . . . . . . . . . 63 5.3.1 Genome Sequencer 20/FLX System (RocheDiagnostics/454LifeSciences) . . 63 5.4 GenomeAnalyzer(Illumina/Solexa) . . . . . . . . 65 5.5 SOLiDSystem(AppliedBiosystems) . . . . . . . . 67 5.6 Advantages of Next Generation Sequenc- ing Technologies over Conventional Sequencing TechnologyonTaggingTechnologies . . . . . . . . 69 September4,2009 9:6 RPS:PanStanfordPublishingBook-6inx9in cont Contents xi 5.7 FromStaticAnalysistoDynamicAnalysis. . . . . 70 5.8 CAGE Method and NextGenerationSequencing Technologies . . . . . . . . . . . . . . . . . . . . . . 72 5.9 ConclusionsandOutlook . . . . . . . . . . . . . . 73 6. Computational Tools to Analyze CAGE — IntroductiontoPARTII 79 7. ExtractionandQualityControlofCAGETags 83 7.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . 83 7.2 Using Read Qualities and Read Properties, Pre-andPost-Extraction . . . . . . . . . . . . . . . 84 7.2.1 Background: TypesofSequencingErrors . 84 7.3 ProceduresBeforeTagExtraction . . . . . . . . . . 85 7.3.1 AmbiguousBaseCalls . . . . . . . . . . . 85 7.3.2 UnusualReadLengths . . . . . . . . . . . 85 7.3.3 LowAverageReadQualityScore . . . . . 86 7.3.4 PresenceofSequencingErrors . . . . . . . 86 7.4 UsingQCValuesAfterTagExtraction . . . . . . . 87 7.5 OriginofSequenceErrors . . . . . . . . . . . . . . 87 7.6 UsingSequenceErrorstoEstimateCAGEQuality 88 7.7 ASimpleCAGETagExtractionMethod . . . . . . 88 8. SettingCAGETagsinaGenomicContext 93 8.1 MappingPipelinesforSequenceTagTechnologies 93 8.2 AMappingPipelineforCAGE . . . . . . . . . . . 96 8.3 BenchmarkingwithaSampleDataset . . . . . . . 97 9. UsingCAGEDataforQuantitativeExpression 101 9.1 HighThroughputExpressionPlatforms . . . . . . 101 9.2 Comparing CAGE to Other Measures of Gene Expression . . . . . . . . . . . . . . . . . . . . . . . 103 9.3 PlatformNormalization . . . . . . . . . . . . . . . 103 9.3.1 MicroarrayNormalization . . . . . . . . . 104 9.3.2 qRT-PCRNormalization . . . . . . . . . . 104 9.3.3 CAGENormalization . . . . . . . . . . . . 105 9.4 Replication . . . . . . . . . . . . . . . . . . . . . . . 105 9.5 GeneModelsandComplexLoci . . . . . . . . . . 106 9.5.1 ComplexTranscription . . . . . . . . . . . 106 9.5.2 GeneExpressionvs. TranscriptExpression 107 September4,2009 9:6 RPS:PanStanfordPublishingBook-6inx9in cont xii Contents 9.6 Construction of CAGE Promoters and Calcula- tionofGeneExpressionLevels . . . . . . . . . . . 107 9.7 Comparison of CAGE Expression between TechnicalReplicates. . . . . . . . . . . . . . . . . . 109 9.8 ComparisonofCAGEExpressionfromBiological Replicates . . . . . . . . . . . . . . . . . . . . . . . 110 9.9 ComparisonofCAGEExpressionBetweenDiffer- entTimePointsWithinaSingleTime-Course . . . 111 9.10 Comparison of CAGE Expression Profiling to qRT-PCRExpressionMeasurements . . . . . . . . 113 9.11 Comparison of CAGE Expression Profiling to MicroarrayMeasurements . . . . . . . . . . . . . . 114 9.12 Present/AbsentCalls . . . . . . . . . . . . . . . . . 114 9.13 Discussion . . . . . . . . . . . . . . . . . . . . . . . 116 10. DatabasesforCAGEVisualizationandAnalysis 123 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 123 10.2 TranscriptionMapsandActivity . . . . . . . . . . 124 10.3 PublicDatabases . . . . . . . . . . . . . . . . . . . 126 10.4 GenomicViewofIn-HouseData . . . . . . . . . . 128 10.5 ForExpressionAnalyses . . . . . . . . . . . . . . . 132 10.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . 133 11. ComputationalMethodstoIdentifyTranscrip- tion Factor Binding Sites Using CAGE Infor- mation 137 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 138 11.2 SchemaoftheMethodologyProcess . . . . . . . . 139 11.3 InitialLinksofTFwiththeAffectedGenes . . . . 142 11.3.1 MappingofTFBSstoPromoters . . . . . . 142 11.3.2 DeterminingEnrichedTFBSs. . . . . . . . 144 11.3.3 Score for Confidence of the Predicted TF→TFBS→TSS/Promoter→Gene Asso- ciation . . . . . . . . . . . . . . . . . . . . . 145 11.4 CorrelationofCAGETagCountsofGenesandTFs 146 11.5 Ranking TF→TFBS→TSS/Promoter→GENE As- sociation: TheEffectiveUseofCAGETags . . . . 147 11.6 VerificationofResults . . . . . . . . . . . . . . . . 149 11.7 ReconstructionofTRNs . . . . . . . . . . . . . . . 150

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.