ARTICLE doi:10.1038/nature12787 An atlas of active enhancers across human cell types and tissues RobinAndersson1*,ClaudiaGebhard2,3*,IreneMiguel-Escalada4,IlkaHoof1,JetteBornholdt1,MetteBoyd1,YunChen1, XiaobeiZhao1,5,ChristianSchmidl2,TakahiroSuzuki6,7,EvgeniaNtini8,ErikArner6,7,EivindValen1,9,KangLi1, LuciaSchwarzfischer2,DagmarGlatz2,JohannaRaithel2,BeritLilje1,NicolasRapin1,10,FrederikOtzenBagger1,10, MetteJørgensen1,PeterRefsingAndersen8,NicolasBertin6,7,OwenRackham6,7,A.MaxwellBurroughs6,7,J.KennethBaillie11, YuriIshizu6,7,YuriShimizu6,7,ErinaFuruhata6,7,ShioriMaeda6,7,YutakaNegishi6,7,ChristopherJ.Mungall12, TerrenceF.Meehan13,TimoLassmann6,7,MasayoshiItoh6,7,14,HideyaKawaji6,14,NaotoKondo6,14,JunKawai6,14, AndreasLennartsson15,CarstenO.Daub6,7,15,PeterHeutink16,DavidA.Hume11,TorbenHeickJensen8,HarukazuSuzuki6,7, YoshihideHayashizaki6,14,FerencMu¨ller4,TheFANTOMConsortium{,AlistairR.R.Forrest6,7,PieroCarninci6,7,MichaelRehli2,3 &AlbinSandelin1 Enhancerscontrolthecorrecttemporalandcell-type-specificactivationofgeneexpressioninmulticellulareukaryotes. Knowingtheirproperties,regulatoryactivityandtargetsiscrucialtounderstandtheregulationofdifferentiationand homeostasis.HereweusetheFANTOM5panelofsamples,coveringthemajorityofhumantissuesandcelltypes,toproduce anatlasofactive,invivo-transcribedenhancers.WeshowthatenhancerssharepropertieswithCpG-poormessengerRNA promotersbutproducebidirectional,exosome-sensitive,relativelyshortunsplicedRNAs,thegenerationofwhichisstrongly relatedtoenhanceractivity.Theatlasisusedtocompareregulatoryprogramsbetweendifferentcellsatunprecedented depth,toidentifydisease-associatedregulatorysinglenucleotidepolymorphisms,andtoclassifycell-type-specificand ubiquitousenhancers.Wefurtherexploretheutilityofenhancerredundancy,whichexplainsgeneexpressionstrength ratherthanexpressionpatterns.TheonlineFANTOM5enhanceratlasrepresentsauniqueresourceforstudiesoncell- type-specificenhancersandgeneregulation. Preciseregulationofgeneexpressionintimeandspaceisrequiredfor samplesfromhuman6,weidentify43,011enhancercandidatesandchar- development,differentiationandhomeostasis1.Sequenceelementswithin acterizetheiractivityacrossthemajorityofhumancelltypesandtissues. ornearcorepromoterregionscontributetoregulation2,butpromoter- Theresultingcatalogueoftranscribedenhancersenablesclassification distalregulatoryregionslikeenhancersareessentialinthecontrolof ofubiquitousandcell-type-specificenhancers,modellingofphysical cell-typespecificity1.Enhancerswereoriginallydefinedasremoteele- interactionsbetweenmultipleenhancersandTSSs,andidentificationof mentsthatincreasetranscriptionindependentlyoftheirorientation, potential disease-associated regulatory single nucleotide polymorph- positionanddistancetoapromoter3.Theywereonlyrecentlyfoundto isms(SNPs). initiateRNApolymeraseII(RNAPII)transcription,producingso-called eRNAs4.Genomiclocationsofenhancerscanbedetectedbymapping BidirectionalcappedRNAsidentifyactiveenhancers ofchromatinmarksandtranscriptionfactorbindingsitesfromchro- TheFANTOM5projecthasgeneratedaCAGE-basedtranscription matinimmunoprecipitation(ChIP)assaysandDNaseIhypersensitive startsite(TSS)atlasacrossabroadpanelofprimarycells,tissuesand sites (DHSs) (reviewed in ref. 1), but there has been no systematic celllinescoveringthevastmajorityofhumancelltypes6.Withinthat analysisofenhancerusageinthelargevarietyofcelltypesandtissues dataset,well-studiedenhancersoftenhaveCAGEpeaksdelineating presentinthehumanbody. nucleosome-deficient regions (NDRs) (Supplementary Fig. 1). To Usingcapanalysisofgeneexpression5(CAGE),weshowthatenhan- determinewhetherthisisageneralenhancerfeature,FANTOM5CAGE ceractivitycanbedetectedthroughthepresenceofbalancedbidirec- (SupplementaryTable1)wassuperimposedonactive(H3K27ac-marked) tionalcappedtranscripts,enablingtheidentificationofenhancersfrom enhancersdefinedbyHeLa-S3ENCODEChIP-seqdata7.CAGEtags smallprimarycellpopulations.BasedupontheFANTOM5CAGEexpress- showeda bimodal distributionflankingthe centralP300 peak,with ionatlasencompassing432primarycell,135tissueand241cellline divergenttranscriptionfromtheenhancer(Fig.1a).Similarpatterns 1TheBioinformaticsCentre,DepartmentofBiology&BiotechResearchandInnovationCentre,UniversityofCopenhagen,OleMaaloesVej5,DK-2200Copenhagen,Denmark.2DepartmentofInternal MedicineIII,UniversityHospitalRegensburg,Franz-Josef-Strauss-Allee11,93042Regensburg,Germany.3RegensburgCentreforInterventionalImmunology(RCI),D-93042Regensburg,Germany. 4SchoolofClinicalandExperimentalMedicine,CollegeofMedicalandDentalSciences,UniversityofBirmingham,Edgbaston,BirminghamB152TT,UK.5LinebergerComprehensiveCancerCenter, UniversityofNorthCarolina,ChapelHill,NorthCarolina27599,USA.6RIKENOMICSScienceCentre,RIKENYokohamaInstitute,1-7-22Suehiro-cho,Tsurumi-ku,YokohamaCity,Kanagawa230-0045, Japan.7RIKENCenterforLifeScienceTechnologies(DivisionofGenomicTechnologies),RIKENYokohamaInstitute,1-7-22Suehiro-cho,Tsurumi-ku,YokohamaCity,Kanagawa230-0045,Japan.8Centre formRNPBiogenesisandMetabolism,DepartmentofMolecularBiologyandGenetics,C.F.MøllersAlle3,Building1130,DK-8000Aarhus,Denmark.9DepartmentofMolecularandCellularBiology, HarvardUniversity,Cambridge,Massachusetts02138,USA.10TheFinsenLaboratory,RigshospitaletandDanishStemCellCentre(DanStem),UniversityofCopenhagen,OleMaaloesVej5,DK-2200, Denmark.11RoslinInstitute,EdinburghUniversity,EasterBush,Midlothian,EdinburghEH259RG,UK.12GenomicsDivision,LawrenceBerkeleyNationalLaboratory,1CyclotronRoadMS64-121,Berkeley, California94720,USA.13EMBLOutstation-Hinxton,EuropeanBioinformaticsInstitute,WellcomeTrustGenomeCampus,Hinxton,CambridgeCB101SD,UK.14RIKENPreventiveMedicineandDiagnosis InnovationProgram,RIKENYokohamaInstitute,1-7-22Suehiro-cho,Tsurumi-ku,YokohamaCity,Kanagawa230-0045,Japan.15DepartmentofBiosciencesandNutrition,KarolinskaInstitutet, Ha¨lsova¨gen7,SE-4183Huddinge,Stockholm,Sweden.16DepartmentofClinicalGenetics,VUUniversityMedicalCenter,vanderBoechorststraat7,1081BTAmsterdam,Netherlands. *Theseauthorscontributedequallytothiswork. {AlistofauthorsandaffiliationsappearsintheSupplementaryInformation. 27 MARCH 2014 | VOL 507 | NATURE | 455 ©2014Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE Fraction of enhancersa 0000....00001234–500CAGE –Position 0relativeC toAGE + 500 Densityb 024– 1st0r0a%nd CdeRTBnSeehahfifSrlSaonanemnecqdcaeetdrin - + s1t0r0an%d P cFraction success (< 0.05) 000...257505 3High CAGE4 2Mid CAGE9 2Low CAGE5 Untranscribed DHS5 Untranscribed ENCODE8strong enhancer Untranscribed chromatin-4defined enhancer 0.00 P300 peak centre (bp) Directionality Prediction category Figure1|BidirectionalcappedRNAsisasignaturefeatureofactive accordingtoFANTOM5-pooledCAGEtagsmappedwithin6300bpof enhancers. a,Enhancersidentifiedbyco-occurrenceofH3K27acand 22,486TSSsofRefSeqprotein-codinggenesandcentrepositionsof10,138 H3K4me1ChIP-seqdata7,centredonP300bindingsites,inHeLacellswere HeLaenhancersdefinedasabove.c,Successratesof184invitroenhancer overlaidwithHeLaCAGEdata(uniquepositionsofCAGEtag59ends, assaysinHeLacells.Verticalaxisshowsthefractionofactiveenhancers smoothedbya5-bpwindow),revealingabidirectionaltranscriptionpattern. (successdefinedbyStudent’st-test,P,0.05versusrandomregions;alsosee Horizontalaxisshowsthe6500bpregionaroundenhancermidpoints. SupplementaryFig.9).Numbersofsuccessfulassaysareshownonthe b,Densityplotillustratingthedifferenceindirectionalityoftranscription respectivebar.Seemaintextfordetails. wereobservedinothercelllines(SupplementaryFig.2a).Enhancer- (asdefinedbyCAGEtagfrequencyinHeLacells)andexaminedtheir associatedreverseandforwardstrandtranscriptioninitiationevents activityusingenhancerreporterassayscomparedtorandomlyselected were,onaverage,separatedby180basepairs(bp)andcorrespondedto untranscribedlociwithregulatorypotentialinHeLa-S3cells:15DHSs10, nucleosomeboundaries(SupplementaryFigs3and4).Asaclass,active 26ENCODE-predicted‘strongenhancers’7and20enhancersdefined HeLa-S3 enhancers had 231-fold more CAGE tags than polycomb- asinFig.1a(SupplementaryTables2and3).Whereas67.4–73.9%of repressedenhancers,indicatingthattranscriptionisamarkerforactive theCAGE-definedenhancersshowedsignificantreporteractivity,only usage.Indeed,ENCODE-predictedenhancers7withsignificantreporter 20–33.3%oftheuntranscribedcandidateregulatoryregionswereactive activity8hadgreaterCAGEexpressionlevelsthanthoselackingreporter (Fig.1candSupplementaryFig.9a).Thesametrendwasobservedin activity(P,4310222,Mann–WhitneyUtest).Alenientthreshold HepG2cells(SupplementaryFig.10a,b).Correspondingpromoter-less on enhancer expression increased the validation rate of ENCODE constructs showed that the enhancer transcription read-through is enhancersfrom27%to57%(SupplementaryFig.5). negligible(SupplementaryFig.9b,c).ManyCAGE-definedenhancers Although capped RNAs of protein-coding gene promoters were overlappedpredictedENCODE‘strongenhancers’or‘TSS’states(25% stronglybiasedtowardsthesensedirection,similarlevelsofcapped and62%,respectively,forHeLa-S3),buttherewasnosubstantialdiffer- RNAinbothdirectionsweredetectedatenhancers(Fig.1bandSup- enceinvalidationratesbetweentheseclasses(SupplementaryFig.10c,d). plementaryFig.2b,c).Thus,bidirectionalcappedRNAsisasignature Insummary,activeCAGE-definedenhancersweremuchmorelikelyto featureofactiveenhancers.Onthisbasis,weidentified43,011enhan- bevalidatedinfunctionalassaysthanuntranscribedcandidateenhan- cercandidatesacross808humanCAGElibraries(seeSupplementary cersdefinedbyhistonemodificationsorDHSs. TextandSupplementaryFigs6–8).Interestingly,thecandidateswere depleted of CpG islands (CGI) and repeats (with the exception of InitiationandfateofenhancerRNAsversusmRNAs neuralstemcells,seeref.9). RNA-seqdatafrommatchingprimarycellsandtissuesshowedthat Toconfirmtheactivityofnewlyidentifiedcandidateenhancers,we ,95%ofRNAsoriginatingfromenhancerswereunsplicedandtyp- randomlyselected46strong,41moderateand36lowactivityenhancers icallyshort(median346nucleotides)—astrikingdifferencetomRNAs a b c d e Density0001....0482EEGGnnEEhhNN1aaCCnn0OOcc0LeeDDerr EEnRtr 1gamtNrktnahARsn N(cslorcA1igpr0i1tpk0, t gs, ecg1nae0olne0mo)kmicic Average predicted sites per kb 00110011........05050505 0 5T′TSSDSista5n0c0e (bp) 1,000FrequencyEnhancer0000′ CAGE 5 ends....6420 5TA0T Abp INR AACCCGGTT De novo motifsSRF (0.90)OCT4 (0.98)MEF2A (0.95) GATA2 (0.96)TEAD1 (0.87) Motif.0662 MITF (0.91)CEBPAP1: (0.94)AP1ETS: CEBPA (0.93) SPI1 (0.95) JUN (0.99) RUNX1 (0.97) SIX4 (0.82) JUND (0.95)GATA2 (0.73)ATF3 (0.88) RELA (0.95) GC-Box (0.95) NRF1 (0.97)ELF1 (0.96)Motif.0497 (0.76)YY1 (0.98) RFX (0.94)nCMotif enrichmentGabove backgroundI-prom22222o210––12terAverage fold changeSKIV2L2–/control123231......–000555R7o5erl0a etinvhea dnics0etar nmceid tpoo 7Tin5St0S nnCCCGGGI TII TTSSSSSS s easnennstiesseense CEGnGehnIa oTnmScSiec r aenxtpiseecntasetion HigDhN2a4s be pI cleavagLeow nECCCnGGhGIIa--IpTn-TSrcoSSemrSoter ERERnneehhffSSaaeennqqcc eeTTrrSS –+SS ss sattrrenaantnnissddeense (per nucleotide) Figure2|FeaturesdistinguishingenhancerTSSsfrommRNATSSs. peaks(arrowat11indicatespositionofthemainenhancerCAGEpeaks; a,DensitiesofthegenomicandprocessedRNAlengthsoftranscriptsstarting directionoftranscriptiongoeslefttoright)revealdistinctcleavagepatterns fromenhancerTSSsandmRNATSSsusingassembledRNA-seqreadsfrom atsequencesresemblingtheINRandTATAelements.d,Denovomotif 13pooledFANTOM5libraries.b,FrequenciesofRNAprocessingmotifs enrichmentanalysesaroundenhancersandnon-enhancerFANTOM5 (59splicemotif(59SS,upperpanel)andthetranscriptionterminationsite CAGE-definedTSSs(CAGETSSsmatchingannotatedTSSsarereferredto hexamer(TTS,lowerpanel))aroundenhancerandmRNATSSs.Verticalaxis as‘promoters’),contingentonCGIoverlap.Topenriched/depletedmotifsare showstheaveragenumberofpredictedsitesperkbwithinacertainwindow shownalongwiththeirbest-knownmotifmatchname.Enrichmentversus sizefromtheTSS(horizontalaxis)inwhichthemotifsearchwasdone.Dashed randombackgroundispresentedasaheatmap.e,Verticalaxisshowsaverage linesindicateexpectedhitdensityfromrandomgenomicbackground.The HeLaCAGEexpressionfoldchangeversuscontrolatenhancersandRefSeq windowalwaysstartsatthegeneorenhancerCAGEsummitsandexpands TSSsafterexosomedepletion.Horizontalaxisshowspositionrelativetothe inthesensedirection.nCGI,non-CGI.c,Averagenucleotidefrequencies TSSorthecentreoftheenhancer.Translucentcoloursindicatethe95% (toppanel)andDNaseIcleavagepatterns(lowerpanel)ofenhancerCAGE confidenceintervalofthemean. 456 | NATURE | VOL 507 | 27 MARCH 2014 ©2014Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH (19%unspliced,median1,256nucleotides)(Fig.2aandSupplementary Anatlasoftranscribedenhancersacrosshumancells Fig.11a–c).UnlikeTSSsofmRNAs,whichareenrichedforpredicted The FANTOM5 CAGE library collection6 enables the dissection of 59splicesitesbutdepletedofdownstreampolyadenylationsignals11,12, enhancerusageacrosscelltypesandtissuescomprehensivelysampled enhancersshowednoevidenceofassociateddownstreamRNApro- acrossthehumanbody.Clusteringbasedonenhancerexpressionclearly cessingmotifs,andthusresembleantisensepromoterupstreamtran- groupedfunctionallyrelatedsamplestogether(Fig.3candSupplemen- scripts(PROMPTs)11(Fig.2bandSupplementaryFig.11d).MostCAGE- tary Figs 18 and 19).Although fetal and adult tissue often grouped definedenhancersgaverisetonuclear(.80%)andnon-polyadenylated together, two large fetal-specific clusters were identified: one brain- (,90%) RNAs13 (Supplementary Fig. 11e). Based on RNA-seq, few specific (pink) and one with diverse tissues (green). The fetal-brain enhancerRNAsoverlapexonsofknownprotein-codinggenesorlarge cluster isassociatedwith enhancersthatare located close to known intergenicnoncodingRNAs(9and1outof4,208enhancersdetected, neuraldevelopmentalgenes,includingNEUROG2,SCRT2,POU3F2 respectively),indicatingthattheyarenotasubstantialsourceofalter- andMEF2C(SupplementaryFig.18b),forwhichgeneexpressionpat- nativepromotersforknowngenes(asinref.14). ternscorrelatewithenhancerRNAabundanceacrosslibraries,suggest- TSS-associated,uncappedsmallRNAs(TSSa-RNAs),attributedto ingregulatoryinteraction(seebelow).Theresultscorroboratethefunc- RNAPII protection and found immediately downstream of mRNA tionalrelevanceoftheseenhancersfortissue-specificgeneexpression TSSs15,16,weredetectableinthesamepositionsdownstreamofenhan- andindicatethattheyareanimportantpartoftheregulatoryprograms cerTSSs(SupplementaryFig.12),indicatingthatRNAPIIinitiationat ofcellulardifferentiationandorganogenesis. enhancerandmRNATSSsissimilar.Indeed,CAGE-definedenhancer Toconfirmthatcandidateenhancerscandrivetissue-specificgene TSSs resembled the proximal position-specific sequence patterns of expressioninvivo,fiveevolutionarilyconservedCAGE-definedhuman non-CGIRefSeqTSSs(Fig.2candSupplementaryFig.13a).Furthermore, enhancers (including the POU3F2 and MEF2C-proximal enhancers denovomotifanalysisrevealedsequencesignaturesinCAGE-defined enhancerscloselyresemblingnon-CGIpromoters(Fig.2dandSup- plementaryFig.13b). a CD19 CD4 CD8 CD56 CD14 BecauseofthesimilaritywithPROMPTs,wereasonedthatcapped enhancerRNAsmightberapidlydegradedbytheexosome.Indeed, ers c smallinterferingRNA-mediateddepletionoftheSKIV2L2(alsoknown n a h asMTR4)co-factoroftheexosomecomplexresultedinamedian3.14- en d foldincreaseofcappedenhancer-RNAabundance(Fig.2eandSup- e n fi plementaryFig.14a,b),butonlyanegligibleincreaseatmRNATSSs. de ThisincreasingtrendissimilartothatofPROMPTregionsupstream GE- A of TSSs, although the increase of enhancer RNAs was significantly C higher(P,4.6310267,Mann–WhitneyUtest;Fig.2eandSupplemen- tabtahtureteynCiFnhAigapG.nr1cEo4ember,xsocpit)ser.erTasslhssoiuoonsnlp,ytrohetfehseeebnniahdtnaiartnteicscpeetirnroossnmeianRlocttNroearnAnst,sricaosrslidpasetnugiodgrngaSdeaKsletaIdeVc.dt2FivpLuir2tryet-vhodieboerspumelsrelovytree1ed7d,, -1 Crep 1reACpGlAiErep 2cGa-1kbEterep 3s+1kb-1kb-+1kb4-1kb+1kb-10–1 kbpRDoeslN0aitatiiosv+1 kbnee -6 HLiogwh CAGE DNase I H3K4me1 H3K27ac cellswasproportional(SupplementaryFig.14d),indicatingthatvir- tnuuamllybearlloidfednettifeicetdabelnehbaindcierresctpiorondaulcCeAeGxoEsopmeaek-sseninscitrievaesRedN1A.7s.-Tfohlde b CCeAlGl-tEy peen-hsapnecceifirsc c C AGE tissuelibraries uponSKIV2L2depletionandnovelenhancercandidateshadonaverage CD19 CD4 CD56 CD14 sHimeLilaarc,ebllust(wSeuapkpelre,mchernotmaraytiFnimg.o1d4iefi)c.ationsignalscomparedtocontrol gnal 0.01 CD19 CAGEidentifiescell-specificenhancerusage DNase I si 00..0002 CD4C tTcieoorntuesasatngwdehitenrtihvpielvircoCa,tACehGCIEPA-eGsxepEqre(asHnsai3olKyns2ce7asancwiadenreendtpHifey3rcKfoe4lrlmm-tyee1pd)e,i-nDspNfeicvAiefimcpeernitmhhyaalnray-- ChIP-seq or 000...000001 CD56ell type broloaoddmcaeplelptyigpeenso,manidcsc.oormg/p,aSruedpptloempuenbltiasrhyedTaDbHleS4d)a.tCaA(hGttEp-:d//ewfiwnewd. Mean -10H003..00K034me1HH3D3KKN42amH7s3eae1c KI27ac CD14 Whole blood Heart enhancerswerestronglysupportedbyproximalH3K4me1/H3K27ac –1 0 1 –1 0 1 Cerebellum Lymphatic organs peaks (71%) and DHSs (87%) from the same cell type. Conversely, enhancDeisr tmanidcpeo tion t (kb) FBeratainl brain FReetparlo tdisuscuteivse organs H3K4me1andH3K27acsupportedonly24%ofDHSsdistaltopro- Figure3|CAGEexpressionidentifiescell-type-specificenhancerusage. moters and exons andonly 4% ofDHSs overlappedCAGE-defined a,RelationshipbetweenCAGEandhistonemodificationsinbloodcells.Rows enhancers(SupplementaryFig.15),indicatingthataminorityofpro- representCAGE-definedenhancersthatareorderedbasedonhierarchical moter-distalDHSsidentifyenhancers.Fromtheoppositeperspective, clusteringofCAGEexpression.ColumnsfortheCAGEtags(pink)represent only11%ofH3K4me1/H3K27aclocioverlappedCAGE-definedenhan- theexpressionintensityforthreebiologicalreplicates.DNaseIhypersensitivity cers and untranscribed loci showed weaker ChIP-seq signals than andH3K27acandH3K4me1ChIP-seqsignals61kbaroundtheenhancer transcribedones(SupplementaryFig.16).Moreover,therewasaclear midpointsareshowningreen,blueandorange,respectively.b,Meansignalof correlationbetweenCAGE,DNaseIhypersensitivity,H3K4me1and DNase-seqaswellasChIP-seqforH3K27acandH3K4me1(verticalaxes)per H3K27acforCAGE-definedenhancersexpressedinbloodcells(Fig.3a). celltype(rows)in61kbregions(horizontalaxes)aroundenhancermidpoints, Accordingly,cell-type-specificenhancerexpressioncorrespondstocell- forenhancerswithblood-celltype-specificCAGEexpression(columns). c,Dendrogramresultingfromagglomerativehierarchicalclusteringoftissue type-specifichistonemodifications(Fig.3b).Themajorityofselected samplesbasedontheirenhancerexpression:eachleafofthetreerepresentsone cell-type-specificenhancerscouldbevalidatedincorrespondingcell CAGEtissuesample(foralabelledtreeandthecorrespondingresultson linesandwereassociatedwithcell-type-specificDNAdemethylation primarycellsamples,seeSupplementaryFigs18and19).Sub-treesdominated (Supplementary Text, Supplementary Fig. 17 and Supplementary byonetissue/organtypeormorphologyarehighlighted.Someoftheenhancers Tables5–8,seealsoref.18).Thus,bidirectionalCAGEpairsarerobust responsibleforthefetal-specificsubgroupinthelargerbrainsub-treeare predictorsforcell-type-specificenhanceractivity. validatedinvivo(Fig.4). 27 MARCH 2014 | VOL 507 | NATURE | 457 ©2014Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE identifiedabove)weretestedviaTol2-mediatedtransgenesisinzebra- withthecut-offsused.Noteworthyexceptionsincludebloodandimmune fishembryos.Weobservedtissue-specificenhanceractivitywith3of5 cells,testis,thymusandspleen,whichhavehighenhancer/generatios. fragments,whichcorrespondedtothehumanenhancertissueexpress- Conversely,smoothandskeletalmuscleandskin,boneandepithelia- ion (Fig. 4). None of three control fragments without CAGE signal relatedcellshavelowratios.Differentialexosomeactivitybetweencell activated the GATA2 promoter (Supplementary Table 9). Although typesmightaffecttheseresults,buttherewasnocorrelationbetween thesamplesizeisnothighenoughtoreliablyestimatethevalidation SKIV2L2 mRNA expression and the number of enhancers detected ratesinzebrafish,thecorrelationbetweentheenhancerusageprofiles (SupplementaryFig.22e,f). inzebrafishtothosedefinedinhumanbyCAGEisnotable. Asexpected,consensusmotifsofknownkeyregulatorsareover- Wegroupedtheprimarycellandtissuesamplesintolarger,mutu- represented in corresponding facet-specific enhancers, for instance allyexclusivecelltypeandorgan/tissuegroups(referredtoasfacets), ETS, C/EBP and NF-kB in monocyte-specific enhancers, RFX and respectively, with similar function or morphology (Supplementary SOXinneurons,andHNF1andHNF4ainhepatocytes(Supplemen- Tables10and11).Figure5summarizeshowmanyenhancerswere taryFig.23).Notably,theAP1motifappearstobeenrichedacrossall detectedineachfacetandthedegreeoffacet-specificCAGEexpress- facets, perhaps associated with a general role for AP1 in regulating ion (see also Supplementary Fig. 21). From the data we can draw openchromatin19. severalconclusions: First,themajorityofdetectedenhancerswithinanyfacetarenot Expressionclusteringrevealsubiquitousenhancers restrictedtothatfacet.Exceptions,wherefacetsuseahigherfraction Hierarchical clustering of enhancers by facet expression revealed a of specific enhancers, include immune cells, neurons, neural stem smallsubsetofenhancers(200or247,definedbyprimarycellortissue cellsandhepatocytesamongstthecell-typefacets,andbrain,blood, facets, respectively) expressed in the large majority of facets (Sup- liverandtestisamongsttheorgan/tissuefacets. plementary Text, Supplementary Figs 24 and 25, and Supplemen- Second, despite their apparent promiscuity, enhancers are more taryTables12and13).Comparedtootherenhancers,theseubiquitous generallydetectedinamuchsmallersubsetofsamplesthanmRNA (u-)enhancersare8timesmorelikelytooverlapCGIsandtheyare transcripts(SupplementaryFigs21and22a,b),consistentwithcell- twiceasconserved(SupplementaryFig.26a–c).U-enhancersoverlap linestudies7andthehigherspecificityofncRNAsingeneral13.Facets typicalchromatinenhancermarksbuthavehigherH3K4me3signal inwhichwedetectmanyenhancerstypicallyalsohaveahigherfrac- (SupplementaryFig.26d).Althoughtheyproducesignificantlylonger tionoffacet-specificenhancers(SupplementaryFig.22c,d). ncRNAsthanotherenhancers(median530nucleotides,P,1.531028, Third, the number of detected expressed enhancers and mRNA Mann–WhitneyUtest),thetranscriptsremainpredominantly(,78%) transcripts is correlated (Supplementary Fig. 21b), but the number unsplicedandsignificantlyshorter(P,4.2310218,Mann–Whitney ofdetectedexpressedgenetranscripts(.1tagpermillionmappedreads Utest)thanmRNAs(Supplementary Figs27and28),donotshare (TPM))is19–34foldlargerthanthenumberofdetectedenhancers exonswithknowngenes,andareexosome-sensitive(Supplementary Fig.14b).Therefore,itisunlikelythatthesearenovelmRNApromo- a mid Human enhancer ters. They are also highly enriched for P300 and cohesin ChIP-seq fb expression (TPM) 0 1 peaks20andRNAPII-mediatedChIA-PETsignal21comparedtoother hin sp E1 100 μm Neuron enhancers (Supplementary Fig. 26d). These results indicate that CR u-enhancerscompriseasmallbutdistinctsubsetofenhancers,which Astrocyte probably has specific regulatory functions used by virtually every Brain nt 100 μm Brain humancell. b 0 3 Neural stem cell LinkingenhancerusagewithTSSexpression fp Neuron E2 100 μm Astrocyte A major challenge is to link enhancers to their target genes21,22. CR fp Spinal cord Uniquely,FANTOM5CAGEallowsfordirectcomparisonbetween Brain transcriptional activity of the enhancer and of putative target gene 100 μm TSSsacrossadiversesetofhumancells.Basedonpairwiseexpression c iv iv dv 0 8 correlation,nearlyhalf(40%)oftheinferredTSS-associatedenhan- Salivary gland Blood vessel of cers(Methods)werelinkedwiththenearestTSS,and64%ofenhan- epithelial cell CRE3 100 μm da KepidEhitennhpedeyaolitatihcl e csleiianlllu cseolild of caessroschiaatvioenast(l1e0a,s2t6o0n;e15c.o3r%re)laatreedsuTpSpSowrteitdhibny5C0h0IAki-lPobEaTse(sR.NSeAvPerIIa-l ysl Urothelial cell mediated) interaction data21, and the supported fraction increases ysl mu 100 μm eRpeisthpeirlaiatol crye ll withthecorrelationthreshold(SupplementaryFig.29a).Thefraction Figure4|Invivovalidationinzebrafishoftissue-specificenhancers. ofsupportedassociationsis4.8-foldhigherthanthatofassociations ValidationsofinvivoactivityofCAGE-definedhumanenhancersCRE1,CRE2 predictedfromDNaseIhypersensitivitycorrelations10(20.6%versus andCRE3inzebrafishembryosatlong-pecstage.Eachpanelshows,fromleftto 4.3%,atthesamecorrelationthreshold),indicatingthattranscription right,representativeyellowfluorescentprotein(YFP)andbrightfieldimagesof isabetterpredictorofregulatorytargetsthanchromatinaccessibility. embryosinjectedwiththehumanenhancergata2promoterreportergene Conserved sequence motifs and ChIP-seq peaks also co-occurred construct(left),YFPzoom-ins(middle)andCAGEexpressioninTPMin significantly in associated enhancer-promoter pairs (Benjamini– humantissues/celltypesfortheenhancer(right).Muscle(mu)andyolk syncytiallayer(ysl)activitiesarebackgroundexpressioncomingfromthegata2 Hochbergfalsediscoveryrate(FDR),0.05,binomialtest),suggest- promoter-containingreporterconstruct.Allimagesarelateral,headtotheleft. ing an additive or synergistic cooperation between enhancers and Notethecorrespondencebetweenzebrafishandhumanenhancerusage/ promotersatRNAPIIfoci. expression.SupplementaryFig.20showsUCSCbrowserimagesofeach Onaverage,aRefSeqTSSwasassociatedwith4.9enhancersandan selectedenhancer.a,CRE1,,230kbupstreamoftheMEFC2gene,drives enhancerwith2.4TSSsandweobserveddifferentregulatoryarchi- highlyrobustexpressioninthebrain(brain)andneuraltube(nt).Rightpanel tecturesaroundgenes(SupplementaryFig.30).Forexample,atthe giveszoom-inoverlayimageshowingexpressionintheforebrain(fb),midbrain beta-globinlocustheCAGEexpressionpatternsoffourlocuscontrol (mid),hindbrain(hin)andspinalcord(sp).b,CRE2,5kbupstreamofthe POU3F2gene,isactiveinthefloorplate(fp).c,CRE3,10kbupstreamofthe regionhypersensitivesitesarehighlycorrelated(Pearson’srbetween SOX7geneTSS,showsspecificexpressioninthevasculature(including 0.88and0.98)withtheexpressionofknowntargetgenes23,24HBG2 intersegmentalvessels(iv),dorsalvein(dv)anddorsalaorta(da). andHBD,andtosomeextentHBG1. 458 | NATURE | VOL 507 | 27 MARCH 2014 ©2014Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH 8 Figure5|Enhancerusageand ncers perAGE tags) 6 8 sudppeetpececirftiepcdiatenyneilhngaingvrceoesrustphpseeonrfumcmeillbllieso.rnoTfhe og(enhamillion C 4 CofArGelEatetadgcsewllitthypineeliabcrhargireos.uTph(feacet) l 2 expressionspecificityofthe Expression specificitybiquitousExclusive00001.....24680 0F.0raction o0f. 3e5nhancer0s.7 etfiecnrhnaoaecrhechraatpeicnassohppcnneoeefrcoalnsicfbdfeiieeisctnxlios(tpghycwropoer.llwsaCuosnnmteosgdlaenoosesun(n)arrohosthhwarseganhsaatc)otn.eawmF/rrtseoitashripsneuien U0.0 facetsandgenes,seeSupplementary KeratocyteTrabecular meshwork cellEnteric smooth muscle cellSmooth muscle cell of tracheaSmooth muscle cell of prostateCiliated epithelial cellFibroblast of periodontiumTendon cellFibroblast of choroid plexusPericyte cellCorneal epithelial cellMyoblastEpithelial cell of oesophagusFibroblast of gingivaFibroblast of lymphatic vesselHair follicle cellCardiac myocyteOsteoblastHepatic stellate cellLens epithelial cellCardiac fibroblastFibroblast of tunica adventitia of arteryAmniotic epithelial cellSkeletal muscle cellAstrocyteChondrocyteRetinal pigment epithelial cellFat cellKeratinocytePlacental epithelial cellMesothelial cellAcinar cellEpithelial cell of MalassezStromal cellSkin fibroblastUrothelial cellPre-adipocyteMammary epithelial cellGingival epithelial cellEndothelial cell of hepatic sinusoidEpithelial cell of prostateEndothelial cell of lymphatic vesselRespiratory epithelial cellSensory epithelial cellMelanocyteBlood vessel endothelial cellKidney epithelial cellVascular-associated smooth muscle cellMesenchymal cellMacrophageNeuronReticulocyteHepatocyteDendritic cellMast cellT cellMonocyteLymphocyte of B lineageNatural killer cellNeuronal stem cellNeutrophil Fig.21. Theseobservationscallforcomputationalmodelsofenhancerregu- Modelshrinkageshowedthatinmostcases,onlyonetothreeenhancers lation,inwhichmultipleenhancersmayworkinconcerttoenhance arenecessarytoexplaintheexpressionvarianceobservedinthelinked theexpressionofagene.Tothisend,wefocusedon2,206RefSeqTSSs gene,andgenerallyproximalenhancersaremorepredictivethandistal for which the joint expression of nearby enhancers (the closest ten ones(Fig.6a,SupplementaryFig.29b–dandSupplementaryText).One enhancerswithin500kb)ishighlypredictiveofthegeneexpression. hypothesisexplainingthefunctionofmultipleenhancersdrivingthe a b d 0.8 Basophil, NK cell, cardiac NK cell, B cell, whole Tonsil, B cell, Proportional contribution Rr22/)to full model (0.4 log(gene expression)2(max TPM)167890 mnceyeuo sbltclr,mIoyo nmwtpaflelahae,l smiliwlC nGd,td mthrhiciresosoaaeaesehvlilttaerlanoe,i ns sf’sbrToseeey’l l,ocl icoeldlel, bamsmooapnshoti lcc, yeTtl elc,, e ll sebdnleosnooddryr,i tseicpp licetheeelnl,l,i a Thyroid glanddwenhsdoplreilte ibcel noc,oe ldl, 0.0 disease 1 3 5 7 9 0 2–3 4–5 6–7 >7 lympChhorcoyntiicc Enhancer Number of highly leukemia (ranked by proximity) correlated enhancers 22 23 24 25 26 27 Enhancer over-representation (odds ratio) Odds ratio c 1120505 6 (0.308) 12 (0.859) 9 (0.713) 4 (0.381) 9 (0.989) 11 (1.418) 6 (0.891) 4 (0.616) 6 (0.964) 13 (2.374)SNmP17 (3.192)asi no6 (1.248)lyv einr6 (1.329)- reenp20 (4.489)hr7 (2.308)aens40 (11.092)ce13 (5.704)enrt6 (1.758)sed 75 (22.37) 15 (4.837) 8 (2.698) 61 (21.325)24 (10.965)11 (4.084) 19 (7.965) 12 (5.121) 11 (4.805) 16 (8.135) 19 (9.675) 64 (32.637)32 (16.781)24 (12.916) 5 (1.193)S9 (1.109)NP4 (0.618)ms oa5 (0.919)ivnelyr-4 (0.824) rine pe5 (1.284)rxeosn6 (1.727)esnte6 (1.791)d 14 (4.183) 12 (3.835) 17 (6.972) 13 (6.101) 26 (15.149) 4 (0.404) 4 (0.437) 4 (0.45) 7 (0.825)S4 (0.587)NmPa5 (0.754)sin olyv8 (1.304) ei6 (1.418)nr -4 (0.729)prerop9 (1.692)mreoste5 (0.996)enr16 (3.977)tse5 (1.046)d 13 (3.066) 17 (4.175)36 (17.826)37 (9.165) 11 (3.296)27 (13.168)11 (3.462)33 (14.423)19 (7.416) 11 (4.35)Enhancers50 (29.853)Exons19 (7.849)17 (7.483)Promoters37 (17.372)20 (8.932)Random67 (34.835) 0 Thyrotoxic hypokalaemic periodic paralysisBirth weightHodgkin’s lymphomaEpirubicin-induced leukopenia Prostate-specific antigen levelsSystemic sclerosisCorneal astigmatismNodular sclerosis Hodgkin lymphomaPsoriatic arthritisInflammatory bowel diseaseLeprosyBasal cell carcinomaChronic lymphocytic leukaemiaEndometriosisCoeliac diseaseSystemic lupus erythematosusIgA nephropathyCoeliac disease and rheumatoid arthritisHair morphologyMultiple sclerosisCutaneous neviPrimary biliary cirrhosisAortic root sizeC-reactive proteinSystolic blood pressure Mean platelet volume Crohn’s diseasePlatelet countsSoluble levels of adhesion moleculesEwing sarcomaSoluble ICAM-1HaemoglobinRed blood cell traits Primary sclerosing cholangitisHaematological parametersOther erythrocyte phenotypesFibrinogenCreatinine levelsWarfarin maintenance doseAge-related macular degene rationChronic kidney diseaseInflammatory bowel disease (early onset)Cystatin CPhosphorus levelsDehydroepiandrosterone sulphate levelsVitamin D insufficiencyKeloidEye colour traitsGraves’ diseaseTelomere lengthAlopecia areataMajor mood disordersType 1 diabetes autoantibodiesLiver enzyme levels (alkaline phosphatase)AsthmaType 1 diabetesHaematological and biochemical traitsPsoriasisRheumatoid arthritisAttention deficit hyperactivity disorder Metabolic traitsUlcerative colitisHeight Figure6|LinkingenhancerstoTSSsanddisease-associatedSNPs. a,The withinenhancers,exonsandmRNApromoters.Observedandexpected proportionalcontribution(seeMethods)ofthe10mostproximalenhancers overlapsareshownabovebars.Theverticalaxisgivesenrichmentoddsratios. within500kbofaTSSinamodelexplaininggeneexpressionvariance(vertical ThehorizontalaxisshowsGWAStraitsordiseases.d,Diseaseswith axis)asafunctionofenhancerexpression.xaxisindicatesthepositionofthe GWAS-associatedSNPsover-representedinenhancersofcertainexpression enhancerrelativetotheTSS:1theclosest,etc.Barsindicateinterquartileranges facets.Thehorizontalaxisgivestheoddsratioasinpanelc,brokenupby anddotsmedians.b,Relationshipbetweenthenumberofhighlycorrelated expressionfacets:eachpointrepresentstheoddsratioofGWASSNP (‘redundant’)enhancersperlocus(horizontalaxis)andthemaximalexpression enrichmentforadisease(verticalaxis)inaspecificexpressionfacet.Summary (TPM)oftheassociatedTSSinthesamemodeloverallCAGElibraries(vertical annotationsofpointcloudsareshown.SeealsoSupplementaryFig.31. axis).Errorbarsasina.c,GWASSNPsetspreferentiallyoverrepresented 27 MARCH 2014 | VOL 507 | NATURE | 459 ©2014Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE sameexpressionpatternisthattheymightconferhighertranscriptional METHODSSUMMARY outputofagene25,26.Indeed,thenumberofhighlycorrelated(redund- Single-molecule HeliScopeCAGE data was generated as described elsewhere6. ant)enhancersclosetoTSSs(SupplementaryMethods)increasedwith SequencingandprocessingofribosomalRNA-depletedRNAs,shortRNAsand theobservedmaximalTSSexpressionoveralllibraries(Fig.6b),imply- H3K27ac or H3K4me1 ChIPs as well as the processing of publicly available ingthattheseenhancersareredundantintermsoftranscriptionpatterns DNase-seqdataaredescribedintheMethods. butadditiveintermsofexpressionstrength.Expressionredundancyis Putativeenhancerswereidentifiedfrombidirectionallytranscribedlocihaving alsocommoningenomicclustersofcloselyspacedenhancers(24%of divergent CAGE tag clusters separated by at most 400bp (described in Sup- plementaryFig.6a).Werequiredlocitobedivergentlytranscribedinatleast 815identifiedgenomicclusters,SupplementaryTable15).Theseare oneFANTOM5sample,definedbyCAGEtag59endswithin200bpdivergent associatedwithTSSsofgenesinvolvedinimmuneanddefenceresponses strand-specificwindowsimmediatelyflankingthelocimidpoints.Theexpression and,assuggestedbyapreviousstudy27,haveahigherexpressionthan ofeachenhancerineachFANTOM5samplewasquantifiedasthenormalized otherenhancer-associatedgenes(eightfoldincreaseonaverage). sumofstrand-specificsumsofCAGEtagsinthesewindows.Asample-setwide directionalityscore,D,foreachlocusoveraggregatednormalizedreverse,R,and Disease-associatedSNPsareenrichedinenhancers forward,F,strandwindow-expressionvaluesacrossallsamples,D5(F2R)/ Manydisease-associatedSNPsarelocatedoutsideofprotein-coding (F1R),werethenusedtofilterputativeenhancerstohavelow,non-promoter- exons and a large proportion of human genes display expression like,directionalityscores(jDj,0.8).Furtherfilteringensuredenhancerstobe polymorphism28.UsingtheNHGRIgenome-wideassociationstudies locateddistanttoTSSsandexonsofprotein-andnoncodinggenes. (GWAS)catalogue29andextendingthecompilationofleadSNPswith MotifenrichmentanalysesweredoneusingHOMER36.Regulatorytargetsof enhancerswerepredictedbycorrelationtestsusingthesample-setwideexpress- proxySNPsinstronglinkagedisequilibrium(similartorefs30,31), ionprofilesofallenhancer-promoterpairswithin500kb.Theregulatoryeffects weidentifieddiseases/traitswhoseassociatedSNPsoverlappedenhan- ofmultipleenhancersweremodelledusinglinearregressionfollowedbylasso- cers, promoters, exons and random regions significantly more than basedmodel-shrinkage37. expectedbychance(Fisher’sexacttestP,0.01,SupplementaryTable16). EnhanceractivitywastestedinvivoinzebrafishembryosusingTol2-mediated Disease-associatedSNPswereover-representedinregulatoryregions transgenesis38.Expressionpatternsweredocumentedat48hpostfertilization toagreaterextentthaninexons(Fig.6c).Formanytraitswhereenriched using .200 eggs per construct. Large-scale in vitro validations on randomly disease-associatedSNPswerewithinenhancers,enhanceractivitywas selectedenhancerswereperformedusingfirefly/Renillaluciferasereporterplas- detectedinpathologicallyrelevantcelltypes(Fig.6dandSupplementary mids with enhancer sequences cloned upstream of an EF1a basal promoter Figs 31 and 32). Examples include Graves’ disease-associated SNPs separatedbyasyntheticpolyAsignal/transcriptionalpausesiteinamodified enriched in enhancers that are expressed predominantly in thyroid pGL4.10(Promega)vector(SupplementaryFig.9d).Fulldetailsareprovided tissue,andsimilarlylymphocytesforchroniclymphocyticleukaemia. intheMethods. Asaproofofconcept,wevalidatedtheimpactoftwodisease-associated OnlineContentAnyadditionalMethods,ExtendedDatadisplayitemsandSource regulatorySNPswithinenhancers(SupplementaryFig.33). Dataareavailableintheonlineversionofthepaper;referencesuniquetothese sectionsappearonlyintheonlinepaper. Conclusions Received4January;accepted16October2013. ThedatapresentedheredemonstratethatbidirectionalcappedRNAs, asmeasuredbyCAGE,arerobustpredictorsofenhanceractivityina 1. Bulger,M.&Groudine,M.Enhancers:theabundanceandfunctionofregulatory cell.Transcriptionisonlymeasuredatafractionofchromatin-defined sequencesbeyondpromoters.Dev.Biol.339,250–257(2010). 2. Lenhard,B.,Sandelin,A.&Carninci,P.Metazoanpromoters:emerging enhancersandfewuntranscribedenhancersshowpotentialenhancer characteristicsandinsightsintotranscriptionalregulation.NatureRev.Genet.13, activity.Thisimpliesthatmanychromatin-definedenhancersarenot 233–245(2012). regulatoryactiveinthatparticularcellularstate,butmaybeactivein 3. Banerji,J.,Rusconi,S.&Schaffner,W.Expressionofab-globingeneisenhancedby othercellsofthesamelineage32orarepre-markedforfastregulatory remoteSV40DNAsequences.Cell27,299–308(1981). 4. Kim,T.-K.etal.Widespreadtranscriptionatneuronalactivity-regulatedenhancers. activityuponstimulation33.Ofcourse,giventherelativeinstabilityof Nature465,182–187(2010). enhancerRNAssomechromatin-definedsitesmaybeactivebutfall 5. Kodzius,R.etal.CAGE:capanalysisofgeneexpression.NatureMethods3, belowthelimitsofdetectionofCAGE. 211–222(2006). 6. TheFANTOMConsortiumandtheRIKENPMIandCLST(DGT).Apromoter-level Ourresultsshowthatposition-specificsequencesignalsupstream mammalianexpressionatlas.Naturehttp://dx.doi.org/10.1038/nature13182 of the transcription initiation sites and the production of small, (thisissue). uncappedRNAsimmediatelydownstreamispresentatbothenhan- 7. TheENCODEProjectConsortium.AnintegratedencyclopediaofDNAelementsin cers and mRNA promoters, suggesting similar mechanisms of ini- thehumangenome.Nature489,57–74(2012). 8. Kheradpour,P.etal.Systematicdissectionofregulatorymotifsin2000predicted tiation.Previousstudies(forexamplerefs10,34,35)suggestedthat humanenhancersusingamassivelyparallelreporterassay.GenomeRes.23, promotersandenhancersdifferinmotifcomposition.Thisviewisnot 800–811(2013). supportedbythelargerFANTOM5dataset.Instead,thedifferences 9. Fort,A.etal.Deeptranscriptomeprofilingofmammalianstemcellssupportsa reflectthelocalG1Ccontentbecausetranscribedenhancerstendto regulatoryroleforretrotransposonsinpluripotencymaintenance.NatureGenet. (inthepress). harbourlowG1Ccontentmotifslikenon-CGIpromoters.Features 10. Thurman,R.E.etal.Theaccessiblechromatinlandscapeofthehumangenome. distinguishingenhancersfrommRNApromotersare(1)enhancerRNAs Nature489,75–82(2012). areexosome-sensitiveregardlessofdirectionwhereas(sense)mRNAs 11. Ntini,E.etal.Polyadenylationsite–induceddecayofupstreamtranscriptsenforces promoterdirectionality.NatureStruct.Mol.Biol.20,923–928(2013). havealongerhalf-lifethantheirantisensecounterpart;(2)enhancer 12. Almada,A.E.,Wu,X.,Kriz,A.J.,Burge,C.B.&Sharp,P.A.Promoterdirectionalityis RNAsareshort,unspliced,nuclearandnon-polyadenylatedand(3) controlledbyU1snRNPandpolyadenylationsignals.Nature499,360–363(2013). enhancershavedownstreampolyadenylationand59splicemotiffre- 13. Djebali,S.etal.Landscapeoftranscriptioninhumancells.Nature489,101–108 quenciesatgenomicbackgroundlevelsimilartoantisensePROMPTs, (2012). 14. Kowalczyk,M.S.etal.Intragenicenhancersactasalternativepromoters.Mol.Cell whereasmRNAsaredepletedofterminationsignalsandenrichedfor 45,447–458(2012). 59splicesites11,12. 15. Valen,E.etal.BiogenicmechanismsandutilizationofsmallRNAsderivedfrom Thecollectionofactiveenhancerspresentedhereprovidesaresource humanprotein-codinggenes.NatureStruct.Mol.Biol.18,1075–1082(2011). thatcomplementstheactivityoftheENCODEconsortium7acrossa 16. Taft,R.J.etal.TinyRNAsassociatedwithtranscriptionstartsitesinanimals.Nature Genet.41,572–578(2009). muchgreaterdiversityoftissuesandcellularstates.Ithasclearappli- 17. Core,L.J.,Waterfall,J.J.&Lis,J.T.NascentRNAsequencingrevealswidespread cationsin human genetics, to narrowthe searchwindows forfunc- pausinganddivergentinitiationathumanpromoters.Science322,1845–1848 tionalassociation,andforthedefinitionofregulatorynetworksthat (2008). 18. Ro¨nnerblad,M.etal.AnalysisoftheDNAmethylomeandtranscriptomein underpintheprocessesofcellulardifferentiationandorganogenesisin granulopoiesisrevealtimedchangesanddynamicenhancermethylation.Blood humandevelopment. http://dx.doi.org/10.1182/blood-2013-02-482893(inthepress). 460 | NATURE | VOL 507 | 27 MARCH 2014 ©2014Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH 19. Biddie,S.C.etal.TranscriptionfactorAP1potentiateschromatinaccessibilityand 37. Friedman,J.,Hastie,T.&Tibshirani,R.Regularizationpathsforgeneralizedlinear glucocorticoidreceptorbinding.Mol.Cell43,145–155(2011). modelsviacoordinatedescent.J.Stat.Softw.33,1–22(2010). 20. Schmidt,D.etal.ACTCF-independentroleforcohesinintissue-specific 38. Gehrig,J.etal.Automatedhigh-throughputmappingofpromoter-enhancer transcription.GenomeRes.20,578–588(2010). interactionsinzebrafishembryos.NatureMethods6,911–916(2009). 21. Li,G.etal.Extensivepromoter-centeredchromatininteractionsprovidea topologicalbasisfortranscriptionregulation.Cell148,84–98(2012). SupplementaryInformationisavailableintheonlineversionofthepaper. 22. Chepelev,I.,Wei,G.,Wangsa,D.,Tang,Q.&Zhao,K.Characterizationof AcknowledgementsFANTOM5wasmadepossiblebyaResearchGrantforRIKEN genome-wideenhancer-promoterinteractionsrevealsco-expressionof OmicsScienceCenterfromMEXTtoY.H.andaGrantoftheInnovativeCellBiologyby interactinggenesandmodesofhigherorderchromatinorganization.CellRes.22, 490–503(2012). InnovativeTechnology(CellInnovationProgram)fromtheMEXT,Japan,toY.H.TheA.S. 23. Fraser,P.,Pruzina,S.,Antoniou,M.&Grosveld,F.Eachhypersensitivesiteofthe groupwassupportedbyfundsfromtheEuropeanResearchCouncilFP7/2007-2013/ humanbeta-globinlocuscontrolregionconfersadifferentdevelopmentalpattern ERCno.204135,theNovoNordiskandLundbeckfoundations.WorkintheM.R.group ofexpressionontheglobingenes.GenesDev.7,106–113(1993). wasfundedbygrantsfromtheDeutscheForschungsgemeinschaft(RE1310/7,11,13) 24. Dostie,J.etal.ChromosomeConformationCaptureCarbonCopy(5C):A andRudolfBartlingStiftung.F.M.andI.M.E.weresupportedby‘‘BOLD’’MarieCurieITN massivelyparallelsolutionformappinginteractionsbetweengenomicelements. and‘‘ZF-Health’’IntegratedprojectoftheEuropeanCommission.WethankS.Noma, GenomeRes.16,1299–1309(2006). M.SakaiandH.TaruiforRNA-seqandsRNA-seqpreparation,RIKENGeNASfor 25. Barolo,S.Shadowenhancers:frequentlyaskedquestionsaboutdistributed generationandsequencingoftheHeliscopeCAGElibraries,IlluminaRNA-seqand cis-regulatoryinformationandenhancerredundancy.Bioessays34,135–141 sRNA-seq,theCopenhagenNationalHigh-throughputDNASequencingCenterfor (2012). IlluminaCAGE-seq,M.Edinger,P.HoffmannandR.Ederforcellsorting,A.Albrechtsen, 26. Schaffner,G.,Schirm,S.,Mu¨ller-Baden,B.,Weber,F.&Schaffner,W.Redundancy I.Moltke,W.Wassermanforadvice,andtheNetherlandsBrainBankforpost-mortem ofinformationinenhancersasaprincipleofmammaliantranscriptioncontrol. humanbrainmaterial. J.Mol.Biol.201,81–90(1988). 27. Whyte,W.A.etal.Mastertranscriptionfactorsandmediatorestablish AuthorContributionsR.A.,I.H.,E.A.,E.V.,K.L.,Y.C.,B.L.,X.Z.,M.J.,H.K.,T.F.M.,T.L.,N.B., super-enhancersatkeycellidentitygenes.Cell153,307–319(2013). O.R.,A.M.B.,J.K.B,C.J.M,N.R.,F.O.B.,M.R.,A.S.madethecomputationalanalysis.J.B., 28. Go¨ring,H.H.H.etal.DiscoveryofexpressionQTLsusinglarge-scaletranscriptional M.B.,T.L.,H.K.,N.K.,J.K.,H.S.,M.I.,C.O.D,A.R.R.F.,P.C.,Y.H.preparedandpre-processed profilinginhumanlymphocytes.NatureGenet.39,1208–1216(2007). CAGEand/orRNA-seqlibraries.E.N.,P.R.A.,T.H.J.,J.B.,M.B.madetheknockdown 29. Hindorff,L.A.etal.Potentialetiologicandfunctionalimplicationsofgenome-wide experimentsfollowedbyCAGE.C.G.,C.S.,L.S.,J.R.,D.G.,M.R.madethebloodcellChIP associationlociforhumandiseasesandtraits.Proc.NatlAcad.Sci.USA106, experiments,methylationassaysandinvitrobloodcellvalidations.T.S.,C.G.,Y.I.,Y.S., 9362–9367(2009). E.F.,S.M.,Y.N.,A.R.R.F.,P.C.andH.S.madetheHeLa/HepG2invitrovalidations.I.M.E., 30. Ward,L.D.&Kellis,M.HaploReg:aresourceforexploringchromatinstates, R.A.,A.S.,F.M.designedandcarriedoutzebrafishinvivotests.R.A.,C.G.,I.H.,C.S.,E.A., conservation,andregulatorymotifalterationswithinsetsofgeneticallylinked E.V.,F.M.,I.M.E.,P.C.,A.R.R.F,M.B.,J.B.,A.L.,C.D.,D.A.H.,P.H.,M.R.,A.S.interpreted variants.NucleicAcidsRes.40,D930–D934(2012). results.R.A.,C.G.,I.H.,E.V.,I.M.E.,J.B.,F.M.,D.A.H.,M.R.,A.S.wrotethepaperwithinput 31. Maurano,M.T.,Wang,H.,Kutyavin,T.&Stamatoyannopoulos,J.A.Widespread fromallauthors.M.R.andA.S.coordinatedandsupervisedtheproject. site-dependentbufferingofhumanregulatorypolymorphism.PLoSGenet.8, e1002599(2012). AuthorInformationTheFANTOM5atlasisaccessiblefromhttp://fantom.gsc.riken. 32. Mercer,E.M.etal.Multilineageprimingofenhancerrepertoiresprecedes jp/5.FANTOM5CAGE,RNA-seqandsRNAdatahavebeendepositedinDDBJ/EMBL/ commitmenttotheBandmyeloidcelllineagesinhematopoieticprogenitors. GenBank(accessioncodesDRA000991,DRA001101).Genomebrowsertracksfor Immunity35,413–425(2011). enhancerswithuser-definableexpressionspecificity-constraintscanbegenerated 33. Ostuni,R.etal.Latentenhancersactivatedbystimulationindifferentiatedcells. athttp://enhancer.binf.ku.dk.Here,processedenhancerexpressiondata,predefined Cell152,157–171(2013). enhancertracksandmotiffindingresultsarealsodeposited.Blood-cellChIP-seqdata 34. Rada-Iglesias,A.etal.Auniquechromatinsignatureuncoversearlydevelopmental andCAGEdataonexosome-depletedHeLacellshavebeendepositedintheNCBIGEO enhancersinhumans.Nature470,279–283(2011). database(accessioncodesGSE40668,GSE49834).Reprintsandpermissions 35. Shen,Y.etal.Amapofthecis-regulatorysequencesinthemousegenome.Nature informationisavailableatwww.nature.com/reprints.Theauthorsdeclareno 488,116–120(2012). competingfinancialinterests.Readersarewelcometocommentontheonlineversionof 36. Heinz,S.etal.Simplecombinationsoflineage-determiningtranscriptionfactors thepaper.CorrespondenceandrequestsformaterialsshouldbeaddressedtoA.R.R.F. primecis-regulatoryelementsrequiredformacrophageandBcellidentities.Mol. ([email protected]),P.C.([email protected]),M.R.([email protected]) Cell38,576–589(2010). orA.S.([email protected]). 27 MARCH 2014 | VOL 507 | NATURE | 461 ©2014Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE METHODS definedforeachbidirectionallocusasthemidpositionbetweentherightmost CAGEdata.SinglemoleculeHeliScopeCAGE39datawasgeneratedasdescribed reversestrandTCandleftmostforwardstrandTCincludedinthemergedbidir- elsewhere6.Weusedasetof432primarycell,135tissueand241celllinesamples ectionalpair.Eachbidirectionallocuswasfurtherassociatedwithtwo200bp thatpassedqualitycontrolmeasuresof.500,000Q20Delve(T.L.etal.,manu- regionsimmediatelyflankingthecentreposition,one(left)forreversestrand transcriptionandone (right)forforwardstrandtranscription,ina divergent scriptinpreparation)mappedCAGEtags,RNAintegrityandreproducibility(for manner.Themergedbidirectionalpairswerefurtherrequiredtobebidirection- furtherdetails,seeref.6). allytranscribed(CAGEtagssupportingbothwindowsflankingthecentre)inat Proofofconceptanalysis.WedefinedsilentandactiveenhancersfromENCODE leastoneindividualsample,andtohaveagreateraggregateofreverseCAGEtags HeLa-S3,GM12878andK562broadpeaks,downloadedfromtheUCSCENCODE (overall FANTOM5samples) thanforwardCAGEtagsin the200bp region repository, according to the co-existence of histone modifications H3K4me1, associatedwithreversestrandtranscription,andviceversa.Thesefilteringsteps H3K27acandH3K27me3.Activeenhancersweredefinedasco-localizedH3K4me1 resultedin78,555bidirectionallytranscribedloci. andH3K27acpeakswith noH3K27me3peak,whereas silentenhancerswere considered loci with H3K4me1 andH3K27me3 peaks but no H3K27ac peak. Expressionquantificationofbidirectionallytranscribedlociandpredictionof LociwerefilteredtobelocateddistanttoTSSs(500bp)andexons(200bp)ofprotein- enhancers.Wequantifiedtheexpressionofbidirectionallociforeachstrandand coding genes, multi-exonic noncoding genes and mRNAs (from ENSEMBL, 200bpflankingwindowineachofthe432primarycell,135tissueand241cellline GENCODE(v10),RefSeqandUCSC,downloadedJanuary12,2012),andother samplesseparatelybycountingtheCAGEtagswhose59endswerelocatedwithin lncRNAsfromagene-centricsetderivedfromliterature40aswellasmanually thesewindows.Theexpressionvaluesofbothflankingwindowswerenormalized annotated sense-antisensepairs(coding-noncoding andnoncoding-noncoding byconvertingtagcountstotagspermillionmappedreads(TPM)andfurther sense-antisensepairs)with59expressedsequencetag(EST)andcomplementary normalizationbetweensampleswasdoneusingtherelativelogexpression(RLE) DNAsupport,and59ESTswithnolocuswithprotein-codingcapacity.Trans- normalizationprocedureinedgeR41.ThenumberofCAGEtagsalignedonChrM criptionaldifferencesbetweenactiveandsilentenhancersetsweredeterminedby wassubtractedfromthetotalnumberofalignedCAGEtagsineachlibrarybefore comparingtheaveragenumberofFANTOM5CAGEtag59endsfromthesame normalization.Thenormalizedexpressionvaluesfrombothwindowswereused ENCODEcelllines(pooledtriplicates)inawindow6300bparoundtheH3K4me1 tocalculateasample-setwidedirectionalityscore,D,foreachenhancerover peakmidpoints. aggregatednormalizedreverse,R,andforward,F,strandexpressionvaluesacross The active enhancer sets of HeLa-S3, GM12878 and K562 cells were then allsamples(SupplementaryFig.6a);D5(F2R)/(F1R).Drangesbetween21 centredonproximal(within200bp)P300ENCODEbindingsitepeaks(joint and1andspecifiesthebiasinexpressiontoreverseandforwardstrand,respect- P300andGATA1peaksforK562)toderivecentrepositions.FANTOM5CAGE ively(D50means50%reverseand50%forwardstrandexpression,whilejDj datafromthesameENCODEcelllines(pooledtriplicates)werethenoverlaid closeto1indicatesunidirectionaltranscription).Adirectionalityscorecalculated overthesecentredenhancerregionsandtheabsence(0)andpresence(1)of(one frompooleddataisagoodestimateofsampledirectionality(Supplementary ormore)CAGEtag59endsin10bpnon-overlappingwindowsweredetermined Fig.6b).Eachbidirectionallocuswasassignedoneexpressionvalueforeach andanaverageprofilewascalculatedtoassesstheaveragebidirectionalpatternof samplebysummingthenormalizedexpressionofthetwoflankingwindows. transcriptionatchromatin-derivedenhancers. Bidirectionallociwerefurtherfilteredtohavelow,non-promoter-like,direc- PooledCAGEdatafromallFANTOM5libraries(describedabove)werefur- tionalityscores(jDj,0.8)andtobelocateddistanttoTSSsandexonsofprotein- theroverlaidwiththeseregionsandadirectionalityscorebasedontheaggregate and noncodinggenes (see‘Proofof conceptanalysis’above fordetails).This ofCAGEtagsfallingwithin6300bpfromthecentrepositionswerecalculatedto resultedinafinalsetof43,011putativeenhancers. determinepotentialstrandbias.Forcomparison,werepeatedthesamecalcula- Wefurthertestedwhethertheexpressionlevelforeachsampleandcandidate tionsforgenomicregions6300bparoundTSSsofRefSeqproteincodinggenes. enhancerwassignificantlygreaterthanthegenomicbackground(seeconstruc- Directionalitywascalculatedas(F-R)/(F1R),whereFandRisthesumofCAGE tionofrandomgenomicbackgroundregionsbelow).APvaluewascalculatedfor tagsalignedontheforwardandreversestrand,respectively.Directionalitycloseto eachenhancerexpressionvalueforeachprimarycell,tissueandcelllinesample -1or1indicatesaunidirectionalbehaviourwhile0indicatesperfectlybalanced bycountingthefractionofrandomgenomicregionswithgreaterexpressionlevel bidirectionaltranscription. inthesamesample.EnhancerswithPvalueslessthan0.001andBenjamini– Positional cross correlations were calculated between reverse and forward HochbergadjustedFDR,0.05wasconsideredtranscribedinthatsample.This CAGEtag59endsatChIP-seq-derivedactiveHeLa-S3andGM12878enhancer analysis yielded binary expression values, which were used for constructing centrepositions(asdeterminedbyP300peaks)6300bp(maximumlag300)to enhancersetsassociatedwitheachsample.Intotal,38,554enhancersweretran- identifytheirmostlikelyseparation.Crosscorrelationswerealsocalculatedin300bp scribed at a significant expression level in at least one primary cell or tissue windows(maximumlag150)flankingtheenhancercentresbetweenCAGE59ends sample.Below,werefertothissetasthe‘robustset’ofenhancersandindicate andENCODEH2A.Zsignals(fromthesamecellline)forHeLa-S3andGM12878 wheneveritwasused.Forallanalyses,weusethewhole(‘permissive’)setof aswellasbetweenCAGE59endsand59endsofENCODEGM12878micrococcal 43,011enhancersifnototherwisementioned. nuclease-digested nucleosome sequencing (MNase-seq) reads (9 pooled repli- Constructionofrandomgenomicbackgroundregions.Werandomlysampled cates).Inthelatteranalysis,correlationsweremadeusingreadsonthesamestrand. 100,000genomicregionsof401bpthatweredistaltoTSSsandexonsofknown Pooled,uniqueCAGEtags(inwhichonlyoneCAGEtagperbpwascounted)were genes(sameasthefilteringproceduredescribedaboveforbidirectionallytran- consideredinallcorrelationanalysesandenhancerswereweighedaccordingtothe scribedloci).Thesewerefurtherfilteredtonotoverlapwithoursetof43,011 aggregatedsignalbeforesubsequentaveragingoverlagsnottomakeanylibraryor predictedenhancers,whichyielded98,942randomgenomicregionswhoseexpress- enhancerhaveanundueinfluence. ionlevelswerequantifiedandnormalizedinthesamemannerasdescribedfor ReporteractivityofENCODEenhancersinrelationtotranscriptionalstatus. bidirectionalloci(above). Weusedpublished8resultsonamassivelyparallelreporterassaymeasuringthe CorrelationbetweenENCODEepigenomicdataandCAGE-definedenhan- activityofENCODE-predictedenhancersinHepG2andK562cells.Allresultson cers.UsingtheUCSCENCODErepositorydata(downloadedandpooled26 non-scrambledsequenceswereconsidered,regardlessofthelevelofconservation. March2012),weassessedthesignalofRNAPolymeraseII(RNAPII),thepooled 198outof738testedK562enhancersand307outof1,136testedHepG2enhancers transcriptionfactorsupertrack(allTFs),CCCTC-bindingfactor(CTCF),E1A hadsignificantenhancerreporteractivity(asdeterminedbytheoriginalpublica- binding protein P300, DNase I hypersensitive sites (DHSs) and two histone tion).Wedeterminedtheexpressionin401bpwindowscentredonmidpointsof marks:H3K4me1andH3K27acaroundenhancers,TSSsandrandomgenomic ENCODE-predictedenhancersusingFANTOM5CAGEfromthesamecelllines. sites. Wefurthercalculatedthefalsediscoveryrateafteraminimumexpressionthresh- Largescaleenhancerreportervalidations.Werandomlyselected125CAGE- oldintheinterval[0,0.5]TPM,asthefractionofnon-significantenhancersamong definedenhancerswithsignificantlyhigherexpressionthanrandomgenomic thosefulfillingtheexpressioncutoff. regions in at least two out of three HeLa-S3 replicates. These were grouped Identification of bidirectionally transcribedloci. Bidirectionally transcribed according to Hela-S3 expression tertiles: low (36), mid-level (41) and strong lociweredefinedfromasetof1,714,047forwardand1,597,186reversestrand (46).Thesecouldbesplitupfurtheraccordingtooverlap(midposition)with CAGEtagclusters(TCs)supportedbyatleasttwoCAGEtagsinatleastone combined ENCODE (release January 2011) segmentations of Segway42 and sample(TCsdefinedinref.6).OnlyTCsnotoverlappingantisenseTCswere ChromHMM43chromatinstateprediction:25,27and14strongly,mid-leveland used.Weidentified1,261,036divergent(reverse-forward)TCpairsseparatedby lowlyexpressedCAGEenhanceroverlappedENCODEstate‘E’(‘strongenhancer’) atmost400bpandmergedallsuchpairscontainingthesameTC,whileatthe whereas21,16and22strongly,mid-levelandlowlyexpressedCAGEenhancer sametimeavoidingoverlappingforwardandreversestrandtranscribedregions overlappedENCODEstate‘TSS’. (prioritizationbyexpressionranking),whichresultedin200,171bidirectional Wefurtherrandomlyselected26and15untranscribed(negligibleamountof loci (procedure illustrated in Supplementary Fig. 6a). A centre position was overlappingFANTOM5HeLa-S3CAGEtags)500bpregionscentredonmid ©2014Macmillan Publishers Limited. All rights reserved ARTICLE RESEARCH positions of HeLa-S3 E states and HeLa-S3 ENCODEDHSs. Two literature- differingbetweenthestructurallyrelatedbidirectionallytranscribedenhancer derived44HeLa-S3positiveenhancersand4randomregions(see‘Construction TSSsandnonCGI-associatedpromoterTSSs,weextracted600bpregionsdown- ofrandomgenomicbackgroundregions’)wereusedforcomparison.Forcom- streamofeachTSSandperformedcomparativedenovomotifsearchesusing parison,wealsorandomlyselected20manuallydefineduntranscribedHeLa-S3 HOMER.Here,weanalysedonesetusingtheothersetasbackground(corrected chromatin-definedactiveenhancers(see‘Proofofconceptanalysis’). forregionsize,matchedforGCcontentandauto-normalized)tocalculatemotif PCR primers for the amplification of enhancer and control regions were enrichmentonlyonthegivenstrand.Thetopmotifenricheddownstreamof designedusingthePerlPrimertool45,andpurchasedfromOperonLtdPrimers nCGIpromoterswasthe59-splicesitemotif.Genomicdistributionsoftheenriched includedBamHIorSalIrestrictionsitesforcloningandsequencesarelistedinSup- splicesitemotif,aswellastheAATAAAterminationsignalweregeneratedusing plementaryTables2and3.Controlfragmentsrangedbetween420and1,452bp. HOMER. Enhancerfragmentsusuallyincludeda500bpwindowaroundthemid-pointof RNA-seqsamplesandlibrarypreparation.Priortopreparationofsequencing our predicted enhancers and depending on the availability of unique primer libraries, rRNA was removed by poly(A)1 selection (CD191 B-cells, CD81 sequences,enhancerfragmentsrangedbetween470and840bp. T-cells,500ng)orrRNAdepletion(fetalheart,1mg).Poly(A)1selectionwas WeinsertedanEF1abasalpromoterfragmentintoHindIIIandNheIsitesof donetwicebyusingDynabeadsOligo(dT) (LifeTechnologies)accordingtothe 25 themultiplecloningsiteinthepromoter-lesspGL4.10(Promega)toconstructthe manufacturer’smanual.rRNAdepletionwasdonebyusingRibo-ZerorRNA pGL4.10EF1avector.WenextremovedtheBamHIandSalIcontainingfragment removalkit (Epicentre, Illumina)accordingto themanual.ThetreatedRNA locateddownstreamoftheSV40latepoly(A)signal,andre-insertedthefragment wasdissolvedin20mlwater. attheSpeIsitethatislocatedupstreamofthesyntheticpoly(A)signal/transcrip- ThepretreatedRNAwasthenfragmentedbyheatingat70uCfor3.5minin tionalpausesitetogeneratemodifiedversionsofpGL4.10EF1aandpGL4.10(see fragmentationbuffer(Ambion),followedbyimmediatechillingoniceandaddi- SupplementaryFig.9d). tionof1mlofStopsolution.FragmentedRNAwaspurifiedwiththeRNeasy EnhancerandcontrolregionswerePCR-amplifiedusingKODpluspolymer- MinElute kit (Qiagen) following the instructions of the manufacturer except ase(Toyobo)fromHEK-293TgDNA,digestedwithBamHIandSalI(Takara 675mlof100%ethanolisusedinsteptwo,insteadof500ml.PurifiedRNAwas Bio),andpurifiedusingtheE-GelSizeSelectsystem(LifeTechnologies).Fivemlof dephosphorylated in phosphatase buffer (New England Biolabs) with 5 U of purifiedPCRproductswereligatedwith100ngoftheBamHI-andSalI-digested Antarctic phosphatase (New England Biolabs) and 40 U of RNaseOut (Life modifiedpGL4.10EF1aandpGL4.10plasmidsusingLigation-high(Toyobo), Technologies)at37uCfor30minfollowedby5minat65uC.Afterchillingon andtransformedintoDH5acompetentcells(Toyobo).Correctinsertionofthe ice RNA was phosphorylated by addition of the following reagents; 5ml of PCR products into the plasmids was checked by colony PCR. Vectors were 103PNK buffer, 20 U of T4 polynucleotide kinase (New England Biolabs), purifiedusingtheQIAGENPlasmidPlus96MiniprepKit(Qiagen). 5mlof10mMATP(Epicentre,Illumina), 40UofRNaseOut,17mlofwater. HeLa-S3cells(JCRBCellBank)wereculturedinMEM(WAKO)supplemen- Thereactionwasincubatedat37uCfor60min.PhosphorylatedRNAwaspuri- tedwith10%FBS(NichireiBioscienceInc.,lotno. 7G0031),100Unitsml21 fiedwiththeRNeasyMinElutekit(Qiagen)asdescribedabove.PurifiedRNAwas penicillinand100mgml21streptomycin(bothLifeTechnologies).HepG2Cells concentrated to 6ml by vacuum centrifugation on a SpeedVac (Eppendorf). (RIKENBRC)wereculturedinDMEM(LifeTechnologies)supplementedwith Onemlof2mMpre-adenylated39DNAadaptor,59-App/ATCTCGTATGCCGT 10%FBS(NichireiBioscienceInc.,LotNo.7G0031),andMEM(WAKO)sup- CTTCTGCTTG-39 was added to the concentrated RNA. After incubation at plementedwith10%FBS(NichireiBioscienceInc.,lotno.7G0031),100Units 70uCfor2minfollowedbychillingonicefor2min,thefollowingreagentswere penicillin and 100mgml21 streptomycin (Life Technologies). Cell lines were addedtoligatetheadaptoratthe39endoftheRNA;1mlof103T4RNAligase2 seededinto96wellplatesatadensityof7.53103cellsperwellonedaybefore truncatedbuffer,0.8mlof100mMMgCl,20UofRNaseOUTand200UofRNA 2 transfection.Fireflyluciferasereporterplasmids(190ng)and10ngofpGL4.73 ligase2truncated(NewEnglandBiolabs).Aftertheincubationat20uCfor60min, Renillaluciferaseplasmid(Promega)wereco-transfectedintoHepG2orHeLa-S3 1mlofheat-denatured5mM59RNAadaptor,59-GUUCAGAGUUCUACAGU cellsusingLipofectamine(LifeTechnologies)accordingtothemanufacturer’s CCGACGAUCGAAA-39wasligatedwith39adaptorligationproductswith20U instruction.Eachtransfectionwasindependentlyperformedthreetimes.After24h, ofT4RNAligase1(NewEnglandBiolabs)and1mlof10mMATP(NewEngland theluciferaseactivitiesweremeasuredbyGloMax96Microplateluminometer Biolabs)at20uCfor60min.4mlofadaptorligatedRNAwasmixedwith1mlof (Promega)usingtheDual-gloluciferaseassaysystem(Promega)accordingtothe 20mMRTPrimer,59-CAAGCAGAAGACGGCATACGA-39,followedbyincuba- manufacturer’sinstruction. tionat70uCfor2min,andimmediatelykeptonice.Thereversetranscription Sequencemotif analysis on global CAGE enhancer and promoter sets. To reactionwasdonewith2ml53PrimeScriptbuffer,1mlof10mMdNTP,20Uof comparemotifsignaturescharacterizingbidirectionallytranscribedenhancers RNaseOUTand200UofPrimeScriptReverseTranscriptase(TakaraBIO)at44uC (permissive set) with those of CAGE-defined promoters, we used the set of for30min.ThecDNAproductwasamplifiedbyPCRwith10mlof53HFbuffer, 184,827robusthumanCAGEclustersdefinedbyref.6separatedinto61,322 1.25mlof10mMeachdNTPmix,2mlof10mMFWDprimer,59-AATGATACG CGIand123,505nonCGI-associatedclusters.Wemadefurthersubsetsofthese GCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA-39,2mlofRTprimer CAGEclusters,contingentontheiroverlapwithannotatedTSSsfromRefseqand and1UofPhusionHigh-FidelityDNAPolymerase(NewEnglandBiolabs).PCR Gencode.WemergedoverlappingextendedCAGEclusters(2300,150;based wascarriedoutinatotalvolumeof50mlwiththefollowingthermalprogram; ontherobustclusterset;averagesizenonCGI:422bp;averagesizeCGI:544bp) 98uCfor30s,12PCRcyclesof10sat98uC,30sat60uC,and15sat72uC, contingentonCGIstatusandsubtractedCAGEclusterregionsthatoverlapped followedbyat72uCfor5minandthenkeptat4uC.RemainingPCRprimerswere withextendedenhancers(midposition6200bp). removedtwicebyusing1.2volumesofAMPureXPbeads(BeckmanCoulter).The Thiscreatedfivesetsofregionsrepresentingnon-overlappingbidirectional resultinglibrarieswerecheckedforsizeandconcentrationbyBioAnalyzer(Agilent) enhancers,nonCGIpromotersandCGIpromoters(annotatedandfullsetsfor usingtheHigh-SensitivityDNAKit(Agilent).Qualifiedsequencinglibrarieswere thetwolatterones).MotifenrichmentwasanalysedusingHOMER36version3,a loadedontheHiSeq2000(Illumina)usingthecustomsequencingprimer,59-CG suiteoftoolsformotifdiscoveryandnext-generationsequencinganalysis(http:// ACAGGTTCAGAGTTCTACAGTCCGACGATCGAAA-39. biowhat.ucsd.edu/homer/).Sequencesofthethreeregionsets(enhancers,nonCGI AllRNA-seqsamplesprofiledinthisstudywerealsoprofiledintheFANTOM5 andCGIpromoters)werecomparedtoequalnumbersofrandomlyselectedgeno- promoteromemanuscriptandaredescribedindetailthere6.Brieflyallhuman micfragments of the average region size, matched for GC content andauto- samplesusedintheprojectwereeitherexemptedmaterial(availableinpublic normalized to remove bias from lower-order oligo sequences. After masking collectionsorcommerciallyavailable),orprovidedunderinformedconsent.All repeats,motifenrichmentwascalculatedusingthecumulativebinomialdistri- non-exempt material is covered under RIKEN Yokohama Ethics applications butionbyconsideringthetotalnumberoftargetandbackgroundsequenceregions (H17-34andH21-14).ForthesamplesprofiledbyRNA-seq,thehumanfetalheart containingatleastoneinstanceofthemotif.Onehundredmotifsweresearchedfor RNAwaspurchasedfromClontech(Catalogueno.636583).CD191B-cellsand arangeofmotiflengths(7–14bp)resultinginasetof800denovomotifsperset. CD81T-cellswereisolatedusingthepluriBeadsystem(huCD4/CD8cascadeand Afterfilteringredundantmotifs,thetop50motifsresultingfromeachsearchwere huCD19single;PluriSelect).RNAwasthenextractedusingthemiRNeasykit(Qiagen). combined, remapped and ranked according to enrichment (depletion) in the RNA-seqmappingandtranscriptassembly.Single-end100bplongreadsfrom enhancerset.Inparallel,wealsousedHOMERtocalculatetheenrichmentof librariesoriginatingfromthesimilarcellsources(allsix‘‘CD191Bcells’’librar- ChIP-seqderivedknowntranscriptionfactormotifs.Motifcollectionsincluding ies,allsix‘‘CD81Tcells’’librariesandone‘‘Fetalheart’’library)wereprocessed searchparametersaredepositedinawebdatabaseathttp://enhancer.binf.ku.dk. togetherviatheMoiraipipeline(Hasegawa,Y.etal.,manuscriptinpreparation). Histograms of PhastCons scores were generated using the annotation tool in TheprocessingstepsimplementedwithintheMoiraipipelineincluded(1)raw HOMER. sequencedreadsPolyAtailand‘‘CTGTAGGCACCATCAAT’’adaptorclipping AnalysisofsplicesiteandterminationsignalsdownstreamofCAGEenhancer usingFASTQ/AClipperfromFASTX-Toolkit(http://hannonlab.cshl.edu/fastx_ TSSsandpromoterTSSs.ToidentifymotifsdownstreamofTSSspotentially toolkit/),(2)removalofsequencedreadscontaining‘‘N’’andsequencessimilarto ©2014Macmillan Publishers Limited. All rights reserved RESEARCH ARTICLE ribosomalRNAusingrRNAdustversion1.02(T.L.etal.,manuscriptinprepara- AssessmentofdegradationratesofRNAsemanatingfromCAGEenhancers tion),and(3)mappingtheresultingreadsagainstthehg19humangenomeusing andpromoters.Bidirectionallytranscribedlociwereidentifiedinthesameway TopHat46(version1.4.1)usingbothTopHatdenovojunctionfindingmodeand aswithpooledFANTOM5CAGElibraries(see‘Identificationofbidirectionally knownexon-exonjunctionsextractedfromGENCODEV10,withalltheother transcribedloci’and‘Expressionquantificationofbidirectionallytranscribedloci parameterssettotheirdefaultvalues.MappedreadsflaggedasPCRduplicates andpredictionofenhancers’above)fromtagclusters(asdefinedin51)derived wereremovedandtheremainingTopHatalignedreadswerethenassembledusing frompooledHeLaCAGE(mocktreatedcontrol,SKIV2L2)libraries.From5,892 Cufflinks47(version1.3.0)withCufflinksparameterssettotheirdefaultvalues. bidirectionallocidistanttoTSSsandexons,4,196werepredictedtobeenhancers AssessmentoflengthsofRNAsemanatingfromenhancersandpromoters.All basedonbalanceddirectionalityoftranscriptionofwhich3,896hadsignificantly Cufflinksassembledtranscripts,whose59ends,regardlessofstrand,werelocated greaterexpressionthanrandomgenomicregionsinatleastonelibrary.These withintheouterboundariesofCAGEenhancersor,onthesamestrand,within were then overlapped with the whole set of FANTOM5 CAGE enhancers to 200bp(upstreamordownstream)ofaGENCODE(v10)protein-codingTSS estimatethefractionofunseentranscribedenhancers. wereconsideredforfurtheranalysis.FortheseCufflinkstranscriptswecalculated TheexpressionfoldchangeofHeLadepletedofSKIV2L2comparedtomock- their(intron-less)RNAlength,(possiblyintron-containing)genomiclengthas treated control were assessed and compared between expressed HeLa CAGE thegenomicdistancebetweentheir59and39ends,aswellastheirnumberofexons. enhancers,promotersofRefSeqprotein-codinggenesingeneralandbrokenup ExonsofCufflinkstranscriptswith59endsinenhancerswerefurthercheckedfor intoCpGandnon-CpGpromoters,andubiquitousFANTOM5CAGEenhancers. atleast50%(reciprocal)overlapwithexonsofGENCODE(v10)known,level1, We further calculated the average footprints of H3K4me1, H3K4me2 and protein-codinggenesandlincRNAs.Werepeatedthesameanalysisspecificallyfor H3K27ac from ENCODE (Broad, Bernstein) signal files in 601bp windows u-enhancers. centredonmidpointsofenhancersidentifiedinHeLacellsandthosethatwere SmallRNAlibrarypreparationandmapping.ShortRNA-seqsequencinglib- novelinSKIV2L22. rarieswerepreparedas24-plexusingtheTruSeqSmallRNASamplePrepKit Purificationofbloodcelltypes.Peripheralbloodmononuclearcellswereiso- (Illumina)followingthemanufacturer’smanual.Allstartingsourceswere1mg latedfromleukapheresisproductsofhealthyvolunteersbydensitygradientcent- oftotalRNA.ThepreparedsequencinglibrarieswereloadedonaHiSeq2000 rifugationoverFicoll/Hypaque(BiochromAG).Collectionofbloodcellsfrom (Illumina).AllsamplesprofiledinthisstudywerealsoprofiledintheFANTOM5 healthydonorswasperformedincompliancewiththeHelsinkiDeclaration.All promoteromepaper6andaredescribedindetailthere.Briefly,allhumansamples donorssignedaninformedconsent.Theleukapheresisprocedureandsubsequent usedintheprojectwereeitherexemptedmaterial(availableinpubliccollections purificationofperipheralbloodcellswasapprovedbythelocalethicalcommittee orcommerciallyavailable),orprovidedunderinformedconsent.Allnon-exempt (referencenumber92-1782and09/066c).CD41cellswereenrichedusingmag- materialiscoveredunderRIKENYokohamaEthicsapplications(H17-34and neticallylabelledhumanCD4MicroBeads(MiltenyiBiotec)andtheMidi-MACS H21-14).ForthesamplesprofiledbysRNA-seq,thehumanfetalheartRNAwas system(MiltenyiBiotec).TheCD41fractionwasstainedwithCD4FITC(fluor- purchasedfromClontech(catalogueno.636583).CD191B-cellsandCD81T-cells esceinisothiocyanate;BectonDickinson,catalogueno.345768),CD25phycoery- were isolated using the pluriBead system (huCD4/CD8 cascade and huCD19 thrin(PE;BectonDickinson,catalogueno.341011)andCD45RAallophycocyanin single;PluriSelect).RNAwasthenextractedusingthemiRNeasykit(Qiagen). (APC)andCD31CD41CD252TcellsweresortedonaFACS-Ariahigh-speed ShortRNAswereprofiledusingtheTruseqprotocolfromIllumina,usingan cellsorter(BDBiosciences).CD81cellswereenrichedusingmagneticallylabelled 8-plex.The8-plexwasfirstsplitbybarcodeandtheresultingFASTQsequences humanCD8MicroBeads(Miltenyi).TheCD81fractionwasstainedwithCD3 trimmedofthe39adaptorsequence.SequenceswithlowqualitybaseNwere FITC(BectonDickinson,catalogueno345763)andCD8APC(BectonDickinson, removed. Ribsomal RNA sequences were then removed using the rRNAdust catalogueno.345775)andsortedforCD31CD81Tcells.CD191andCD561 program.RemainingreadswerethenmappedusingBWAversionis0.5.9(r16) cellswereenrichedfromtheCD82fractionusingmagneticallylabelledhuman andmultimapperswererandomlyassigned. CD19andCD56MicroBeads(Miltenyi).EnrichedcellswerestainedwithCD3 AnalysisofsmallRNAsatenhancerTSSsandpromoterTSSs.59and39endsof FITC(BectonDickinson,catalogueno.345763),CD19PE(BectonDickinson, mappedsRNAsaswellaspooledCAGE59endswereoverlaidwindowsof601bp catalogueno.345777)andCD56APC(BectonDickinson,catalogueno.341027) centredonforwardstrandsummitsofenhancer-definingCAGEtagclustersand andsortedintoCD31CD191BcellsandCD31CD561naturalkiller(NK)cells. sensestrandsummitsinpromotersofRefSeqprotein-codinggenes.Theaverage Purificationofbloodmonocytesisdescribedelsewhere52. cross-correlationbetweenCAGE59endsandsRNA39endswerecalculatedin Generation of ChIP data for blood cells. Chromatin was obtained from thesewindowsallowingamaxlagof300.Forfootprintplots,readsmappingto CD41CD252Tcells,CD81Tcells,CD191Bcells,andCD561NKcellsof thesamegenomiclocationswereonlycountedoncenottomakeanylibraryor genomicregionhaveanundueinfluence. two healthy male donors each. Chromatin immunoprecipitation (ChIP) for HeLacellsculturingandSKIV2L2depletion.HeLacellsweregrowninDMEM H3K4me1 and H3K27ac and library construction were done essentially as msiRedNiAum-mseudpiaptleedmkennotecdkdwoiwthn1o0f%eitfheetarlEbGovFiPn(ecosnetrruoml),aatn3d7SuKCIVan2Ld25(%MTCRO42). edreesncrciebseedquelesnecweh(eGreR52C.hS3e7q/uhegn1c9e)tuasginsgwBeorewmtiea5p0paneddotonltyhuenciuqrureelnytmhaupmpaendtraegfs- wereperformedusing22nMofsiRNAandLipofectamin2000(Invitrogen)as wereusedfordownstreamanalyses.H3K4me1andH3K27acChIP-seqdatafor transfectingagent.Asecondhitof22nMsiRNAwasgivenafter48h.Cellswere CD141monocyteswasgeneratedelsewhere52.ComplementaryDNasehyper- collected an additional 48 h after the second hit, and protein depletion was sensitivitysequencingdatawasobtainedfromtheEpigeneticsRoadmapproject verified by western blotting analysis as described elsewhere48. The following (http://www.roadmapepigenomics.org)andmappedasabove.BloodcellChIP- siRNAsequenceswereused: seq data have been deposited with the NCBI GEO database (accession code egfpGACGUAAACGGCCACAAGU[dT][dT] GSE40668)andUCSCGenomeBrowsertrackhubdataoftheentirebloodcell egfp_asACUUGUGGCCGUUUACGUC[dT][dT] data set can be found at http://www.ag-rehli.de/NGSdata.htm. Also see Sup- SKIV2L2CAAUUAAGGCUCUGAGUAA[dT][dT] plementaryTable4. SKIV2L2_asUUACUCAGAGCCUUAAUUG[dT][dT] ClusteringofbloodcellCAGEandepigeneticsdata.CAGEsamplescorres- HeLaCAGElibrarypreparationsanddataprocessing.CAGElibrarieswere pondingto CD41, CD81, B cells,NK cellsand monocytes wereselectedin preparedfrom5mgoftotalRNApurifiedfrom23106HeLacellsusingthe triplicatesfromamongthesetofprimarycellsamples.Basedonthetotalsetof Purelinkminikit(Ambion)with1%2-mercaptoethanol(Sigma)andon-column 43,011permissiveenhancers,asubsetof6,609blood-expressedenhancerswas DNaseItreatment(Ambion)asrecommendedbymanufacturer.CAGElibraries definedasbeingsignificantlyexpressedabovegenomicbackground(described werepreparedasdescribedpreviously49.Priortosequencingfourlibrarieswith above)inatleasttwoofthetriplicatesamplesforatleastonebloodcelltype.This differentbarcodeswerepooledandappliedtothesamesequencinglane.The subset of enhancers was clustered for heat map visualization using complete librariesweresequencedonaHiSeq2000instrument(Illumina).Tocompensate linkageagglomerativehierarchicalclusteringbasedonenhancerusagepercell forthelowcomplexityin59endoftheCAGElibraries30%Phi-Xspike-inwere type(binarymatrix)andManhattandistance. addedtoeachsequencinglaneasrecommendedbyIllumina.CAGEreadswere Enhancersweredefinedasbeingspecificallyexpressedinonebloodcelltypeif assignedtotheirrespectiveoriginatingsampleaccordingtoidenticallymatching havingapairwiselog2foldchange.1.5withrespecttotheotherfourbloodcell barcodes.Assignedreadsweretrimmedtoremovelinkersequencesandsubse- types.Thefoldchangewascalculatedbasedonthemeanexpressionovertrip- quentlyfilteredforaminimumsequencingqualityof30in50%ofthebasesusing licate samples per cell type. Footprints for DNase I hypersensitivity (DHS), theFASTX-Toolkit(http://hannonlab.cshl.edu/fastx_toolkit/).Mappingtothe H3K4me1andH3K27acwerecalculatedpercell-type-specificenhancersetand humangenome(hg19)wasperformedusingBowtie50(version0.12.7),allowing celltypebyextensionofreadsto200bpandoverlapaggregationforawindowof formultiplegoodalignmentsandsubsequentlyfilteringforuniquelymapping 61kbaroundenhancermidpointasthemeanTPMsignaloverallenhancersin reads.ReadsthatmappedtounplacedchromosomepatchesorchrMwerediscarded. thatspecificsubset. ©2014Macmillan Publishers Limited. All rights reserved
Description: