ebook img

Predicting the three-dimensional folding of cis-regulatory regions in mammalian genomes using bioinformatic data and polymer models PDF

5.1 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Predicting the three-dimensional folding of cis-regulatory regions in mammalian genomes using bioinformatic data and polymer models

Predicting the three-dimensional folding of cis-regulatory regions in mammalian genomes using bioinformatic data and polymer models Chris A Brackley,1 Jill M Brown,2 Dominic Waithe,3 Christian Babbs,2 James Davies,2 Jim R Hughes,2 Veronica J Buckle,2 and Davide Marenduzzo1 1SUPA, School of Physics and Astronomy, University of Edinburgh 2MRC Molecular Haematology Unit, Weatherall Institute Of Molecular Medicine, Oxford University 3Wolfson Imaging Centre Oxford, Weatherall Institute Of Molecular Medicine, Oxford University The three-dimensional organisation of chromosomes can be probed using methods such as Capture-C. However it is unclear how such population level data relates to the organisation within a single cell, and the mechanisms leading to the observed interactions are still largely obscure. We present a polymer modelling scheme based on the assumption that chromosome architecture is 6 maintained by protein bridges which form chromatin loops. To test the model we perform FISH 1 experiments and also compare with Capture-C data. Starting merely from the locations of protein 0 binding sites, our model accurately predicts the experimentally observed chromatin interactions, 2 revealing a population of 3D conformations. n Keywords: chromosomeconformation;polymermodel;fluorescenceinsituhybridization;cis-regulation a J 2 Background experimentsisanaverageoveralargepopulationofcells, 1 yet it has become clear that there is a remarkable vari- The three-dimensional spatial organisation of mam- abilityinbothchromosomalconformationandchromatin ] M malianchromosomesinvivoisatopicoffundamentalim- interactions between different cells [19, 20]. Thus it is portance in cell biology [1–5]. Understanding how chro- an important challenge to understand how the chromo- Q matin conformation becomes modified on a local scale some conformation in single cells leads to the observed o. in order to up-regulate transcription from genes during population average, and to decipher the mechanism un- i differentiation or development is critical not only to de- derlying such arrangements. To address this issue, here b cipher a fundamental biological process, but also to de- we present an in silico investigation of the local fold- - q lineate the role this process may play in human disease ing and resulting interaction maps of important active [ andpotentialtherapies. Thehigherscaleorganisationof gene loci in mouse erythroblasts. We concentrate on the chromatininthenucleusalsohasimportantrolestoplay well studied α and β globin loci which have long been 1 v in this regard [5–9] as the spatial structure of chromo- model systems for understanding cis-regulatory interac- 2 somes is tightly linked to transcription. For instance, tions [14, 21–30]. These loci are known to have tissue- 2 active genes can cluster at nuclear speckles [10, 11]; specificorganisation,andexpressionofthedifferentgenes 8 converselyperipherallamina-associateddomains(LADs) within the loci varies through development and erythro- 2 comprise of regions of the DNA that are not generically poiesis. As a comparison, we also study embryonic stem 0 transcriptionally active [12, 13]. The three dimensional cells where these genes are not active. Our main result . 1 structure of the genome is, therefore, intimately related isthatourmodelpredictspatternsofcontactswhichare 0 to its function. close to that found by high-resolution Capture-C exper- 6 iments, reproduces the changes in such patterns follow- Thanks to the development of high-throughput ex- 1 ing differentiation, and explains existing observations on perimental techniques based on chromosome conforma- : v tion capture (3C) [1], such as Hi-C and Capture-C [2– the biology of the globin loci in mouse. Our predictions i also compare favourably with new fluorescence in situ X 4, 14, 15], it is now possible to probe experimentally hybridisation (FISH) experiments that give spatial sepa- which regions of the genome of a given cell type are r ration measurements between specific genomic locations a spatially proximate in vivo. A major result obtained in individual cells. This level of agreement is especially with these methods has been the discovery that chromo- remarkable because it essentially involves no fitting. somesareorganisedinaseriesoftopologically-associated domains (TADs) [2–4], which are separated by bound- Our model builds on the minimal assumption that the aries,butwhosebiologicalnatureremainselusive. While spatial organisation of eukaryotic chromosomes is main- the TAD boundaries are thought to be largely con- tained largely through the action of proteins or pro- served across cell types, the arrangement of the chro- teincomplexeswhichcanformbridgesbysimultaneously matin within a TAD is not [16]. This internal organisa- binding to more than one site in the genome, and form- tiondependsontheactivityofthegeneswithinadomain, ing loops from the intervening chromatin [4, 31–36]. We and is likely related to the action of cis-regulatory ele- treat the chromatin fibre as a simple bead-and-spring ments(DNAregionswherethebindingofatranscription polymer (Additional file 1: Figure S1), and coarse-grain factor(TF)canregulatetheexpressionofagenewhichis the bridge forming protein complexes into single units. tens or hundreds of kilo-base-pairs (kbp) away) [17, 18]. We then “paint” the polymer according to bioinformatic Thepatternofinteractionsrevealedbymost3Cbased data characterising protein binding and chromatin state 2 in the relevant cell type, and use molecular dynamics to tive sites at regulatory elements are often surrounded by simulate the motion of the region of interest (see Ad- H3K4me1 modified regions, these act as a funnel which ditional file 1: Figure S1 for a schematic diagram, and effectively directs proteins to their high affinity binding Additional file 15: Supplementary Methods for the full sites [47]. details of the model). The chromatin fibre and proteins diffuse as though subject to the thermal fluctuations of thenucleoplasm;theproteincomplexescanbindanddis- Results sociate from the chromatin and form bridges, and the fi- bre adopts conformations which are consistent with the entropic and energetic constraints of the system. By re- Chromatin folding in the mouse α globin locus peatedly running the simulation with different random thermal motions, we can generate a population of equi- First, we use our model to predict the folding of librium conformations representing a population of cells. a 400 kbp region around the mouse α globin locus Some examples of other studies where polymer models (chr11:31960000-32360000, mm9 build; each polymer havebeenappliedtostudychromatinare[20,31–34,37– bead represents 400 bp, or two nucleosomes, see Fig- 40]. ure 1A and Methods). This well studied cluster contains To keep our model as simple as possible we use the lo- five globin related genes: the ζ globin gene (Hba-x, ex- cationsofDNase1hypersensitivesites(DHSs)asaproxy pressed in embryonic erythroid cells, but silent in adult forbindingsitesofagenerictypeofproteinbridge,which cells), two copies of the α globin gene (Hba, expressed in we imagine is made up from complexes of transcription foetal and adult erythroblasts) and two θ globin genes factors and other DNA-binding proteins. The choice of (Hbq1 and Hbq2, only weakly expressed in adult tissue). DHSs as binding sites is justified due to their well doc- Expression of the genes in the cluster is controlled by umented tendency to correlate with open chromatin, or several regulatory elements: the multi-species conserved euchromatin, and with peaks in ChIP-seq data for many elements R1-4 and the mouse specific R(m). Some of transcriptionfactors[41], suchasGATA1, Nfe2Scl/Tal1 these are contained within the introns of Nprl3, one of and Klf1, all of which are known to be important for severalwidelyexpressedgeneswhichsurroundthelocus; globin regulation (see Additional file 2: Figure S2). The the R2 element (known as HS-26 in mouse and equiv- interactions between the many transcription factors and alent to HS-40 in human) is thought to be particularly co-factors which might form the bridging complexes in- important for globin regulation [21, 23, 27]. Figure 1A volved in cis-regulatory binding are not well character- shows the binding sites for CTCF and DHS across the ized, and the DHS approximation avoids the need to region considered (informed by ChIP-seq and DNase-seq make any assumptions. One factor that most certainly dataforadulterythroidcells–seeAdditionalfile2: Fig- hasachromatinarchitecturalroleistheCCCTC-binding ureS2);thepositionsoftheH3K4me1methylationmarks factor(CTCF)[4,35,40,42–44]. Thisproteinisthought are also indicated (from ChIP-seq data for the same cell to form dimers which drive looping between some of its type, see Additional file 2: Figure S2). In our simula- specificbindingsitesscatteredalongthechromosomesof tions, proteins bind strongly to the CTCF or DHS la- eukaryotic organisms. In particular, convergent CTCF belled beads, and also weakly to the H3K4me1 marks. bindingsiteshavebeenproposedtodelimittheextentof Some typical snapshots from our simulations are shown chromatin domains, which might be extruded through a in Figure 1B and Additional file 16: Video S1 (CTCF loopingcomplex,possiblycomprisingcohesin[40,44,45]. and DHS binding proteins are shown as red and green CTCF is therefore a bridge with an architectural role, spheres respectively), while the average contact map is and has indeed been dubbed a “global genome orga- shown in Figure 1C. nizer”[4,35,42]. Interestingly,chromatinhasbeenfound As anticipated, one of the main strengths of our ap- to compact on depletion of RAD21 and CTCF [37]. To proach is that it naturally outputs information on each reflectitsperceivedimportance,wetreatCTCFproteins member of the population of chromatin conformations as separate bridges in the simulations; in this case the (these can be thought of as representing different cells, binding sites are placed at peaks in the ChIP-seq data orthesamecellatdifferenttimes),whichwecanthenfur- for CTCF binding (see Additional file 2: Figure S2). therinterrogate. Aclusteringanalysis(i.e., groupingthe Ourmodelthereforeincludestwospeciesofputativepro- conformations by similarity; see Additional file 15: Sup- tein bridges, which we denote CTCF and DHS bind- plementary Methods for details) of 1000 simulated con- ing proteins (or bridges) respectively. Furthermore, we formationsrevealsthatthelocusfoldsintofourmainrep- consider the hypothesis that some histone modifications resentative structures (Figure 2). The main distinction (e.g. H3K4 monomethylation at enhancers or trimethy- between these structures is whether a single bridging- lation at active promoters) act to recruit bridging pro- induced globular domain forms (of size ∼70 kbp), or teins [46]. We include this in the model by introducing whether it breaks into two smaller microdomains, one a weaker, non-specific interaction between the bridges containing around 40 kbp, and the other one around and H3K4me1 modified regions (which are not already 25 kbp. The size of these globular microdomains does labelledasCTCForDHSbridges); sincethehypersensi- notexceed100kbp,sothesearemuchsmallerthanTADs 3 A Mouseα-globinlocusinerythroid(Ter119+)cells(chromosome11,mm9build) RefSeq Genes Il9r Rhbdf1 Npr13 Hba Hba Snrnp25 Mpg R1 R2 R3 R(m) R4 Hba-x Hbq1 Hbq2 32.10 Mbp 32.20 Mbp unmarked bead CTCF binding site DHS site both CTCF and DHS H3K4me1 mark B k C 360 100 2 3 10-1 10-2 k 960 10-3 1 3 31960k 32360k Figure 1: Simulating the α globin locus. (A)Browserviewshowinggenesinthevicinityoftheαglobinlocus,alongside a schematic indicating the coarse-graining used in the simulations. A 110 kbp section of the 400 kbp chromatin fragment which was simulated is shown. As described in the text, simulation chromatin beads were designated as CTCF binding sites, DHS binding, H3K4me1 modified sites, and combinations of these. The positions of the set of five regulatory elements are indicated with blue triangles, and promoters with green squares. (B) Example simulated configurations of the locus. CTCF proteins(green)andDHSbindingproteins(red)areshown;thechromosomefragmentiscolouredasinA.SeealsoAdditional file16: VideoS1fora3-Dviewoftheconfigurations. Parametersforthepolymermodelandthebridge–chromatinaffinityare given in full in Additional file 15: Supplementary Methods. (C) Contact map showing the frequency of contacts between each chromatinbeadin1000simulatedconfigurations. Notethatthecolourbarshowsalogarithmicscale. Thebluelinetotheleft indicatestheregionwhichisshowninA.Thegreenlinetotheleftindicatestheregionwhichisusedfortheclusteringanalysis (Figure 2 and text). (the median size of a TAD is 1 Mbp [3]); interestingly, arrangement within the domains can be further probed though, their size is comparable to that of the sub-TAD by looking at which promoters are directly interacting domains observed within active regions [4], and also to with the different regulatory elements in each conforma- thatoftheso-calledsupercoilingdomainsrecentlyfound tion (see Additional file 3: Figure S3). We find, for ex- in mammalian cells [48]. ample, that one or more of the α promoters interacts In the most common representative structure, which with one or more of the elements in 65% of conforma- accounts for 53% of the total observed conformations for tions, and that Hba-a1 interacts with the elements in thelocus,thereisasingleglobulardomaincontainingthe 53% of conformations whereas Hba-a2 interacts in only promoters of the globin genes, the promoters of the two 41%. This is qualitatively consistent with experiments neighbouring genes Mpg and Nprl3, and all five known in which mRNA expression from the two α globin par- regulatory elements. A similar representative structure, alogues was measured independently (on the basis of 3(cid:48) whichaccountsfor6%ofconformations,alsohasasingle sequence divergence), which showed that the gene situ- globulardomain,buttheregionwhichcontainstheNprl3 ated linearly closer to the enhancer elements, Hba-a1, is promoter is in a loop outside the globule. A third rep- always expressed at a higher level [26]. resentative structure accounts for 14% of the conforma- Importantly,wecanalsocomparetheinteractionspre- tions: heretwoglobularmicrodomainsform,wheretheα dicted by our simulations with recent high-resolution genesinteractwithonlythetwogenomicallyclosestregu- Capture-C data [14] which mapped the chromosomal latory elements. The fourth structure, which is adopted contacts within a number of cis-regulatory landscapes by about 25% of the conformations, has again two mi- in mouse erythroblasts (see Additional file 15: Supple- crodomains, but their composition is different: now the mentary Methods). Specifically, Figure 3A compares α genes are no longer in the same microdomain as the Capture-C and in silico patterns of contacts with the regulatory elements. We expect that these genes should promoters of the two α globin paralogues (which can- be transcriptionally inactive when the locus adopts this not be separated in the experimental data as they share structure. Finally, there are a small number (∼ 1%) of the same sequence). Figure 3B shows a similar plot for conformations which do not fit into any of these four the Mpg promoter. The results show that, remarkably, clusters. It is also interesting to note that the ζ gene with the sole input of the ChIP-seq and DNase-seq data and Mpg seldom interact with the elements (these genes giving the locations of the protein binding sites, we can are not widely expressed in adult erythroid cells). The reproduce to a good accuracy the Capture-C profiles. In 4 R1R2R3R(m)R4α-1α-2 100 Nprl act 1 A 6% 10-1 α1α2 ulation contprobability 0.5 m si 0 32.1 32.2 32.3 10-2 genomic position (Mbp) 100 B 53% 1100--21 Nprlα1α2 ulation contactprobability 0 .15 m 10-3 si 0 32.1 32.2 32.3 genomic position (Mbp) 100 C D 10-1 experiment experiment 14% α1 simulation simulation 1100--32 Nprl α2 normalisedoccurences 0.1 240000 normalisedoccurences 0.1 240000 100 0 0 0 300 600 900 0 300 600 900 separation (nm) separation (nm) 10-1 25% Nprl Figure 3: Simulations compare favourably with ex- 10-2 α1 perimental data. (A)Plotshowingthecontactsmadewith α2 the promoters of the two α globin genes (locations indicated 10-3 byredasterisks;thepositionsoftheregulatoryelementsand other gene promoters are also indicated). Simulation results Figure 2: Conformationsoftheαglobinlocuscanbe (red) are shown alongside Capture-C data (grey); in both grouped by similarity. A clustering analysis gives a den- cases the plots show the contacts to both genes combined drogram (left) which indicates how similar or different the (sinceeachcopyofthegenehasthesamesequenceitisimpos- conformations are. Conformations fall into four main rep- sibletoseparatetheseintheexperiment). Blackbarsindicate resentative structures depending on the pattern of contacts regionswherethereisnocontactdata(i.e. betweencaptured they exhibit (see Additional file 15: Supplementary Meth- regions; see Additional file 15: Supplementary Methods and ods). Contact maps for each representative structure are Ref. [14]). Since Capture-C data only gives relative contact shown (centre; the region shown is indicated by the green strength,theheightoftheexperimentaldatahasbeenscaled line in Figure 1C), as is a schematic of each representative soastobestfitthesimulationresults(seeAdditionalfile15: structure(right). Theproportionofsimulatedconformations SupplementaryMethods). (B)AsinA,butnowshowingthe adoptingagivenstructuregivesapredictionofthefrequency contactsmadewiththeMpg promoter(positionindicatedby with which that structure will occur in a population of cells. red asterisk). Although Mpg is roughly the same genomic distance away from the regulatory elements as the α globin genes, it interacts with them less frequently. (C) Plot show- particular,wereproducethecontactsbetweentheαpro- ingthedistributionofthe3-Dseparationoftheαglobinpro- moters and the five known regulatory elements; we also moters and the probe pE located at the regulatory elements R1-3. Simulations are compared with FISH measurements reproducethefactthatthereissomeinteractionbetween (see Methods and Additional file 5: Figure S5) performed on theregulatoryelementsandtheNprl3 promoter(seeAd- mature erythroblasts 30 hours after differentiation, when the ditionalfile4: FigureS4),butfarfewerinteractionswith globin genes are maximally expressed. The inset shows the the Mpg promoter, despite the fact that this gene is a meanandstandarddeviationforeachcase. (D)AsinH,but similar genomic distance away from the elements as the the separation of the α promoters and a downstream control α genes. probe p58 located within the Sh3pxd2b gene. To further assess the level to which the population of locusconformationspredictedbyourmodelgivesafaith- fulrepresentationoftheorganisationoftheαglobinlocus tribution(seeMethodsandAdditionalfile5: FigureS5); in real cells, we performed FISH experiments (see Meth- this is the only fitted parameter in our model, and the ods) to obtain distributions of the separations of probes fit yields a size of 15.8 nm, which is reasonable given at different positions across the locus. These measure- that 400 bp corresponds to two nucleosomes. Plotting mentsalsoallowustoparametrisethephysicalsizeofthe theexperimentalandsimulationseparationdistributions 400bpsimulationbeadsbyfittingthemeansofeachdis- on the same axes (Figures 3C-D, and Additional file 5: 5 Figures S5D-G) reveals that once more the simulations and βh1 genes are predominantly expressed in embryos, give an accurate prediction of the structure of the locus; while the β genes take over in adults), and is controlled forexampletheseparationoftheαpromotersandpEat by interactions with a series of DHSs in a region known theregulatoryelementsR1-3showsanarrowdistribution as the locus control region (LCR) [21, 24]. Unlike the α peaked about a mean value of ∼ 200 nm, whereas the globinlocus,theβ globingenesaresurroundedoneither separation of the promoters and a probe p58 at roughly side by a condensed chromatin region, containing genes the same genomic distance, but telomeric to the locus, whicharenotexpressedinerythroidcells. Aswiththeα showsamuchbroaderdistributionwithameancloserto globin case, we use ChIP-seq and DNase-seq data to la- 300 nm. belabead-and-springpolymerwhichrepresentsthegene We can also define a quantitative score Q, taking val- locus (see Figure 4A, and Additional file 7: Figure S7). ues between 0 and 1, which indicates how well our simu- A clustering analysis of a population of 500 simulated lations predict the experimental Capture-C interaction conformations reveals that the most abundant represen- profiles (see Additional file 15: Supplementary Meth- tative structure of the β globin locus (43% of the total ods for details). By combining Capture-C data from a conformations, see schematics in Figure 4C and dendro- number of promoters across the locus, we can obtain a gram in Additional file 8: Figure S8), features a single mean Q value along with a standard error (Additional globular domain, where the β Major and Minor promot- file6: FigureS6). Thisallowsustocompareresultsfrom ers co-localised with the five regulatory elements in the different model set-ups. Specifically, we examined the LCR, and with a CTCF site on the telomeric side near effect on the experiment-simulation comparison scores the Olfr65 gene. A further 16% of conformations adopt of changes in: (i) chromatin stiffness; (ii) number of a similar representative structure, but the promoters in- bridges; and (iii) level of coarse-graining (see Additional teract only with the LCR. We also note that when the file 15: Supplementary Methods and Additional file 6: locus adopts these structures, there is an interaction be- Figure S6). For the first two cases we find only a mod- tween the CTCF sites in the LCR and the one on the esteffectontheQ-scoreforthesimulatedconfigurations centromeric side of the β genes near the Olfr67 gene (Additional file 6: Figure S6); if we decrease the res- (these contacts are just visible on the left and bottom olution of our model by changing the coarse-graining, edges of the top two contact maps in Additional file 8A: then this performs less well. Interestingly the repre- Figure S8A) which has previously been observed in both sentative structures found from the clustering analysis definitive erythroblasts and erythroid progenitors, but is of the population of conformations found in silico are absent in non-erythroid tissue [22, 24]. This is consis- always the same. What changes in some cases is the tent with the hypothesis that CTCF mediated loops in proportion of conformations which adopt each represen- progenitors hold the locus in a structure poised to facili- tative structure. In the model where the chromatin was tateβ globinexpressionupondifferentiation[24](though stiffer,theglobularmicrodomainstructurecontainingall see below). A third representative structure, which ac- of the regulatory elements occurred less often, whereas counts for 9% of the simulated conformations, has the the structure where the Nprl3 promoter loops out was β promoters interacting only with the DHS near Olfr65. more likely; this is because holding the Nprl3 promoter The Capture-C data, along with previous work [22, 24], in the microdomain requires bending of the chromatin confirms the prediction that this site (usually denoted fibre, which is disfavoured when this is stiff. Also, when HS-60) interacts with the β globin promoters; indeed it we examined the effect of changing the number of pro- has been previously shown that there are interactions tein complexes in the simulations, we found that, as between all hypersensitive sites in the locus [22] and the more proteins are introduced, there is a greater likeli- pair of sites HS-60/-62 are normally taken to demarcate hood that the locus adopts a structure with two globu- the boundary of the locus. Whether this particular DHS lar microdomains; this is because forming more protein (HS-60) has enhancer properties remains unclear, how- bridges between chromatin binding regions, while being ever it binds Scl/Tal1 (a transcription factor thought to energetically favourable, leads to the formation of more play a key role in hematopoietic differentiation [50]), is loopswhoseentropiccostincreasesnon-linearlywiththe near to a CTCF binding site (HS-62), and is within a number of loops [49]. region marked by monomethylation of histone H3 Lys4, which is normally associated with enhancers. In the re- maining32%oftheconformations(bottomtwoschemat- Chromatin folding of the β globin locus ics in Figure 4D), the β globin promoters are still to- gether, but do not interact with the hypersensitive sites (Additional file 8A: Figure S8A). We have also applied our chromosome-and-bridges model to the mouse β globin locus (chr7:110800000- We note that the microdomains which form in each 111200000, mm9 build; Figure 4, Additional file 7: Fig- type of the five representative structures have more ure S7, and Additional file 8: Figure S8). This locus “looped out” regions (consistent with conclusions from contains five globin genes: the (cid:15)y gene, βh1 and 2, and 3C experiments in Ref. [22]) than in the α globin locus twoβglobingenesβ-Majorandβ-Minor. Theexpression (compare contact maps in Figures 1C and 2 with Fig- ofeachgenedependsonthestageofdevelopment(the(cid:15)y ure 4C and Additional file 8A: Figure S8A – more gaps 6 A Mouseβ-globinlocusinerythroid(Ter119+)cells,(chromosome7,mm9build) RefSeq Genes Olfr67 Hbb-b1 Hbb-bh1 Olfr66 Olfr65 Hbb-b2 Hbb-bh2 Hbb-y Olfr64 Olfr631 HS1HS2 HS3 HS4HS5 110.94 Mbp 111.06 Mbp unmarked bead CTCF binding site DHS site both CTCF and DHS H3K4me1 mark B C M 1.2 100 1 1 10-1 10-2 M 0.8 10-3 1 1110.8M 111.2M D E ct 1 β-Min LC4R3% β-Min β-MajLCR β-Min β-Maj LCR mulation contaprobability 0.5 9% si 0 16% 110.9 111.0 111.1 genomic position (Mbp) F LCR ct 1 β-Min β-Ma6j % β-Min26β-%Maj LCR mulation contaprobability 0.5 si 0 110.9 111.0 111.1 genomic position (Mbp) Figure 4: Cis-interactions of the β globin locus. (A) Browser view showing genes in the vicinity of the β globin locus, alongside a schematic indicating the coarse-graining used in the simulations. A 130 kbp section of the 400 kbp chromatin fragment which was simulated is shown. The positions of the known regulatory elements within the LCR are indicated with blue triangles, and promoters with green squares. (B) Example simulated configurations of the locus. CTCF proteins (green) and DHS binding proteins (red) are shown; the chromosome fragment is coloured as in A. (C) Contact map showing the frequencyofcontactsbetweeneachchromatinbeadin500simulatedconfigurations. Thecolourbarshowsalogarithmicscale. The blue line to the left indicates the region which is shown in A; the green line indicates the region which is used in the clustering analysis. (D) As in Figure 2, clustering analysis allows conformations to be grouped by their structural features. Schematics of the representative structures are shown, with the % of conformations in which they occur; a dendrogram, and contact maps for each representative structure are shown in Additional file 8: Figure S8. (E) Plot showing the contacts made with the promoters of the two β genes (locations indicated by red asterisks; the positions of the regulatory elements and gene promotersareindicated). Simulationresults(red)areshownalongsideCapture-Cdata(grey);bothcasesshowthecontactsto both genes combined (since each copy of the gene has the same sequence it is impossible to separate these in the experiment). Black bars indicate regions where there is no contact data (see Ref. [14] and Additional file 15: Supplementary Methods). (F) Similar plot showing the contacts made with the Hbb-y gene (position indicated by red asterisk). are seen between the blocks of highly probable interac- predict contact patterns which are in good agreement tions in the β globin case). This indication that the β with Capture-C data, both for the β Major and Minor globin locus is less compact than the α globin case is gene promoters (Figure 4E) and for the Hbb-y promoter borne out in measurements of the overall 3-D size of the (Figure 4F). This demonstrates that our model is not simulatedloci(seedistributionsoftheradiusofgyration gene-specific, but can be applied, in principle, genome- ofthepolymerinAdditionalfile9: FigureS9Gcompared wide, at least to active regions; the two bridges which to the α globin case in Figure 7G). we model, CTCF and DHS binding proteins, are indeed found in most euchromatic, open chromatin, regions. As in the case of the α globin locus, our simulations 7 Given its relatively low computational cost (harvesting moter interacts strongly across the Slc25a37 gene, but 500 conformations for a 400 kbp chromosome region at also with two distinct regions between the nearby Synb a 400 bp resolution can be done in about a day with a and Gm16677 genes (Figure 6E, top panel). These are multi-core machine, see Additional file 15: Supplemen- enrichedformono-methylationofLysine4ofHistoneH3 tary Methods), we expect this modelling to be useful (see Additional file 11D: Figure S11D), suggesting that in predicting the overall folding of previously uncharac- sites within these regions have enhancer activity (as was terised active chromosomal loci – the knowledge of the also proposed in Ref. [51]). In order to test these predic- predicted population of 3-D structures can then direct tionswecomparewithnewCaptureCexperiments(per- further high-resolution Hi-C, Capture-C or fluorescence formed as detailed in Ref. [14]). As before, our very sim- hybridisation experiments (as in Figures 3 and 4E-F) to ple model gives a remarkable agreement with the data: characterise that region more accurately. strong interaction with the putative enhancer regions is observed in the erythroid, but not the stem cells. Some longer distance interactions which are predicted in both The model accurately reproduces differences in cell types are not found in the experimental data; these locus folding across cell types errorsareduetoourapproximationthatbridgescanform between any DNase hypersensitive sites, and the agree- ment would likely be improved with a different choice of Importantly, because data showing protein binding, input data (e.g. using TFs involved in regulation of this hypersensitive sites and histone modifications are avail- gene). ablefordifferentcelltypes,wecanalsopredictchangesin the three-dimensional organisation of a chromosomal re- gionacrosscelltypesoratdifferenttimesindevelopment. We show in Figure 5 how the folding of the globin loci The typical 3-D structures of the globin loci are differs in mouse embryonic stem cells (where the globin preserved in CTCF or other TF knock-outs genes are inactive) with respect to the organisation pre- dicted for erythroblasts. The bioinformatic data used to Another strength of our approach is that it is easy to inform our modelling for stem cells are given in Addi- alter the protein binding profiles in our simulations to tional file 10: Figure S10. investigate e.g. genome modifications or protein knock- Figure 5A shows the contact map predicted from sim- outs etc., and predict the consequences of these for the ulations of the α globin locus. Our model predicts that 3-D organisation in vivo. For example, we can switch in ES cells the contacts are much sparser than in ery- off interactions with the hypersensitive sites, and only throblasts,thatthebridging-induceddomainaroundthe include the CTCF bridges in the simulation, or simu- α globin gene is lost (Figure 5B), and that no interac- lateaCTCFknock-outbyswitchingoffinteractionswith tionswiththeregulatoryelementsareobserved;thesame the CTCF sites and any hypersensitive sites where only is true of the neighbouring Mpg promoter. Once again, CTCF binds (i.e. DHSs which bind CTCF, but none of the contacts observed in silico reproduce the experimen- the other TFs implicated in globin regulation). tal ones (Figures 5C), with some minor inaccuracies for In the case of the α globin locus we find that, surpris- Mpg (whichlikelyoriginatefromourapproximationthat ingly, for both the CTCF and DHS knock-outs the same all DHSs are the same in regards to bridge formation, folded structures can still form (Figures 7A-D). For the but nevertheless highlight the principle that the locus CTCF knock-out, the relative proportions of each struc- can adopt a completely different shape in a different cell ture found in the clustering analysis remain largely un- type). When repeating the analysis for the β globin lo- changed (Figure 7E): the most common one is again the cus we find that the loss of non-local contacts is even single globular domain containing the α promoters and more dramatic (Figures 5D-E), and the agreement with all regulatory elements. If we assume that the level of the data even more remarkable (Figures 5F), with all α globin expression correlates with the fraction of con- non-local (i.e. off diagonal) interactions being absent. formations in which one or more of the α promoters is To further demonstrate the wide applicability of the interacting with one or more of the regulatory elements, model, we also perform a set of simulations for a re- thenthisexpressionlevelalsoremainslargelyunchanged gionsurroundingtheSlc25a37 (Mitoferrin1)geneinboth (thegenesareactivein65-70%ofconformations,seeFig- mouseerythroblastsandembryonicstemcells. Thisgene ure 7F). For the DHS knock-out on the other hand, the encodesamitochondrialproteinessentialforironimport number of conformations showing regulatory element in- into mitochondria, however much less is known about teractionsdropstolessthan20%. Thereisalsoachange this locus than about the α or β globin, and so our re- in the proportions of the different groups found by the sults represent a true prediction of its folding. The in- clusteringanalysis,withthestructureinwhichtheNprl3 put data used was similar to that of the globin loci, and promoter loops out of a single domain becoming most are given in Additional file 11: Figure S11. As shown common. Nevertheless, it is remarkable that despite loss in Figure 6 the simulations predict that in the erythroid ofbindingattheregulatoryelements(whichpresumably cells(wherethegeneisactive)thelocusformsacompact reduces α globin expression), the CTCF sites near the domain around Slc25a37 and Entpd4; the Slc25a37 pro- Hbq1 and Hbq2 promoters, and within the introns of the 8 α-globinlocus A B C k k 1 2360 100 2360 0.5 actbility 3 10-1 3 contproba 0 32.1 32.2 32.3 0 1 y 10-2 actbilit 960k 10-3 960k -0.5 contproba 0 32.1 32.2 32.3 1 1 331960k 32360k 331960k 32360k genomic position (Mbp) β-globinlocus D E F k k 1 2360 100 2360 0.5 actbility 3 10-1 3 contproba 0 110.9 111.0 111.1 0 1 y 10-2 actbilit 960k 10-3 960k -0.5 contproba11 00.9 111.0 111.1 1 1 331960k 32360k 331960k 32360k genomic position (Mbp) Figure 5: Simulations show changes in locus organisation across cell types. (A)Contactmapfor500conformations fortheαglobinlocusinmouseembryonicstemcells(mES).SimulationsareperformedasinFigure1,butusingmESChIP-seq and DNase-seq data, as shown in Additional file 9: Figure S9. (B) Difference between the contact maps in panel A and Figure 1C. Blue regions indicate contacts which were present in erythroblasts, but not mES, and yellow indicates contacts present in mES but not erythroblasts. (C) Plots comparing simulations and Capture-C data for mouse embryonic stem cells (data from Ref. [14]). (D)-(F) Similar plots but for the β globin locus. Nprl3 gene(greenandyellowinFigure1A)aresufficient due to the β globin locus being less compact initially. to allow the locus to fold into the same representative Given the suggestion that CTCF proteins play a key structures. We canalso measuretheeffect ontheoverall roleingenomeorganisation,itmightseemsurprisingthat size of the domain by calculating the radius of gyration theknock-outsimulationshowsarelativelyminorchange ofthepolymer;Figure7Gshowsthedistributionforeach in the folding structures and promoter-enhancer inter- of the in silico knock-outs. We see that loss of protein action in both globin loci. However, CTCF is known bindinggenerallyleadstoanexpansionofthelocus,with to have a variety of different functions, for instance it the DHS knock-out having more effect than the CTCF acts as a barrier against the spreading of repressive het- case. erochromatin, or as an insulator, preventing interactions A similar scenario applies to CTCF and DHS knock- with other nearby chromosome regions [42]. A recent outs in the β globin locus (Additional file 9: Figure S9). study suggested that a depletion of CTCF has only a Here, however the contact map for each of the groups mildeffectonthedomainorganisationofchromosomesas identified by the clustering analysis (Additional file 9: found via Hi-C experiments [52], and a ChIA-PET anal- Figures S9A-C) shows some subtle differences between ysis of the contacts made between CTCF-bound regions the knock-outs. Again the CTCF knock-out appears to found that, only a fraction of the 40,000 CTCF binding havelittleeffect,leadingtoonlysmallchangesinthefrac- sites are involved in these [53]: presumably, this fact is tion of simulations adopting each structure or the con- related to the recently discovered importance of CTCF tacts between the β promoters and the LCR. The DHS binding site directionality in loop formation [4, 43, 45]. knock-out leads to a notable reduction in the promoter- In the specific case of the β globin locus, another recent LCR interactions, and a reduction in the number of con- study found that reducing the abundance of CTCF pro- formationsadoptingthestructurewheretheβ promoters tein, or disrupting a specific CTCF binding site within interactwiththehypersensitivesiteneartheOlfr65 gene. the locus in erythroid progenitor cells leads to a loss of This locus also expands upon protein knock-outs, albeit chromosomelooping;howeverupondifferentiationtoma- toalesserextentthantheαglobincase; thisisprobably ture erythroblasts, these cells are still able to express β 9 Mitoferrinlocus k AErythroblastcells k BmEScells D 6 6 100 kb mm9 009 100 009 100 chr14: 69,750,000 70,050,000 7 7 RefSeq Genes Gm16677 10-1 10-1 1700092C10Rik Slc25a37 Loxl2 Nkx2-6 Synb Entpd4 Mir6950 Nkx3-1 E 10-2 10-2 1 6k 6k ontactprob Erythroblast cells 69 10-3 69 10-3 c 0 9 9 669696k 70096k 669696k 70096k s d a 6k C re 9 0.4 0 69.7 69.9 70.09 0 7 1 0 ontactprob mES cells c 0 s d 6k rea 9 -0.4 6 9 69.7 69.9 70.09 669696k 70096k genomic position (Mbp) Figure 6: Simulations also correctly predict looping for a less studied locus. Simulations of the Slc25a37 gene (Mitoferrin1) were performed for mouse erythroblasts and embryonic stem cell, using a similar input data as for the globin loci (DNase-seq, and ChIP-seq for CTCF and the H3K4me1 histone modification). (A) Contact map from the simulations of erythroblasts showing the frequency of contacts between each chromatin bead in 500 simulated configurations. (B) Similar contactmapforthesamelocusinmouseembryonicstemcells. (C)DifferencebetweenthecontactmapsinpanelsAandB.Blue regions indicate contacts which were present in erythroblasts, but not mES, and yellow indicates contacts present in mES but not erythroblasts. (D) Browser view showing the genes across the 400 kb simulated region. (E) Plots showing the interaction profiles for the Slc25a37 promoter in each cell type, comparing simulation results (upper panels) with new Capture-C data (lower panels). Note that the genomic coordinates are aligned with the browser view in D. globin, and fruitful interactions between the promoters generic bridges made up by complexes of transcription and the LCR can still form [25] (i.e. setting up loops in factors and other DNA-binding proteins. The only in- progenitor cells appears not to be necessary). Together puts we require are ChIP-seq data for CTCF binding, thissuggeststhattheglobinlocimaybeexampleswhere and the map of DNase1 hypersensitive sites, which we CTCF-mediatedchromosomeloopsarenotcrucialinde- take as a proxy for the location of the binding sites for termining the 3-D organisation, though of course CTCF the generic protein bridges (DHS bridges). Importantly, is likely to have some other function (e.g. protecting ourapproachdiffersfromotherrecentpolymermodelling other nearby genes from activation) and may still play studies which also have predictive power [20, 29, 36], in an important organisational role at a larger scale [28]. thatitdoesnotrelyonfittingtopre-existing5CorHi-C In our simulations the CTCF bridges certainly do form data. Due to this feature, it can be applied to relatively loops, but in their absence the overall folding patterns poorlycharacterisedloci(e.g.,Mitoferrin1,seeFigure6), can be maintained by the other bridges. for which only few data exist (e.g., DNase tracks); the model can then be developed when needed as more ex- perimental data become available. Our model generates a population of conformations, Discussion hence we can predict, for instance, the distribution of distances between selected targets on the globin locus. In this work we have shown that a minimal polymer These results compare very favourably with our FISH model informed by large bioinformatic datasets on pro- measurements, which allow us to estimate the physical tein binding can successfully reproduce the pattern of sizeofthebeadsinourcoarse-grainedpolymer(orequiv- Capture-Ccontactsobservedinthewellstudiedαandβ alently, the DNA packing density in the chromatin fibre globinlociwithinmouseerythroblasts(acelltypewhere in the globin locus; this is the only fitting parameter in these genes are highly active), and also within the less ourmodel). Thepackingweobtain(15.8nmfor400bp) understood Slc25a37 (Mitoferrin1) locus. Our model is is consistent with open chromatin, which is reasonable built on the hypothesis that there exists architectural since the region we focus on is highly active. protein bridges, which we assume are either CTCF, or Thefactthatourmodelgeneratesapopulationofcon- 10 1 1 1 1 1 A.DHSand 10-1 10-1 10-1 10-1 CTCFbinders 10-1 asFig.1 10-2 10-2 10-2 10-2 10-3 10-3 10-3 10-3 10-2 1 1 1 1 1 B.CTCF 10-1 10-1 knockout 10-1 10-1 10-2 10-2 10-3 10-2 10-2 10-3 10-1 1 1 1 1 1 C.DHS 10-1 10-1 knockout 10-1 10-1 10-2 10-2 10-3 10-2 10-2 10-3 10-1 D Nprl Nprl Nprlα1α2 Nprlα1α2 α1α2 Osttrhuecrture α1 α2 E % of conformations which have F % of conformations withα-promoter G DHS and CTCF each type of structure regulatory element interactions 0.2 CTCF knockout DHS and CTCF DHS and CTCF bility DHS knockout CTCF knockout CTCF knockout ba 0.1 o DHS knockout DHS knockout pr 0 25 50 75 100 0 25 50 75 100 0 40 80 120 160 200 Hba1 Hba1 &Hba2 Hba2 None radius of gyration (nm) Figure 7: Simulations predict the effect of protein knock-outs in the α globin locus. Plots showing the effect of a CTCF knock-out, and a “DHS knock-out” (equivalent to knocking out all protein complexes involved in looping the α globin locusexcept CTCF).(A)-(C)Contactmapsshowingtheinteractionsbetweendifferentchromosomallocationsforconformations within each group identified by clustering analysis. Maps from three sets of simulations are shown; the positions of the known regulatory elements and gene promoters are indicated above each plot. (D) Schematics showing the structure of the locus within each group. (E) Plot showing the percentage of conformations which belong to each group identified by the clustering analysis. ThecolourkeyisgiveninD.(F)Plotshowinginwhatpercentageofconformationsthetwoαglobingenepromoters areinteractingwithoneormoreoftheknownregulatoryelements. (G)Plotshowingthedistributionoftheradiusofgyration of the locus across the simulated conformations. The radius of gyration is defined as R2 = (1/N)(cid:80)N (r −¯r)2, where r is g i=1 i i the position of the ith chromatin bead in the polymer, and¯r is the mean position of all N chromatin beads. formations, ratherthanasingleaverageconformation, is and costs less entropy. (This is because there are more important because it gives an estimate of the stochastic- ways to place two microdomains in space than there are ity and fluctuations in in vivo 3-D organisation. A key forasingleone,andalsobecausetheentropyofformingn result of our model is that the conformations of the loci loops in the same place scales non-linearly with n [49]). we studied can be grouped into a handful of representa- There is a subtle balance between these contributions, tive structures, which account for different fractions of which are both of the order of a few k T, therefore both B the whole population. In both the α and β globin loci, structures coexist in the population. A consequence of the analysis suggests that there is a split in these struc- this is that the globin loci are naturally poised close to a tures between two main types: those in which there is transition between two different 3-D folding phenotypes; a single globular domain which includes the active genes becausethecompetitionbetweenbridgingandentropyis togetherwiththeirregulatoryelements,andthosewhere likely to be a generic feature, we suggest that the plas- the globule splits into two microdomains. The single ticityassociatedwiththisbalancebetweencompetingef- globule structures are favoured by bridging, while the fects may be an underlying principle in the organisation competing structure requires less bending and looping, ofactiveregionsgenome-wide. Thissuggeststhatthecell

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.