Zhouetal.BMCGenomics2013,14:39 http://www.biomedcentral.com/1471-2164/14/39 RESEARCH ARTICLE Open Access Use of somatic mutations to quantify random contributions to mouse development Wenyu Zhou1, Yunbing Tan2, Donovan J Anderson3, Eva M Crist3, Hannele Ruohola-Baker4, Stephen J Salipante5 and Marshall S Horwitz3* Abstract Background: The C. elegans cell fatemap,inwhich thelineage ofits approximately 1000 cells is visibly charted beginning from thezygote, represents a developmental biology milestone.Nematode development is invariant from one specimen to thenext,whereas inmammals, aspects ofdevelopment are probabilistic, and development exhibits variationbetween even genetically identical individuals. Consequently, a single defined cell fate map applicable to allindividuals cannot exist. Results: Todetermine theextent to which patterns of cell lineage are conserved between different mice, wehave employed the recently developed method of“phylogenetic fate mapping”to compare cell fate maps in siblings. In this approach, somatic mutations arising inindividualcells are used to retrospectively deduce lineage relationships through phylogenetic and—as newly investigated here—related analytical approaches based ongenetic distance. We have catalogedgenomic mutationsat anaverage of 110mutation-prone polyguanine (polyG)tracts for about 100cells clonally isolated from various corresponding tissues ofeach of two littermates ofa hypermutable mouse strain. Conclusions: We find that during mouse development, muscle and fat arise from a mixedprogenitor cell pool in thegerm layer,but, contrastingly, vascular endothelium inbrain derives from a smaller source ofprogenitor cells. Additionally, formation of tissue primordia is marked by establishment ofleft and right lateral compartments, with restricted cell migration between divisions.We quantitatively demonstrate that development represents a combination of stochastic and deterministic events, offering insight into how chance influences normal development and may give rise to birth defects. Keywords: Fate map, Cell lineage,Differentiation Background nearer to just1 g [1]. However, each of the two daughter Mouse gestation takes approximately 20 days [1], and, cells may experience different fates; both daughter cells although cell cycle length is variable, embryonic cells donotalwaysdivide,nordotheydosoatthesametime. divide about twice per day [2]. It can therefore be sur- Along with the effects of apoptosis, this accounts for the mised that about 40 or so mitotic generations transpire fact that a newborn mouse has fewer cells than antici- between fertilization and birth—a value similar to other pated if embryonic cell proliferation were to proceed estimates derived from different assumptions [3]. If all exponentially. embryonic cell divisions produced two daughter cells In fact, asymmetric cell divisions are evident in the C. that both subsequently divided, then a newborn mouse elegans‘ cell fate map, in which the lineage of every cell should be composed of 240 (≈1011) cells. Given that the in the worm, beginning from the zygote, is charted [5]. mass of a cell is about 10-12 kg [4], a newborn mouse Based on the cell fate map, it becomes apparent that would weigh about 10 g—close to actual measurements sometimes one daughter cell continues to proliferate while the other ceases to divide and undergoes terminal *Correspondence:[email protected] differentiation or death. There are then only two types 3DepartmentofPathology,UniversityofWashington,Box358056,Seattle, ofproliferative celldivisions,distinguishablebyhowthey WA98195,USA Fulllistofauthorinformationisavailableattheendofthearticle are graphed on the lineage tree: one type in which both ©2013Zhouetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited. Zhouetal.BMCGenomics2013,14:39 Page2of16 http://www.biomedcentral.com/1471-2164/14/39 daughter cells divide and the other where only one sufficient quantities of DNA to perform mutational ana- daughter cell continues to divide. If only the first of lysis, cataloged length-altering mutations at dozens of these two possibilities were to hold constant—that polyguanine (polyG) repeat mutational hotspots dis- daughter cells constitutively divide—then there would persed throughout the genome, and determined the only be one possible cell lineage tree, a symmetric one order in which mutations havearisen, toward the goal of with each node bifurcating at every branch. However, reconstructing cellular lineages. For the purpose of max- the addition of the second possible type of cell division imally extracting somatic genetic information, we have —in which one of the two daughter cells ceases to fur- additionally introduced a technical refinement in which therdivide—addssignificantcomplexitytotherepertoire studies are conducted in DNA repair-deficient hyper- of potential cell lineage trees and consequently to the mutable mouse strains and have also evaluated new different types of tissue and body plans that can be cre- methods of inferring cellular ancestry based on genetic ated duringembryogenesis. distance, inadditiontothose basedonphylogenetics. For any given number of n cells in an embryo there are a surprisingly large possible number ((2n-3)!/2n-2(n- Results 2)!)) of potential cell lineage histories [6]. For an embryo Mutationprofilesofsinglecells with 4 cells there are 15 different possible fate maps, for We have previously carried out phylogenetic fate map- 8 cells there are 135,135, and for 16 cells the number ping studies utilizing the developmentally normal exceeds 1015. For the thousand or so cells of the adult “Immortomouse” strain, which expresses a conditional worm [5], the number of potential different lineage his- SV40 T-antigen oncogene and conveniently allows for tories is immeasurably large. Yet, all individual worms derivation of conditionally-immortalized cell lines invariantly develop identically; the cell fate map remains [14,22] from clonally expanded single cells. To obtain constantfrom oneC.elegans specimentothenext[5]. larger numbers of informative mutations, we took the For many animals, however, including mice and other additional step of breeding the Immortomouse’s condi- mammals, there does not exist a single, defined fate map tional T-antigen into hypermutable strains, deficient in which the same developmental plan is followed by all both in the lagging-strand DNA polymerase delta proof- individualsofthat species. Instead,developmentispartly reading [23,24] and MLH1 DNA mismatch repair [25] stochastic [7]. In contrast to C. elegans, any given cell activities. from an early embryo is totipotent and can adopt any of We successfully isolated and cultured as conditionally a number of different possible cell fates. Commitment to immortalized clonal cell lines about 100 single cells dis- any particular lineage is probabilistic (as reviewed [6]). A sected from various tissues at similar locations from striking illustration of the variable development occur- each of two adolescent (5 week) female mouse litter- ring between even genetically identical individuals of the mates (here identified as “mouse 1” and “mouse 2”). We same species is evident in cloned animals, where size, harvested cells representing vascular endothelial tissue blood cell indices and serum markers, skin type, hair from the brain, preadipocytes from abdominal fat, and growth patterns, blood vessel branching and even the fibroblasts from hindlimb muscles (Additional file 1: number of teats all show considerable heterogeneity, Table S1). In addition to mutations developing somatic- even among constitutionally genetically identical indivi- ally during the lifetime of the mouse, mutations can also duals [8]. Similar examples include variable heart valve arise during ex vivo clonal expansion; however, they are morphology [9], craniofacial structure [10], and numbers expectedtorandomly populateonlyafewcellsperclone of neurons [11,12] and cortical brain patterning [12] and because they are unique to each isolate are unlikely amongisogenic strainsofrodents.Thesestudies indicate to confound inferences of lineage, even if they are de- that while genetic background and environment con- tectable. We therefore assume that the most frequent tribute to variation, at least some differences are not alleles in a clone represent genotypes of the original sin- genetically determined but are rather inescapable gle cell from which the clone is derived [14,15]. As an consequences of developmental noise. additional measure to control for mutations arising dur- Here we attempt to measure the extent to which ran- ing ex vivo clonal expansion, for several isolates, we split dom versus deterministic factors shape development. each clone after just a few passages into two separate We employ an approach that we have dubbed “phylo- culturesandindependentlygenotyped andanalyzed each genetic fate mapping”, previously developed by our member of the pair to insure that separately they pro- group [13-16] and similar to methods developed by duced equivalent results(seebelow). others [3,17-21], in which cell lineage histories are in- To ascertainsomatically-acquired mutations in each of ferred from somatic mutations. We have dissected single the single cell clones, we extracted DNA from the cells from analogous tissues of two mouse littermates, expanded clones and genotyped an average of 110 polyG expanded the cells clonally ex vivo in order to obtain loci per clone and identified somatic mutations that Zhouetal.BMCGenomics2013,14:39 Page3of16 http://www.biomedcentral.com/1471-2164/14/39 either shortened orlengthened thepolyG tract(genotyp- of the cells examined. Because mutations arise with ing data for mouse 1 and 2 shown in Additional file 2: regular frequency during mitosis, a measure of the gen- Table S2 and Additional file 3: Table S3, respectively). etic distance separating individual cells from the zygote Figure 1a shows how many different mutant alleles are is expected to be proportional to the number of mitoses identified for each marker across all of the approxi- those cells have undergone since conception [19]. We mately 100 cells genotyped for each mouse. Combining calculatedgeneticdistancefortissues basedonthemean data from all cells harvested, each mouse exhibits an number of pairwise allelic differences for the polyG mar- average of 0.5 mutant alleles/polyG locus/cell, which is kers, adjusting for missing data (data for mouse 1 and 2 more than one hundred-fold greater than we previously in Additional file 6: Table S6 and Additional file 7: Table observed (0.003 mutations/locus/cell) using mice with S7, respectively). Measuring this distance from the zyg- intact DNA repair machinery [14]. Figure 1b shows the ote forcellsineach mouse suggeststhatfibroblastsfrom number of polyG marker mutations detected per cell for hindlimb muscle and preadipocytes from abdominal fat each mouse (from among all approximately 110 mar- have undergone a similar number of mitoses, yet it is kers). On average, for each cell, more than one third of significantly fewer than those of vascular endothelial the 110 polyG markers (mouse 1: 36.7%, mouse 2: cells derived from the brain (Figure 1c). One potential 34.4%) exhibited a somatic mutation. It is worth noting explanation for this observation is that it simply takes that the SV40 T-antigen originates from a strain (mix- fewer cell divisions from the point at which muscle and ture of CBA/Ca and C57BL/10) different from the one fat differentiation begins until their development is (C57BL/6J) than it is crossed into and that contains the complete, compared to what is required for the forma- MLH1 and DNA polymerase delta deficient alleles. Lit- tion of blood vessels in the brain. Alternatively, it is pos- termates therefore carry differing amounts of strain- sible that these tissues all arise at a similar point during specific DNA from each parent, most likely including at development, but that muscle and fat originate from a lociencodingotherDNAfidelityfactorsaswellaspolyG larger group of progenitor cells than vascular endothe- markers. The similarity in mutation profiles between the lium. In the latter scenario, endothelial cells of blood two individuals suggests that the genetic effects induced vessels would require relatively more cell divisions be- by the deficiency in polymerase proof-reading domain fore committing to specified lineages in order to pro- and mismatch repair genes are unlikely to be influenced duce the large numbers of cells required during the bydifferences betweenmouse strains. tissue maturationprocess. We next experimentally assayed the mutation fre- To distinguish between these two possibilities, we quency at polyG loci. From each mouse we selected one compared the pairwise genetic distance among single muscle fibroblast and one preadipocyte cell line and iso- cell clones within each tissue as well as to the zygote. lated 12 single cells that were each passaged for a For the progenitor cell pool which gives rise to any tis- defined number (20) of doublings. For each of the 48 sue, this comparison yields an estimation of the relative subclones, we genotyped 110 polyG loci and identified number of mitoses which cells have undergone prior to mutations that were not found in the parental cell line tissue commitment, in contrast to how many mitoses from which the subclones were derived. We calculate those cells have undergone after commitment. For that mouse 1 muscle fibroblasts and preadipocytes ex- muscle and fat the distance between cells within each hibit equal mutation rates, with a mean of 0.010 muta- tissue is greater than their distance to the zygote tions/division/polyG locus, while mouse 2 displays (Table 1). In contrast, vascular endothelial cells demon- similar values (p=0.248), with an average of 0.012 and strate that they are about as distant from each other as 0.013 mutations/division/locus for muscle fibroblasts they are to the zygote. Since we isolated a similar num- and preadipocytes, respectively (Additional file 4: Table ber of cells from those tissues (Additional file 1: Table S4, with the genotyping data from which it is derived S1), we minimized possible bias introduced by unequal shown in Additional file 5: Table S5). These results indi- sampling. The pattern observed in muscle fibroblasts cate that mutation rates do not vary with cell type or be- and preadipocytes may be interpreted as showing that tween individuals and support the notion that mutations during organogenesis, these cells form a population of can be used as a “molecular clock” [19] to unbiasedly mixed lineages bearing various genotypes, instead of infer cell lineage histories in different tissues from differ- from a few closely related progenitors. Following entmice. organogenesis,mutationscontinuetoaccumulateindes- cendant cells derived from the mixed founder popula- Quantifyingmitotichistoryoftissues tion, with the result that cells within an organ are more Cells within the body all originate from the zygote. We dissimilar to each other than they are to the zygote. approximated the genotype of the zygote as being the Contrastingly, for brain vascular endothelial cells, or- most commonly observed allele for each locus, across all ganogenesis appears to initiate from a limited number of Zhouetal.BMCGenomics2013,14:39 Page4of16 http://www.biomedcentral.com/1471-2164/14/39 A 50 ci 40 M1 M2 o G l oly 30 p of er 20 b m u N 10 0 2 3 4 5 6 7 8 10 Mutant alleles per marker B 12 10 s M1 M2 e pl 8 m a s of 6 er b m u 4 N 2 0 10 13 20 23 26 29 32 35 38 41 44 47 50 53 58 Mutations per sample C M1 M2 0.4 ** e c n 0.3 a st di ge 0.2 a er v A 0.1 0 L eft fibroRbilga hstt fsibroLbelfta stprse a diRpiog chty tpeLrseefta dviap soccuyltRiaerg shet nvdaots ch uelliaarl e n d oth elial Figure1(Seelegendonnextpage.) Zhouetal.BMCGenomics2013,14:39 Page5of16 http://www.biomedcentral.com/1471-2164/14/39 (Seefigureonpreviouspage.) Figure1SomaticpolyGmutationprofilesoftwomouselittermates.(A)Histogramshowinghowmanydifferentmutantallelesare identifiedforeachmarkeracrossalloftheapproximately100cellsgenotypedforeachmouse.Anaverageof5mutantalleleswasobservedon eachpolyGmarkerinbothmice,yieldinganaveragemutationrateas0.5mutantalleles/polyGlocus/cell.(B)Histogramshowingthenumberof polyGmarkermutations(fromallapproximately110markers)detectedpercellforeachmouse.Formouse1,somaticmutationswereobserved in36.7%ofapproximately110polyGmarkersforeachsinglecellcloneonaverage,whileinmouse2,anaverageof34.4%wereobserved.(C) Averagegeneticdistanceofdifferenttypesoftissuestothezygoteforeachmouse. progenitors, and cells within the tissue appear to To avoid these problems, our modified eBURST algo- undergo a large number of cell divisions in order to fully rithm calculates relative genetic distances from pairwise commit to the specific lineage. In this case, the genetic comparisons of genotypes, connects isolates with related distance of cells from the zygote is much greater and is genotypes into groups and clonal complexes, and identi- comparable to the average distance of single cells within fies the founding genotype of each clonal complex. thesame type oftissue. Analysis using the modified eBURSTalgorithm suggests Notably, in both mice, we observed that relationships that muscle fibroblasts and fat preadipocytes are clon- are in general much closer for cells in the same type of ally related (mouse 1 shown in Figure 2a, mouse 2 in tissue than they are for cells in different types of tissue Additional file 8: Figure S1), in agreement with the (Table 1). An interpretation of this observation is that above findings indicating that muscle fibroblasts and the fate of progenitor cells are specified early in embryo- preadipocytes share a common population of progenitor genesis and remain committed during the remainder of cells. Only under such circumstances, is it possible for development. It appears that cell migration between dif- descendants of closely related lineages to localize and ferent primordial tissues is rare; otherwise, genetic dis- develop in physically separated tissues. For most clones, tances within tissues would be similar to those between modified eBURST analysis does not detect meaningful differenttypesoftissues. relationships between other cell types. Nevertheless, This notion also applies when examining the related- given the fact that we examined only a small proportion ness of left-sided tissues to their right-sidedcounterparts of the cells present in any tissue, we are largely limited (Table 1). Interestingly, we found that the distance be- to detecting relationships between cells that are only tween contralateral tissues of the same type is generally separated by a few cell divisions. (Based on assumptions larger than it is for the distance between the same types described in the Materials and Methods section, we esti- of ipsilateral tissues; however, the genetic distance for mate that the modified eBURST algorithm is limited contralateral tissues of the same type is still smaller than to detecting clonal relationships of cells separated by the average distance between unrelated types of tissues. ≤12 mitoses.) Intriguingly, modified eBURST analysis This finding suggests that establishment of left and right revealed in both mice several connections between sin- polaritytakes place after specification oflineages to indi- gle cell clones derived from distant tissues (such as from vidual tissues, and, subsequently, cells largely develop contralateral tissues), suggesting that cell migration constrainedtoeitherside. may occur during development, such that spatially sepa- rated cells share similar mutation profiles. Overall, how- Reconstructionoflineagerelationshipsbydistance-based ever, most cells from spatially isolated tissues did not methods exhibitsucharelationship,suggestingthatcellmigration We next evaluated whether genetic distance information appears to be rare during development, at least across could be used to infer lineage relationships between tis- the physical distances separating cells within the tissues sues. We used two approaches (one based on the wesampled. eBURST algorithm and another utilizing network ana- Wethenexaminedforsimilaritiesamongcellsthrough lysis) for deriving clonal relationships between tissues use of network analysis (Figure 2b), which offers a com- andcells from genetic distance calculations. plementary approach for identifying ancestral relation- We first adapted the eBURST algorithm [26], which ships based on genetic distance [27]. In mouse 1, muscle was originally developed to display clonal relationships fibroblasts and preadipocytes are most genetically simi- among bacterial populations. An advantage of eBURST lar, consistent with the findings reported above. The analysis is that it may more sensitively detect clonal same close relationship between fibroblasts and preadi- relationships in cases where there is insufficient genetic pocytes appears in mouse 2, at least on the right side of information to establish phylogeny. However, the algo- the body; however, not all relationships in mouse 1 are rithm is designed to interpret genotypes arising in hap- preserved in mouse 2. To compare the overall similarity loid genomes. An additional limitation is its inability to of tissue relationships between the two mice, we mea- analyze datasets as large as those generated in our study. sureddistancesbetweenthesamepairsoftissuesinboth Zhouetal.BMCGenomics2013,14:39 Page6of16 http://www.biomedcentral.com/1471-2164/14/39 Table1Averagegeneticdistanceandthesampleerrorofmean(SEM)forcomparisonsamongsinglecellclones groupedbytheirtissueorigins Mouse1 Mouse2 Averagedistance SEM Averagedistance SEM Leftfibroblasts Intra-‐tissue 0.289 0.003 0.334 0.009 Inter-‐tissue 0.316 0.001 0.349 0.002 Tozygote 0.213 0.007 0.260 0.010 Lefttoright 0.304 0.006 0.332 0.007 Rightfibroblasts Intra-‐tissue 0.315 0.004 0.305 0.005 Inter-‐tissue 0.324 0.001 0.334 0.002 ToZygote 0.238 0.008 0.237 0.007 Leftpreadipocytes Intra-‐tissue 0.282 0.016 0.360 0.010 Inter-‐tissue 0.313 0.002 0.358 0.002 Tozygote 0.251 0.010 0.271 0.0019 Lefttoright 0.293 0.003 0.336 0.011 Rightpreadipocytes Intra-‐tissue 0.287 0.008 0.292 0.008 Inter-‐tissue 0.307 0.002 0.330 0.002 Tozygote 0.235 0.015 0.225 0.014 Leftvascularendothelial Intra-‐tissue 0.285 0.007 0.334 0.008 Inter-‐tissue 0.329 0.002 0.365 0.002 Tozygote 0.343 0.0111 0.333 0.014 Lefttoright 0.317 0.011 0.0348 0.015 Rightvascularendothelial Intra-‐tissue 0.331 0.007 0.321 0.008 Inter-‐tissue 0.339 0.002 0.354 0.002 Tozygote 0.342 0.011 0.316 0.015 mice and calculated Pearson correlation coefficients across tissues, to infer phylogenetic trees. We first com- (Figure 2c, based on data in Additional file 9: Table S8). puted genotypes for particular tissues based on the most This analysis demonstrates that the relatedness of differ- frequent alleles found in all cells from the same type of ent tissues to the zygote is largely the same in both mice tissue. Phylogenetic reconstruction of the different tis- (Pearson correlation coefficient=0.789, R2=0.622, and sues (Figure 3a) demonstrates that, among all the types p=0.0067), but the relatedness between any two different of tissue investigated in this study, vascular endothelial tissues in the pair of mice follows no discernible pattern cells from the left and right sides of the brain share the (Additional file 10: Table S9). We reconcile these obser- most recent common progenitor and are therefore most vations by proposing that in different individuals, tissues closely related (as was found above in analyses based on develop at similar times with similar sizes of progenitor mitotic distances). Fibroblasts from the left and right cell populations, but that the genetic composition of kidney are also closely related. Notably, these relation- those progenitor cells is randomly assigned. Although ships are conserved in both individual littermates. Other the overall coefficient index for all pairs of tissues tissues demonstrate more variable relationships, how- demonstrates that tissue relationships between these ever. In comparing the two mice, for instance, kidney individuals are far from perfectly correlated, it is none- podocytes are more similar to preadipocytes in mouse 1 theless non-random; in other words, the overall pattern while they are closest to vascular endothelial cells in represented in two mouse littermates reflects a combin- mouse 2. Despite the findings from the distance-based ation of deterministic and stochastic developmental analysis, we failed to discern any relationship between events. preadipocytes and muscle fibroblasts using phylogenetic inference. This may be a consequence of using the zyg- Phylogeneticreconstructionoflineagerelationships ote genotype as an outgroup in the phylogenetic recon- In order to more specifically infer lineage relationships struction.Itispossible thatthesimilarity ofgenotypesof among cells from each mouse, we used the genotypes of muscle, preadipocytes and zygote, demonstrated by individualcells,aswellascollectivelythemeangenotype distance-based clustering and network analyses, pose Zhouetal.BMCGenomics2013,14:39 Page7of16 http://www.biomedcentral.com/1471-2164/14/39 A D=0.2 Right endothelium Left endothelium Right preadipocytes Left preadipocytes Right kidney Left kidney Right muscle fibroblasts Left muscle fibroblasts B Anterior Left Right Posterior Left preadipocytes Right preadipocytes Left muscle fibroblasts Right muscle fibroblasts Right preadipocytes C 0.48 2 0.43 e s u o 0.38 m of e 0.33 c n a st 0.28 di e s 0.23 wi air P 0.18 0.13 0.15 0.2 0.25 0.3 0.35 0.4 Pairwisedistanceofmouse1 Figure2(Seelegendonnextpage.) Zhouetal.BMCGenomics2013,14:39 Page8of16 http://www.biomedcentral.com/1471-2164/14/39 (Seefigureonpreviouspage.) Figure2Lineagerelationshipsinferredfrommethodsbasedongeneticdistance.(A)ModifiedeBURSTanalysis,showing“population snapshot”ofsinglecellclonesinmouse1.Clustersofrelatedsinglecellclonesandindividualunlinkedclonesaredisplayedasasinglemodified eBURSTdiagrambyusingthedistancevalueD=0.2ascut-off.Clustersoflinkedsinglecellclonescorrespondtocomplexesthatsharehighly similarmutationalprofiles.Eachsinglecellcloneisrepresentedasadotwithcolorindicativeofitstissueorigin.(Mouse2showninFigureS1.)(B) Networkrepresentationdepictingmutationalsimilaritiesamongsinglecellclonesbetweenbothmice.Significantsimilaritiesbetweensinglecell clonesformouse1areshownwithgreyconnectinglines.Eachsinglecellcloneisdepictedasadotwithdifferentcolorsindicativeoftissue originwhilethelayoutonthegraphreflectsrelativeanatomicallocationontheanteroposterioraxis.Thediameterofthecirclescorrelateswith theaveragedistancewithintissues.Orangelinesshowrelationshipsthatareconservedinmouse2.(C)Scatterplotofdistancebetween equivalentpairsoftissue,comparingmouse1tomouse2.Distancesofspecifictissuestothezygotearecoloredorange;atrendlineindicates theircorrelation.Amongthesecomparisons,thedistancesbetweenindividualtissuestothezygotearelargelyconservedbetweenthetwomice. difficulties in resolving those groups from one another migration occurring during embryogenesis. Yet, cell when employing phylogenetic analysis and, conse- mixing and migration appear restricted to certain devel- quently,does notproduceaninformativetree structure. opmental stages and/or certain types of tissue, because, Whenapplyingphylogeneticanalysistoindividualcells by and large, cells develop in a constrained space that is (as opposed to the composite genotype produced from likely defined by interactions with neighboring cells and cells of the same tissue type, as shown in Figure 3a), the surrounding tissuearchitecture. number of somatic mutations identified was insufficient to produce well-supported bifurcating trees through Patternsofcellgrowthinferredfromtheshapeofthe phylogenetic reconstruction (mouse 1 shown in tree Figure 3b and mouse 2 in Additional file 8: Figure S2); The topology of a phylogenetic tree is shaped by the half of terminal branches cannot be fully resolved and process through which it has grown [28,29]. For ex- appearaspolytomies.Employingeven alow thresholdof ample, if a lineage bifurcates, but only one of the subse- 50% Bayesian posterior probability yielded a tree in quent two cell lines persists, then the shape of the tree which all branches correspond to terminal bifurcations will be asymmetric at that branch. For a tree produced of pairs of cells, without revealing complex internal from composite genotypes representing cells of the same branching structures. Although this topology is limiting, tissue type (as in Figure 3a), these properties translate to there are nevertheless several noteworthy findings con- the probability that progenitor cells will give rise to dis- tained in the phylogeny. First, internal control clones tinct tissue types. We therefore examined the topology that were split from the same parental clone in culture of phylogenetic reconstructions for nonrandom shapes. are largely paired together with high confidence (mouse We first generated a comparison set of trees based on 1: 16/18 paired with an average of 0.99 posterior prob- randomization of genotypes. Assuming the same total ability; mouse 2: 26/28 paired with an average of 0.97 amount of genetic information, we generated random posterior probability), indicating neither that mutations genotypes with the same number of samples from our occurring during ex vivo expansion nor that errors in experimentally observed genotypes by sorting alleles of determining marker genotypes are of sufficient magni- each locus into arbitrary orders. We used Bayesian tude to influence phylogenetic reconstructions. Second, phylogenetic analysis, collected the 5×104 highest-scored pairs of single cell clones from different tissue origins trees and measured their degree of asymmetry. The occur frequently (mouse 1: 9/14; mouse 2: 8/11). Com- resultsareshowninthehistograminFigure3c,inwhich pared to pairs of phylogenetically related cells derived asymmetry is measured by the N-bar statistic [30]. (We from the same tissue, pairs of phylogenetically related also measured asymmetry using a different statistic, Col- cells from dissimilar types of tissues exhibit longer less’ imbalance statistic I [31], which produced similar c branches connecting them to their most recent common results, Figure S3.) Although the trees shown in progenitor. This finding indicates that such cell pairs di- Figure 3a are symmetric, they correspond to a Bayesian verge from their common ancestors substantially earlier consensus estimating the single best tree. To get a sense in development than for related cells from the same tis- of the range of the shapes of trees that are compatible sue, confirming observations from our earlier studies with theexperimentaldata for mouse 1,we collected the [14]. Reassuringly, phylogenetically related pairs of cells 5×104 highest-scored trees (of 2.5×105 total) produced from different tissues also had higher degrees of genetic by the phylogenetic analysis, measured their asymmetry, similarity in our distance-based analyses and similarly and superimposed the result on the values for the trees formed statistically significant connections in the modi- generated from random genotypes (Figure 3c, which fied eBURST and network analyses. Altogether, the shows symmetry measured by N-bar, and Figure S3, paired patterns of single cell clones in the phylogenetic which shows symmetry measured by I). Compared c reconstruction are consistent with cell mixing and to trees based on randomized genotypes, possible Zhouetal.BMCGenomics2013,14:39 Page9of16 http://www.biomedcentral.com/1471-2164/14/39 phylogenies best fitting the experimental data are much each have a similar probability of descending directly more symmetric. We reject a trivial explanation that the from the zygote, at the root of the tree. Overall, this ob- symmetry arises from polytomies, where the branching servation suggests that a population of pluripotent cells order cannot be resolved, because the posterior prob- in the early embryo contributes to different lineages abilities support the inferred structures. The most obvi- without bias and that the determination of lineage com- ous biological explanation for a symmetric tree is that mitmentduringdevelopmentisitself a stochasticevent. thereisnovariationinspeciationand/orextinction rates for different branches of the tree. With respect to em- Discussion bryogenesis, this implies that distinct types of tissue, In our previous studies [13-16] employing phylogenetic represented by individual clades in the phylogenetic tree, analysis of somatic mutations accumulating during A B Zygote LReifgtL hmet fumts pRucrsilegeca hlfdeitb ikfpriiobodbrcnolyeabtysela tspssotsdocytes 1MR my 121BL1 B5L1 4BL 17BL1 B9L 11B5R 2A1BR 51BR 61BR 91BR 101BR 111FL 41FL 51FL 91FR 31FR 41FR 51FR 9 1FR10 1FR 111FR 131KL 41KL 51KL 61KL 91KR 71KR 81ML F 11AML F 11ML F 13ML F1 5ML 1F M1L5 1F M1L1 6MmyL1 M4m1Ly M1m8LM y1 mLM11 y0Mm L 1Lym1 1my2 y1 314 Right preadipocytes 11KBLR 3 7 11MMLL m myy 1 156 0.60 Left kidney podocytes *1MR my1 F1L3 B7 11MMRR FF 11B *1MR my 13A 1MR F 3 0.54 1MR my 8 1MR F 7 Left vascular endothelial 1BL 14 1MR F 9 Muta00ti0..0o11..n669s2/SitMMe0.9oo2uussee 120.61 RiLgehfRtt kikgiidhdntn eveyay s fficibburrlooabbr lleaansstdtssothelial 1M1LM FL1 1KF31R 71K M4L1 LT8 mai*1ly H51a*KnLd 11K1BL M1R Amy 16BL 1111BLM R2 BF 118A1KL 71BL 171BL 31KR 61KR 51BR 1A1BL 81BR 81BR 1B1ML F 11 1FL 12 1Chest1BR 31MR F 81MR F 41FR 81FR 71BL 1B*1BL 1A1*FR 1B1F*R 1A1M*R m1yM 9RB m1*Fy L9 A11BF*L* 11AML*1 mMyL1 4MmB1LyM *4F1AL 5M *1FBRM 1*5 mRAM1 1My*Rm MS Rym2R Smy m 1y1 1y1 05 C Right endothelium Left endothelium 10000 Right preadipocytes 9000 Random Trees Left preadipocytes Trees sorted by posterior probability 8000 Right kidney s 7000 Left kidney ee Right muscle fibroblasts of tr 6000 Left muscle fibroblasts er 5000 b m u 4000 N 3000 2000 1000 0 3.05 3.3 3.55 3.8 4.05 4.3 4.55 4.8 5.05 5.3 5.55 5.8 6.05 6.3 6.55 6.8 7.05 7.3 N-bar symmetry statistic 4 7.43 Figure3Phylogeneticreconstructionoftissuesandsinglecellclonesinbothmice.(A)Phylogenetictreeoftissues,withmouse1inblack andmouse2inorange,overlaid.NumbersatbifurcationsindicateBayesianposteriorprobabilities.(B)Phylogenetictreeofsinglecellclonesin mouse1.Onlybranchstructureswithlargerthan50%posteriorprobabilityareshown.Pairsofsinglecellclonesfromthesameparentalcellare markedwithasterisks.(Mouse2showninAdditionalfile8:FigureS2)(C)DistributionofN-barsymmetrystatisticformouse1tissuetreeswith highestposteriorprobabilitiescomparedtorandomtreeswiththesamenumberofbranches,showingasymmetricnatureoftheactualtissue trees.Examplesofsymmetric(left,N-bar=4)andasymmetrictrees(right,N-bar=7.43)areshown.(EquivalentdistributionfortheI symmetry c statisticareshowninAdditionalfile8:FigureS3). Zhouetal.BMCGenomics2013,14:39 Page10of16 http://www.biomedcentral.com/1471-2164/14/39 development, we analyzed only individual mice. Add- promoting survival of at least some individuals in the itionally, we had previously not taken advantage of gen- face of a continually changing environment. This inter- etic strains in which there is reduced DNA replication pretation is somewhat analogous to the concept of gen- fidelity withcorrespondingly higher ratesofsomatic mu- etic “buffering,” in which populations may tolerate tation. In the results we present here, comparison of tis- otherwise deleterious mutations in genes in order to sue relationships in two sibling mice with mutator maintain higher genetic diversity and thereby expedite phenotypes reveals details about how well overall pat- the rate of adaption [43]. Overall, our study offers gen- terns of development are conserved between different etic evidence to separate variable developmental events individuals. Results from our distance-based analysis from conserved ones, and delineates a model in which point to a stochastic model of development, in which development represents the sum of what can be effi- progenitors of different tissues and their exact genetic ciently specified in the genome balanced against the ef- composition are randomly determined. Additionally, the fort required to control entropic noise intrinsic to the highly symmetric shape of reconstructed cell lineage underlyingbiochemistry. trees in these mice, generated by phylogenetic inference Oneofthemostsignificanteventsduringdevelopment using mutations accumulating in single cells, similarly is gastrulation, when the single-layered blastula reorga- supports the apparently stochastic nature of lineage dif- nizes into the three classic germ layers, which subse- ferentiationoccurringduringembryogenesis. quently give rise to specialized cell types. Given that Ever since Waddington first proposed a probabilistic muscle, fibroblasts,andfat share acommon mesodermal model for how gene regulation modulates development origin, significant effort has focused on deciphering gen- in 1957 [32], stochastic contributions to cell fate deter- etic mechanisms determining lineage commitment of mination have been repeatedly demonstrated in studies progenitor cells to one cell type or the other [44,45]. employing various linage tracing techniques, including However, the relative timing of lineage determination dye injection [33], retroviral marking [34], and chimeras and the ultimate source of progenitors of muscle and fat formed from embryonic stem (ES) cells obtained from are still unknown. In this study, by inferring from simi- mixtures of differently pigmented mouse strains [35]. larity in somatic mutations in individual single cell For example, with respect to the latter, sibling litter- clones isolated from various tissues, we show that mates exhibit variable patterns of pigmentation, indicat- muscle fibroblasts and preadipocytes are similar in gen- ing that, at least in skin, mature tissues are randomly etic composition we show that muscle fibroblasts and derived from primordial progenitors. Yet, the simple fact preadipocytes are similar in genetic composition and that most mice (and other individuals within a species) separate into discrete lineages at a similar time during are patterned more-or-less the same suggests that there development. Our data suggest thatboth of these tissues are limits to stochastic effects occurring during differen- may descend from a pool of progenitor cells with mixed tiation. A goal of our study was to determine where and lineages, instead of from a single or a few progenitor when suchrestrictions mightoccur. cells with similar mitotic histories. We present a sche- Developmental stochasticity has been mathematically matic,modelingthesefindingsinFigure4. modeledandexperimentallyconcludedtobeaninescap- This notion resonates with recent discoveries of post- able consequence of gene transcription [36,37], epigen- natal mesenchymal stem cells (MSCs), a type of cell that etic gene regulation [38] and protein interaction [39]. holds the potential to differentiate into multiple lineages Ultimately, these processes presumably reflect the inher- in muscle, fat, and bone tissues, and which have been ent noise in the networks into which genes and their located as nonhematopoietic cells in bone marrow [47- products assemble, as governed by statistical and 49],pericytesencircling capillariesandmicrovessels[50], quantum mechanics [40-42]. However, this is not to say adipose tissue [51], and indeed from almost every post- that development is solely a random process, as our data natal connective tissue [52]. Given such a diversity of also indicate that during lineage specification, the timing postnatalMSCsinvariousanatomical locations,itisrea- and numbers of progenitor cell populations appear to be sonable to speculate that they could be derived from conservedbetween individuals. precursorswithdifferentgenetic composition.Wethere- An immediate question is how and why certain devel- fore propose a developmental model in which at the opmental events occur predictably while others appear early three germ layer stage, there might be a large pool to be random. Although our study does not provide of progenitor cells within mesoderm that possess mul- direct clues, it is reasonable to speculate that such a tiple lineage differentiation potentials, yet they them- balance between stochasticity and determinism is an selves arise from proliferative growth and can be evolutionary consequence that defines one species and distinguished from each other by the mitotic mutations distinguishes it from another but that at the same they bear. Such a mixed pool of progenitor cells gives time allows for beneficial diversity within a species, rise to precursors that initiate formation of muscle, fat,