ebook img

Comparison of multiple transcriptomes exposes unified and divergent features of quiescent and PDF

15 Pages·2017·3.36 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Comparison of multiple transcriptomes exposes unified and divergent features of quiescent and

Pietrosemolietal.SkeletalMuscle (2017) 7:28 DOI10.1186/s13395-017-0144-8 RESEARCH Open Access Comparison of multiple transcriptomes exposes unified and divergent features of quiescent and activated skeletal muscle stem cells Natalia Pietrosemoli1†, Sébastien Mella2,3†, Siham Yennek2,3,6†, Meryem B. Baghdadi2,3†, Hiroshi Sakai2,3†, Ramkumar Sambasivan4†, Francesca Pala2,3, Daniela Di Girolamo2,5 and Shahragim Tajbakhsh2,3* Abstract Background: Skeletal muscle satellite(stem) cellsare quiescent inadult mice and can undergo multiple rounds of proliferation and self-renewal following muscle injury. Several labs have profiled transcripts ofmyogenic cells during thedevelopmental and adult myogenesis with theaim ofidentifying quiescent markers. Here, wefocused onthe quiescent cell state and generated new transcriptomeprofiles that include subfractionations ofadult satellite cell populations, and an artificiallyinduced prenatal quiescent state, to identify core signaturesfor quiescent and proliferating. Methods: Comparison of available data offered challenges related to theinherent diversity of datasetsand biological conditions. We developed a standardized workflow to homogenize the normalization,filtering, and quality control steps for the analysis of gene expression profiles allowing theidentification up- and down-regulated genesandthesubsequentgenesetenrichmentanalysis.Tosharetheanalyticalpipelineofthiswork,wedeveloped Sherpa,aninteractiveShinyserverthatallowsmulti-scalecomparisonsforextractionofdesiredgenesetsfromthe analyzeddatasets.Thistoolisadaptabletocellpopulationsinothercontextsandtissues. Results:Amulti-scaleanalysiscomprisingeightdatasetsofquiescentsatellitecellshad207and542genescommonly up-anddown-regulated,respectively.Sharedup-regulatedgenesetsincludeanover-representationoftheTNFα pathwayviaNFKβsignaling,Il6-Jak-Stat3signaling,andtheapicalsurfaceprocesses,whileshareddown-regulated genesetsexhibitedanover-representationofMycandE2FtargetsandgenesassociatedtotheG2Mcheckpointand oxidativephosphorylation.However,virtuallyalldatasetscontainedgenesthatareassociatedwithactivationorcell cycleentry,suchastheimmediateearlystressresponsegenesFosandJun.Anempiricalexaminationoffixedand isolatedsatellitecellsshowedthattheseandothergeneswereabsentinvivo,butactivatedduringproceduralisolation ofcells. Conclusions:Throughthesystematiccomparisonandindividualanalysisofdiversetranscriptomicprofiles,weidentified genesthatwereconsistentlydifferentiallyexpressedamongthedifferentdatasetsandsharedunderlyingbiological processeskeytothequiescentcellstate.Ourfindingsprovideimpetustodefineanddistinguishtranscriptsassociated withtrueinvivoquiescencefromthosethatarefirstrespondinggenesduetodisruptionofthestemcellniche. Keywords:Skeletalmusclesatellitecells,Quiescence,Multi-leveltranscriptomiccomparisons,Sherpa *Correspondence:[email protected];[email protected] †Equalcontributors 2StemCellsandDevelopment,DepartmentofDevelopmentalandStemCell Biology,InstitutPasteur,75015Paris,France 3CNRSUMR3738,InstitutPasteur,75015Paris,France Fulllistofauthorinformationisavailableattheendofthearticle ©TheAuthor(s).2017OpenAccessThisarticleisdistributedunderthetermsoftheCreativeCommonsAttribution4.0 InternationalLicense(http://creativecommons.org/licenses/by/4.0/),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedyougiveappropriatecredittotheoriginalauthor(s)andthesource,providealinkto theCreativeCommonslicense,andindicateifchangesweremade.TheCreativeCommonsPublicDomainDedicationwaiver (http://creativecommons.org/publicdomain/zero/1.0/)appliestothedatamadeavailableinthisarticle,unlessotherwisestated. Pietrosemolietal.SkeletalMuscle (2017) 7:28 Page2of15 Background signature of the quiescent state. This large compendium Most adult stem cell populations identified to date are ofexpressiondataoffersthefirstcomparisonandintegra- in a quiescent state [1]. Following tissue damage or tion of nine independent studies of the quiescent state of disruptionofthestemcellniche,skeletal muscle satellite mouse satellite cells, and we developed Sherpa, a shiny (stem) cells transit through different cell states from re- interactive web server to provide a user-friendly explor- versible cell cycle exit to a postmitotic multi-nucleate ation of the analysis. In addition, using a protocol for the state in myofibres. In mouse skeletal muscle, the tran- fixation and capture of mRNA directly from the tissue scription factor Pax7 marks satellite cells during quies- without the alteration in gene expression that could arise cence and proliferation, and it has been used to identify during the isolation procedure, which typically takes and isolate myogenic populations from skeletal muscle severalhourswithsolidtissues,wehaveempiricallytested [2, 3]. Myogenic cells have also been isolated by the expression of transcripts. Strikingly, several genes, fluorescence-activated cell sorting (FACS)using avariety including members of the Jun and Fos family, were found of surface markers, including α7-integrin, VCAM, and to be present in isolated satellite cells using conventional CD34 [4]. Although these cells have been extensively isolation procedures, but they were absent in vivo. These studied by transcriptome, and to a more limited extent findings, and the unique atlas that we report, will un- by proteome profiling, different methods have been used doubtedly improve our current understanding of the to isolate and profile myogenic cells thereby making molecular mechanisms governing the quiescent state and comparisons laborious and challenging. To address this contributetotheidentificationof critical regulatorygenes issue, it is necessary to generate comprehensive catalogs involvedindifferentcellstates. of gene expression data of myogenic cells across distinct states andindifferentconditions. Methods Soon after their introduction two decades ago, high- Individualdatasettranscriptomicanalysis throughput microarray studies started to be compiled The analysis comprised a total of nine datasets, three into common repositories that provide the community novel microarray datasets and six publiclyavailable data- access to the data. Several gene expression repositories sets [9–14], choosing only samples with overall similar for specific diseases, such as the Cancer Genome Atlas conditions. All datasets were analyzed independently (TCGA) [5], the Parkinson’s disease expression database following the same generalized pipeline based on ad hoc ParkDB [6], or for specific tissues, such the Allen Hu- R-implementedscripts(Fig.2). man and Mouse Brain Atlases [7, 8] among many, have been crucial in allowing scientists the comparison of Geneexpressionprofiles datasets, the application of novel methods to existing The microarray data compared activated satellite cells datasets, and thus a more global view of these biological (ASCs) and quiescent satellite cells (QSCs) from differ- systems. ent experiments. Table 1 describes the public datasets In this work, we generated transcriptome datasets that were taken into account for the analysis with the of satellite cells in different conditions and performed GEO [15] (Gene Expression Omnibus) identifications, comparisons with published datasets. Due to the diver- references,andsampledistribution.Thenewmousemicro- sity of platforms and formats of published datasets, this array datasets include the following comparisons: young wasnotreadilyachievable.Forthisreason,wedeveloped adult Quiescent(adult)/Activated (postnatal day 8) and an interactive tool called Sherpa (SHiny ExploRation Quiescent [high/low]/D3Activated [high/low], and Fetal_- tool for transcriPtomic Analysis) to provide comprehen- NICD[E17.5/E14.5].Table1presentsallsampledetails. sive access to the individual datasets analyzed in a homogeneous manner. This web server allows users to Animals,injuries,andcellsorting (i) identify differentially expressed genes of the individ- Animals were handled according to the national and ual datasets, (ii) identify the enriched gene sets of the European Community guidelines and the ethics commit- individual datasets, and (iii) effectively compare the teeoftheInstitutPasteur(CTEA)inFrance.Forisolation chosen datasets. Sherpa is adaptable and serves as a of quiescent satellite cells, Tg: Pax7-nGFP mice (6– repository for the integration and analysis of future tran- 12 weeks) [2] were anesthetized prior to the injury. Tibi- scriptomic data. It has a generic design that makes it alis anterior (TA) muscles were injured with notexin applicable to the analysis of other transcriptome datasets (10μl–10μM;Latoxan).CellswerethenisolatedbyFACS generated inavariety ofconditionsandtissues. using FACS ARIA III (BD Biosciences), MoFlo Astrios We analyzed gene expression profiles (GEPs) of acti- andLegacy(BeckmanCoulter)sorters.Pax7HiandPax7Lo vated and quiescent states of mouse satellite cells derived cells correspond to the 10% of cells with the highest and fromthreenewexperimentalsetupsandsixpubliclyavail- the lowest expression of nGFP, respectively, as defined able microarray datasets to define a consensus molecular previously[3]. Pietrosemolietal.SkeletalMuscle (2017) 7:28 Page3of15 erimentalsetupsal,experimental, GSE81096Lukjanenkoetal.[14] 6QSC,5ASC 2016 Tibialisanterior,gastro-necmius,quadriceps M –Young:915w,–old:2024m C57BL/6(Janvier) Cardiotoxin CD34+/integrin-alpha7+/Lin- 72h(3d) IlluminaMouseRef-8v2.0 high-throughputexpherows.Thebiologic GSE70376García-Pratetal.[13] 4QSC,4ASC 2015 Tibialisanterior M Young:3m,old:–2024m C57BL/6 Cardiotoxin FACS:integrin-−alpha7+/Lin−−/CD31/CD45−/CD11b/Sca1- 72h(3d) Agilent028005SurePrintG3Mouse8x60kMicroarray ells.Threenewareshownint GSE38870Farinaetal.[12] 3QSC,3ASC 2012 Tibialisanterior F –36m C57BL/6xDBA2(??) Injury:BaCl2μ(50L1.2%) FACS:syndecan-3 12hor48hpostinjury Affymetrix430_2.0 andquiescentstatesofmousemusclestem(satellite)csatellitecells(ASCs)andquiescentsatellitecells(QSCs)columnsoftheTable GSE47177LiuetGSE3483GSE15155al.[9]Fukadaetal.[10]Pallafacchinaetal.[11] 3QSC,3QSC,3ASC3QSC,3ASC3ASC60h,3ASC84h 201320072010 HindlimbHindlimbDiaphragm,pectoralis,abdominalmuscles MFM,F –8w812w6w C57BL/6(Jackson)C57BL/7(nihonclea) CreER/+GFP/+Pax7Pax3(highGFP)eYFP/+R26 GFP/+GFP/QSCsinculturePax3,Pax3BaCl2+:mdx:mdxfor4dAdult;adultmdx;1wold;3dinculture −CreER/+FACS:Pax7;FACS:CD45ReportereYFP/+R26/SM/C-2.6+ 36h(1.5d),60h4dinculture3dinculture(2.5d),84h(3.5d)postinjury AffymetrixMouseAffymetrix430AAffymetrix430_2.0Gene1.0ST Table1Summaryofanalyzedtranscriptomicdatasetsofactivatedandsixpublicallyavailablemicroarraydatasetscomparingactivatedandtechnicaldetailsofeachexperimentareshowninthedifferent NICDRef/codeQuiescent[high/low]QuiescentFetalR26[E17.5/E14.5]D3Activatedactivated[high/low] Num.of3QSC_Pax7low,33QSC,33"QSC,"3samplesASC_Pax7low,3P8_ASC"ASCQSC_Pax7high,3ASC_Pax7high Date201320072015 AnatomyTibialisanteriorLimb,bodyForelimbswall,diaphragm SexMM,F ––Age68wP8,45wE14.5,E17.5 StrainC57BL/6B6.129 nGFP/+Cre+ReporterTg:Pax7-nGFPPax7Myf5:stop-NICDgfp/+(10%high,R2610%low) “”“”ActivationNotexinP8=activatedE14.5=activated QSCsReporterReporterReporterPurif. “”“”TimingQuiescentandE17.5(Q),E14.5(A)3dpostinjury PlatformAffymetrixMouseAffymetrixAffymetrixMouseGene1.0ST430_2.0Gene1.0ST,AffymetrixMouseGene2.0ST hdwmhours,days,weeks,months Pietrosemolietal.SkeletalMuscle (2017) 7:28 Page4of15 For isolation of activated satellite cells,TA muscles (day 3 Westernblotanalysis postinjury (D3) and non-injured) were collected and sub- TotalproteinextractsfromsatellitecellsisolatedbyFACS jectedto4–5roundsofdigestioninasolutionof0.08%col- were run on a 4–12% Bis-Tris Gel NuPAGE (Invitrogen) lagenaseD(Roche)and0.1%trypsin(Gibco#31966)diluted and transferred on Amersham Hybond-P transfer mem- inDMEM-1%P/S(Invitrogen)supplementedwithDNAseI brane (Ge Healthcare). The membrane was then blocked at10μg/ml(Roche,11284932001)[2,3].Pax7HiandPax7Lo with 5% non-fat dry milk in TBS; probed with anti-JunD cellscorrespondtothe10%ofcellswiththehighestandthe (329) (1:1000, sc-74 Santa Cruz Biotechnology Inc.), anti- lowestexpressionofnGFP,respectively,asdefinedpreviously JunB (N-17) (1:1000, sc-46 Santa Cruz Biotechnology [3]. Inc.), or anti-c-Jun (H-79) (1:1000, sc-1694 Santa Cruz Skeletal muscle progenitors were obtained also from BiotechnologyInc.)overnight;washedandincubatedwith the forelimbs of E14.5 and E17.5 fetuses of Myf5CreCAP/ HRP-conjugated donkey anti-rabbit IgG secondary anti- +:R26Rstop-NICD-nGFP/+[16]compoundmice.Tissueswere body(1:3000);anddetectedbychemiluminescence(Pierce dissociated in DMEM, 0.1% collagenase D (Roche, ECL2westernblottingsubstrate,ThermoScientific)using 1088866), 0.25% trypsin (GIBCO, 15090-046), DNaseI theTyphoonimagingsystem.Afterextensivewashing,the 10 μg/ml for three consecutive cycles of 15 min at 37 °C membrane was incubated with anti-Histone H3 antibody in a water bath under gentle agitation. For each round, a (ab1691, 1:10,000; abcam) as a loading control. All West- supernatant containing dissociated cells was filtered ern blots were run in triplicate, and bands were quanti- through 70-μm cell strainer, and trypsin was inhibited tated in one representative gel. Quantification was done with foetal calf serum (FCS). Pooled supernatants from usingImageJsoftware. each round of digestion were centrifuged at 1600 rpm for 15 min at 4 °C, and pellet was re-suspended in cold Isolationoffixedmousemusclestemcellsandreal-time DMEM/1% PS/2%FCS and filtered through 40-μm cell PCR strainer. For empirical analysis of genes by RT-qPCR (e.g., Jun In other experiments, skeletal muscles from the limbs, and Fos), skeletal muscles were fixed immediately in body wall, and diaphragm were collected from pups at 0.5% for 1 h in paraformaldehyde (PFA) using a protocol postnatal day 8 (P8, mitotically active satellite cells) and based on the notion that transcripts are stabilized by 4–5 weeks old mice (quiescent satellite cells) of PFA fixation [18, 19]. Briefly, PFA fixed and unfixed Pax7nGFP/+ knock-in line [17]. Cells were isolated by skeletal muscles were minced as described [4]; fixed FACS based on NICD-GFP or Pax7-nGFP intensity, samples were incubated with collagenase at double the usingBDFACSARIA IIIandMoFlo Astrios sorters. normal concentration, and mRNA wasisolated following FACS based on size, granulosity, and GFP levels using a FACS Aria II (BD Bioscience). Total RNA was extracted Microarraysamplepreparation from fixed cells with RecoverAll™ (Total Nucleic Acid Total mRNAs were isolated using Qiagen RNAeasy® Mi- Isolation Kit Ambion, Thermo Fisher), according to cro Kit according to the manufacturer’s recommenda- manufacturer instructions. cDNA was prepared by tions; 5 ng of total RNA was reverse transcribed and random-primed reverse transcription (Super-Script II, amplified following the manufacturer’s protocols Invitrogen, 18,064–014), and real-time PCR was done (Ovation Pico WTA System v2 (Nugen Technologies, using SYBR Green Universal Mix (Roche, 13608700) Inc. 3302-12); Applause WTA Amp-Plus System (Nugen StepOne-Plus, Perkin-Elmer (Applied Biosystems). Spe- Technologies, Inc. 5510-24)), fragmented and biotin la- cific primers for each gene were designed, using the Pri- beled using the Encore Biotin Module (Nugen Tech- mer3Plus online software, to work under the same nologies, Inc. 4200-12). Gene expression wasdetermined cycling conditions. For each reaction, standard curves by hybridization of the labeled template to GeneChip for reference genes were constructed based on six four- microarrays Mouse Gene 1.0 ST (Affymetrix). fold serial dilutions of cDNA. All samples were run in Hybridization cocktail and posthybridization processing triplicate. The relative amounts of gene expression were were performed according to the “Target Preparation for calculatedwithRLP13expressionasaninternalstandard Affymetrix GeneChip Eukaryotic Array Analysis” proto- (calibrator). RT-qPCR primers used appear in Add- col found in the appendix of the Nugen protocol of the itional file 6:TableS2. fragmentation kit. Arrays were hybridized for 18 h and washed using fluidics protocol FS450 0007 on a Gene- Normalization,qualitycontrol,andfilteringofGEPs Chip Fluidic Station 450 (Affymetrix) and scanned with Gene expression profiles (GEPs) were processed using an Affymetrix GeneChip Scanner 3000, generating CEL standard quality control tools to obtain normalized, files for each array. Three biological replicates were run probeset-level expression data. For all raw datasets de- foreachcondition. rived from affymetrixchips, Robust Multi-Array Average Pietrosemolietal.SkeletalMuscle (2017) 7:28 Page5of15 expression measure (rma) was used as normalization Genesetenrichmentanalysisonindividualsets method using the affy and the oligo R packages [20, 21]. Each dataset was tested for gene set enrichment All analyses were preferentially conducted at the probe- independently, using the CAMERA competitive test im- set level. Probesets were annotated to gene symbol and plemented in the Limma R package [27] and three gene gene ENTREZ using chip-specific annotations. For gene set collections from the mouse version of the Molecular level results, the probeset with the highest expression Signatures Database MSigDB v6 [28, 29]: (1) Hallmark variability was selected to represent the corresponding gene sets (H), which summarize and represent specific gene.Qualitycontrolswereperformedonrawdatausing well-defined biological states or processes displaying a relative log expression (RLE) and Normalized Unscaled coordinate gene expression; (2) Kyoto Encyclopedia of Standard Errors (NUSE) plots from the affyPLM R pack- Genes and Genomes (KEGG) canonical pathways (C2 age [22]. Sample distribution was examined using hier- CP:KEGG),derivedfromtheKyotoEncyclopediaofGenes archical clustering of the Euclidean distance and and Genomes [30]; and (3) Reactome canonical pathways principal component analysis from the stats [23] and (C2 CP:Reactome) from the curated and peer reviewed FactoMineRRpackages[24](seeAdditionalfile1:Figure pathwaydatabase[31](genesetanalysisinFigs.1and2). S1 for the resulting plots for dataset Quiescent [high/ low]/D3Activated [high/low]). The resulting plots of the remaining datasets are not shown, but they presented Multiplesetanalysis:determinationofthequiescent similar trends, which can be explored through the inter- signature active webserverSherpa. The combinatorial landscape of datasets was explored using the SuperExactTest [32] and the UpSetR [33] R Differentialexpressionanalysis packages to test and visualize the intersection of the Each dataset was individually analyzed to identify genes datasets.Additionally,theJaccardindex[34]ofsimilarity showing significant differential expression (DEGs) was calculated to assess the extent of similarity between between the ASC and the QSC (gene level analysis in statistically differentially expressed genes (DEGs) of each Fig. 1; differential analysis in Fig. 2). This analysis was pair of datasets. A significance ranking, based on several performed using the linear model method implemented criteria, was calculated for each individual dataset to in the Limma R package [25]. The basic statistic was the determine its presence or absence in the final dataset moderated t-statistic with a Benjamini and Hochberg’s ensemble, which was used for determining the gene multiple testing correction to control the false discovery signature. Once the dataset ensemble was defined, the rate (FDR)[26]. overlapping differentially up- and down-regulated genes Fig.1Generalframeworkoftheanalysis:anindividualdatasetanalysisfollowedbyamulti-setanalysis.Theindividualdatasetanalysisconsistedof(i) theanalysisofgeneexpressionprofiles(GEPs)ofeachdataset,includingnormalization,filteringandqualitycontrolcheckofeachrawdataset,andthe differentialanalysistoidentifydataset-specificdifferentiallyexpressedgenes(DEGs),(ii)theGeneSetEnrichmentAnalysis(GSEA)performedinthegene setspace.TheGSEAconsistedinidentifyingenrichedpathwaysfromthreegenesetsoftheMSigDBcollection[29](Hallmarkgenesets,CP:KEGGgene setsandCP:Reactomegenesets);(iii)themulti-setanalysistoassembleastudy-independentgenesignature,i.e.,alistofgenesspecifictothe quiescencestate Pietrosemolietal.SkeletalMuscle (2017) 7:28 Page6of15 purpose, commonly up-regulated or down-regulated genes were used in a one-sided Fisher’s exact test imple- mented in R script with a Benjamini and Hochberg’s multiple testing correction of the P value to determine the enriched gene sets and the direction of such enrichment. Webapplication:Sherpa We developed an interactive web application for the exploration, analysis, and visualization of the individual datasets and their combination (http://sherpa.pasteur.fr). This application allows the user to effectively and efficiently analyze the individual datasets one by one (in- dividual dataset analysis) or as an ensemble of datasets (multi-set analysis) and was developed with the Shiny R package [36]. Results Thisstudyinvolvesanindividualdatasetanalysisfollowed byamulti-setanalysis(Fig.1).First, eachraw datasetwas normalized, filtered, and subjected to the same quality controls and checks. Gene-level differential analysis and genesetenrichmentanalysiswerethenperformed(Fig.2). Finally, a multi-set analysis assembled a platform- independent list of genes specific to the quiescent state. Fig.2Workflowofthestandardizedindividualdatasetanalysis.The When analyzing multiple microarray GEPs, however, sev- analysisoftheninedatasetswasperformedinaconsistentmanner eral issues needed to be addressed regarding the experi- foreachdatasetusingadhocRscripts.Itincludedthefirststepof datapreparationfollowedbyasecondstepofdataanalysis.GEPs mentalsetup,themicroarrayplatformsandthelaboratory wereprocessedusingstandardqualitycontroltoolstoobtain conditions [37]. First, the individual studies, even if re- normalized,probeset-levelexpressiondata.Forrawdatasetsderivedfrom lated, had different aims, experimental designs, and cell affymetrixchips,RobustMulti-ArrayAverageexpressionmeasure(rma) populations of interests (e.g., developmental stage and wasusedasnormalizationmethod.Allanalyseswereconductedat gender of mice). Second, the different microarray plat- probesetlevel.Probesetswereannotatedtogenesymbolandgene ENTREZusingchip-specificannotations.Qualitycontrolswereperformed forms contained different probes and probesets with spe- onrawdatausingRLEandNUSEplots.ThedistributionoftheQSCand cific locations and alternative splicing that might produce ASCsamplesaccordingtotheirGEPswasexploredusinghierarchical different expression results [38]. Finally, sample prepar- clusteringoftheEuclideandistanceandprincipalcomponentanalysis ation,protocols,anddatesofextractionsmighthaveinflu- (Additionalfile1:FigureS1).Statistically,differentiallyexpressedgenes enced array hybridization and introduced bias [39]. This (DEGs)wereidentifiedbetweentheASCandtheQSCgroupsusingthe linearmodelimplementedbytheLimmaRpackage[10].Geneset experimental heterogeneity required critical data process- enrichmentanalysiswasbasedonthreegenesetcollectionsfromthe ingtoensurestatisticallymeaningfulassumptionstodrive mouse version of the Molecular Signatures Database MSigDB biological interpretation and compile gene signatures. For v6.0[12,13]:(1)Hallmark,whichsummarizesandrepresentsspecific this,weuseda standardizedworkflowtoreducethetech- well-definedbiologicalstatesorprocessesdisplayingacoordinate nical variations between datasets. Specifically, this work- geneexpression;(2)KEGGcanonicalpathways,derivedfromtheKyoto EncyclopediaofGenesandGenomes[14];and(3)Reactomecanonical flow applied (i) the same normalization method for the pathwaysfromthecuratedandpeer-reviewedpathwaydatabase[15]. experiments having the same microarray chips, (ii) the Totestfortheenrichmentofthesegenesets,thecompetitivegene same quality control criteria to discard poor-quality sam- settestCAMERA[16]wasused ples, (iii) the same aggregation method for summarizing probesets into single genes, and (iv) the same filtering in (DEGs, as defined by the adjusted P value ≤0.05) were all datasets.Thefiltering ofthe datasets wasbased onthe usedtobuild thequiescent signature. same significance criteria which included a minimum number of differentially expressed genes, the presence of Genesetenrichmentanalysisonthequiescentsignature genesknowntobedifferentiallyexpressedbetweenquies- An over-representation analysis (ORA) [35] was applied cent and activated states from previous studies, and a to the quiescent signature using the previously described similarity measure among the datasets. Table 1 summa- gene collection (Hallmark, Kegg, Reactome). For this rizes the main biological and experimental variations in Pietrosemolietal.SkeletalMuscle (2017) 7:28 Page7of15 this study, as well as the technical differences present in was considered to be an outlier and was not included in thedatasets. thefinal analysis. Three new sets of microarrays of quiescent versus The number of significantly up- and down-regulated activated satellite cell are reported here(seeTable 1). The genes (DEGs) resulting from the differential expression first one is part of a developmental and postnatal series analysis of the quiescent with respect to the activated that was reported previously [16] (E12.5 vs. E17.5), and states were calculated (Additional file 5: Table S1). DEGs here, P8 (postnatal day 8, in vivo proliferating) and 4– were identified as having |logFC|≥1 and a false discov- 5 week old (quiescent) mice were compared. The second ery rate FDR≤0.05. The statistical analysis was per- one is based on previously reported differences in quies- formed at the probeset level, and only those probesets centandproliferatingcellstatesinsubpopulationsofsatel- matching to genes are reported. On average, the datasets lite cells (quiescent: dormant, top 10% GFP+ cells vs. exhibited 1548 up-regulated genes with a standard primed, bottom 10% GFP+ cells isolated from Tg:Pax7- deviation of 1173 genes. The number of down-regulated nGFPmice;proliferating:3dayspostinjury[3]).Thethird genes corresponded to 2122, with a standard deviation dataset is based on previous observations that the Notch of 1658 genes. The lowest number of DEGs belonged to intracellular domain (NICD) when expressed constitu- the Fetal_NICD[E17.5/E14.5] dataset (39 up, 136 down), tively (Myf5Cre: R26stop-NICD) in prenatal muscle progeni- while the highest number of DEGs belonged to the tors leads to cell-autonomous expansion of the myogenic GSE70376 dataset (4367 up, 6346 down). Additionally, progenitor population (Pax7+/Myod−) and the absence of an analysis of the distribution of the logFC across the differentiation, followed by premature quiescence at late datasets revealed that there were significant differences fetalstages(E17.5)[16].Here,E17.5(quiescent)andE14.5 among the ranges and shapes of such distributions for (proliferating) prenatal progenitors were compared. Ex- eachdataset(Additional file2:FigureS2). cept for our datasets Quiescent(adult)/Activated(P8) and Fetal_NICD[E17.5/E14.5], all the studies were conducted Genesetlevelanalysisrevealscommonunderlying on adult mice (male and female) with ages ranging from biologicalprocessesacrossthedatasets 8weeksto6months. Despite the great difference among the number of DEGs While all datasets shared similar cell states (quiescent for the different sets, clear trends among the significantly (QSC) and activated (ASC) satellite cells), the experimen- enriched pathways were found (Fig. 3a). This heatmap tal procedures varied between studies. Activation of cells, showseachdatasetasacolumnandeachenrichedgeneset forinstance,wasachievedindifferentways:(i)invitro,by asarow.Thegenesetcollectionthatwastestedforenrich- culturingfreshlyisolatedsatellitecellsforseveraldaysand mentcorrespondstotheHallmarkgenesetcollectionfrom (ii) in vivo, by extracting ASCs from an injured muscle. MSigDB [40]. Enriched gene sets corresponding to over- Furthermore, for in vivo activation, several techniques expressedgenesareshowninred,whileenrichedgenesets were used to induce the injuries—BaCl , or the snake that were generally abundant in under-expressed genes are 2 venoms cardiotoxin or notexin. Cell extraction protocols shown in blue. Out of the 11 datasets, GSE38870 stood as also varied among the different studies: (i) using trans- an outlier for both over- and under-represented gene sets genic mice expressing a reporter gene that marks satellite compared to the rest. For the other ten datasets, most of cells (several alleles) or (ii) using a combination of anti- them showed an enrichment in the quiescent state for the bodies targeting surface cell antigens specific to satellite TNFA_SIGNALING_VIA_NFKB pathway (nine datasets), cells (several combinations, seeTable 1). Finally, the nine whileeightdatasetswereenrichedinUV_RESPONSE_DN, datasets that were examined in this study date from 2007 IL6_JAK_STAT3_SIGNALING, APICAL_SURFACE, and to 2016. During this period, microarray technologies KRAS-SIGNALING_DN pathways. Similarly, the ten data- evolved, and the different chips available may introduce setsshared similar trends for under-expressed genes inthe yet another source of variation among the compared pathways MYC_TARGETS_V1, E2F_TARGETS, G2M_ datasets. To carry out a statistically meaningful ana- CHECKPOINT, and OXYDATIVE_PHOSPORYLATION, lysis of these extensively heterogeneous datasets, crit- allofwhichareexpectedtobeabsentinthequiescentstate. ical data processing was required to interpret gene In total, two subnetworks corresponding to 8 under- and signatures as described in the workflow (Fig. 1). 15 over-expressed enriched gene sets could be distin- guished (Fig. 3b). A network representation of the top 3 most commonly found enriched gene sets (nodes, thick- Thenumberofdifferentiallyexpressedgenesvaries outlined circles) is shown in Fig. 3b for the over-expressed significantlyamongdifferentdatasets (TNFA_SIGNALING_VIA_NFKB, UV_RESPONSE_DN, I A total of 32 samples from ASCs and 34 samples from L6_JAK_STAT3_SIGNALING) and under-expressed (MY QSCs from the nine datasets were analyzed. After the C_TARGETS_V1, E2F_TARGETS, G2M_CHECKPOINT) quality control, one sample from the GSE38870 dataset categories. The size of each node corresponds to the total Pietrosemolietal.SkeletalMuscle (2017) 7:28 Page8of15 a b Fig.3Enrichedgenesetsacrossindividualdatasets.Enriched(over-represented)genesetswithover-expressedgenesareshowninred;enrichedgene setswithunder-expressedgenesareshowninblue.aGenesetenrichmentprofilesusingtheHallmarkgenesetcollectionfromMSigDB[40],eachrow correspondstoageneset,andeachcolumncorrespondstoadataset.bNetworkrepresentationofthreemostcommonover-andunder-expressedgene sets(denotedbythethickborderonthenode)alongwiththegenesetssharinggeneswiththem(connectorlines).Nodesrepresentgenesetswitha nodecirclesizeproportionaltothenumberoftimesthegenesetsappearasenrichedinthedifferentdatasets(seepanela).Thicknessoftheconnecting linesisproportionaltothenumberofsharedgenesbetweennodes number of times that the gene set was enriched in all the differentially expressed within each dataset, in order to datasets, and the thickness of the interconnecting lines is identify genes commonly up- or down-regulated in the proportional to the number of genes shared between quiescent state. Although the aforementioned technical connected nodes. Gene sets sharing less than 10% of their and experimental heterogeneity could introduce noise in genesarenotshown.Wenotedalsothatdifferentgenesets this analysis, such variation was distinguishable from the hadavarying number ofgenesincommon(Fig. 3b); ifthe more stable, underlying common quiescent signature. gene overlap were large, those gene sets (and their Given that the distribution and ranges of the logFCs corresponding biological functions) will likely be also varied so drastically between datasets (Additional file 2: affected (i.e., activated or repressed). For the three most Figure S2), a single FC (fold change) threshold could not common enriched gene sets with under-expressed be chosen to be used for all datasets. Thus, for the genes, for example, we noted that gene set MYC_TAR- combinatorial analysis approach, we set out to maximize GETS_V1 shares most of its genes with gene sets the number of differentially expressed genes common to E2F_TARGETS and G2M_CHECKPOINT. This sug- all the datasets that were considered, where only the ad- gests that the three categories represented by these justed P value was used as a threshold to define DEGs. gene sets had an interplay of genes that displays them However, even in this low constrained scenario, combin- all as under-expressed. ing all the datasets together resulted in very few overlap- ping genes: 12 up (Arntl, Atf3, Atp1a2, Cdh13, Dnajb1, Determiningaquiescenttranscriptionalsignatureamong Enpp2, Ier2, Jun, Nfkbiz, Rgs4, Usp2, Zfp36) and 1 down alldatasets (Igfbp2). Alternatively, when certain datasets were To determine a consensus quiescent signature from the excluded from the analysis, the number of DEGs in- datasets, we compared the genes found to be creased (Fig.4a). Pietrosemolietal.SkeletalMuscle (2017) 7:28 Page9of15 a b c Fig.4Differentcombinatoriallandscapesresultindifferentdegreesofstringencyforthelistofgenesdefiningthequiescentstateofsatellitecells.a Barplotindicatingthenumberofoverlappingdifferentiallyexpressedgenes(DEGs)foreachbestcombinationofintersections,fromdegree2to11. Thedotsunderneaththebarplotindicatethedatasetsincludedintheintersections.Thetotalnumberofup(UP)anddown(DOWN)DEGsforeach datasetareindicatedinlightgrayanddarkgray,respectively.PanelsbandcarethecoloredmatricesshowingtheJaccardindexbetweeneachpairof datasets,forUPDEGsandDOWNDEGs,respectively.DendrogramsshowthehierarchicalclusteringusingtheJaccardindexasEuclideandistance Combinatorialassessmentofdatasetsaccordingto first two closest datasets belonged to studies originating significanceandsimilaritycriteria from the same laboratory underscores the impact of To find the best combination of datasets defining a technical biases. The hierarchical clustering of the consistent and sufficiently large quiescent signature, we Euclidean distance of the Jaccard indexes shows that for ranked them according to their significance. This signifi- up- and down-regulated genes, the datasets Fetal_- cance was determined according to an ensemble of cri- NICD[E17.5/E14.5], GSE38870, and GSE81096 had a teria. First, the dataset should have a minimum number tendency to not group with the rest of the datasets. In of DEGs. Our Fetal_NICD[E17.5/E14.5] dataset, for in- addition to these criteria, others can be used to assess stance, had only 250 DEGs (Additional file 5: Table S1), the significance of the datasets. Choosing the datasets and using it in the analysis resulted in a dramatically re- according to the activation or extraction method of the duced number of overlapping DEGs (Additional file 3: cells, for example, would result in a more stringent Figure S3). A second criterion was the presence of genes ensembleofdatasets. known to be differentially expressed between quiescent Takingintoaccountthedatasetsignificance(basedonthe and activated states from previous studies. In this case, number ofDEGs and presenceofsomereported quiescent datasets GSE38870 and GSE81096 did not meet this cri- markers) and the low extent of overlap between Fetal_- terion, since they lacked genes known to be associated NICD[E17.5/E14.5], GSE38870, and GSE81096 datasets with or regulating the quiescent state such as Calcr, withrespect to theremaining datasets, these threedatasets Notch1, Chrdl2, Lama3, Pax7, and Bmp6 genes (unpub- were excluded from the multi-dataset analyses. The final lished data, see Fig. 7; [41–43]). As a third criterion, we ensemble comprised the eight remaining datasets which used the dataset similarity, which was assessed using the had207and542genescommonlyup-anddown-regulated, Jaccardindex (JI), and amatrix ofthe JIs for the up- and respectively (Fig. 4a). To further characterize these down-regulated genes was generated (Figs. 4b, c, re- commonly regulated genes, we performed an over- spectively). In both matrices, the closest pairs of datasets representation analysis (ORA) of the gene sets. An enrich- were GSE47177 at 60 h and GSE47177 at 84 h (JI=0.46 ment was detected for the 207 commonly up-regulated and 0.44 for the up- and down-regulated genes, respect- genes insevendifferent Hallmark genesets(Fig. 5a).Some ively), followed by the second pair of closest sets Quies- genesweresharedamongdifferentpathways(e.g., Atf3and cent [high]/D3Activated [high] and Quiescent [low]/ Il6werefoundinsixdifferentgenesets),whileotherswere D3Activated [low] (JI=0.39 and 0.33, for up- and down- found in one gene set only (e.g.,Tgfbr3, Spsb1). These re- regulated genes, respectively). The observation that the sultsareconsistentwiththeindividualgenesetenrichment Pietrosemolietal.SkeletalMuscle (2017) 7:28 Page10of15 a b Fig.5Geneexpressionofdifferentiallyexpressedgenes(DEGs)insatellitecells.aBinaryheatmapoftheover-representationanalysis.Eachcol- umnrepresentsoneenriched(over-represented)geneset,andeachrowcorrespondstoagene.Redcellsindicatethepresenceofthecorre- spondinggeneinagivengeneset.bNetworkrepresentationof39GOSlimtermsusedtocharacterizethecommonlyregulatedgenes insatellitecells.Nodesrepresentgenesetswithanodesizeproportionaltothegenesetsize.Edgesindicatethatgenesaresharedamongthe genesets.Thethicknessoftheedgeisproportionaltothenumberofsharedgenes.AlsoshownaretheheatmapsoflogFCforgenesbelonging toextracellularmatrix,nucleicacidbindingandcellcycleandproliferation,nucleicacidbinding,andsignaltransductionactivity,respectively.Each rowcorrespondstoageneandeachcolumncorrespondstoadataset.DendrogramsshowhierarchicalclusteringusingtheEuclideandistance analysis(seeFig.3)emphasizingthatthesegenesreflectthe To assign a global function to the commonly regulated global traits associated with the quiescent state. Note that genes, we annotated them using GOSlim terms, which onlyafractionofthe207geneswasfoundinknownexist- summarize broad terms based on Gene Ontology (GO) ing gene sets (57/207), leaving about three-quarters of the terms [44]. To identify categories of genes, we generated commonly up-regulated genes not associated with any heatmaps of the logFC in the different datasets for a sub- existing gene set. This finding was expected given that a set of the 207 UP genes belonging to the extracellular quiescent signature is yet to be defined, and thus current matrix, nucleic acid binding activity (+/− cell cycle prolif- genesetslacksuchannotations.Tofacilitatetheanalysisof eration), and signal transduction activity (Fig. 5b). transcriptomesasdescribedhere,wehavedevelopedanon- Unexpectedly, some genes associated with cell cycle pro- line interactive tool called Sherpa (Fig. 6). Sherpa allows liferation, such as c-Fos and c-Jun, were up-regulated in users to perform analyses on individual and on multiple thequiescentcellanalysesinalldatasets(Fig.7a).Tover- datasets.Eachindividualdatasetanalysisinvolvestheidenti- ify the transcriptional relevance of these genes in ficationofdifferentiallyexpressedgenes;comparisonofthe quiescent cells, weused a protocol toisolatesatellite cells expression of selected genes in the quiescent and activated in which a short fixation (PFA) treatment was performed states through tables, heatmaps, andvolcano plots;and ex- priortoharvestingthecellstoarrestdenovotranscription ploration of the distribution of the samples according to during the isolation protocol (see the “Methods” section). their variability through principal component analysis and Then, the expression level quantification was assessed at cluster analysis. The multiple dataset analysis allows the thetranscript(RT-qPCR)andprotein(Westernblot)level comparisonofselecteddatasetsaccordingtothecommonly at different time points after isolation for a number of differentiallyexpressedgenes.Alloftheseanalysesareinter- genes (Fig. 7b, c). Notably, quantifications of c-Jun, Jun B, active,astheyallowtheusertoselectthethresholdsoffold and Jun D levels showed that at time 0 (+PFA), these change(logFC)andfalsediscoveryrate(adj.Pvalue). genes were not detected in quiescent cells, neither at the

Description:
datasets, and thus a more global view of these biological systems. In this work, we generated transcriptome datasets of satellite cells in different .. WTA System v2 (Nugen Technologies,. Inc. 3302-12); Applause WTA Amp-Plus System (Nugen. Technologies, Inc. 5510-24)), fragmented and biotin la-.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.