ebook img

Task learning reveals signatures of sample-based internal models in rodent prefrontal cortex PDF

16 Pages·2016·1.38 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Task learning reveals signatures of sample-based internal models in rodent prefrontal cortex

bioRxiv preprint first posted online Sep. 21, 2015; doi: http://dx.doi.org/10.1101/027102. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. Task learning reveals signatures of sample-based internal models in rodent prefrontal cortex AbhinavSingha,AdrienPeyracheb,andMarkDHumphriesa,1 aFacultyofBiology,MedicineandHealth,UniversityofManchester,Manchester,M139PT,UK;bMcGillUniversity,Montreal,Canada ThismanuscriptwascompiledonJuly14,2016 Theinherentuncertaintyoftheworldsuggeststhatbrainsuseprob- prefrontal cortex (mPFC). Medial PFC is necessary for learn- abilisticinternalmodels.Wesoughttotestwhetherorhowneurons ing new rules or strategies [13, 14], and changes in mPFC represent such models in higher cortical regions, learn them, and neuron firing times correlates with successful rule learning usetheminbehaviour. Usingasamplingframework,wepredicted [15], suggesting that mPFC coding of task-related variables that trial-evoked and sleeping population activity represent the in- changes over learning. We thus hypothesised that mPFC en- ferredandexpectedprobabilitiesgeneratedfromaninternalmodel codes an internal model of a task, which is updated by task ofabehaviouraltask, andwouldbecomemoresimilarasthetask performance, and from which the statistical distributions of waslearnt. Totestthesepredictions,weanalysedpopulationactiv- population activity are generated. To test these hypotheses, ityfromrodentprefrontalcortexbefore,during,andaftersessionsof we analysed previously-recorded population activity from the learningrulesonamaze.Distributionsofactivitypatternsconverged medial prefrontal cortex of rats learning rules in a Y-maze betweentrialsandpost-learningsleepduringsuccessfulrulelearn- [16]. ing. Learning induced changes were greatest for patterns predict- ingcorrectchoiceandexpressedatthemaze’schoicepoint,consis- Results tentwithanupdatedinternalmodelofthetask.Ourresultssuggest sample-basedinternalmodelsareageneralcomputationalprinciple Rats with implanted tetrodes learnt one of three rules on ofcortex. a Y-maze: go left, go right, or go to the randomly-lit arm (Fig. 1A). Each recording session was a single day containing neuralensembles|statisticalinference|Bayestheorem 3 epochs totalling typically 1.5 hours: pre-task sleep/rest, behavioural testing on the task, and post-task sleep/rest. We How do we know what state the world is in? Behavioural focussedontensessionswheretheanimalreachedthelearning evidence suggests brains solve this problem using prob- criteria for a rule mid-session (Materials and Methods; 15-55 abilistic reasoning [1, 2]. Such reasoning implies that brains neuronspersession). Inthisway,wesoughttoisolatechanges representandlearninternalmodelsforthestatisticalstructure in population activity solely due to rule-learning. of the external world [1, 3, 4]. With these models, neurons could represent uncertainty about the world with probability Theorysketch.Here we outline our theoretical predictions for distributions, and update those distributions with new knowl- changes in population activity, derived from the inference-by- edge using the rules of probabilistic inference. Theoretical samplinghypothesis;afullaccountisgivenintheSIText. We work has elucidated potential mechanisms for how cortical soughttotesttheideathatthemPFCcontainsatleastonein- populations represent and compute with probabilities [2, 5–8], ternalmodelrelatedtotaskperformance,suchasrepresenting and shown how computational models of inference predict the relevant decision-variable (here, left or right) or the rule- aspects of cortical activity in sensory and decision-making dependent outcomes. Learning of the task should therefore tasks [e.g. 2, 9]. But experimental evidence for the neural representation of probabilistic internal models is lacking. An experimentally-accessible proposal is the recent SignificanceStatement inference-by-sampling hypothesis [7, 10–12]. This proposes Thecerebralcortexcontainsbillionsofneurons. Theactivity that cortical population activity at some time t is a sample ofoneneuronislostinthismorass, soitisthoughtthatthe from an underlying probability distribution. Cortical activ- co-ordinatedactivityofgroupsofneurons–“neuralensembles” ity evoked by external stimuli represents sampling from the –arethebasicelementofcorticalcomputation,underpinning model-generated“posterior"distributionthattheworldisina sensation,cognition,andaction.Butwhatdotheseensembles particular state. Spontaneous cortical activity represents sam- represent? Here we show that ensemble activity in rodent pling of the model in the absence of external stimuli, forming prefrontalcortexrepresentssamplesfromaninternalmodelof a model-generated “prior" for the expected properties of the theworld-aprobabilitydistributionthattheworldisinaspecific world. A key prediction is that the evoked and spontaneous state.Wefindthatthisinternalmodelisupdatedduringlearning population activity should converge over repeated experience, aboutchangestotheworld,andissampledduringsleep.These as the internal model adapts to match the relevant statistics resultssuggestthatprobability-basedcomputationisageneric of the external world. Just such a convergence has been ob- principleofcortex. served in small populations from ferret V1 over development [11]. Unknown is whether probabilistic internal models are A.S:designedtheanalyses;analysedthedata;discussedtheresults;commentedondrafts.A.P.: a general computational principle for cortex: whether they discussedtheresults;commentedondrafts. M.D.H:designedtheanalyses;analysedthedata; discussedtheresults;wrotethepaper. can be observed during learning, or in higher-order cortices, Theauthorsdeclarenoconflictofinterest or during ongoing behaviour. A natural candidate to address these issues is the medial 1Towhomcorrespondenceshouldbeaddressed.E-mail:mark.humphriesmanchester.ac.uk www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX PNAS | July14,2016 | vol.XXX | no.XX | 1–8 bioRxiv preprint first posted online Sep. 21, 2015; doi: http://dx.doi.org/10.1101/027102. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. A B learning Left Right Model Model + input Model* Generative model Choice point Prior Posterior Prior* Activity encodes Sleep Pre Task Sleep Post Epoch Start SleepPre Task SleepPost D(Pre|Correct) > D(Post|Correct) Prediction C Neuron D E 30 1234561 0 1 0 0 0 orrect)10-1 orrect) 10-1 c c 0 0 0 0 0 0 sk: 10-2 sk: 10-2 0 0 0 1 0 0 Ta Ta Time (ms) 20 1000 0000 1000 0000 0101 0000 n probability (1100--43 n probability ( 1100--43 er er 0 0 0 0 0 0 att att p p 0 0 0 0 0 0 10-4 10-3 10-2 10-1 10-4 10-3 10-2 10-1 0 0 0 0 1 0 10 pattern probability (Pre) pattern probability (Post) Fig.1. Activitypatterndistributionsduringrule-learning.(A)Y-mazetaskset-up(top);eachsessionincludedtheepochsofpre-tasksleep/rest,tasktrials,andpost-task sleep/rest(bottom)-Fig.4Agivesabreakdownpersession.Oneofthreetargetrulesforobtainingrewardwasenforcedthroughoutasession:goright;goleft;gotothe randomly-litarm.(B)Schematicoftheory.Ifprefrontalcortexencodesaninternalmodelofthetask,thenactivityduringthetaskisderivedfromtheinternalmodelplusthe relevantexternalinputs:thedistributionofactivityisthustheposteriordistributionovertheencodedtaskvariables.Duringsleep,thedistributionofactivityisderivedentirely fromtheinternalmodel,andthusisthepriordistributionovertheencodedtaskvariables.Updatestotheinternalmodelbytasklearning(creatingModel∗)willthenchangethe priordistributionencodedduringsleep(toPrior∗).Thetheoreticalpredictionisthenthattheactivitydistributioninpost-tasksleep,derivedfromthemodelofthecorrectrule,will beclosertothedistributiononthecorrecttrials,comparedtothepre-tasksleep.(C)Thepopulationactivityofsimultaneouslyrecordedspiketrainswasrepresentedasabinary activitypatterninsomesmalltime-bin(here2ms).(D)Scatterplotofthejointfrequencyofeveryoccurringpatterninpre-taskSWS(distributionP(Pre))andtask(distribution P(R))epochsforonesession.(E)ForthesamesessionasD,scatterplotofthejointfrequencyofeveryoccurringpatterninpost-taskSWS[P(Post)]andtask[P(R)] epochs. 2 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Singh etal. bioRxiv preprint first posted online Sep. 21, 2015; doi: http://dx.doi.org/10.1101/027102. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. update the internal model based on feedback from each trial’s TheconvergencebetweenthetaskP(R)andpost-taskSWS outcome. WetheorisedthatmPFCpopulationactivityoneach P(Post) distributions was also robust to both the choice of trial was sampling from the posterior distribution generated activity pattern binsize across an order of magnitude from 2 fromthismodel;andthat“spontaneous"activityinslow-wave to 20 ms (Fig. S2) and the choice of correct trials in the task sleep (SWS), occurring in the absence of task-related stimuli distribution P(R) (Fig. S3). and behaviour, samples the corresponding prior distribution Together, these results are consistent with the convergence (Fig. 1B). Consequently, updating the internal model from over learning of the posterior and prior distributions repre- task feedback should be reflected in changes to the posterior sented by mPFC population activity. They imply that mPFC and prior distributions generated from that model. encodes a task-related internal model that is updated by task Byrestrictingouranalysestosessionswithsuccessfullearn- feedback. ing, we expected the post-task SWS activity to be sampling from an internal model that has learnt the correct rule. To Wrong models lead to no convergence.Is this convergence compareposteriordistributionsamplesfromthesameinternal of trial-evoked and post-task SWS distributions inevitable? model, we considered population activity during correct trials To answer this, we made use of the 7 sessions in which the after the learning criteria were met – we call this distribution rats experienced a rule change. As rule changes occurred P(R). Our main prediction was thus that the distribution only after 10 consecutive correct trials [16], these sessions P(R) of activity during the correct trials would be more sim- are uniquely divided into periods when the internal model ilar to the distribution in post-task SWS [P(Post)] than in of the task was right and when it was wrong. Once wrong, pre-task SWS [P(Pre)]. Such a convergence of distributions the rat needed to find the correct new model. Consequently, wouldbeevidencethatatask-relatedinternalmodelinmPFC our theory predicts that the prior distribution in post-task was updated by feedback. SWSsleepisnotderivedfromthesameinternalmodelasthat used before the rule change. In other words, the posterior Activity distributions converge between task and post-task fromthepre-changetrialsandthepriorfromthespontaneous sleep.To test these hypotheses, we compared the statistical activity of post-task sleep are derived from different models, distributionsofactivitypatternsbetweentaskandsleepepochs. and should not converge. Activity patterns, the putative samples from the underlying Wetestedthispredictionbycomparingtheactivitypattern probability distribution, were characterised as a binary vector distributions in pre-change correct trials [P(R)] and in post- (or “word") of active and inactive neurons with a binsize of 2 task SWS [P(Post)]. There was no convergence between ms (Fig. 1C). Each recorded population of N neurons had the two distributions, when measured using either Kullback- the same sub-set of all 2N possible activity patterns in all Liebler divergence (3.6±11.7%; Fig. 2D-E) or the Hellinger epochs (Fig. S1) [17, 18]. Such a common set of patterns is distance (3.8±9.3%; Fig. 2I-J). For the effect sizes observed consistent with their being samples generated from the same for the learning sessions, there was sufficient power to recover form of internal model across both behaviour and sleep. the same effect size at α=0.05 with N =7 sessions (KLD: Ifactivitypatternsaresamplesfromaprobabilitydistribu- learningsessioneffectsized=0.96,rule-changesessionpower tion,thentwosimilarprobabilitydistributionswillberevealed = 0.7; Hellinger: d = 2.36, power ≈1), which argues against by the similar frequencies of sampling each pattern [11]. For low power causing the lack of convergence. each pair of epochs, we thus computed the distances between the two corresponding distributions of activity patterns (Fig. Convergence is a consequence of changes to correlations, 1D,E). We first used the information-theory based Kullback- notjustfiringrates.Populationfiringratedifferencesbetween Liebler divergence to measure the distance D(P|Q) between waking and sleep states [20, 21], and increases in SWS firing distributions P and Q in bits [11]. We found that in 9 of the after task-learning, could potentially account for the conver- 10 sessions the distribution P(R) of activity during the trials gence of distributions during learning. To control for this, wasclosertothedistributioninpost-taskSWS[P(Post)]than we used the “raster" model [20] to generate surrogate sets of in pre-task SWS [P(Pre)] (Fig. 2A). spike-trains that matched both the mean firing rates of each On average the task-evoked distribution of patterns was neuron, and the distribution of total population activity in 18.7±6.2% closer to the post-task SWS distribution than each time-bin (K =0,1,...,N spikes per bin). Consequently, the pre-task SWS distribution (Fig. 2B), showing a conver- theoccurrenceratesofparticularactivitypatternsintheraster gence between task-evoked and post-task SWS distributions. modelarethosepredictedtoarisefromneuronandpopulation Further, we found a robust convergence even at the level of firing rates alone. individual sessions (Fig. 2C). We found that firing rates could not account for the con- While the Kullback-Liebler divergence provides the most vergence between task and post-task SWS distributions. The completecharacterisationofthedistancebetweentwoprobabil- data-derived distance D(Post|R) was always smaller than the ity distributions, estimating it accurately from limited sample distance D(Post−model|R) predicted by the raster model data has known issues [19]. To check our results were robust, (Fig. 3A). This was true whether we used Kullback-Liebler we re-computed all distances using the Hellinger distance, a divergence or the Hellinger distance (Fig. 3C) to measure non-parametric measure that provides a lower bound for the distances between distributions. Kullback-Liebler divergence. Reassuringly, we found the same Our activity patterns are built from single units, unlike results: the distribution P(R) of activity during the trials previous work using multi-unit activity [11, 20, 22–24], so was consistently closer to the distribution in post-task SWS we expect our patterns to be sparse with rare synchronous [P(Post)] than in pre-task SWS [P(Pre)] (Fig. 2F-H; the activity. Indeed our data are dominated by activity patterns mean convergence between task-evoked and post-task SWS with K =0 and K =1 spikes (Fig. S4). If all patterns were distributions was 21±2.8%). K = 0 or K = 1, the raster model spike trains would be Singh etal. PNAS | July14,2016 | vol.XXX | no.XX | 3 bioRxiv preprint first posted online Sep. 21, 2015; doi: http://dx.doi.org/10.1101/027102. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. R: post-learn correct trials R: pre-change correct trials A B C D E 400 P=0.014 200 P=0.35 60 60 2 Distance (KLD, bits/s) 123000000 Convergence (%) −2240000 Session 468 Distance (KLD, bits/s) 11505000 Convergence (%) −2240000 KL divergence 10 −40 −40 −50 0 50 0 0 D(Pre|R)D(Post|R) −60 Convergence (%) D(Pre|R)D(Post|R) −60 F G H I J 0.08 P=0.001 0.05 P=0.41 60 60 2 H Distance (Hellinger) 000...000246 Convergence (%) −2240000 Session 468 Distance (Hellinger) 0000....00001234 Convergence (%) −2240000 ellinger distance −40 10 −40 0 −50 0 50 0 D(Pre|R)D(Post|R) −60 Convergence (%) D(Pre|R)D(Post|R) −60 Fig.2. Convergenceofactivitypatterndistributionsbetweenthetaskandpost-tasksleep.(A)Distancesbetweenthedistributionsofpatternfrequenciesinsleepandtask epochs;onedotpersession.D(X|Y):distancebetweenpatterndistributionsinepochsXandY:Pre:pre-taskSWS;Post:post-taskSWS;R:correcttasktrials.(B)Scatter ofconvergenceacrossallsessions(circles).ConvergenceisD(Pre|R)−D(Post|R)/D(Pre|R).Avaluegreaterthanzeromeansthattheactivitypatterndistributionin thetaskisclosertothedistributioninpost-taskSWSthanthedistributioninpre-taskSWS.Blacklinesgivemean±2s.e.m.(C)Data(dot)and95%bootstrappedconfidence interval(line)fortheconvergenceoftaskandpost-taskSWSactivitypatterndistributionsforeachsession.Black:sessionswithCIsabove0.(D)Distancesbetweenthe distributionsofpatternfrequenciesinsleepandtaskepochsduringrule-changesessions;onedotpersession.HerethedistributionP(R)isconstructedfromcorrecttask trialsbeforetherulechange.(E)ResultfrompanelDexpressedasconvergence.(F)-(J)AsA-E,usingHellingerdistance.AllP-valuesfrom1-tailedWilcoxonsignranktest, withN=10learningsessions(panelsA-CandF-H)orN =7rule-changesessions(panelsD-EandI-J). exactly equivalent to the data. Given the relative sparsity A B of K ≥ 2 patterns in our data, it is all the more surprising P=0.001 P=0.001 bits/s] 8 bits/s] 150 dthaetna-dtheraitvewdedfiosutrnidbustuiocnhsa. consistent lower distance for our R) - D(Post|R) [ 246 R) - D(Post|R) [ 10500 KL diverge isbnpuittkIhiteoesfno.rsleTllobaowutiiscvlhtteehoocankctclyttuhhrfeirrseot,nrmwuceeetodhafiepffcspeoelr-ieacencodct-ieatvhcabteteiivtoswaantemipeoenantadtpnaeartanatlytsaeswnrisnditsthm,odKodrdiase≥wtlrin2is- model| 0 model| 0 nce from data and from the raster model fitted to the complete ost- ost- data. We found that the data-derived distance D(Post|R) D(P -2 D(P -50 was always smaller than the distance D(Post−model|R) pre- dicted by the raster model (Fig. 3B-D). Across all sessions, C D themodel-predicteddistanceD(Post−model|R)wasbetween P=0.001 P=0.001 0.002 0.15 3%and46%greaterthanthedata-deriveddistanceD(Post|R) R) R) using Kullback-Liebler divergence, indicating that much of Post| Post| 0.1 He the convergence between task and SWS distributions could R) - D( 0.001 R) - D( llinge ncoontvbeergaecnccoeuonftteadskfoarnbdypfiorsitn-tgasrkatSeWs aSlodniset.riCbountisoenqsuiesnntloyt,dthuee D(Post-model| 0 D(Post-model| 0.005 r distance toof npReoeuparusoslnautrfiiionrnignlfiyg,r.ifnogrtrhateesse,Kbu≥t t2oascitmiviliatyrlypaptrteecrinsedicsotrrribeluattiioonnss, all convergence results held (Fig. S5) despite the order-of- -0.05 magnitude fewer sampled patterns. Fig.3. Convergenceiscausedbychangesincorrelation, notfiringrate. (A) Thedistancebetweenthetaskandpost-tasksleepdistributionsD(Post|R)is Convergenceisnotarecencyeffect.We examined periods of alwayssmallerthanpredictedbyfiringratechangesduringsleepaloneD(Post− SWSinordertomostlikelyobservethesamplingofaputative model|R),asgivenbytherastermodel. Blacklinesgivemean±2s.e.minall internal model in a static condition, with no external inputs panels.(B)AsinA,usingonlyactivitypatternswithK ≥2spikesfromdataand and minimal learning. But as correct task trials more likely model.(C)-(D)AsA-B,usingHellingerdistance.AllP-valuesfrom1-tailedWilcoxon signranktest,withN=10sessions. occur towards the end of a learning session, this raises the possibility that the closer match between task and post-task 4 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Singh etal. bioRxiv preprint first posted online Sep. 21, 2015; doi: http://dx.doi.org/10.1101/027102. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. SWS distributions are a recency effect, due to some trace or sampling rates during learning of a spatial navigation task. reverberation in sleep of the most recent task activity. Consequently, the statistical distributions of patterns in spon- The time-scales involved make this unlikely. Bouts of SWS taneous and task-evoked activity converge. Our analyses thus didnotstartuntiltypically8minutesaftertheendofthetask suggest mPFC encodes a probabilistic internal model of a (mean 397s, S.D. 188 s; Fig. 4A). Any reverberation would task, which is updated by behavioural outcomes, and uses thus have to last at least that long to appear in the majority population-activity sampling as the basis for inference. of post-task SWS distributions. Remarkably we observed the convergence of distributions TheinterveningperiodbeforethefirstboutofSWScontains usingpreciseactivitypatternsdownto2msresolution. Using quiet wakefulness and early sleep stages. If convergence was surrogate models, we showed that the convergence is due to a recency effect, then we would expect that distributions changes in correlations between neurons, rather than changes [P(Rest)] of activity patterns in this more-immediate “rest" in firing rates. Previous work observed fine structure in epochwouldalsoconvergewiththetaskdistributions. Wedid stimulus-evoked population activity patterns in retina [e.g. not find this: across sessions, there was no evidence that the 22, 23] and V1 [11]. We extend these results to show that distributioninpost-taskrest[P(Rest)]consistentlyconverged such fine time-scale correlation structure can be observed in onthedistributionduringtasktrials[P(R)](Fig. 4B-C;mean cortical regions for executive control, and be evoked by tasks. convergence was −8.7±18.7%). Not only is the observed Unexpectedly, we have shown that, despite their high tempo- convergence inconsistent with a recency effect, it seems also ral resolution, task information can be decoded from these selective for activity in SWS. patterns. How a cortical region encodes an internal model is an Distributions are updated by task-relevant activity patterns. intriguing open question. A strong candidate is the relative The above analysis rests on the idea that the distributions of strengthsofthesynapticconnectionsbothintoandwithinthe activity patterns are derived from an internal model of the encodingcorticalcircuit[7,8,10,12]. Theactivityofacortical task. This predicts that individual patterns should correlate circuitisstronglydependentonthepatternandstrengthofthe with some aspect of the task. We sought an unbiased way connections between its neurons [e.g. 31, 32]. Consequently, of testing this prediction, so considered the following. In definingtheunderlyingmodelasthecircuit’ssynapticnetwork our theory, the changes to the internal model over learning allowsbothmodel-basedinferencethroughsynaptically-driven shouldbedirectlyreflectedinthedifferencesbetweentheprior activity and model learning through synaptic plasticity [10]. distributions before and after learning. Consequently, if we Our results are distinct from previous observations of task- compare the sampling of activity patterns in pre-task sleep to specific replay during sleep in prefrontal cortex [33], including sampling in post-task sleep, then any patterns with changes reports [16] using the same data analysed here. In contrast to in their sampling should be from the updated model. This the work here, replay accounts do not consider the statistical means that these patterns should encode some aspect of the distributionsoftheobservedpatterns,noridentifythechanged task. patterns[beyondexampletemplatesinref.33],norrelatethem Remarkably, this is exactly what we found. For each co- totaskbehaviour;moreover,replayisdescribedforcoincident activation pattern, we found its ability to predict a trial’s activity on coarse time-scales greater than those used here outcome by its rate of occurrence on that trial (Fig 5A). by a factor of 50 ([ref. 16]) up to a factor of 10000 ([ref. 33]). When we compared this outcome prediction to the change in Theythusdonotaddressthestatisticalchangestopopulation- sampling between pre- and post-task sleep, we found a strong wide activity predicted by theories of probabilistic population correlation between the two (Fig. 5B-D). This correlation coding. was highly robust (Fig. 5E-G). The learnt internal model, Our theory proposes that spontaneous neural activity dur- as evidenced by the updated patterns sampled from it, was ing sleep is sampling a prior distribution generated from an seemingly encoding the task. internal model. We found that the set of activity patterns was remarkably conserved between sleeping and behaviour Outcome-predictive patterns occur around the choice point. (Fig. S1), despite the different global dynamics of cortex Consistent with the internal model being task-related, we between these states [34, 35], consistent with activity being further found that the outcome-predictive activity patterns generated from the same internal model in both states. This preferentially occurred around the choice point of the maze theory predicts that manipulating synaptic weights during (Fig. 6). Particularly striking was that patterns strongly sleep, changing the internal model, should change both the predictive of outcome rarely occurred in the starting arm prior and the posterior distributions over task variables. Re- (Fig. 6A). Together, the selective changes over learning to centworkhasshownthatinducingtask-specificrewardsignals outcome-specific(Fig. 5)andlocation-specific(Fig. 6)activity during sleep, likely altering synaptic weights, indeed immedi- patterns show that the convergence of distributions (Fig. 2) ately alters task behaviour on waking [36]. Our results thus is not a statistical curiosity, but is evidence for the updating suggest that casting sleeping and waking activity as prior of a behaviourally-relevant internal model. and posterior distributions generated from the same internal modelcouldbeafruitfulcomputationalframeworkforrelating cortical dynamics to behaviour. Discussion Prefrontal cortex has been implicated in both planning and MaterialsandMethods working memory during spatial navigation [25–28], and ex- ecutive control in general [29, 30]. Our results suggests a probabilistic basis for these functions. We find that moment- Taskandelectrophysiologicalrecordings.The data analysed here to-momentpatternsofmPFCpopulationactivitychangetheir werefromtenlearningsessionsandsevenrulechangesessionsin Singh etal. PNAS | July14,2016 | vol.XXX | no.XX | 5 bioRxiv preprint first posted online Sep. 21, 2015; doi: http://dx.doi.org/10.1101/027102. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. Rest SWS Error Correct A B C 10 P=0.65 P=0.014 400 50 8 Session 246 Post-task rest (s)1257050500000 Distance (KLD, bits/s) 123000000 Convergence (%) -500 0 0 -100 Post-task Post-task D(Pre|R)D(Rest|R) 0 2000 4000 6000 8000 10000 rest SWS Time(s) Fig.4. Convergenceisnotarecencyeffect.(A)Breakdownofeachlearningsessionintothedurationofitsstatecomponents.Thetaskepochisdividedintocorrect(red)and error(blue)trials,andinter-trialintervals(whitespaces).Trialdurationsweretypically2-4seconds,soarethinlinesonthisscale.Thepre-andpost-taskepochscontained quietwakingandlightsleepstates(“Rest"period)andidentifiedboutsofslow-wavesleep(“SWS").Inset:durationoftheRestperiodbetweentheendofthelasttrialandthe startofthefirstSWSbout(linesgivemean±2s.e.m.)(B)Distancesbetweenthedistributionsofpatternfrequenciesindifferentepochs;onedotpersession.D(X|Y): distancebetweenpatterndistributionsinepochsXandY:Pre:pre-taskSWS;Rest:immediatepost-taskrestperiod;R:correcttasktrials.ComparetoFig.2A.(C)Results frompanelBexpressedastheconvergencebetweenthedistributionsinthetaskandpost-taskrestperiod.Wealsore-plotheretheconvergencebetweenthetaskand post-taskSWSdistributions.(P-valuesfrom1-tailedWilcoxonsignranktest,withN=10sessions). the study of [ref. 16]. For full details on training, spike-sorting, work[11,20],wethuscomputeandreportthesymmetrisedKLD: andhistologysee[16]. FourLong-Evansmaleratswithimplanted D(P|Q)=(d(P|Q)+d(Q|P))/2. tetrodesinprelimbiccortexweretrainedontheY-mazetask(Fig. Thereare2N distinctpossibleactivitypatternsinarecording 1a). Each recording session consisted of a 20-30 minute sleep or withN neurons. Mostoftheseactivitypatternsareneverobserved, restepoch(pre-taskepoch),inwhichtheratremainedundisturbed soweexcludetheactivitypatternsthatarenotobservedineitherof inapaddedflowerpotplacedonthecentralplatformofthemaze, theepochswecompare. Theempiricalfrequencyoftheremaining followed by a task epoch, in which the rat performed for 20-40 activitypatternsisbiasedduetothelimitedlengthoftherecordings minutes, and then by a second 20-30 minute sleep or rest epoch [19]. To counteract this bias, we use the Bayesian estimator and (post-task epoch). Every trial started when the rat reached the quadraticbiascorrectionexactlyasdescribedin[11]. TheBerkes departure arm and finished when the rat reached the end of one estimatorassumesaDirichletpriorandmultinomiallikelihoodto of the choice arms. Correct choice was rewarded with drops of calculate the posterior estimate of the KLD; we use their code flavoured milk. Each rat had to learn the current rule by trial- (github.com/pberkes/neuro-kl)tocomputetheestimator. Wethen and-error, either: go to the left arm; go to the right arm; go to computeaKLDestimateusingallS activitypatterns,andusing thelitarm. Tomaintainconsistentcontextacrossallsessions,the S/2andS/4patternsrandomlysampledwithoutreplacement. By extra-mazelightcueswerelitinapseudo-randomsequenceacross fitting a quadratic polynomial to these three KLD estimates, we trials,whethertheywererelevanttotheruleornot. canthenusetheintercepttermofthequadraticfitasanestimate Weprimarilyanalysedheredatafromthetensessionsinwhich oftheKLDifwehadaccesstorecordingsofinfinitelength[19,37]. thepreviously-definedlearningcriteriaweremet: thefirsttrialofa The Hellinger distance for two discrete distributions P and bpleorcfokrmofaantcleeausnttitlhtrheeeceonndseocfutthiveeserseswioanrdwedastraibalosveaf8te0r%w.hIinchlattheer Q is D(P|Q) = 12Pni=1(√pi−√qi)2. To a first approximation, sessions the rats reached the criterion for changing the rule: ten this measures for each pair of probabilities (pi,qi) the distance betweentheirsquare-roots. Inthisform, D(P|Q)=0meansthe consecutivecorrecttrialsoroneerroroutof12trials. Thuseachrat distributionsareidentical,andD(P|Q)=1meansthedistributions learntatleasttworules,withsevenrule-changesessionsintotal. are mutually singular: all positive probabilities in P are zero in Tetroderecordingswerespike-sortedonlywithineachrecording Q,andvice-versa. TheHellingerdistanceisalowerboundforthe sessionforconservativeidentificationofstablesingleunits. Inthe KLD:2D(P|Q)≤KLD. 17 sessions we analyse here, the populations ranged in size from To compare distances between sessions we computed a nor- 15-55units. malised measure of “convergence". The divergence between a Activitypatterndistributions.ForapopulationofsizeN,wecharac- given pair of distributions could depend on many factors that terisedpopulationactivityfromtimettot+δasanN-lengthbinary differ between sessions, including that each recorded population vectorwitheachelementbeing1ifatleastonespikewasfiredby was a different size, and how much of the relevant population thatneuroninthattime-bin,and0otherwise. Inthemaintextwe for encoding the internal model we recorded. Consequently, the useabinsizeofδ=2msthroughout,andreporttheresultsofusing key comparison between the divergences D(Pre|R)−D(Post|R) largerbinsizesin(Fig. S2). Webuildpatternsusingthenumber also depends on these factors. To compare the difference in di- ofrecordedneuronsN,uptoamaximumof35forcomputational vergences across sessions, we computed a “convergence" score by tractability. Theprobabilitydistributionfortheseactivitypatterns normalising by the scale of the divergence in the pre-task SWS: wascompiledbycountingthefrequencyofeachpattern’soccurrence ((D(Pre|R)−D(Post|R))/D(Pre|R). We express this as a per- andnormalisingbythetotalnumberofpatternoccurrences. centage. Convergencegreaterthan0%indicatesthatthedistance between the task (R: correct trials) and post-task SWS (Post) Comparing distributions.We quantified the distance D(P|Q) be- distributions is smaller than that between the task and pre-task tweenprobabilitydistributionsP andQusingboththeKullback- SWS(Pre)distributions. Lieblerdivergence(KLD)andtheHellingerdistance. The KLD is an information theoretic measure to compare Statistics.Quoted measurement values are means ± s.e.m. All the similarity between two probability distributions. Let P = hypothesistestsusedthenon-parametricWilcoxonsigntestforaone- (p1,p2,...,pn) and Q = (q1,q2,...,qn) be two discrete probabil- sampletestthatthesamplemedianforthepopulationofsessions ity distributions, for n distinct possibilities – for us, these are isgreaterthanzero. Forlearningsessions,wehaveN =10sessions; afilnlepdosassibdle(Pi|nQd)ivi=duaPl ani=ct1ivpiitlny(ppqiait)t.ernTsh.isTmheeaKsuLreDisisntohtensydme-- fpolrotrumlee-acnhavnaglueses(Fanigd.t2h)ewireahpapvreoxNim=at7es9e5s%siocnosn.fiTdhenrocuegihnoteurtvawles metric, so that in general d(P|Q) 6= d(Q|P). Following prior givenby±2s.e.m. 6 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Singh etal. bioRxiv preprint first posted online Sep. 21, 2015; doi: http://dx.doi.org/10.1101/027102. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. A C D n) Error dia0.25 0.6 me R = 0.9 Proportion (trials) 00..24 Correct absolutechange(000.1..125 PN <= 02.70002 G 1 ed0.05 0 malis 0.5 Patt0ern frequ1e0ncy (per2 t0rial) Nor 00.5 P0(.p6redic0t .o7utco0m.8e) 0.9 huffled) 0 Bge 1 E F edian)0.25 R = 0.63 R (s Normalised absolute chan 0000....2468 alisedabsolutechange(m0000..01..1255 PN <= 09.50002 -0-.15-0.5 0 R (data)0.5 1 0 0.6 0.8 1 Norm 00.5 0.6 0.7 0.8 0.9 P(predict outcome) P(predict outcome) Fig.5. Codingoftrialoutcomebysampledactivitypatterns.(A)Exampledistributionsofapattern’sfrequencyconditionedontrialoutcomefromonesession.(B)Forall co-activationpatternsinonesession,ascatterplotofoutcomepredictionand(absolute)changeinpatternfrequencybetweenpre-andpost-taskSWS.Changeisnormalisedto themaximumchangeinthesession.(C)Distributionofchangeinpatternfrequencyaccordingtooutcomepredictionoveralltensessions.Colourintensitygivesthecumulative probabilityofatleastthatchange.Circlesgivethemedianabsolutechangeforeachdistribution.Inthisexample,distributionswerebuiltusingbinswith90data-pointseach. UnbinneddataareanalysedinFig.S6.(D)CorrelationofoutcomepredictionandmedianchangeinpatternoccurrencebetweensleepepochsfromC,overalltensessions. Redlineisthebest-fitlinearregression(P <0.0002,permutationtest).(E)-(F)AsC-D,fortheworst-casecorrelationobserved,with25data-pointsperbin.(G)Robustness ofcorrelationresults.SoliddotsplotthecorrelationcoefficientRbetweenoutcomepredictionandmedianchangeinpatternfrequencyobtainedfordifferentbinningsofthe data.ColoureddotscorrespondtopanelsC-DandE-F.LineseachgivetheentirerangeofRobtainedfroma5000-repeatpermutationtest;nonereachtheequivalentdata point(dashedlineshowsequality),indicatingalldatacorrelationshadP <0.0002. Bootstrappedconfidenceintervals(inFig. 2C,H)foreachses- errortrials(asintheexampleofFig. 5A).Wethenusedathresh- sion were constructed using 1000 bootstraps of each epoch’s ac- old T to classify trials as error or correct based on whether the tivity pattern distribution. Each bootstrap was a sample-with- frequency on that trial exceeded the threshold or not. We found replacement of activity patterns from the data distribution X thefractionofcorrectlyclassifiedcorrecttrials(truepositiverate) to get a sample distribution X∗. For a given pair of boot- andthefractionoferrortrialsincorrectlyclassifiedascorrecttrials strapped distributions X∗,Y∗ we then compute their distance (false positive rate). Plotting the false positive rates against the D∗(X∗|Y∗). Given both bootstrapped distances D∗(Pre|R) and truepositiveratesforallvaluesofT givestheROCcurve. Thearea D∗(Post|R), we then compute the bootstrapped convergence undertheROCcurvegivestheprobabilitythatarandomlychosen (D∗(Pre∗|R∗)−D∗(Post∗|R∗))/D∗(Pre∗|R∗). patternfrequencywillbecorrectlyclassifiedasfromacorrecttrial; wereportthisasP(predictoutcome). Rastermodel.Tocontrolforthepossibilitythatchangesinactivity pattern occurrence were due solely to changes in the firing rates Relationship of sampling change and outcome prediction.Within ofindividualneuronsandthetotalpopulation,weusedtheraster each session, we computed the change in each pattern’s occur- model exactly as described in [20]. For a given data-set of spike- rence between pre- and post-task SWS. These were normalised trainsN andbinsizeδ,therastermodelconstructsasyntheticset by the maximum change within each session. Maximally chang- ofspikessuchthateachsyntheticspike-trainhasthesamemean ingpatternswerecandidatesforthoseupdatedbylearningduring rateasitscounterpartinthedata,andthedistributionofthetotal the task. Correlation between change in pattern sampling and number of spikes per time-bin matches the data. In this way, it outcome prediction was done on normalised changes pooled over predictsthefrequencyofactivitypatternsthatshouldoccurgiven allsessions. Changescoreswerebinnedusingvariable-widthbins solelychangesinindividualandpopulationrates. ofP(predictoutcome),eachcontainingthesamenumberofdata- For Fig 3 we generated 1000 raster models per session using pointstoruleoutpowerissuesaffectingthecorrelation. Weregress thespike-trainsfromthepost-taskSWSinthatsession. Foreach P(predictoutcome)againstthemedianchangeineachbin,using generated raster model, we computed the distance between its themid-pointofeachbinasthevalueforP(predictoutcome). Our distributionofactivitypatternsandthedatadistributionforcorrect mainclaimisthatpredictionandchangearedependentvariables trials in the task D(Post−model|R). This comparison gives the (Fig. 5C-G).Totestthisclaim,wecomparedthedatacorrelation expecteddistancebetweentaskandpost-taskSWSdistributions againstthenullmodelofindependentvariables,bypermutingthe duetofiringratechangesalone. Weplotthedifferencebetweenthe assignmentofchangescorestotheactivitypatterns. Foreachper- meanD(Post−model|R)andthedataD(Post|R)inFig. 3. mutation,werepeatthebinningandregression. Wepermuted5000 timestogetthesamplingdistributionofthecorrelationcoefficient Outcomeprediction.Weexaminedthecorrelatesofactivitypattern R∗ predictedbythenullmodelofindependentvariables. Tocheck occurrencewithbehaviour. Toruleoutpurefiringrateeffects,we robustness,allanalyseswererepeatedforarangeoffixednumber excluded all patterns with K =0 and K =1 spikes, considering ofdata-pointsperbinbetween20and100. onlyco-activationpatternswithtwoormoreactiveneurons. To check whether individual activity patterns coded for the Relationshipoflocationandoutcomeprediction.Thelocationofev- outcome on each trial, we used standard receiver-operating char- acteristic (ROC) analysis. For each pattern, we computed the ery occurrence of a co-activation pattern was expressed as a nor- distributionofitsoccurrencefrequenciesseparatelyforcorrectand malizedpositiononthelinearisedmaze(0: startofdeparturearm; Singh etal. PNAS | July14,2016 | vol.XXX | no.XX | 7 bioRxiv preprint first posted online Sep. 21, 2015; doi: http://dx.doi.org/10.1101/027102. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. 1: end of the chosen goal arm). Our main claim is that activity weplotthemeanand95%rangeofthissamplingdistributionas patternsstronglypredictiveofoutcomeoccurpredominantlyaround thegreyregioninFig. 6B. thechoicepointofthemaze,andsopredictionandoverlapofthe choiceareaaredependentvariables(Fig. 6B).Totestthisclaim, ACKNOWLEDGMENTS. WethanktheHumphrieslab(JavierCa- wecomparedthisrelationshipagainstthenullmodelofindependent ballero,MatEvans,SilviaMaggi)fordiscussions;RasmusPetersen variables,bypermutingtheassignmentoflocationcentre-of-mass forcommentsonthemanuscript;andP.BerkesandM.Okunfor (medianandinterquartilerange)totheactivitypatterns. Foreach respectively making their KL divergence and raster model code publiclyavailable. A.S.andM.D.HweresupportedbyaMedical permutation, we compute the proportion of patterns whose in- ResearchCouncilSeniornon-ClinicalFellowshipawardtoM.D.H. terquartilerangeoverlapsthechoicearea,andbinasperthedata. A.P. was supported by Human Frontier Science Program Fellow- We permuted 5000 times to get the sampling distribution of the shipLT000160/2011-landNationalInstituteofHealthAwardK99 proportionspredictedbythenullmodelofindependentvariables: NS086915-01. A B models.PLoSComputBiol9:e1003311. 1 100 13. RagozzinoME,DetrickS,KesnerRP(1999)Involvementoftheprelimbic-infralimbicareas n 0.8 Goal arm oint (%) 80 14. RNofiectuhhreoEsrLco,id1Se9hn:a4tp5pi8rreo5f–rMo4n5Lt9a(4l2.0c0o7rt)exPrienlimbebhica/viniofrraallimflebxicibiinlitayctfiovratpiolanceimapnadirsremspeomnosreylfeoarrnminuglt.ipleJ Normalized positio 000...246 ChoiceDeparture Patterns around choice p 246000 11116785.... PLWBl9sttehaeu2eeaesonc6ynrkhzsr.hncaraosiihepncwkrerpgyhintAoeAcrrace,e,hnAasHleBaemp,sutaoe,KKpmrndbahtsehpulan-etóhtmesparniuPraleoi.e,nrssta(fsHr2nfllMoi0eeapnMD1xoratri0a,ctb,its)olMelBenrCrKatenesioccDnstehwahclieelehon(pr2cnerettko0snhinop0aueCtnu9nptl)phe(ooa2refntSei0Ktofpafal1r,enoom3oasWnn)s.ritltncPiaiaNaeiinolnlrneleapcgetuaootu.rriosruloNatSknsnetessiIxo,6.uenad2vrBJo-neu:aw4Nndnrt1iittend6ras3euge6g–oro:l4ods9iraugl2si2estca5l1Fteiirnn.–pPi2bei9.z7u3a(t:Nt26h4tioi0ao.e7n0tn4s9rN7eo)o–eaff4Rulsmn7repoe5piksu5olecar.faiytpl1imooa2fsci:n9strig1iuvb6illiteeny–- duringperceptualdecision-making.ProgNeurobiol103:156–193. 0 0 0.5 0.6 0.7 0.8 0.9 0.6 0.8 19. PanzeriS,SenatoreR,MontemurroMA,PetersenRS(2007)Correctingforthesamplingbias probleminspiketraininformationmeasures.JNeurophysiol98:1064–1072. P(predict outcome) P(predict outcome) 20. OkunMetal.(2012)Populationratedynamicsandmultineuronfiringpatternsinsensory cortex.JNeurosci32:17108–17119. Fig.6. Outcomepredictingactivitypatternsaresampledinthechoicearea. (A) 21. FiserJ,LengyelM,SavinC,OrbánG,BerkesP(2013)How(not)toassesstheimportance Scatterplotofeachpattern’soutcomepredictionandsamplelocationsinthemaze ofcorrelationsforthematchingofspontaneousandevokedactivity.p.arXiv:1301.6554. (dotismedianposition; greylineisinterquartilerange); allpositionsgivenasa 22. SchneidmanE,BerryMJ,SegevR,BialekW(2006)Weakpairwisecorrelationsimply proportionofthelinearisedmazefromstartofdeparturearm.Redlinesindicatethe stronglycorrelatednetworkstatesinaneuralpopulation.Nature440:1007–1012. approximatecentre(solid)andboundaries(dashed)ofthemaze’schoicearea(cfFig 23. TkacikGetal.(2014)Searchingforcollectivebehaviorinalargenetworkofsensoryneurons. 1A).(B)Proportionofactivitypatternswhoseinterquartilerangeofsamplelocations PLoSComputBiol10:e1003408. 24. GanmorE,SegevR,SchneidmanE(2015)Athesaurusforaneuralpopulationcode.Elife entersthechoicearea(blackdotsandline).Greyregionshowsmean(line)and95% 4:e06134. range(shading)ofproportionsfromapermutationtest.Thedataexceedtheupper 25. BaegEHetal.(2003)Dynamicsofpopulationcodeforworkingmemoryintheprefrontal limitofexpectedproportionsforalloutcome-predictivepatterns. cortex.Neuron40:177–188. 26. FujisawaS,AmarasinghamA,HarrisonMT,BuzsákiG(2008)Behavior-dependentshort- termassemblydynamicsinthemedialprefrontalcortex.NatNeurosci11:823–833. 27. ItoHT,ZhangSJ,WitterMP,MoserEI,MoserMB(2015)Aprefrontal-thalamo-hippocampal 1. KordingKP,WolpertDM(2004)Bayesianintegrationinsensorimotorlearning. Nature circuitforgoal-directedspatialnavigation.Nature522:50–55. 427:244–247. 28. SpellmanTetal.(2015)Hippocampal-prefrontalinputsupportsspatialencodinginworking 2. PougetA,BeckJM,MaWJ,LathamPE(2013)Probabilisticbrains:knownsandunknowns. memory.Nature522:309–314. NatNeurosci16:1170–1178. 29. MillerEK(2000)Theprefrontalcortexandcognitivecontrol.NatRevNeurosci1(1):59–65. 3. WolpertDM,GhahramaniZ,JordanMI(1995)Aninternalmodelforsensorimotorintegration. 30. SulJH,KimH,HuhN,LeeD,JungMW(2010)Distinctrolesofrodentorbitofrontalandmedial Science269:1880–1882. prefrontalcortexindecisionmaking.Neuron66:449–460. 4. DayanP,AbbotLF(2001)TheoreticalNeuroscienceed.anonymous.(MITPress,Cambridge, 31. CossellLetal.(2015)Functionalorganizationofexcitatorysynapticstrengthinprimaryvisual MA). cortex.Nature518:399–403. 5. ZemelRS,DayanP,PougetA(1998)Probabilisticinterpretationofpopulationcodes.Neural 32. OkunMetal.(2015)Diversecouplingofneuronstopopulationsinsensorycortex. Nature Comput10:403–430. 521:511–515. 6. MaWJ,BeckJM,LathamPE,Pouget1A(2006)Bayesianinferencewithprobabilisticpopu- 33. EustonDR,TatsunoM,McNaughtonBL(2007)Fast-forwardplaybackofrecentmemory lationcodes.NatNeurosci9:1432–1438. sequencesinprefrontalcortexduringsleep.Science318:1147–1150. 7. BuesingL,BillJ,NesslerB,MaassW(2011)Neuraldynamicsassampling: amodel 34. DestexheA,ContrerasD,SteriadeM(1999)Spatiotemporalanalysisoflocalfieldpotentials forstochasticcomputationinrecurrentnetworksofspikingneurons. PLoSComputBiol andunitdischargesincatcerebralcortexduringnaturalwakeandsleepstates.JNeurosci 7:e1002211. 19:4595–4608. 8. KappelD,HabenschussS,LegensteinR,MaassW(2015)NetworkplasticityasBayesian 35. SteriadeM,TimofeevI,GrenierF(2001)Naturalwakingandsleepstates:aviewfrominside inference.PLoSComputBiol11:e1004485. neocorticalneurons.JNeurophysiol85:1969–1985. 9. BeckJMetal.(2008)ProbabilisticpopulationcodesforBayesiandecisionmaking.Neuron 36. deLavilleonG,LacroixMM,Rondi-ReigL,BenchenaneK(2015)Explicitmemorycreation 60:1142–1152. duringsleepdemonstratesacausalroleofplacecellsinnavigation. NatNeurosci18:493– 10. FiserJ,BerkesP,OrbánG,LengyelM(2010)Statisticallyoptimalperceptionandlearning: 495. frombehaviortoneuralrepresentations.TrendsCognSci14:119–130. 37. StrongSP,KoberleR,deRuytervanSteveninckRR,BialekW(1998)Entropyandinformation 11. BerkesP,OrbánG,LengyelM,FiserJ(2011)Spontaneouscorticalactivityrevealshallmarks inneuralspiketrains.PhysRevLett80:197–200. ofanoptimalinternalmodeloftheenvironment.Science331:83–87. 12. HabenschussS,JonkeZ,MaassW(2013)Stochasticcomputationsincorticalmicrocircuit 8 | www.pnas.org/cgi/doi/10.1073/pnas.XXXXXXXXXX Singh etal. bioRxiv preprint first posted online Sep. 21, 2015; doi: http://dx.doi.org/10.1101/027102. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. SI Text external input, from the prior in spontaneous activity. Some downstream neurons, receiving these samples as a consecutive Neuralinference sequenceofinputs,canreconstructtheprobabilitydistribution just by summing their inputs over time. How do we know the current state of the world given some Forsimplicity,Berkesetal[2]consideredtheinstantaneous inputfromit? Ourinputisbothlimitedintimeandnoisy, so population activity as some binary vector indicating whether ourestimatesareinherentlyuncertain. Consequently,wehave eachneuronwasactiveorinactiveinaverysmalltimewindow. an inference problem: what is our best guess of the current This representation makes the distributions easy to measure stateoftheworldgivensomefinite,noisyinput? Wecanstate experimentally. this problem as being equivalent to inferring the probability Learning updates synaptic weights, altering the encoded distribution model. The prediction that posterior and prior distributions P(state|input,model) [1] converge over learning is thus neurally equivalent to the con- atsomegivenmomentintimet;inwords,thisistheprobability vergencebetweenthedistributionsofevokedandspontaneous of currently being in a given state, out of all possible states, population activity. given both the available input and some internal model of the world. Using Bayes’ theorem, we can make this dependence Evidenceforinference-by-samplinginvisualcortices on input and model explicit: Theseideasweredevelopedinthecontextofvisualprocessing, and particularly with reference to V1. In this context, the P(state|input,model)∝P(input|state,model)P(state|model) “state” of the world is the current view, and the input is the | {z } | {z }| {z } posterior likelihood prior information received by the retina. The proposed purpose [2] of inference in V1 is to infer the most likely low level visual The prior is the internal estimate of the current state before features – edges, for example – present in the current view, the observation input, the posterior is the estimate of the given the input to the retina. V1’s internal model is then a current state after observing input, and the improvement in statistical model of the low-level features, which can be built the estimate arises from the new information available in over a life-time’s experience of the world. input that is processed through the likelihood. All these are Consequently, Berkes and colleagues [2] tested the con- dependent on the model of the world we are using. This struction of this internal model by recording from area V1 internal model specifies how we interpret the inputs in the at different stages of development in the ferret. Natural im- likelihood, and generate the prior probabilities. If we change ages were used to probe the current posterior distribution themodel,wechangethesetwooperations,andsochangeour supported by the model, and darkness was used to probe the estimates of the current state of the world. We can think of current prior distribution. Over development, the activity the model as specifying what we expect to be relevant in the distribution evoked by natural images increased its similar- input, and what states we expect to be in. ity to the distribution during darkness. This increase was Onegoaloflearningisthustoupdatetheinternalmodelto robust to a series of controls for simultaneous changes in fir- match the statistical properties of the world. The better the ing rate statistics [2–4]. Their results are consistent with model, the better we will be able to predict the state of the the inference-by-sampling interpretation in which the internal external world. But as we can only access directly the inputs model is updated by experience with the world, so that the generated from those states, formally we say that learning posterior and prior distributions converge. seeks to maximise P(input|model) over all possible inputs at alltimestbychangingtheparametersofthemodel. Amodel Inference-by-samplinginhighercorticesoverlearning which always generates maximum values for P(input|model) duringbehaviour isthebestpossiblelearntinternalmodeloftheexternalworld. Theseresultscouldnotaddresslearningseparatelyfromdevel- Obtaining such a model necessarily means that we have expe- opment. Further, unknown is whether inference-by-sampling rienced all possible states giving rise to those inputs, so that can be observed in higher-order cortices, or during ongoing the prior P(state|model) is always accurate, and we obtain behaviour. no new information from the likelihood. Consequently, the There is no a priori reason to expect that inference-by- posteriorprobabilitybecomesalwaysproportionaltotheprior sampling would be restricted to primary sensory cortices. probability. A measure of learning is thus how close the prior Muchhasbeenwrittenaboutthegenericnatureofthecortical and posterior distributions have become. microcircuit [5, 6], so we might reasonably expect that, if an internalmodelisencodedbytheneuralcircuitinV1,soother Inference-by-sampling similar cortical circuits in other regions encode other internal The inference-by-sampling theory [1, 2] proposes that the models. model is encoded by the particular set and weight of connec- Compellingsupportforthishascomefrommodellingwork tionsinaneuralcircuit. Inthisview,theposteriordistribution by Maass and colleagues [7, 8]. Their models have shown how isencodedbytheactivityofthecircuitevokedbysomeinput. awiderangeofplausiblecorticalcircuitmodelsallproducethe Crucially, it predicts that the prior distribution is encoded necessarydynamicstosamplefromastatisticalmodelencoded by spontaneous activity of the same circuit, as this is solely by the circuit’s connections [7, 8]. Moreover, the models sampling the model. also replicate key properties of the firing statistics in cortex, Ifthecircuitisthemodel,thenthetheorypredictsthatthe including the close-to-Poisson irregularity of firing patterns. circuit’s instantaneous population activity is a sample from a These suggest that the inference-by-sampling hypothesis is probability distribution - from the posterior when receiving indeed a plausible generic computation for cortex. PNAS | July14,2016 | vol.XXX | no.XX | 1 bioRxiv preprint first posted online Sep. 21, 2015; doi: http://dx.doi.org/10.1101/027102. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license. Inference of state is also a generic operation. Nothing in that we would observe changes that correlated with learning. Equation 2 limits its application to sensory information. We We reasoned that neural activity in clearly identified sleep might consider “state" in the sense used in the reinforcement periods before and after the session was a clear candidate for learning literature [9], as a generic description of the current spontaneous activity, as it occurred in the absence of external values of variables of the external world. Indeed, in forms of sensory input. We used slow-wave sleep periods to clearly reinforcement learning that depend on simulation of future delineate the presence of sleep. As the rats acquired the rule actions,“state"inthiscontextcanevenrefertothesimulated in that session then, if mPfC indeed encodes rule acquisition, values of variables in the external world - for which we would we expect that the spontaneous activity in sleep after the use the internal model to simulate possible outcomes. During sessionisdrawnfromtheinternalmodelrelatedtothecorrect behaviour, we might thus expect that an internal model is rule. learnt about the statistical dependence of outcomes on deci- Wecanonlybesurethatduringbehaviourthiscorrect-rule sions in particular contexts. modelwouldbesampledoncorrecttrials. Thisdoesnotimply The power of the inference-by-sampling hypothesis is that thatmPfCactivityiscausalfordecisionsonthosetrials-even we do not need to know the internal model to test for its in a monitoring or goal-encoding role, mPfC activity would existence. We need not specify an exact model to test the reflect whether or not the correct decision was made. The convergence of distributions in evoked and spontaneous activ- mPfC activity on error trials is unconstrained by the theory. ity, but such a convergence is evidence of an updated internal Consequently, we can only be sure that, if the inference-by- model. sampling hypothesis is true, then the distribution of samples Consequently, to test the generality of the inference-by- oncorrecttrialswouldconverge,onaverage,tothedistribution sampling hypothesis, we sought to test the convergence of in sleep after learning. distributions over learning using data from the medial pre- The final, subtle constraint is that overt behavioural signs frontal cortex (mPfC) of rats learning rules in a Y-maze task of learning likely indicates ongoing synaptic plasticity. For [10]. By looking at these data for a change to some internal example, on the same Y-maze, some pyramidal neurons in model in mPFC, we are assuming only that the model is re- mPfC change the timing of their spikes in relation to the hip- lated to the rule, but not any specific form of model. It could pocampal theta rhythm, indicating local circuit plasticity [13]. encode the set of task states and their transitions; it could If so, then the internal model is changing during behaviour. encode the current sequence of required actions; it could be But the internal model putatively sampled in the post-session a statistical model of outcomes. Supporting this assumption, sleep will be stable. To thus minimise the confound of these we know mPFC is necessary for successful acquisition of new changes during behaviour, and compare static posterior and rules[11,12],andthatmPFCpyramidalneuronschangetheir prior distributions [as per 2], we sought to identify where the firing patterns during acquisition of the rules used here [13]. internalmodelupdatingmayhavefinished. Ausefulproxyfor Eveniftheinterpretationoftheconvergenceofdistributions thisistheasymptoticbehaviouralperformance. Wethusused in the inference-by-sampling framework turns out to be incor- the trial at which the rat reached the learning criteria as the rect, the observation of such a convergence between waking indicatorofrelativestabilityintheinternalmodel. Allcorrect and spontaneous activity over learning still offers compelling trials from this trial onwards were then used to construct the clues to the nature of cortical computation. activitydistributionduringthetask-wecallthisdistribution P(R) in the main text, and distances measured between it and some other distribution P(X) we call D(X|R). Whatdistributionstocompare? Nonetheless,theinference-by-samplingtheoryplaceslimitson 1. FiserJ,BerkesP,OrbánG,LengyelM(2010)Statisticallyoptimalperceptionandlearning: frombehaviortoneuralrepresentations.TrendsCognSci14:119–130. exactlywhichactivitydistributionstocompare. IntheBerkes 2. BerkesP,OrbánG,LengyelM,FiserJ(2011)Spontaneouscorticalactivityrevealshallmarks et al. [2] study, this decision was made simple by the elegant ofanoptimalinternalmodeloftheenvironment.Science331:83–87. 3. OkunMetal.(2012)Populationratedynamicsandmultineuronfiringpatternsinsensory experimentaldesign. AstheymonitoredV1overdevelopment, cortex.JNeurosci32:17108–17119. so it was reasonable to expect the internal model to adapt 4. FiserJ,LengyelM,SavinC,OrbánG,BerkesP(2013)How(not)toassesstheimportance to the statistics of the world over a lifetime. Their tests at ofcorrelationsforthematchingofspontaneousandevokedactivity.p.arXiv:1301.6554. 5. ThomsonAM,LamyC(2007)Functionalmapsofneocorticallocalcircuitry.FrontNeurosci different developmental stages were samples of the current 1:19–42. posterior and prior distributions supported by the model. We 6. HarrisKD,ShepherdGMG(2015)Theneocorticalcircuit:themesandvariations.NatNeu- would not expect significant changes to the internal model rosci18:170–181. 7. BuesingL,BillJ,NesslerB,MaassW(2011)Neuraldynamicsassampling: amodel during their testing, as it was short on the time-scale of the forstochasticcomputationinrecurrentnetworksofspikingneurons. PLoSComputBiol developmentalchanges,andsotheycouldcomparetheirentire 7:e1002211. 8. HabenschussS,JonkeZ,MaassW(2013)Stochasticcomputationsincorticalmicrocircuit recorded distributions of evoked and spontaneous activity. In models.PLoSComputBiol9:e1003311. otherwords,theywereabletocomparetwodistributionsfrom 9. SuttonRS,BartoAG(1998)ReinforcementLearning: AnIntroduction. (MITPress,Cam- the same, static model. bridge,MA). 10. PeyracheA,KhamassiM,BenchenaneK,WienerSI,BattagliaFP(2009)Replayofrule- Our data on rats learning rules in a Y-maze allow us to learningrelatedneuralpatternsintheprefrontalcortexduringsleep.NatNeurosci12:916– addressiflearningoftheinternalmodelcanbeobserved. But 926. learning on short time-scales brings the confounding issue 11. RagozzinoME,DetrickS,KesnerRP(1999)Involvementoftheprelimbic-infralimbicareas oftherodentprefrontalcortexinbehavioralflexibilityforplaceandresponselearning. J that learning the model is happening online, while we are Neurosci19:4585–4594. monitoringactivity. Sowhatdistributionsshouldwecompare? 12. RichEL,ShapiroML(2007)Prelimbic/infralimbicinactivationimpairsmemoryformultiple taskswitches,butnotflexibleselectionoffamiliartasks.JNeurosci27:4747–4755. We chose the 10 training sessions in which the rat clearly 13. BenchenaneKetal.(2010)Coherentthetaoscillationsandreorganizationofspiketimingin acquired the present rule, so we could be reasonably sure thehippocampal-prefrontalnetworkuponlearning.Neuron66:921–936. 2 |

Description:
Abhinav Singha, Adrien Peyracheb, and Mark D Humphriesa,1. aFaculty of Biology . are the basic element of cortical computation, underpinning sensation . from this model; and that “spontaneous" activity in slow-wave . 6. 8. 10. −50. 0. 50. Session. Convergence (%). −60. −40. −20. 0. 20. 40
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.