ebook img

A Forward Model at Purkinje Cell Synapses Facilitates Cerebellar Anticipatory Control PDF

1.1 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Forward Model at Purkinje Cell Synapses Facilitates Cerebellar Anticipatory Control

A Forward Model at Purkinje Cell Synapses Facilitates Cerebellar Anticipatory Control IvanHerreros-Alonso XerxesD.Arsiwalla PaulF.M.J.Verschure SPECSlab SPECSlab SPECS,UPF 7 UniversitatPompeuFabra UniversitatPompeuFabra CatalanInstitutionofResearch 1 Barcelona,Spain Barcelona,Spain andAdvancedStudies(ICREA) 0 [email protected] Barcelona,Spain 2 n a Abstract J 6 2 How does our motor system solve the problem of anticipatory control in spite of a wide spectrum of response dynamics from different musculo-skeletal sys- ] tems,transportdelaysaswellasresponselatenciesthroughoutthecentralnervous C system? To a great extent, our highly-skilled motor responses are a result of a N reactivefeedbacksystem,originatinginthebrain-stemandspinalcord,combined . withafeed-forwardanticipatorysystem,thatisadaptivelyfine-tunedbysensory o experienceandoriginatesinthecerebellum. Basedonthatinteractionwedesign i b thecounterfactualpredictivecontrol(CFPC)architecture,ananticipatoryadaptive - motorcontrolschemeinwhichafeed-forwardmodule,basedonthecerebellum, q steersanerrorfeedbackcontrollerwithcounterfactualerrorsignals. Thoseare [ signalsthattriggerreactionsasactualerrorswould,butthatdonotcodeforanycur- 1 rentorforthcomingerrors. Inordertodeterminetheoptimallearningstrategy,we v deriveanovellearningruleforthefeed-forwardmodulethatinvolvesaneligibility 5 traceandoperatesatthesynapticlevel. Inparticular,oureligibilitytraceprovides 7 amechanismbeyondco-incidencedetectioninthatitconvolvesahistoryofprior 7 synaptic inputs with error signals. In the context of cerebellar physiology, this 7 solutionimpliesthatPurkinjecellsynapsesshouldgenerateeligibilitytracesusing 0 aforwardmodelofthesystembeingcontrolled. Fromanengineeringperspective, . 1 CFPCprovidesageneral-purposeanticipatorycontrolarchitectureequippedwitha 0 learningrulethatexploitsthefulldynamicsoftheclosed-loopsystem. 7 1 : v 1 Introduction i X Learningandanticipationarecentralfeaturesofcerebellarcomputationandfunction(Bastian,2006): r thecerebellumlearnsfromexperienceandisabletoanticipateevents, therebycomplementinga a reactivefeedbackcontrolbyananticipatoryfeed-forwardone(Hofstoetteretal.,2002;Herreros andVerschure,2013). Thisinterpretationisbasedonaseriesofanticipatorymotorbehaviorsthat originateinthecerebellum. Forinstance,anticipationisacrucialcomponentofacquiredbehaviorin eye-blinkconditioning(Gormezanoetal.,1983),atrialbytriallearningprotocolwhereaninitially neutralstimulussuchasatoneoralight(theconditioningstimulus,CS)isfollowed,afterafixed delay,byanoxiousone,suchasanairpufftotheeye(theunconditionedstimulus,US).Duringearly trials,aprotectiveunconditionedresponse(UR),ablink,occursreflexivelyinafeedbackmanner followingtheUS.Aftertrainingthough,awell-timedanticipatoryblink(theconditionedresponse, CR)precedestheUS.Thus,learningresultsinthe(partial)transferencefromaninitialfeedback actiontoananticipatory(orpredictive)feed-forwardone. Similarresponsesoccurduringanticipatory posturaladjustments,whichareposturalchangesthatprecedevoluntarymotormovements,such asraisinganarmwhilestanding(Massion,1992). Thegoaloftheseanticipatoryadjustmentsisto counteracttheposturalandequilibriumdisturbancesthatvoluntarymovementsintroduce. These 30thConferenceonNeuralInformationProcessingSystems(NIPS2016),Barcelona,Spain. behaviorscanbeseenasfeedbackreactionstoeventsthatafterlearninghavebeentransferredto feed-forwardactionsanticipatingthepredictedevents. Anticipatoryfeed-forwardcontrolcanyieldhighperformancegainsoverfeedbackcontrolwhenever thefeedbackloopexhibitstransmission(ortransport)delays(Jordan,1996). However, evenifa planthasnegligibletransmissiondelays, itmaystillhavesizableinertiallatencies. Forexample, ifweapplyaforcetoavisco-elasticplant,itspeakvelocitywillbeachievedafteracertaindelay; i.e. the velocity itself will lag the force. An efficient way to counteract this lag will be to apply forcesanticipatingchangesinthedesiredvelocity. Thatis,anticipationcanbebeneficialevenwhen onecanactinstantaneouslyontheplant. Giventhat, hereweaddresstwoquestions: whatisthe optimalstrategytolearnanticipatoryactionsinacerebellar-basedarchitecture? andhowcoulditbe implementedinthecerebellum? Toanswerthatwedesignthecounterfactualpredictivecontrol(CFPC)scheme,acerebellar-based adaptive-anticipatorycontrolarchitecturethatlearnstoanticipateperformanceerrorsfromexperience. The CFPC scheme is motivated from neuro-anatomy and physiology of eye-blink conditioning. It includes a reactive controller, which is an output-error feedback controller that models brain stemreflexesactuatingoneyelidmuscles,andafeed-forwardadaptivecomponentthatmodelsthe cerebellumandlearnstoassociateitsinputswiththeerrorsignalsdrivingthereactivecontroller. WithCFPCweproposeagenericschemeinwhichafeed-forwardmoduleenhancestheperformance of a reactive error feedback controller steering it with signals that facilitate anticipation, namely, withcounterfactualerrors. However,withinCFPC,evenifthesecounterfactualerrorsthatenable predictive control are learned based on past errors in behavior, they do not reflect any current or forthcomingerrorintheongoingbehavior. In addition to eye-blink conditioning and postural adjustments, the interaction between reactive andcerebellar-dependentacquiredanticipatorybehaviorhasalsobeenstudiedinparadigmssuch asvisually-guidedsmoothpursuiteyemovements(Lisberger,1987). Alltheseparadigmscanbe abstracted as tasks in which the same predictive stimuli and disturbance or reference signal are repeatedlyexperienced. Inaccordancetothat,weoperateourcontrolschemeintrial-by-trial(batch) mode. Withthat, wederivealearningruleforanticipatorycontrolthatmodifiesthewell-known least-mean-squares/Widrow-Hoffrulewithaneligibilitytrace. Morespecifically,ourmodelpredicts thattofacilitatelearning,parallelfiberstoPurkinjecellsynapsesimplementaforwardmodelthat generatesaneligibilitytrace. Finally,tostressthatCFPCisnotspecifictoeye-blinkconditioning,we demonstrateitsapplicationwithasmoothpursuittask. 2 Methods 2.1 CerebellarModel w x 1 1 w x j j x N w N e o Figure1: AnatomicalschemeofaCerebellarPurkinjecell. Thex denoteparallelfiberinputsto j Purkinjesynapses(inred)withweightsw . odenotestheoutputofthePurkinjecell. Theerrorsignal j e,throughtheclimbingfibers(ingreen),modulatessynapticweights. Wefollowthesimplifyingapproachofmodelingthecerebellumasalinearadaptivefilter, while focusingoncomputationsatthelevelofthePurkinjecells,whicharethemainoutputcellsofthe cerebellarcortex(Fujita,1982;Deanetal.,2010). Overthemossyfibers,thecerebellumreceives a wide range of inputs. Those inputs reach Purkinke cells via parallel fibers (Fig. 1), that cross 2 dendritictreesofPurkinjecellsinaratioofupto1.5×106parallelfibersynapsespercell(Eccles etal.,1967). Wedenotethesignalcarriedbyaparticularfiberasx ,j ∈[1,G],withGequaltothe j totalnumberofinputsfibers. Theseinputsfromthemossy/parallelfiberpathwaycarrycontextual information(interoceptiveorexteroceptive)thatallowsthePurkinjecelltogenerateafunctional output. Werefertotheseinputsascorticalbases,indicatingthattheyarelocalizedatthecerebellar cortexandthattheyprovidearepertoireofstatesandinputsthatthecerebellumcombinestogenerate itsoutputo. Aswewilldevelopadiscretetimeanalysisofthesystem,weusentoindicatetime(or time-step). Theoutputofthecerebellumatanytimepointnresultsfromaweightedsumofthose corticalbases. w indicatestheweightorsynapticefficacyassociatedwiththefiberj. Thus, we j havex[n]=[x [n],...,x [n]](cid:124)andw[n]=[w [n],...,w [n]](cid:124)(wherethetranspose,(cid:124),indicates 1 G 1 G thatx[n]andw[n]arecolumnvectors)containingthesetofinputsandsynapticweightsattimen, respectively,whichdeterminetheoutputofthecerebellumaccordingto (cid:124) o[n]=x[n] w[n] (1) Theadaptivefeed-forwardcontrolofthecerebellumstemsfromupdatingtheweightsaccordingtoa ruleoftheform ∆w [n+1]=f(x [n],...,x [1],e[n],Θ) (2) j j j where Θ denotes global parameters of the learning rule; x [n],...,x [1], the history of its pre- j j synapticinputsofsynapsej;ande[n],anerrorsignalthatisthesameforallsynapses,corresponding tothedifferencebetweenthedesired,r,andtheactualoutput,y,ofthecontrolledplant. Notethatin drawingananalogywiththeeye-blinkconditioningparadigm,weusethesimplifyingconvention ofconsideringthenoxiousstimulus(theair-puff)asareference,r,thatindicatesthattheeyelids shouldclose;theclosureoftheeyelidastheoutputoftheplant,y;andthesensoryresponsetothe noxiousstimulusasanerror,e,thatencodesthedifferencebetweenthedesired,r,andtheactual eyelidclosures,y. Giventhis,weadvanceanewlearningrule,f,thatachievesoptimalperformance inthecontextofeye-blinkconditioningandothercerebellarlearningparadigms. 2.2 CerebellarControlArchitecture Cerebellum (cortex and nuclei) ADAPTIVE-ANTICIPATORY and Inferior olive [FF] x (FEED-FORWARD) LAYER FF [o] [e] o Trigeminal [e] Facial nucleus nucleus [x] r e u [C] + + C P [u] - - + Eyelids Pons [y] (Blink) y [r] [P] CS REACTIVE FEEDBACK CLOSED- US (Context, (FEEDBACK) LAYER LOOP SYSTEM (airpuff) e.g.: sound, light) Figure2: Neuroanatomyofeye-blinkconditioningandtheCFPCarchitecture. Left: Mappingof signalstoanatomicalstructuresineye-blinkconditioning(DeZeeuwandYeo,2005);regulararrows indicateexternalinputsandoutputs,arrowswithinvertedheadsindicateneuralpathways. Right: CFPCarchitecture. Notethatthefeedbackcontroller,C,andthefeed-forwardmodule,FF,belong tothecontrolarchitecture,whiletheplant,P,denotesanobjectcontrolled. Otherabbreviations: r, referencesignal;y,plant’soutput;e,outputerror;x,basissignals;o,feed-forwardsignal;andu, motorcommand. Weembedtheadaptivefiltercerebellarmoduleinalayeredcontrolarchitecture,namelytheCFPC architecture,basedontheinteractionbetweenbrainstemmotornucleidrivingmotorreflexesand the cerebellum, such as the one established between the cerebellar microcircuit responsible for conditionedresponsesandthebrainstemreflexcircuitrythatproducesunconditionedeye-blinks (HesslowandYeo,2002)(Fig.2left). Notethatinourinterpretationofthisanatomyweassume thatcerebellaroutput,o,feedsthelowerreflexcontroller(Fig.2right). Putincontroltheoryterms, withintheCFPCschemeanadaptivefeed-forwardlayersupplementsanegativefeedbackcontroller steeringitwithfeed-forwardsignals. 3 Our architecture uses a single-input single-output negative-feedback controller. The controller receivesasinputtheoutputerrore=r−y. Forthederivationofthelearningalgorithm,weassume thatbothplantandcontrollerarelinearandtime-invariant(LTI)systems. Importantly,thefeedback controllerandtheplantformareactiveclosed-loopsystem,thatmathematicallycanbeseenasa systemthatmapsthereference,r,intotheplant’soutput,y. Afeed-forwardlayerthatcontainsthe above-mentionedcerebellarmodelprovidesthenegativefeedbackcontrollerwithanadditionalinput signal,o. Werefertooasacounter-factualerrorsignal,sincealthoughitmechanisticallydrivesthe negativefeedbackcontrolleranalogouslytoanerrorsignalitisnotanactualerror.Thecounterfactual errorisgeneratedbythefeed-forwardmodulethatreceivesanoutputerror,e,asitsteachingsignal. Notably,fromthepointofviewofthereactivelayerclosed-loopsystem,ocanalsobeinterpretedas asignalthatoffsetsr. Inotherwords,evenifrremainsthereferencethatsetsthetargetofbehavior, r+ofunctionsastheeffectivereferencethatdrivestheclosed-loopsystem. 3 Results 3.1 Derivationofthegradientdescentupdateruleforthecerebellarcontrolarchitecture WeapplytheCFPCarchitecturedefinedintheprevioussectiontoataskthatconsistsinfollowing afinitereferencesignalr ∈ RN thatisrepeatedtrial-by-trial. Toanalyzethissystem,weusethe discretetimeformalismandassumethatallcomponentsarelineartime-invariant(LTI).Giventhis, bothreactivecontrollerandplantcanbelumpedtogetherintoaclosed-loopdynamicalsystem,that canbedescribedwiththedynamicsA,inputB,measurementCandfeed-throughDmatrices. In general,thesematricesdescribehowthestateofadynamicalsystemautonomouslyevolveswith time,A;howinputsaffectsystemstates,B;howstatesaremappedintooutputs,C;andhowinputs instantaneouslyaffectthesystem’soutputD(AstromandMurray,2012). Asweconsiderareference ofafinitelengthN,wecanconstructtheN-by-N transfermatrixT asfollows(Boyd,2008)  D 0 0 ... 0  CB D 0 ... 0    CAB CB D ... 0  T =   ... ... ... ... ...  CAN−2B CAN−3B CAN−4B ... D Withthistransfermatrixwecanmapanygivenreferencerintoanoutputy usingy =Tr,obtaining r r whatwouldhavebeenthecompleteoutputtrajectoryoftheplantonanentirelyfeedback-driventrial. NotethatthefirstcolumnofT containstheimpulseresponsecurveoftheclosed-loopsystem,while therestofthecolumnsareobtainedshiftingthatimpulseresponsedown. Therefore,wecanbuild thetransfermatrixT eitherinamodel-basedmanner,derivingthestate-spacecharacterizationof theclosed-loopsystem,orinmeasurement-basedmanner,measuringtheimpulseresponsecurve. Additionally,notethat(I−T)ryieldstheerrorofthefeedbackcontrolinfollowingthereference,a signalwhichwedenotewithe . 0 Leto∈RN betheentirefeed-forwardsignalforagiventrial. Givencommutativity,wecanconsider thatfromthepointofviewoftheclosed-loopsystemoisaddeddirectlytothereferencer,(Fig.2 right). In that case, we can use y = T(r+o) to obtain the output of the closed-loop system whenitisdrivenbyboththereferenceandthefeed-forwardsignal. Thefeed-forwardmoduleonly outputslinearcombinationsofasetofbases. LetX ∈ RN×G beamatrixwiththecontentofthe GbasesduringalltheN timestepsofatrial. Thefeed-forwardsignalbecomeso = Xw,where w∈RGcontainsthemixingweights. Hence,theoutputoftheplantgivenaparticularwbecomes y=T(r+Xw). Weimplementlearningastheprocessofadjustingtheweightswofthefeed-forwardmoduleina trial-by-trialmanner. Ateachtrialthesamereferencesignal,r,andbases,X,arerepeated. Through learningwewanttoconvergetotheoptimalweightvectorw∗definedas 1 1 w∗ =argminc(w)=argmin e(cid:124)e=argmin (r−T(r+Xw))(cid:124)(r−T(r+Xw)) (3) 2 2 w w w wherecindicatestheobjectivefunctiontominimize,namelytheL normorsumofsquarederrors. 2 WiththesubstitutionX˜ =TXandusinge =(I−T)r,theminimizationproblemcanbecastasa 0 4 canonicallinearleast-squaresproblem: 1 w∗ =argmin (e −X˜w)(cid:124)(e −X˜w) (4) 2 0 0 w Onetheonehand,thisallowstodirectlyfindtheleastsquaressolutionforw∗,thatis,w∗ =X˜†e , 0 where†denotestheMoore-Penrosepseudo-inverse. Ontheotherhand,andmoreinterestingly,with w[k]beingtheweightsattrialkandhavinge[k]=e −X˜w[k],wecanobtainthegradientofthe 0 errorfunctionattrialkwithrelationtowasfollows: ∇ c=−X˜(cid:124)e[k]=−X(cid:124)T(cid:124)e[k] w Thus,settingηasaproperlyscaledlearningrate(theonlyglobalparameterΘoftherule),wecan derivethefollowinggradientdescentstrategyfortheupdateoftheweightsbetweentrials: (cid:124) (cid:124) w[k+1]=w[k]+ηX T e[k] (5) Thissolvesforthelearningrulef ineq.2. Notethatf isconsistentwithboththecerebellaranatomy (Fig.2left)andthecontrolarchitecture(Fig.2right)inthatthefeed-forwardmodule/cerebellumonly requirestwosignalstoupdateitsweights/synapticefficacies: thebasisinputs,X,anderrorsignal,e. 3.2 T(cid:124)facilitatesasynapticeligibilitytrace Thestandardleastmeansquares(LMS)rule(alsoknownasWidrow-Hoffordecorrelationlearning rule) can be represented in its batch version as w[k +1] = w[k]+ηX(cid:124)e[k]. Hence, the only differencebetweenthebatchLMSruleandtheonewehavederivedistheinsertionofthematrix factorT(cid:124). Nowwewillshowhowthisfactoractsasafilterthatcomputesaneligibilitytraceateach weight/synapse. Notethattheupdateofasingleweight,accordingEq.5becomes (cid:124) (cid:124) w [k+1]=w [k]+ηx T e[k] (6) j j j wherex containsthesequenceofvaluesofthecorticalbasisj duringtheentiretrial. Thiscanbe j rewrittenas (cid:124) w [k+1]=w [k]+ηh e[k] (7) j j j withh ≡Tx . Theaboveinnerproductcanbeexpressedasasumofscalarproducts j j N (cid:88) w [k+1]=w [k]+η h [n]e[k,n] (8) j j j n=1 wherenindexesthewithintrialtime-step. Notethate[k]inEq.7referstothewholeerrorsignal attrialk wherease[k,n]inEq.8referstotheerrorvalueinthen-thtime-stepofthetrialk. Itis nowclearthateachh [n]weighshowmuchanerrorarrivingattimenshouldmodifytheweight j w ,whichispreciselytheroleofaneligibilitytrace. NotethatsinceT containsinitscolumns/rows j shiftedrepetitionsoftheimpulseresponsecurveoftheclosed-loopsystem,theeligibilitytracecodes atanytimen,theconvolutionofthesequenceofpreviousinputswiththeimpulse-responsecurveof thereactivelayerclosed-loop. Indeed,ineachsynapse,theeligibilitytraceisgeneratedbyaforward modeloftheclosed-loopsystemthatisexclusivelydrivenbythebasissignal. Consequently,ourmainresultisthatbyderivingagradientdescentalgorithmfortheCFPCcerebellar control architecture we have obtained an exact definition of the suitable eligibility trace. That definition guarantees that the set of weights/synaptic efficacies are updated in a locally optimal mannerintheweights’space. 3.3 On-linegradientdescentalgorithm Thetrial-by-trialformulationaboveallowedforastraightforwardderivationofthe(batch)gradient descentalgorithm. Asitlumpedtogetherallcomputationsoccurringinasametrial,itaccountedfor timewithinthetrialimplicitlyratherthanexplicitly: one-dimensionaltime-signalsweremappedonto pointsinahigh-dimensionalspace. However,afterhavingestablishedthegradientdescentalgorithm, wecanimplementthesameruleinanon-linemanner,droppingtherepetitivenessassumptioninherent totrial-by-triallearningandperformingallcomputationslocallyintime. Eachweight/synapsemust 5 haveaprocessassociatedtoitthatoutputstheeligibilitytrace. Thatprocesspassestheincoming (unweighted)basissignalthrougha(forward)modeloftheclosed-loopasfollows: s [n+1] = As [n]+Bx [n] j j j h [n] = Cs [n]+Dx [n] j j j wherematricesA,B,C andDrefertotheclosed-loopsystem(theyarethesamematricesthatwe usedtodefinethetransfermatrixT),ands [n]isthestatevectoroftheforwardmodelofthesynapse j j attime-stepn. Inpractice,each“synaptic”forwardmodelcomputeswhatwouldhavebeenthe effectofhavingdriventheclosed-loopsystemwitheachbasissignalalone. Giventhesuperposition principle,theoutcomeofthatcomputationcanalsobeinterpretedassayingthath [n]indicateswhat j wouldhavebeenthedisplacementoverthecurrentoutputoftheplant,y[n],achievedfeedingthe closed-loopsystemwiththebasissignalx . Theprocessofweightupdateiscompletedasfollows: j w [n+1]=w [n]+ηh [n]e[n] (9) j j j Ateachtimestepn,theerrorsignale[n]ismultipliedbythecurrentvalueoftheeligibilitytrace h [n],scaledbythelearningrateη,andsubtractedtothecurrentweightw [n]. Thereforewhereas j j thecontributionofeachbasistotheoutputoftheadaptivefilterdependsonlyonitscurrentvalueand weight,thechangeinweightdependsonthecurrentandpastvaluespassedthroughaforwardmodel oftheclosed-loopdynamics. 3.4 Simulationofavisually-guidedsmoothpursuittask We demonstrate the CFPC approach in an example of a visual smooth pursuit task in which the eyeshavetotrackatargetmovingonascreen. Eventhoughthesimulationdoesnotcaptureallthe complexityofasmoothpursuittask,itillustratesouranticipatorycontrolstrategy. Wemodelthe plant(eyeandocularmuscles)withatwo-dimensionallinearfilterthatmapsmotorcommandsinto angularpositions. Ourmodelisanextensionofthemodelin(PorrillandDean,2007),eventhough inthatworktheplantwasconsideredinthecontextofthevestibulo-ocularreflex. Inparticular,we useachainoftwoleakyintegrators: aslowintegratorwitharelaxationconstantof100msdrivesthe eyesbacktotherestposition;thesecondintegrator,withafasttimeconstantof3msensuresthat thechangeinpositiondoesnotoccurinstantaneously. Tothisbasicplant,weaddareactivecontrol layermodeledasaproportional-integral(PI)error-feedbackcontroller,withproportionalgaink and p integralgaink . Thecontrolloopincludesa50msdelayintheerrorfeedback,toaccountforboth i theactuationandthesensinglatency. Wechoosegainssuchthatreactivetrackinglagsthetargetby approximately100ms. Thisgivesk =20andk =100. Tocompletetheanticipatoryandadaptive p i controlarchitecture,theclosed-loopsystemissupplementedbythefeed-forwardmodule. n (a.u.)0.18 ryy[[15]0] n (a.u.) 00..12 eeo[[[155]00]] positio00..46 positio 0 ar ar ul0.2 ul ang 0 ang−0.1 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 time (s) time (s) Figure3: Behaviorofthesystem. Left: Reference(r)andoutputofthesystembefore(y[1])and after learning (y[50]). Right: Error before e[1] and after learning e[50] and output acquired by cerebellar/feed-forwardcomponent(o[50]) Thearchitectureimplementingtheforwardmodel-basedgradientdescentalgorithmisappliedtoa taskstructuredintrialsof2.5secduration. Withineachtrial,atargetremainsstillatthecenterof thevisualsceneforaduration0.5sec,nextitmovesrightwardsfor0.5secwithconstantvelocity, remainsstillfor0.5secandrepeatsthesequenceofmovementsinreverse,returningtothecenter. Thecerebellarcomponentreceives20Gaussianbasissignals(X)whosereceptivefieldsaredefined inthetemporaldomain,relativetotrialonset,withawidth(standard-deviation)of50msandspaced by100ms. Thewholesystemissimulatedusinga1mstime-step. ToconstructthematrixT we computedclosed-loopsystemimpulseresponse. 6 Atthefirsttrial,beforeanylearning,theoutputoftheplantlagsthereferencesignalbyapproximately 100msconvergingtothepositiononlywhenthetargetremainsstillforabout300ms(Fig. 3left). As aresultoflearning,theplant’sbehaviorshiftsfromareactivetoananticipatorymode,beingableto trackthereferencewithoutanydelay. Indeed,theerrorthatissizableduringthetargetdisplacement before learning, almost completely disappears by the 50th trial (Fig. 3 right). That cancellation resultsfromlearningtheweightsthatgenerateafeed-forwardpredictivesignalthatleadsthechanges inthereferencesignal(onsetsandoffsetsoftargetmovements)byapproximately100ms(Fig. 3 right). Indeed,convergenceofthealgorithmisremarkablyfastandbytrial7ithasalmostconverged totheoptimalsolution(Fig. 4). WH 1 WH+50ms 0.8 WH+70ms E FM−ET S M0.6 R r 0.4 0.2 0 0 10 20 30 40 50 #trial Figure4: Performanceachievedwithdifferentlearningrules. Representativelearningcurvesofthe forwardmodel-basedeligibilitytracegradientdescent(FM-ET),thesimpleWidrow-Hoff(WH)and theWidrow-Hoffalgorithmwithadelta-eligibilitytracematchedtoerrorfeedbackdelay(WH+50 ms)orwithaneligibilitytraceexceedingthatdelayby20ms(WH+70ms). Errorisquantifiedasthe relativerootmean-squarederror(rRMSE),scaledproportionallytotheerrorinthefirsttrial. Errorof theoptimalsolution,obtainedwithw∗ =(TX)†e ,isindicatedwithadashedline. 0 Toassesshowmuchourforward-model-basedeligibilitytracecontributestoperformance,wetest threealternativealgorithms. Inbothcasesweemploythesamecontrolarchitecture,changingthe plasticityrulesuchthatweeitherusenoeligibilitytrace,thusimplementingthebasicWidrow-Hoff learningrule,orusetheWidrow-Hoffruleextendedwithadelta-functioneligibilitytracethatmatches thelatencyoftheerrorfeedback(50ms)orslightlyexceedsit(70ms). Performancewiththebasic WHmodelworsensrapidlywhereasperformancewiththeWHlearningruleusinga“puredelay” eligibilitytracematchedtothetransportdelayimprovesbutnotasfastaswiththeforward-model- basedeligibilitytrace(Fig. 4). Indeed,inthiscase,thebeststrategyforimplementingadelayed deltaeligibilitytraceissettingadelayexceedingthetransportdelaybyaround20ms,thusmatching the peak of the impulse response. In that case, the system performs almost as good as with the forward-model eligibility trace (70 ms). This last result implies that, even though the literature usuallyemphasizestheroleoftransportdelays,eligibilitytracesalsoaccountforresponselagsdue tointrinsicdynamicsoftheplant. Tosummarizeourresults,wehaveshownwithabasicsimulationofavisualsmoothpursuittask thatgeneratingtheeligibilitytracebymeansofaforwardmodelensuresconvergencetotheoptimal solutionandaccelerateslearningbyguaranteeingthatitfollowsagradientdescent. 4 Discussion Inthispaperwehaveintroducedanovelformulationofcerebellaranticipatorycontrol,consistent withexperimentalevidence,inwhichaforwardmodelhasemergednaturallyatthelevelofPurkinje cellsynapses. Fromamachinelearningperspective,wehavealsoprovidedanoptimalityargument forthederivationofaneligibilitytrace,aconstructthatwasoftenthoughtofinmoreheuristicterms asamechanismtobridgetime-delays(Bartoetal.,1983;ShibataandSchaal,2001;McKinstryetal., 2006). Thefirstseminalworksofcerebellarcomputationalmodelsemphasizeditsroleasanassociative memory(Marr,1969;Albus,1971). Later,thecerebellumwasinvestigatesasadeviceprocessing correlatedtimesignals(Fujita,1982;Kawatoetal.,1987;Deanetal.,2010). Inthislatterframework, 7 the use of the computational concept of an eligibility trace emerged as a heuristic construct that allowedtocompensatefortransmissiondelaysinthecircuit(Kettneretal.,1997;ShibataandSchaal, 2001; Porrill and Dean, 2007), which introduced lags in the cross-correlation between signals. Concretely,thatwasreferredtoastheproblemofdelayederrorfeedback,duetowhich,bythetime anerrorsignalreachesacell,thesynapsesaccountableforthaterrorarenolongertheonescurrently active,butthosethatwereactiveatthetimewhenthemotorsignalsthatcausedtheactualerrorwere generated. Thisviewhashoweverneglectedthefactthatbeyondtransportdelays,responsedynamics of physical plants also influence how past pre-synaptic signals could have related to the current outputoftheplant. Indeed,foralinearplant,theimpulse-responsefunctionoftheplantprovidesthe completedescriptionofhowinputswilldrivethesystem,andassuch,integratestransmissiondelays aswellasthedynamicsoftheplant. Recently, Eventhoughcerebellarmicrocircuitshavebeenusedasmodelsforbuildingcontrolarchitectures, e.g., the feedback-error learning model (Kawato et al., 1987), our CFPC is novel in that it links thecerebellumtotheinputofthefeedbackcontroller,ensuringthatthecomputationalfeaturesof thefeedbackcontrollerareexploitedatalltimes. Withinthedomainofadaptivecontrol,thereare remarkablesimilaritiesatthefunctionallevelbetweenCFPCanditerativelearningcontrol(ILC) (Amann et al., 1996), which is an input design technique for learning optimal control signals in repetitive tasks. The difference between our CFPC and ILC lies in the fact that ILC controllers directlylearnacontrolsignal,whereas,theCFPClearnsaconterfactualerrorsignalthatsteersa feedback controller. However the similarity between the two approaches can help for extending CFPCtomorecomplexcontroltasks. WithourCFPCframework,wehavemodeledthecerebellarsystemataveryhighlevelofabstraction: we have not included bio-physical constraints underlying neural computations, obviated known anatomicalconnectionssuchasthecerebellarnucleo-olivaryinhibition(BengtssonandHesslow, 2006;HerrerosandVerschure,2013)andmadesimplificationssuchascollapsingcerebellarcortexand nucleiintothesamecomputationalunit. Ontheonehand,suchachoiceofhigh-levelabstractionmay indeedbebeneficialforderivinggeneral-purposemachinelearningoradaptivecontrolalgorithms. Ontheotherhand,itisremarkablethatinspiteofthisabstractionourframeworkmakesfine-grained predictionsatthemicro-levelofbiologicalprocesses. Namely,thatinacerebellarmicrocircuit(Apps andGarwicz,2005),theresponsedynamicsofsecondarymessengers(Wangetal.,2000)regulating plasticityofPurkinjecellsynapsestoparallelfibersmustmimicthedynamicsofthemotorsystem beingcontrolledbythatcerebellarmicrocircuit. Notably,thelogicalconsequenceofthisprediction, thatdifferentPurkinjecellsshoulddisplaydifferentplasticityrulesaccordingtothesystemthatthey control,hasbeenvalidatedrecordingsinglePurkinjecellsinvivo(Suvrathanetal.,2016). Inconclusion,wefindthatanormativeinterpretationofplasticityrulesinPurkinjecellsynapses emerges from our systems level CFPC computational architecture. That is, in order to generate optimaleligibilitytraces,synapsesmustincludeaforwardmodelofthecontrolledsubsystem. This conclusion,inthebroaderpicture,suggeststhatsynapsesarenotmerelycomponentsofmultiplicative gains, but rather the loci of complex dynamic computations that are relevant from a functional perspective,both,intermsofoptimizingstoragecapacity(BennaandFusi,2016;LahiriandGanguli, 2013)andfine-tuninglearningrulestobehavioralrequirements. Acknowledgments TheresearchleadingtotheseresultshasreceivedfundingfromtheEuropeanCommission’sHorizon 2020socSMCproject(socSMC-641321H2020-FETPROACT-2014)andbytheEuropeanResearch Council’sCDACproject(ERC-2013-ADG341196). References Albus,J.S.(1971). Atheoryofcerebellarfunction. MathematicalBiosciences,10(1):25–61. Amann,N.,Owens,D.H.,andRogers,E.(1996). Iterativelearningcontrolfordiscrete-timesystemswith exponentialrateofconvergence. IEEProceedings-ControlTheoryandApplications,143(2):217–224. Apps, R. and Garwicz, M. (2005). Anatomical and physiological foundations of cerebellar information processing. Naturereviews.Neuroscience,6(4):297–311. Astrom,K.J.andMurray,R.M.(2012). FeedbackSystems: AnIntroductionforScientistsandEngineers. Princetonuniversitypress. 8 Barto,A.G.,Sutton,R.S.,andAnderson,C.W.(1983). Neuronlikeadaptiveelementsthatcansolvedifficult learningcontrolproblems. IEEEtransactionsonsystems,man,andcybernetics,SMC-13(5):834–846. Bastian,A.J.(2006). Learningtopredictthefuture: thecerebellumadaptsfeedforwardmovementcontrol. CurrentOpinioninNeurobiology,16(6):645–649. Bengtsson,F.andHesslow,G.(2006). Cerebellarcontroloftheinferiorolive. Cerebellum(London,England), 5(1):7–14. Benna, M. K. and Fusi, S. (2016). Computational principles of synaptic memory consolidation. Nature neuroscience. Boyd,S.(2008). Introductiontolineardynamicalsystems. OnlineLectureNotes. DeZeeuw,C.I.andYeo,C.H.(2005). Timeandtideincerebellarmemoryformation. Currentopinionin neurobiology,15(6):667–74. Dean,P.,Porrill,J.,Ekerot,C.-F.,andJörntell,H.(2010). Thecerebellarmicrocircuitasanadaptivefilter: experimentalandcomputationalevidence. Naturereviews.Neuroscience,11(1):30–43. Eccles,J.,Ito,M.,andSzentágothai,J.(1967). Thecerebellumasaneuronalmachine. SpringerBerlin. Fujita,M.(1982). Adaptivefiltermodelofthecerebellum. Biologicalcybernetics,45(3):195–206. Gormezano,I.,Kehoe,E.J.,andMarshall,B.S.(1983). Twentyyearsofclassicalconditioningwiththerabbit. Herreros,I.andVerschure,P.F.M.J.(2013). Nucleo-olivaryinhibitionbalancestheinteractionbetweenthe reactiveandadaptivelayersinmotorcontrol. NeuralNetworks,47:64–71. Hesslow,G.andYeo,C.H.(2002). Thefunctionalanatomyofskeletalconditioning. InAneuroscientist’sguide toclassicalconditioning,pages86–146.Springer. Hofstoetter,C.,Mintz,M.,andVerschure,P.F.(2002). Thecerebelluminaction: asimulationandrobotics study. EuropeanJournalofNeuroscience,16(7):1361–1376. Jordan,M.I.(1996). Computationalaspectsofmotorcontrolandmotorlearning. InHandbookofperception andaction,volume2,pages71–120.AcademicPress. Kawato,M.,Furukawa,K.,andSuzuki,R.(1987). Ahierarchicalneural-networkmodelforcontrolandlearning ofvoluntarymovement. BiologicalCybernetics,57(3):169–185. Kettner,R.E.,Mahamud,S.,Leung,H.C.,Sitkoff,N.,Houk,J.C.,Peterson,B.W.,andBarto,a.G.(1997). Predictionofcomplextwo-dimensionaltrajectoriesbyacerebellarmodelofsmoothpursuiteyemovement. Journalofneurophysiology,77:2115–2130. Lahiri,S.andGanguli,S.(2013). Amemoryfrontierforcomplexsynapses. InAdvancesinneuralinformation processingsystems,pages1034–1042. Lisberger, S. (1987). Visual Motion Processing And Sensory-Motor Integration For Smooth Pursuit Eye Movements. AnnualReviewofNeuroscience,10(1):97–129. Marr,D.(1969). Atheoryofcerebellarcortex. TheJournalofphysiology,202(2):437–470. Massion,J.(1992).Movement,postureandequilibrium:Interactionandcoordination.ProgressinNeurobiology, 38(1):35–56. McKinstry,J.L.,Edelman,G.M.,andKrichmar,J.L.(2006). Acerebellarmodelforpredictivemotorcontrol testedinabrain-baseddevice. ProceedingsoftheNationalAcademyofSciencesoftheUnitedStatesof America,103(9):3387–3392. Porrill,J.andDean,P.(2007). Recurrentcerebellarloopssimplifyadaptivecontrolofredundantandnonlinear motorsystems. Neuralcomputation,19(1):170–193. Shibata,T.andSchaal,S.(2001). Biomimeticsmoothpursuitbasedonfastlearningofthetargetdynamics. In IntelligentRobotsandSystems,2001.Proceedings.2001IEEE/RSJInternationalConferenceon,volume1, pages278–285.IEEE. Suvrathan, A., Payne, H. L., and Raymond, J. L. (2016). Timing rules for synaptic plasticity matched to behavioralfunction. Neuron,92(5):959–967. Wang,S.S.-H.,Denk,W.,andHäusser,M.(2000). Coincidencedetectioninsingledendriticspinesmediatedby calciumrelease. Natureneuroscience,3(12):1266–1273. 9

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.