Fast Sigmoidal Networks via Spiking Neurons Wolfgang Maass Institutefor Theoretical CompufnS oence, Technirche Universitmt Gror, GIOZA, ustria We show that networks of relatively realistic mathematical models for biological neurons in principle can simulate arbitrary feedfornard sig- moidal neural neb in away that has previoudv not been considered. This new approac.h, is based on temporal- codi.ng b; single spikes (mspectively bv the timine of svnchmnous firin= in ~oalosf neurons) rather than on the traditional inte2 rpretation of analog variables in terms of firing rates. Ihe resulting new simulation is subrtantially faster and hence more con- sistent withexperimental resulbabout Ihemuimalspeedof information processing in cortical neural systems. Asa conseque.n.c e wecan show that networks of noisy spiki.n.g neurons are "universal soomximators" in the sense that thev, can a~oroximate with regard to temporal coding any givencontinuous function of several variables. This result holds for a fairly large class of schemes for coding analog variables by firing times of spiking neurons. This new proposal for the possible organization of computations in nehvorks of spiking neurons systems has some interesting consequences for the type of learning rules that would be needed to explain the self- organization of such networks. Finally, the fast and noise-robust implementation of sigmoidal neural nets by temporal coding points to possible new ways of implementing feedforward and recurrent sigmoidal neural neb with pulse stream VLSI. 1 Introduction Sigmoidal neural nets are the most powerful and flexible computational model known today. In addition they have the advantage of allowing "self- organiration"via a variety of quitesuccessful Learningalgorithms. Unforh- nately thecomputational unitsof sigmoidal neural netsdiffer strongly from biological neurons, and it is particularly dubious whether sigmoidal neural nets provide a useful paradigm for the organization off ast computations in cortical neural svstems. lrad~honallyo n', vmws the brmg rate of a neuron as the rep-nta- twn 01 an analog vanable m analog computatwns with sp~kmgne urons, ~n part~cular.m thc %~mulahoonf s~gmondaln eural net5 by spekmg nrumns Neural Computorzon 9 , 2 m(1 997) @ 1997 Ma.rachuselb lnshhlte of Technology 280 Wolfgang Maass However, with regard tofastcorticalcomputations, thisview isinconsistent with exoerimental data. Perrett el ol. (.1 9821. and Tho- and Imbert (.1 9891. have demonstrated that visual pattern analysis and pattern classification can be carried out by humans in just 100 ms, in spite of the fact that it involves a minimum of 10 synaptic stages from theretina to the temporal lobe. The same speed of visual processing has been measured by Rolls and others in macaaue monkevs. Furthermore thev have shown that a sin& cortical area involved m vl,ual proce,wn): can complcle !b compulaliun in lust 20 to 70 ms (Rolls 1994. Rolls and Tovee 19Y4) On the other hand, the firmg rates of neurons involved in these computations are usually below 100 Hz. and hence at least 20 to 30 ms would be needed iust to samole the current firing rate of a neuron. Thus a coding of analog variables by firing rates is quite dubious in the context of fast cortical computations. Experimental evidence accumulated during the past few years indicates thatmany biological neuralsystemsuse the timingofsingleactionpotentials (or "svikes") to encode information (Abeles et al. 1993: Bialek and Rieke 1992, bar <I vl 1994 Fersler and Spruaton lY95 Hopfreld 1995, Kempter rl a1 1996, Slnowsk~1 995. Suftky 1994, Thorpe and Imbert 1989, Rwke rl a1 1996) In addrtron varrous ekperrrnenh haw shown that brologlral neurons are able to fire in vitro with high timing precision (Bryant &d Segundo 1976: Seeundo 1994: Mainen and Seinowski 1995). Weshorn ~nt hlrartrlt. that thenex~stsato mpletely dlfferrnr wav of slm ulattngs~gmo~dalneuralnwetn~t h,pakmgneuronrthat ts bawdon temporal coding with single spike ((and on temporal coding by synchronous firings of of neurons in a more noise-robust interp&tio"). This sirnulati& is based on the observationt hat in the oresenceof some other excitation that move, the membrane potmt~acll o* to the fmng thrrshold, mdwldualrwl- tatory postsynaphc potentials (E19Ps)o r urh~bntoryp ostsynaptrc potentrals (IPSPs) (or volleysof synchronrzed postsynaptrc potentds (PSPs))a re able to shift the firing time bf a neuron. his &e;han&n is. p. articularly easy to analvze if we work in a ranee where all PSPs can be aooroximated well bv llnrar functmns For thls range one cnn .how that the redtmg fnrmg tlmr IS lmearly related to the woghted sum of the fmng hmrs of the preaynaptx neurons, with the weights &responding to the efficacies ("strengths") of the involved synapses: We will explain this key observation in a bit mare detail at the end of this section, after defming the formal model of a noisy spiking neuron. Although this model ignores many of the intricate details of a biological neuron (e.g., nonlinearitiesi n dendritic integration),o ne may argue that it underestimates, rather than overestimates", the. com putational caoabilities of a bioloeical neuron. We are not makine ex~licitu se of the noise in spiking neurons. Rather we show that the computational mecha- nism is robust with respect to various types of noise. A complementarya pproach for simulating artificial neural nets by spik- ing neurons with temporal coding has recently been proposed (Hopfield 1995). Hopfield's construction yields basically a simulation of radial basis Fast Sigmoidal Networks via Spiking Neurons 281 function (RBF) units,. w here the weights o.f the. ,R BF units are stored in the delavs between sv,n aoses and soma of a s~ikinnee uron. Hence Hoofield's consmctlonprovtdesaneffrc~entw ay of mplementrnga look up tablc wrth spnkmg neurons (wlth rome very nice mvanance wgardm): the strength of the -tnmulud Ilowe\er, m contrast lo thcconstmchon consdered hew, hrs system rs bard on ' grandmother neurons,' and rt ~snogte ared toward pro- v~dmganm f.o mat~veoutputm a situatjon where themput (sl s,) d- not match (.U D to a factor) one of the fixed set of stored ~ittem(sb ecause it is. for example, a superposition of several stored patterns). Furthermore Hop- field's construction provides no method for simulating multilayer neural nets. In addition, in contrast to our construction, it no compu- tational or learning-related role to the efficacy (i.e. strength) of synapses between biological neurons. We describe in the remainder of this section the orecise models for si"e- moadal neural ncts and nosy sprkmng ncunms that we cons~dera, nd at the end of thrs sectLon describe the key mechantsm of our smulatron Ihe mam construction of this article is eive" in Section 2. and our main result is stated in the theoremat the end of that s~tionIn. Section3 weshow that this result implies that networks of noisy spiking neurons are -u niversal approxima- ton. Wealso move that this result holds for a fairlv lareeclass of schemes for temporal coding of analog variables. In Section 4> we briefly indicate some new perspectives about the organization of learning in biological neural svstems &at follow from this a&roach. ' We point out that this is nocan article about biology but about com- putational complexity theory. Its main results (given in Sections 2 and 3) are rieorous theoretical results about the com~u&ionalD ower of common mathrm~t~cmalo dels for networks of spkmg neurons However, some In- formal comment\ have bwn added (after the thcorrm In %dwn 2. as- w ell as in Sections 4 and 5) in order to facilitate a discussion of the bioloeical relevance of this mathematical model and its theoretical consequences. The computational unit of a sigmoidal neural net is .a sigmoidal gate (o-gate) G, that assigns to analog input numben XI.. .. x,_l 6 [O. y] an output of the form ' =xo;:( r, . x, + r.). The function o: R + [0, y] is called the activation function of G. n... .. ""-1 are the weiehts of G.. and r... is the bias of G. These are conside;ed adjustable paramete; of G in the context of a learnin-g p.r ocess. The parameter y > 0 determines the scale of the analog com~utationsc arried out bv the neural net. For convenience we assume that each G-gate G has an additional input x, with some constant valu.e c E (0, yl available. Hence after rescaling r., the function ,fc- that is com~utedb v, G can be viewed as a restriction oTthe function 282 Wolfgang Maass to arguments with x. = c. The original choice for the activation func- tion a in Rumelhart et ol. (1986) has been the Logistic sigmoid function o(y) = 1/(1 + 0'). Many years of practical experience with sigmoidal neural nets have shown that the exact form of the activation function o is not relevant for the computational power and learning capabilities of such neural nets, as Long as a is nondecreasing and almost everywhere differ- entiable, the Limits Limy,-, o(y) and limy,,o(y) have finite values, and o increases approximately linearly in some intermediate range. Gradient- descent learning procedures such as backpropagation formally require that o is differentiable everywhere, but practically one can just as well use the piecewise linear "Linear-saturated activation function n,: R + [O, y] de- fined by I 0. ify<O n,(y)= y. if05y5y Y ify>y. As a model for a spiking neuron we take the common model of a Leaky integrate-and-fire neuron with noise, in the formulation of the somewhat more general spike response model of Gershler and van Hemmen (1994). The onlv specific assumption needed for the consbction in this article is that po&+aphc potent;als can be dexrtbed (or at least approxmated) by a llnear funchon dumg some rnrhal segment Actually theconstmchons of thtb art& appear to be of ~ntereset ven ~f this asumphon 15n ot sattsfied, but in that case thev are harder to analvze theoreticaliv. We consider networks that consist of a finite set V of spiking neurons, aE, . s.e,t: RE* G + V R x f oVr eoafc shv. sn v,an .pas o.es se, a( uw. ve)iz Eh tE w(w.h, e>re 0R aCn :d= aI xr eEs pRo: nxs e> f 0u1n)c. atinodn a thmhold funchon (3, Rt + R' for each neuron u c V Each responsefunctmnc, , modelsetther an EPSPor an 1151' lhr typral ahaifw F. of cE RE+P sis a tnhde lsPeSt Posf IfSi rmindg~ tcimaleesd aon f Fa~keeuurroe1 n u, then the potential at the higger zone of neuron v at time t is given by Furthermore one considers a threshold function Q,(t - t') that quantifies the "reluctance" of v to fire again at time t if its Last previous firing was at time P. Thus (3&) is extremely large for small x and then approaches 0,(0) for larger x. In a noise-& model, a neuron v fires at time t as swn as P,,(t) reaches @At- P). The precise form of this threshold function (3, is not important for the conshuctions in this article, since we consider here only computations that relv on the timinz of the first spike in a spike hain. Thus it suffices to assume that O,(t - t') = 0,(0) for sufficiently large values oft - t' and Fast Sigmoidal Networks via Spiking Neumns 283 Figure I. The lypncal shapeof mhtbalory and exotalory postsynaphc polenlials al a bmlo@cal neuron (We assume that the resnng mcmbrane pomennal has thc value 0.) tthhaet cmonfIs(t3r,uWct ionx oEf S(0e cYtIiIo nI S2 llaorrg et r tth a[nTo .th, e- pyo.l Te,,]n halTs- hP e,U laJ tttehra tc oonccduihr omn (which amounk to the assum~tionof a sufficientlv lone refracton, oeriod.) will prevent iterated firing of neuron u during the critical time2 .in terval [Tag - y, Tml. The construction in Section 2 is robust with respect to several types of noise that make the model of a spiking neuron biologically more realistic. As in the model for a leaky integrate-and-fire neuron with noise, we allow that the potentials P,(t) and the threshold functions <&(t - Y) are subject to some additive noise. Hence P,(t) is replaced by PySY(t): = P"(t) + ol"(t). and O,(t - t') is replaced by where a&) and &(t) describe the impact of some unknown (or even ad- versarial) source of noise (which might, for example, result from synapse failures). disOtrinbeu taesds uamcceosr din i nm-^ otost s opmreev iso-.iuf isc theo.vrr eotbicaabli lsittvu ddiiesstr tibhuatti oa&n )(..e .-e&. ,( wt) hairtee noise), whereas our subsequent constructions allow that a,(t). &(t) are some arbitrary functions with bounded absolute value (e.-~.," sy.stem atic noise"). Ina simpler model for a noisy spiking neuron, one assumes that a neuron v fim exactly at those time points t when P ( t )r eaches from below the value @Yg(-t 1'). We consider in this article a biologically more realistic 284 Wolfgang Maass model, whereas in Gerstner and van Hemmen (1994), the size of the differ- ence P ( t )- C)yy(t - t') governs just the probability that neuron u fires. The choice of the exact firing time is left up to some unknown stochastic processes, and it may, for example, occur that u does not fire in a time inter- val I duringwhich eY(t-) @yY(t-t' ) > 0,or thatv fires "spontaneously" at a time t when eY(t)- C)Y(t - t') c 0. For thesubsequent constructionsw e need only the following assumption about the firing mechanism: For any time interval I of length greater than 0, the probability that v fires during I is arbitrarily close to 1 if P'Yy(t) - 0YY(-t t') is sufficiently large for t E I (up to the time when u fires), and the probability thatv firesduring !isarbitrarily closeto~of~~(t)-@yy(t-t') - is sufficiently negative for all t E I. It turns out that it suffices to assume only the following rather weak propertiesof the response functions c,, ,,:Eachr esponse function E,,,: Rf R is either excitatory or inhibitory. All excitatory response functions e.,,(t) have thevalueOfor t E 10, d,,,l,and thevalue td,. for t E Id,,, d,,+Al, where d,,, ? 0 is some fixed delay and A > 0 is some other constant. FAu.dr,t, hen+n oAre + w ye] , awsshuemree yt hwati tEh , ,,0 ( tc) ?y c5.. ,(dA 1,2,., is+ a Ano) thfoerr aclol nts tEa nt[.d ,W., it+h freogratr Ed 1to0, din,.,h] ibaintodr yE r~es,p~o=n(s -De( tf u-nc dtiuo.ndsf Eo. r.t( Et) ,I dw ,,.e". adsus.u" m+e A t]h. aFtu ~rt,h,,e,r(mt)=o r0e we assume that E,, .(t) = 0 for all sufficiently Large t. Finally we need a mechanism for increasing the firing threshold C) := @do)o f a "rested" neuron v (at Least for a short ~eriod)O. ne biolo-eicall.v plaus!bleabsumptmn that would account for buchan ulcreasets that neuron u mewoba large number of iPSh from randomly fmngnruruna that arrwc onsynapses that are far away from the triggerzoneof vTso that eachof them has barely any effect on the dynamics of the potential at the trigger zone, but together they contributea rather steady negative summand BN- to the potential at the trigger zone. Other possible explanations for the increase of the firine threshold (3 could be based on the contribution of inhibitow mterneurons whose IPSPsarr~vecloseto lhesoma and are tmc locked tothe onset of the stimulus, or on lon-g -lasl~n- g~ nhnbittonssu ch as those mcdmted by GABAs receptors. Formally we assume that each neuron v receives some negative (i.e., inhibitory) potential BN- c 0 that can be assumed -to b-e constant during the time intervals that are considered in the followine areuments. In comparison with other models for spiking neurons, this model allows more general noise than the models considered in Gerstner and van Hem- men (1994) and Maass (1995). On the other hand this model is somewhat less general than the one cons~deredIn Maass (199ha) Havqd eftntd the formal modcl, wr can now cxplam the key mech- anism of ihe constructions in more detail. It is well known that i&oming Fast Sigmoidal Networks via Spiking Neurons 285 EPSPs and IEPs are able to shiff the firing time of a biological neuron. We explore this effect in the mathematical model of a spiking neuron, showing that in principle it can be used to carry out complex analog computations in temporal coding. Assume that a spiking neuron u receives PSPs from presy- naptic neurons a,. .. . ,an, that w, is the weight (efficacy) for the synapse from a, to u, and that d; is the time delay from a, to v. Then there exists a range of values for the parameters where the firing time L of neuron v can be written in terms of the firing times t,,, of the presynaptic neurons a, as Hence in principle a spiking neuron is able to compute in tempo.r al coding of inoutsand oubutsalinear function (where theefficaciesof sv,n aosesencode the coeff~cientosf the hear function, as in rate codnng of analog varmbles) Thecalculations at the beztnntnz of Sectnon 2 show that thrs holds pmrwly if there is no noise and then PSP~ar e at time t,, all in their initial liniarlv rik ing or linearly decreasing phase. However, for a biological interpretation, it is interesting to know that even if the firing times to, (or more precisely their effectivevaluest ., +di) lie further apart, this mechanism computes a mean- ingful approximation to a linear function. It employs (through the natural shaoeof BPS)a n interesting" adao.tatio n of outliersamonp. the t, +d;: Input neuronsa, that fire twl ate (relative to the average) losetheir influenceon the determination off,, and input neurons a, that fire extremely early have the same imoact as neuronsa, ihat firesomewhat later (but stilibefori the aver- age) Remark 2 ~nS cctnon 2 provtdesa moredetahi dtscuss~ono f thtsrffecl Thegoal of the next srctton IS to provca ngorous theoret.~ ralr e-s ult about theco&utational oower of formal models foinetworks of soikine neurons. We are not claiming that this construction (which is designed exclusively for that purpose) provides a blueprint for the organization of fast analog comoutations in bioloeical neural svstems. However.. it orovides the first theoretical model that is able to explain the possibility ofa fast analog com- putations with noisy spiking neurons. Some remarks about the possible biological relevance of details of this construction can be found after the theorem in Section 2. 2 The Main Construction Consider an arbitrary +-gate G., for some y > 0, which computes a function fc: [O. Y]" --t 10, Y].L eIt rl. . . . r. E R be trh:e =wle ights of G. Thus we have 1 if rl . s, < 0 fc(s1,. .. .sn) = E:=, r, .s,, if 0 5 r, .s, 5 y for arbitrary inputs sl.. . . . s. E [O. y]. if EL, I# . S, r y 286 Wolfgang Maass Figure 2: The simulation of a sigmoidal gate by a spiking neuron v in temporal coding - witFhooru tth neo siaske e( io.fe .s, ium, .d=i c i8tv*, we0 f iarnstd c eoancshid neer utrhoen c avs efi roef sm w.i hkeinn"eev neeru Proon(!s ) crosses Q,(t -P) from below). Then wedescribe thechan~etsh at areneeded in this construction for the general case of noisy spikingneurons. We construct for a given n,-gate G and for an arbitrary given parameter E > 0 with E c y a network Nc.* of spiking neurons that approximates fc with precision 5 E; that is, the output NG.,(sI.. . .. ,s ,,) of NG., satisfies INC.&I.. .. .sn) - fc(s1.. .. .s.)I j E for all sl. . . . s. E [O, y]. In order to be able to scale the size of weights a.ccord-ing t o t-h e given gate G. we assume that h% ,r ecei.v es an additional inout sn that is eiven like the oththereer ainrep unt +va 1ri ianbpleust snle.u. r.o .n ss. aion. .t e..m . po , orwali tcho tdhien gp.r Topheurst yw eth aast sau, mfier etsh aatt time Ti. - s, (where T.. is some constant). We will discuss at the end of this sectio"(in Remarks 5and 6) biologicaliy more plausible variations of the construction where ao and T, are not needed. h.(t)W fercoomn sthheu cnt +a s 1pi inkpinugt nneeuurroonn sva ion. N. .c. G .a tGnh,a wt hreiccehi vreessu nl+t lfr oPmSP ths eh ofi(rti)n,. g. o .f , a, at time T,. - s, (see Fimre 2). In addition v receives some auxiliarv PSPs from other spiking neu& in Nc ,, whose timing depends only on Fast Sigmoidal Networks via Spiking Neurons 287 The firing time t, of this neuron v will p r o v i d.. . ,~ ).s of the netw.o rk NG.,i n temporal coding; that is, u will fire at time T,, - NG.&I.. . . s.) for some Toutt hat does not depend on SI.. . . ,s .. Let w. ,,, be the weight of the synapse from input neuron a, to neuron u, i = 0,. . . ,n . Wneeu arossnusm aoe. t. h.a .t. a thne, a"ndde. lawv,e" wdor, i.,-t, e b edh fvoere tnh ais, acnodm um ios nth dee slaaym. eT fhours a lwl ein cpaunt describe for i = 0.. . . n the impact of the firing of a; at time T,. - s, on -- the potential at the trigger zonebf neuron u at thet bv the EPSP or IPSP h,(t) = w.,,, . ~,.~- ((Tt,, , - s,)), which has on the basis of our assumptions the value if t - (T>,,- s,) < d hi(t)=(~E.(t-(T,.-s,)-d), ifdjt-(T,.-s;)jd+A. where w, = w. ,,i,n the case of an EPSP and w, = -w.,,, in the case of an IPSl? Weassume that neuron v hasnot fired for a sufficientl,v lone- time.. so th at its thmhold function B,(.t - Y) can be assumed to have a constant value W when the PSPs ho(t). . . . h.(t) arrive at the trigger zone of u. Furthermore weassume for themoment thatbesides these nji PSPsonly BN- influences the potential at the trigger zone of u. This contribution BN- is assumed to have a constant value in the time internal considered here. Then if no noise is present, the time t, of the next firing of u can be described by the equality (3 = h,(t,) + BN- = w, . (t, - (T,. - s;) - d) + BN-. (2.1) ;=a '4 provided that We assume from now on that a fixed value so = 0 is chosen for the extra input so. Then equation 2.1 is equivalent to This t, satisfies equation 2.2 if -s, j 1, - T,. - d j A - s, for j = 0, . . . . n; hence for any sj E [0, y ] if . . We set w, := A . r, for i = 1,. . . n, where rl, .. . r, are the weights of the simulated =,-gate G, and A > 0 is some not-yet-determined factor (that we 288 Wolfgang Maass will later choose sufficiently large in order to make sure that neuron v fires closely to the time 1, given by equation 2.3 even in the presence of noise). We choose wo so that x:=o wi = A. This impl~etsh at equation 2.3 with To,,,: = <-) - BN- + T," + d is now equivalent to C t" = Tm - h .EL. ,=I and equation 2.4 is equivalent to O-Y- 05-- g7,s,jA-y. (2.6) Hence, provided that y, A, and BN- are chosen in relationship to 6) and A SO that ~ 5(3 -- B N' 4A-Y. (2.7) we havesatisfied equatian2.4, and thereforeachieved that the firing time t, of neuron uprovides ln temporal coding theoutput fcs. . . , s = Z:=, r,. s,ofthesimulatedn,-gateCforallinputsst.. ...s , E [0, y]withx:=, r,.s, E 10. YI. . ' In order to simulate G also for sl. . . . s,, E 10, y] with other values of x:=, r, - s,, we make sure that u fires at a time with distance at most E to To., - y if x:', r, . s, is larger than y and that v fire with distance at mast E to time T,, if E:=, r, . s, c 0. For that purpose, we add inhibitory and excitatory neuro.n s that fire at times that depend on time T,, but not on the inputs,. . . . s,. Theactivity of theseauxiliary inhibitolyand excitatory neurons may also shift the firing time t, of u if x:=l r, . s, E [O, y], but at most by E. According to the previously described construction, we have by equa- tion 2.5 that t., = T, ...+ if r. = 0 for i = l. . . . .n (and therefore T-,-!,- . r.. s. = 0). Furthermore the parametera have been rhown so that equation 2 2 rs satls. fled for thts caw, whch ~mplwsth nt each of the PSPs h,O)1 5 at time T.,,f or ET, h=is Oimstpillli wesi tthhiant tfhoeri anniyti avlasleugem oef ns,t Eo f LlOe. nGa]t hth Ae oPfS iPts h f;i(rfs)t inso antz teimroes eTa;,m e-n yt. not further advanced than the end of the initial segment of length A of its