ebook img

Statistical Mechanics and Visual Signal Processing PDF

34 Pages·0.3 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical Mechanics and Visual Signal Processing

PUPT-1435 cond-mat/9401072 STATISTICAL MECHANICS AND VISUAL SIGNAL PROCESSING 4 9 9 MARC POTTERS(1,2) ANDWILLIAM BIALEK(1) 1 n a (1) NEC Research Institute, 4 Independence Way, Princeton, New Jersey 08540 J 0 (2) Department of Physics, Princeton University, Princeton, New Jersey 08544 3 1 v January 1994 2 7 0 1 0 4 9 / t a m - d The nervous system solves a wide variety of problems in signal processing. n In many cases the performance of the nervous system is so good that it appo- o raches fundamental physical limits, such as the limits imposed by diffraction c andphotonshotnoiseinvision. Inthis paperweshowhowto use the language : v of statistical field theory to address and solve problems in signal processing, i X that is problems in which one must estimate some aspect of the environment from the data in an array of sensors. In the field theory formulation the op- r a timal estimator can be written as an expectation value in an ensemble where the input data act as external field. Problems at low signal-to-noise ratio can be solved in perturbation theory, while high signal-to-noise ratios are treated with a saddle-point approximation. These ideas are illustrated in detail by an example ofvisualmotionestimationwhichis chosento modela problemsolved by the fly’s brain. In this problem the optimal estimator has a rich structure, adapting to various parameters of the environment such as the mean-square contrast and the correlation time of contrast fluctuations. This structure is in qualitative accord with existing measurements on motion sensitive neurons in the fly’s brain, and we argue that the adaptive properties of the optimal esti- mator may help resolve conlficts among different interpretations of these data. Finally we propose some crucial direct tests of the adaptive behavior. 1 1 Introduction Imagine walking along a busy city street. As we walk, we are almost unaware of the myriad tasks which our brains are performing: We use a combination of visual and vestibular signals to keep ourselves upright, sensors in our feet and leg muscles help adjust our stride to the terrain, we listen for cars and people behind us, vision helps us recognize the caf´e we are approaching and identifies our friend who will meet us there, and all these senses combine to provide us with a trajectory which reaches our goal and avoids obstacles. All of these tasks involve signal processing, and we have the qualitative impression that we (and other animals) are quite good at solving these problems. The goal of this paper is to show that statistical mechanics provides the natural language for formulating and solving signal processing problems, and that the structure of the statisticalmechanicsmodels providesa predictive theory ofhow realbrains solve the corresponding problems. 1.1 Physical limits and biological signal processing Signal processing is in essence a physics problem. As an example, in vision the preciseformulationofanytaskmustbeginwiththefactthatimagesareblurred bydiffractionandcorruptedbyphotonshotnoise. Theseirreduciblelimitations in the quality of the input signal limit the reliability with which the brain (or any device) can estimate what is really going on in the visual environment— we can never be truly certain of what we are looking at, nor can we know its exact trajectory, and so on. There are physical limitations from the hardware as well—cellsof a certainsize are boundto generateelectricalnoise,signalsare passed from one point to another along cables of limited information capacity, ... . Remarkably, these physical limitations to the reliability of computation are actually relevant to the operation of real brains. Indeed there are many cases where the performance of the nervous system is essentially equal to the limitthatonecalculatesfromfirstprinciples[2,4];examplesrangefromphoton countingintoadsandhumans[1,13]tovisualmotionestimationinflies[7,44]to acousticcoding infrogsandcrickets[39]andecho delayestimationinbats [48]. These observations strongly suggest that a theory of optimal signal processing will help us understand the computational strategies of brains. 1.2 Choosing a problem Ofthemanyexamplesofsignalprocessinginthenervoussystem,ourdiscussion ismotivatedmostdirectlybytheproblemofmotionestimationinthefly’svisual system. Inmanyinsectsthevisualsystemproducesmovementsignalswhichare usedtocontroltheflightmusclesandtherebystabilizeflight. Inthecaseofflies, visually guided flight behaviorhas been studied both by measuring flight paths duringnaturalbehaviors[30,51,52,53]andbyexaminingthetorquesproduced 2 by flies hanging from a torsion balance in response to movements of controlled patterns across the visual field [20, 36]. The input to the motion computation comesfromasingleclassofphotoreceptorcellswhicharearrayedintheregular lattice of the compound eye, and the signal and noise properties of these cells are extremely well characterized. The output of the movement computation can be monitored in a handful of identified cells of the lobula complex [15, 19], anddestructionof individuallobula plate neuronsproduces clear deficits in the fly’s opto-motor behavior [18]. The fact that these neurons are “identified” has a technical meaning: cells of essentially identical morphology occur in each fly, and these cells have responses to visual stimuli which are quantitatively reproducible from individual to individual. Thus the cells can be named and numberedbasedoneither structure or function; under favorableconditions one canrecordfromthecellH1,whichcodeswidefield,horizontalmovementsofthe visual field, for periods of up to five days [43]. The accessibility of quantitative measurements at each of several layers in the nervous system—photoreceptors, second order neurons, motion-sensitive cells, flight behavior—makes the fly a nearly ideal testing ground for theories of neural signal processing. 1.3 Relation to previous work Signal processing is of course an enormous field with a diverse set of applica- tions [29, 33, 50]. At the core of this field is the description of random func- tions, whether they be functions of time (e.g., sounds) or space (e.g., images). Nonetheless, the bulk of the literature on signal processing does not seem to make extensive use of the functional integral methods which seem so natural fromtheperspectiveofstatisticalmechanics. In1988,however,ZweigandLack- ner [28] studied a model signal processing problem, reconstructing the configu- ration of a randomly moving plate using a set of sparse and noisy observations of displacements at particular points along the plate. By choosing the random motions from the Boltzmann distribution they emphasized the connection to statistical mechanics. Traditional approaches to the problem of “optimal estimation” begin by defining a class of estimation strategies and then use variational methods to searchthisclassforthe bestestimator. ThisapproachgoesbacktoWiener and Kolmogorov, who found the optimal linear filters for recovering and extrapo- lating signals in noisy backgrounds. In this analysis the causality of filters is a crucial constraint, and hence the key mathematical ingredient is the role of the causalanalytic structure of filters in solving the integralequations which result from the variational definition of optimality. The Wiener-Kolmogorov results essentiallycloseabroadclassofquestionsrelatedtooptimalestimationbylinear filters,butleavecompletelyopenproblemswheretheoptimalestimatormaybe a non-linear function of the input data. In fact, the classical approach gives us little hintabouttheconditionsfortheimportanceofnon-linearity. Weshallsee thatthestatisticalmechanicsapproachgivesustoolsforgoingbeyondlinear(or 3 lowest order non-linear) estimators. Rather than searching through a limited class of estimators for a local optimum, the mapping to a statistical mechanics problem allows us to directly construct the globally optimal estimator. In the case of vision, it is widely recognized that unconstrained variational approachestoestimationresultinill-posedproblems[35]. The “regularization” of these ill-posedproblems can be givena probabilistic interpretation, in which the optimal estimator reflects a compromise between fitting the available data and taking account of a priori knowledge about the expected distribution of incoming signals; this is the same structure described by Zweig and Lackner in their model problem. Different regularization terms thus represent different hypotheses about the statistical structure of the visual environment, but there has been very little experimental work to characterize these statistics [42]. Fol- lowing Geman and Geman [16], there has been considerable focus on “Markov random field” models for image statistics, which are known to be equivalent to Boltzmann distributions with local Hamiltonians, but there has been relatively littleworkexploitingthefullstatisticalmechanicsstructureofeventhesesimple models. In particular, if one takes the statistical models seriously as models of theproblemsthataresolvedbyrealbrains,thereistheclearpredictionthatthe signalprocessingstrategiesofthebrainmustadapttothestatisticalstructureof the environment. This sort of adaptation is much richer than that convention- ally described in the biological literature, and we shall explore these predicted effects in some detail since we feel that they constitute the key experimental signature of optimal estimation. Theconnectionbetweenstatisticalmechanicsandsignalprocessinghasbeen used in previous work to study vision at very low photon flux, where the low signal-to-noise ratio (SNR) leads to a universal “pre-processing” stage in any estimation task [6, 37, 38]. The structure of this pre-processing step is in good agreementwith the responses of cells in the first stage of retinalsignal process- ing,andasfarasweknowthisisthefirstexampleofasuccessfulparameter-free prediction of neural responses. Most sensory systems encode incoming signals insequencesofdiscretepulses,termedspikesoractionpotentials,andtheprob- lemofdecodingthesepulsesequencescanagainbegivenastatisticalmechanics formulation [9]; the resulting predictions concerning the algorithms for optimal decoding have been confirmed in experiments on a wide variety of organisms [5]. Finally, the low SNR limit of motion estimation in fly vision has also been studied [3, 4, 40], and this analysis was crucial in establishing that the fly does make optimal estimates, but we shall see that by restricting attention to low SNR one misses a large amount of structure which is likely relevant under nat- ural conditions. Attempts to go beyond the low SNR perturbation theory have been confined to model problems [8, 41]. 4 2 Simple Examples of Signal Processing A typicalsignal processing problemconsists of extracting information from the outputofa devicecorruptedby noise. The signalmightbe encodedin the data in a complex way or might even be present only statistically (as in the case of predictionoftimeseries). Thetaskisthen,giventhestatisticalpropertiesofthe quantities of interest, to find the optimal estimate of the signal using the data. To do so, one must decide on a definition of the quality of an estimate. There aretwoobviouschoices: maximumlikelihoodandleast-mean-squareerror. The maximum likelihood estimate corresponds to the signal that has the highest probability given the observationof the data. The least-mean-squareestimator is the one that would on average minimize χ2, the distance square to the true value; it is equalto the conditionalmean. There areother costfunctions whose minimizations lead to different estimators. Before we proceed with our main problem, let us explain our approach on a few simple models where everything can be computed exactly. 2.1 One variable estimation Consider the following problem: estimate a signalconsisting of a single number s using the output y of a detector contaminated by noise, y =s+η. Since the noise is randomand the true signalis unknown to the observer,this problemis aprobabilisticone. Tofindthebestestimateofthesignal,weneedtoconstruct the probability of s given the output y. We use Bayes’ theorem to write down the conditional probability of s given y: P[y s]P[s] P[sy]= | . (1) | P[y] P[y s] describes the detection process: how the output is related to the | input signal and the probabilistic nature of the noise. P[s] reflects the a-priori knowledge of the signal, and P[y] is independent of s and thus serves as a normalization. Suppose firstthat both signalandnoise are chosenfrom independent Gaus- sian distributions with variance S and N respectively, i.e. 1 s2 1 (y s)2 P[s]= exp and P[y s]= exp − . (2) √2πS −2S | √2πN − 2N (cid:20) (cid:21) (cid:20) (cid:21) After a little bit of algebra, we find S+N (S+N)(s Sy )2 P[sy]= exp − S+N . (3) | r2πNS "− 2NS # The conditional probability distribution is thus also Gaussian with mean Sy S+N and variance NS . In this example, the maximum and the average of the S+N 5 conditional probability distribution are equal and are given by S s = y. (4) e S+N (cid:18) (cid:19) The best estimate of the input is thus a linear function of the detector output. ThiswillbetrueingeneralwhenevertheprobabilitydistributionsareGaussian. Itisnotnecessarilythecasewheneitherthe signalorthe noiseisnon-Gaussian or when the output is a non-linear function of the signal. To illustrate the emergence of non-linear estimators, consider a small mod- ification of our toy-model. Suppose now that the signal is positive and taken froman exponentialdistributionwith mean 1/αandthat the noise is Gaussian and additive but has a variance proportional to the signal (i.e. N = βs).The analog of Eq. (3) is now 1 (y s)2 P[sy]= exp αs − , (5) | Z√s − − 2βs (cid:18) (cid:19) where Z is an s-independent normalization factor. The maximum likelihood estimator, which maximizes Eq. (5), is given by β2+4y2(1+2αβ) β sML = − . (6) e 2(1+2αβ) p To compute the least mean-square estimator, we need to average s with respect to P[sy]. After some algebra and a trip to our favorite integral table | we find β y sLMS = + | | . (7) e 1+2αβ √1+2αβ In this example, the two definitions of optimal estimation lead to different results; both are non-linear functions of the output variable y. This of course comes from the non-Gaussian nature of the signal and the dependence of the noiseuponthesignal. Amusingly,inthis casethe optimalestimatorisnoteven monotonic in the detector output y. 2.2 Causal linear filtering Let us now consider the estimation of continuous signals by doing the time dependent version of our Gaussian toy-model. Suppose now that s(t) and η(t) are functions of time chosen from Gaussian distributions with power spectra S(ω) and N(ω). In functional integral language [14, 25]: 1 1 ∞ dω s(ω)2 P[s]= exp | | , (8) ZS "−2Z−∞ 2π S(ω) # 6 whereZ isanill-definednormalizationconstantwhichasusualneverentersin S thecomputations. ThereisasimilarequationforP[y s]. Wefindtheconditional | probability distribution 1 1 ∞ dω y(ω) s(ω)2 1 ∞ dω s(ω)2 P[sy]= exp | − | | | . (9) | Z "−2Z−∞ 2π N(ω) − 2Z−∞ 2π S(ω) # To find s(ω) we compute the “equation of motion” and find y h i S(ω) s (ω)= y(ω), (10) e S(ω)+N(ω) (cid:18) (cid:19) which can be written in the time domain: ∞ ∞ dω S(ω) s (t)= dτg(τ)y(t τ) where g(τ)= e−iωτ . (11) e − 2π S(ω)+N(ω) Z−∞ Z−∞ We see that the problem of estimating continuous functions of time is just the problemofestimatingtheindependentFouriercomponents;foreachcomponent we have an equation identical to Eq. (4) for the estimation of a single variable. It should be clear that this simplicity is tied to the assumption of Gaussian distributions for both the signal and the noise. The filter g(τ) is acausal, since it extends symmetrically both in the past and in the future. If we actually want to build a device which takes the y(t) as input and returns an estimate of the signal s(t), then this device must be causal. Itisnaturaltoaskwhatistheoptimalcausalestimator,onethatwould requiredknowledgeofthe outputy(t)onlyuptothe presentor,moregenerally, up to a small delay time τ in the future. The delay time τ could be negative, ◦ ◦ that would be a predictor instead of an estimator, but the same method would apply. This problem was solved by Wiener [54] and Kolmogorov [26, 27]. We give here another derivation in a slightly different context. Wiener assumed linearfiltering andfound thatthe optimalcausalfilter onlyrequiredknowledge ofthe signalandnoise powerspectra. We will assume thatthe signalandnoise are chosen from Gaussian distribution with specified power spectra and find that the optimal estimator is a linear filter. Thetaskistoestimates( τ )giventheoutputuptotimet=0. Onceagain ◦ − wewriteourestimatorastheconditionalaverageoftherandomvariables( τ ) ◦ − upon observing y(t < 0). This is equivalent to averaging first with respect to the conditional probability of s given all of y and then integrating out y+(t) using the conditional probability P[y+(t)y−(t)] where | y+(t>0) = y(t) y−(t>0) = 0 (12) y+(t<0) = 0 y−(t<0) = y(t). We need to write down the probability distribution of y+(t) given y−(t). P[y(t)] P[y+(t)y−(t)]= (13) | P[y−(t)] 7 1 1 ∞ dω y(ω)2 P[y(t)]= exp | | (14) Z "−2Z−∞ 2π (N(ω)+S(ω))# The key trick [54] is to write 1 Ψ(ω)2 = , (15) | | N(ω)+S(ω) where Ψ(ω) is chosen such that it has no poles in the upper half plane. The probability distribution in the time domain is then 1 1 ∞ ∞ dω 2 P[y(t)]= exp dτ y(ω)Ψ(ω)e−iωτ . (16) Z "−2Z−∞ (cid:12)Z−∞ 2π (cid:12) # (cid:12) (cid:12) (cid:12) (cid:12) The function x(τ) defined by (cid:12) (cid:12) ∞ dω x(τ)= y(ω)Ψ(ω)e−iωτ (17) 2π Z−∞ is causal (i.e. it only depends on the values of y(t) for t<τ). 1 1 0 ∞ dω 2 P[y−,y+]= exp dτ y−(ω)Ψ(ω)e−iωτ Z "−2Z−∞ (cid:12)Z−∞ 2π (cid:12) (cid:12) (cid:12) 1 ∞ ∞ dω (cid:12) (cid:12)2 dτ (y−(cid:12)(ω)+y+(ω))Ψ(ω)e−iωτ (cid:12) (18) − 2Z0 (cid:12)Z−∞ 2π (cid:12) # (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) P[y−]= [dy+]P[y−,y+] (19) Z To do the integral over y+(t) we change variables to x+(τ) defined as x(τ) forτ >0. Wenoticethatthisisalineartransformation,andsincex(τ)iscausal the transformation matrix is lower triangular. Therefore the Jacobian of this transformation is independent of y−(t). The Gaussian integral over x(τ) gives an overall factor that can be absorbed in the normalization of our probability distribution. 1 1 0 ∞ dω 2 P[y−]= exp dτ y−(ω)Ψ(ω)e−iωτ (20) Z "−2Z−∞ (cid:12)Z−∞ 2π (cid:12) # (cid:12) (cid:12) (cid:12) (cid:12) 1 1 ∞ (cid:12)∞ dω (cid:12) 2 P[y+(t)y−(t)]= exp dτ y−(ω)+y+(ω) Ψ(ω)e−iωτ | Z "−2Z0 (cid:12)Z−∞ 2π (cid:12) # (cid:12) (cid:0) (cid:1) (cid:12)(21) (cid:12) (cid:12) (cid:12) (cid:12) 8 As mentioned earlier, we first average s using P[sy] and then average the re- | sult with respect to P[y+ y−]. The result of the first averaging is the acausal | estimator (11) evaluated at t= τ , ◦ − ∞ dω s ( τ )= eiωτ◦y(ω)S(ω)Ψ(ω)Ψ¯(ω). (22) nc ◦ − 2π Z−∞ Since s ( τ ) is linear in y(t), averaging with respect to (21) amounts to re- nc ◦ − placing y(t) by its average value which solves the equation of motion: ∞ dω y−(ω)+y+(ω) Ψ(ω)e−iωτ =0 for τ >0. (23) 2π Z−∞ (cid:0) (cid:1) Imposing (23) amounts to replacing y(ω)Ψ(ω) by 0 dτeiωτ ∞ dω′y(ω′)Ψ(ω′)e−iω′τ. (24) 2π Z−∞ Z−∞ in equation (22). The causal estimator is therefore given by s ( τ )= ∞ dωeiωτ◦S(ω)Ψ¯(ω) 0 dτeiωτ ∞ dω′y(ω′)Ψ(ω′)e−iω′τ (25) c ◦ − 2π 2π Z−∞ Z−∞ Z−∞ Wecaneasilyextendthis totheslightlymoregeneralcasewheretheoutput if known up to time t and we want to estimate the signal at time t τ . After ◦ − rewriting things a little bit, we obtain ∞ dω s (t τ )= e−iωty(ω)k(ω). (26) c ◦ − 2π Z−∞ k(ω)=Ψ(ω) ∞dτeiωτ ∞ dω′eiω′(τ◦−τ)S(ω′)Ψ¯(ω′) (27) 2π Z0 Z−∞ We recognize this result as the Wiener filter [54, p. 86]. Note that we are using a different definition of Ψ(ω) and the opposite sign convention in Fourier transforms. But more importantly, our derivation shows that non-linear terms would not help the causal estimation of Gaussian signals with Gaussian noise. 3 Optimal Rigid Motion Estimation Wenowturntotheproblemofvisualmotionestimation. Althoughourprimary motivationistopresentatheoryoftheproblemsolvedbythefly’svisualsystem, our formulationis fairly generaland should be applicable to any visual system, or even to the design of artificial systems. It is important to realize that our “model” is a model of the problem the fly is solving, not a model of the neural circuitry which implements the solution. Thus we need not concern ourselves with the details of fly neuroanatomy; what is important is that we capture the essential features of the fly’s environment which make the problem of motion estimation non-trivial. 9 3.1 The model The fly wants to estimate its own rotation as it tries to fly in a straight line but is pushed around by the wind. If the fly flew in a straight line it would see a contrast pattern C[x,t] that changes both as a function of time t and position x in the visual field; we define contrast as C = (I I )/I where I is o o − the light intensity at a point and I its average value. Since we are interested o only in horizontal motion estimation we give a one-dimensional description, where x is just the azimuthal angle or yaw. When the fly turns along some angular trajectory θ(t) it sees a modified contrast pattern C[x θ(t),t] then − each photoreceptor cell produces a voltage given by V (t)= T(τ) M(x x)C[x θ(t τ),t τ]+δV (t), (28) n n n − − − − Z Z where T(τ) is the temporal impulse response of the cell, M(x) the angular acceptance profile of the photoreceptor and δV (t) is the voltage noise. The n photoreceptors form an array of size N and of angular separation φ . We will ◦ assume that the noise is independent in each photoreceptor, Gaussian and ad- ditivewithpowerspectrumN(ω). Asintheprevioussection,weuseEq. (1)to write the conditional probability distribution where now θ(t) is the signal and V (t) are the detector output. Equation (28) and the voltage noise power n { } spectrumdetermine the probabilityofobserving V (t) givenatrajectoryθ(t) n { } and a contrast pattern C(x,t). Since this contrast pattern is unknown to the observer,one needs to averageover all possible such contrast. 1 dω V (ω) V¯ (ω) 2 n n P[θ(t) V (t) ]= exp − P [θ(t)], (29) n |{ } Z * "−Xn Z 2π (cid:12)(cid:12) 2N(ω) (cid:12)(cid:12) #+C where V¯ (ω)=T(ω) dteiωt dxM(x x )C(x θ(t),t). (30) n n − − Z Z Brackets mean the average over P[C(x,t)], the spatio-temporal contrast distri- bution. For now we will keep the distribution P[C(x,t)] and P[θ(t)] as general aspossible,exceptthatwewillassumeP[θ(t)]tobe left-rightandtime reversal invariant. We will choose the optimal estimator to be the average θ˙(t) in the distribution (29) which, as mentioned in the first section, minimizes χ2. We can expand the norm square in (29) and drop the V 2 term since it does not n | | depend on C(x,t) nor θ(t) and can be absorbed in the overallnormalization, θ˙ (t)= 1 θ˙(t)exp dω V¯n(ω) 2 V¯n∗(ω)Vn(ω) , (31) e Z(V)* "−Xn Z 2π (cid:12)(cid:12)2N(ω)(cid:12)(cid:12) − N(ω) #+C,θ where dω V¯ (ω) 2 V¯∗(ω)V (ω) Z(V)= exp n n n . (32) * "−Xn Z 2π (cid:12)(cid:12)2N(ω)(cid:12)(cid:12) − N(ω) #+C,θ 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.