AN INTRODUCTION TO STOCHASTIC DIFFERENTIAL EQUATIONS VERSION 1.2 Lawrence C. Evans Department of Mathematics UC Berkeley Chapter 1: Introduction Chapter 2: A crash course in basic probability theory Chapter 3: Brownian motion and “white noise” Chapter 4: Stochastic integrals, Itoˆ’s formula Chapter 5: Stochastic differential equations Chapter 6: Applications Exercises Appendices References 1 PREFACE These are an evolving set of notes for Mathematics 195 at UC Berkeley. This course is for advanced undergraduate math majors and surveys without too many precise details random differential equations and some applications. Stochastic differential equations is usually, and justly, regarded as a graduate level subject. A really careful treatment assumes the students’ familiarity with probability theory, measure theory, ordinary differential equations, and perhaps partial differential equations as well. This is all too much to expect of undergrads. But white noise, Brownian motion and the random calculus are wonderful topics, too good for undergraduates to miss out on. Therefore as an experiment I tried to design these lectures so that strong students could follow most of the theory, at the cost of some omission of detail and precision. I for instance downplayed most measure theoretic issues, but did emphasize the intuitive idea of σ–algebras as “containing information”. Similarly, I “prove” many formulas by confirming them in easy cases (for simple random variables or for step functions), and then just stating that by approximation these rules hold in general. I also did not reproduce in class some of the more complicated proofs provided in these notes, although I did try to explain the guiding ideas. My thanks especially to Lisa Goldberg, who several years ago presented the class with several lectures on financial applications, and to Fraydoun Rezakhanlou, who has taught from these notes and added several improvements. I am also grateful to Jonathan Weare for several computer simulations illustrating the text. 2 CHAPTER 1: INTRODUCTION A. MOTIVATION Fix a point x ∈ Rn and consider then the ordinary differential equation: 0 (cid:2) x˙(t) = b(x(t)) (t > 0) (ODE) x(0) = x , 0 where b : Rn → Rn is a given, smooth vector field and the solution is the trajectory x(·) : [0,∞) → Rn. (cid:1)(cid:2)(cid:3)(cid:4) (cid:1) (cid:5) Trajectory of the differential equation Notation. x(t) is the state of the system at time t ≥ 0, x˙(t) := d x(t). (cid:1) dt In many applications, however, the experimentally measured trajectories of systems modeled by (ODE) do not in fact behave as predicted: (cid:6)(cid:2)(cid:3)(cid:4) (cid:1) (cid:5) Sample path of the stochastic differential equation Hence it seems reasonable to modify (ODE), somehow to include the possibility of random effects disturbing the system. A formal way to do so is to write: (cid:2) ˙ X(t) = b(X(t))+B(X(t))ξ(t) (t > 0) (1) X(0) = x , 0 where B : Rn → Mn×m (= space of n×m matrices) and ξ(·) := m-dimensional “white noise”. This approach presents us with these mathematical problems: • Define the “white noise” ξ(·) in a rigorous way. 3 • Define what it means for X(·) to solve (1). • Show (1) has a solution, discuss uniqueness, asymptotic behavior, dependence upon x , b, B, etc. 0 B. SOME HEURISTICS Let us first study (1) in the case m = n, x = 0, b ≡ 0, and B ≡ I. The solution of 0 (1) in this setting turns out to be the n-dimensional Wiener process, or Brownian motion, denoted W(·). Thus we may symbolically write W˙ (·) = ξ(·), thereby asserting that “white noise” is the time derivative of the Wiener process. Now return to the general case of the equation (1), write d instead of the dot: dt dX(t) dW(t) = b(X(t))+B(X(t)) , dt dt and finally multiply by “dt”: (cid:2) dX(t) = b(X(t))dt+B(X(t))dW(t) (SDE) X(0) = x . 0 This expression, properly interpreted, is a stochastic differential equation. We say that X(·) solves (SDE) provided (cid:3) (cid:3) t t (2) X(t) = x + b(X(s))ds+ B(X(s))dW for all times t > 0 . 0 0 0 Now we must: • Construct W(·): See Chapter 3. (cid:4) • Define the stochastic integral t···dW : See Chapter 4. 0 • Show (2) has a solution, etc.: See Chapter 5. And once all this is accomplished, there will still remain these modeling problems: • Does (SDE) truly model the physical situation? • Is the term ξ(·) in (1) “really” white noise, or is it rather some ensemble of smooth, but highly oscillatory functions? See Chapter 6. Aswewillseelaterthesequestionsaresubtle, anddifferentanswerscanyieldcompletely different solutions of (SDE). Part of the trouble is the strange form of the chain rule in the stochastic calculus: C. ITOˆ’S FORMULA Assume n = 1 and X(·) solves the SDE (3) dX = b(X)dt+dW. 4 Suppose next that u : R → R is a given smooth function. We ask: what stochastic differential equation does Y(t) := u(X(t)) (t ≥ 0) solve? Offhand, we would guess from (3) that (cid:2) (cid:2) (cid:2) dY = u dX = u bdt+u dW, according to the usual chain rule, where (cid:2) = d . This is wrong, however! In fact, as we dx will see, (4) dW ≈ (dt)1/2 in some sense. Consequently if we compute dY and keep all terms of order dt or (dt)1, we 2 obtain 1 dY = u(cid:2)dX + u(cid:2)(cid:2)(dX)2 +... 2 1 = u(cid:2)(bdt+dW)+ u(cid:2)(cid:2)(bdt+dW)2 +... (cid:5) (cid:6)(cid:7) (cid:8) 2 from (3) (cid:9) (cid:10) 1 = u(cid:2)b+ u(cid:2)(cid:2) dt+u(cid:2)dW +{terms of order (dt)3/2 and higher}. 2 Here we used the “fact” that (dW)2 = dt, which follows from (4). Hence (cid:9) (cid:10) 1 (cid:2) (cid:2)(cid:2) (cid:2) dY = u b+ u dt+u dW, 2 with the extra term “1u(cid:2)(cid:2)dt” not present in ordinary calculus. 2 A major goal of these notes is to provide a rigorous interpretation for calculations like these, involving stochastic differentials. Example 1. AccordingtoItoˆ’sformula, thesolutionofthestochasticdifferentialequation (cid:2) dY = YdW, Y(0) = 1 is Y(t) := eW(t)−2t, and not what might seem the obvious guess, namely Yˆ(t) := eW(t). (cid:1) 5 Example 2. Let P(t) denote the (random) price of a stock at time t ≥ 0. A standard model assumes that dP, the relative change of price, evolves according to the SDE P dP = µdt+σdW P for certain constants µ > 0 and σ, called respectively the drift and the volatility of the stock. In other words, (cid:2) dP = µPdt+σPdW P(0) = p , 0 wherep isthestartingprice. UsingonceagainItoˆ’sformulawecancheckthatthesolution 0 is (cid:11) (cid:12) σW(t)+ µ−σ2 t P(t) = p e 2 . 0 (cid:1) A sample path for stock prices 6 CHAPTER 2: A CRASH COURSE IN BASIC PROBABILITY THEORY. A. Basic definitions B. Expected value, variance C. Distribution functions D. Independence E. Borel–Cantelli Lemma F. Characteristic functions G. Strong Law of Large Numbers, Central Limit Theorem H. Conditional expectation I. Martingales This chapter is a very rapid introduction to the measure theoretic foundations of prob- ability theory. More details can be found in any good introductory text, for instance Bremaud [Br], Chung [C] or Lamperti [L1]. A. BASIC DEFINITIONS. Let us begin with a puzzle: Bertrand’s paradox. Take a circle of radius 2 inches in the plane and choose a chord of this circle at random. What is the probability this chord intersects the concentric circle of radius 1 inch? Solution #1 Any such chord (provided it does not hit the center) is uniquely deter- mined by the location of its midpoint. Thus area of inner circle 1 probability of hitting inner circle = = . area of larger circle 4 Solution #2 By symmetry under rotation we may assume the chord is vertical. The diameter of the large circle is 4 inches and the chord will hit the small circle if it falls within its 2-inch diameter. 7 Hence 2 inches 1 probability of hitting inner circle = = . 4 inches 2 Solution #3 By symmetry we may assume one end of the chord is at the far left point of the larger circle. The angle θ the chord makes with the horizontal lies between ±π and 2 the chord hits the inner circle if θ lies between ±π. 6 θ Therefore 2π 1 probability of hitting inner circle = 6 = . 2π 3 2 (cid:1) PROBABILITY SPACES. This example shows that we must carefully define what we mean by the term “random”. The correct way to do so is by introducing as follows the precise mathematical structure of a probability space. We start with a set, denoted Ω, certain subsets of which we will in a moment interpret as being “events”. DEFINTION. A σ-algebra is a collection U of subsets of Ω with these properties: (i) ∅,Ω ∈ U. (ii) If A ∈ U, then Ac ∈ U. (iii) If A ,A ,··· ∈ U, then 1 2 (cid:13)∞ (cid:14)∞ A , A ∈ U. k k k=1 k=1 Here Ac := Ω−A is the complement of A. 8 DEFINTION. Let U be a σ-algebra of subsets of Ω. We call P : U → [0,1] a probability measure provided: (i) P(∅) = 0,P(Ω) = 1. (ii) If A ,A ,··· ∈ U, then 1 2 (cid:13)∞ (cid:15)∞ P( A ) ≤ P(A ). k k k=1 k=1 (iii) If A ,A ,... are disjoint sets in U, then 1 2 (cid:13)∞ (cid:15)∞ P( A ) = P(A ). k k k=1 k=1 It follows that if A,B ∈ U, then A ⊆ B implies P(A) ≤ P(B). DEFINITION. A triple (Ω,U,P) is called a probability space provided Ω is any set, U is a σ-algebra of subsets of Ω, and P is a probability measure on U. Terminology. (i) A set A ∈ U is called an event; points ω ∈ Ω are sample points. (ii) P(A) is the probability of the event A. (iii) A property which is true except for an event of probability zero is said to hold almost surely (usually abbreviated “a.s.”). Example 1. LetΩ = {ω ,ω ,...,ω }beafiniteset, andsupposewearegivennumbers 1 2 N (cid:16) 0 ≤ p ≤ 1 for j = 1,...,N, satisfying p = 1. We take U to comprise all subsets of j j Ω. For each set A = {ω ,ω ,...,ω } ∈ U, with 1 ≤ j < j < ...j ≤ N, we define j j j 1 2 m 1 2 m P(A) := p +p +···+p . (cid:1) j j j 1 2 m Example 2. The smallest σ-algebra containing all the open subsets of Rn is called the Borel σ-algebra, denoted B. Assume that f is a nonnegative, integrable function, such (cid:4) that f dx = 1. We define Rn (cid:3) P(B) := f(x)dx B for each B ∈ B. Then (Rn,B,P) is a probability space. We call f the density of the probability measure P. (cid:1) Example 3. Suppose instead we fix a point z ∈ Rn, and now define (cid:2) 1 if z ∈ B P(B) := 0 if z ∈/ B 9 for sets B ∈ B. Then (Rn,B,P) is a probability space. We call P the Dirac mass concen- trated at the point z, and write P = δ . (cid:1) z A probability space is the proper setting for mathematical probability theory. This means that we must first of all carefully identify an appropriate (Ω,U,P) when we try to solve problems. The reader should convince himself or herself that the three “solutions” to Bertrand’s paradox discussed above represent three distinct interpretations of the phrase “at random”, that is, to three distinct models of (Ω,U,P). Here is another example. Example 4 (Buffon’s needle problem). The plane is ruled by parallel lines 2 inches apart and a 1-inch long needle is dropped at random on the plane. What is the probability that it hits one of the parallel lines? The first issue is to find some appropriate probability space (Ω,U,P). For this, let (cid:2) h = distance from the center of needle to nearest line, θ = angle (≤ π) that the needle makes with the horizontal. 2 θ (cid:7) (cid:8)(cid:9)(cid:9)(cid:10)(cid:11)(cid:9) These fully determine the position of the needle, up to translations and reflection. Let us next take π Ω = [(cid:5)0,(cid:6)(cid:7)2(cid:8)) × [(cid:5)0(cid:6),(cid:7)1](cid:8), U = Borel subsets of Ω, values of h values of θ P(B) = 2·area of B for each B ∈ U. π We denote by A the event that the needle hits a horizontal line. We can now check that this happens provided h ≤ 1. Consequently A = {(θ,h) ∈ Ω|h ≤ sinθ}, and so (cid:4) sinθ 2 2 P(A) = 2(area of A) = 2 π2 1 sinθdθ = 1. (cid:1) π π 0 2 π RANDOM VARIABLES.Wecanthinkoftheprobabilityspaceasbeinganessential mathematical construct, which is nevertheless not “directly observable”. We are therefore interested in introducing mappings X from Ω to Rn, the values of which we can observe. 10