ebook img

Representations and weak convergence methods for the analysis and approximation of rare ... PDF

104 Pages·2013·0.83 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Representations and weak convergence methods for the analysis and approximation of rare ...

Representations and weak convergence methods for the analysis and approximation of rare events A Short Course of the Scuola di Dottorato in Scienze matematiche, Universit(cid:224) degli Studi di Padova Paul Dupuis Division of Applied Mathematics Brown University Providence, RI 02896 USA Outline: 1. Introduction and Examples 2. General Theory and Relative Entropy 3. Canonical Problem I (cid:150)Sanov(cid:146)s Theorem 4. Canonical Problem II (cid:150)Small Noise Di⁄usions 5. Freidlin-Wentsell Theory and Moderate Deviations 6. Processes That Are Not Functionals of an IID Noise Process 7. Extracting Information From the Variational Problem 8. An Overview of Importance Sampling for Rare Events 9. The Subsolutions Approach to Importance Sampling 10. The Subsolutions Approach to Importance Sampling, Cont(cid:146)d 11. The Empirical Measure of a Markov Chain 12. Current Developments and Related Problems 1 The goal of these notes is to introduce the reader to methods for char- acterizing and analyzing rare events, and for the construction and analysis of related Monte Carlo numerical approximations. The approach to both topics is based on weak convergence theory and relative entropy representa- tions for exponential integrals. Some of the ideas and methods presented in thesenotes(cid:133)rstappearedinA Weak Convergence Approach to the Theory of Large Deviations with Richard Ellis, which in particular has a very detailed discussion of many of the nice properties of relative entropy we will use. Many new topics, including in(cid:133)nite dimensional problems and the analysis of Monte Carlo, will appear in a forthcoming book with Amarjit Budhiraja with the same title as these notes: Representations and Weak Convergence Methods for the Analysis and Numerical Approximation of Rare Events. These notes were prepared as part of a short course given at Diparti- mento di Matematica, Universit(cid:224) degli Studi di Padova, from 20-31 May, 2013. The author would like to thank the department, and in particular Markus Fischer, for their warm hospitality. 2 Lecture 1: Introduction and Examples 1 The setting and statement of a large deviation principle Let S denote a Polish space with Borel (cid:27)-algebra (S). Typical examples of S in these notes will be Rd (Euclidean d-dimensiBonal space), C [0;T] : S(cid:22) (the set of continuous functions mapping [0;T] to S(cid:22)) and (S(cid:22)) (the set of probability measures on S(cid:22); (S(cid:22)) ), where S(cid:22) is itself a PoliPsh spa(cid:0)ce. (cid:1) B Let Xn;n N be S-valued random variables on ((cid:10); ;P), with distri- f 2 g (cid:0) (cid:1) F butions (cid:22) (B) = P X B ; B (S): n n f 2 g 2 B De(cid:133)nition 1 A function I : S [0; ] is called a rate function if x S : ! 1 f 2 I(x) M is compact for all M [0; ). (cid:20) g 2 1 De(cid:133)nition 2 The sequence of random variables Xn;n N (or equival- f 2 g ently the sequence of distributions (cid:22)n;n N ) is said to satisfy the large f 2 g deviation principle (LDP) with rate I, if I is a rate function, and if 1. for all open sets O (S) 2 B 1 liminf logP X O inf I(x); n n n f 2 g (cid:21) (cid:0)x O !1 2 2. for all closed sets F (S) 2 B 1 limsup logP X F inf I(x): n n n f 2 g (cid:20) (cid:0)x F !1 2 In a very rough sense, one can think of this as saying P X B (x) e nI(x); n (cid:14) (cid:0) f 2 g (cid:25) where (cid:14) > 0 is small and B (x) = y : d(y;x) < (cid:14) , with d the metric on S. (cid:14) For C (S) let I(C) = inf If(x). If I(C ) =g I(C(cid:22)) = I(C), then C is x C (cid:14) 2 B 2 called an I-continuity set and we have 1 lim logP X C = I(C): n n n f 2 g !1 Variational problems arise naturally in large deviation problems because of the following elementary consequence of exponential scaling. 3 Lemma 1 Let sequences a ; b [0; ] be given such that n n f g f g (cid:26) 1 1 loga u [0; ] n (cid:0)n ! 2 1 1 logb v [0; ]: n (cid:0)n ! 2 1 Then 1 log(a +b ) min u;v : n n (cid:0)n ! f g Thus if a and b are probabilities scaling like a e nu and b e nv, n n n (cid:0) n (cid:0) (cid:25) (cid:25) then the decay rate of a +b is given by the smaller of u and v. n n Remark 1 The scaling parameter n N is sometimes replaced by " > 0, 2 with n corresponding to 1=". 2 Examples The following examples illustrate various applications. Proofs of the LDP for some (but not all) of these models will be given. Some will also be used in the discussion of Monte Carlo methods. Example1 (Multi-dimensionalrandomwalk, insurancerisk) LetZ i beindependentandidenticallydistributed(iid)withdistribution(cid:22) (Rd). 2 P For x Rd and n N de(cid:133)ne 2 2 1 Xn = Xn+ Z ; Xn = x; i+1 i n i 0 and the piecewise linear interpolation i i+1 Xn(t) = Xn+ Xn Xn (nt i); t ; : i i+1(cid:0) i (cid:0) 2 n n (cid:20) (cid:21) (cid:2) (cid:3) Assume that EZ < 0 component-wise and for M (0; )d let i 2 1 (cid:28)n = inf t 0 : Xn(t) M for some j = 1;:::;d : j j f (cid:21) (cid:21) g The problem of estimating P (cid:28)n < arises in insurance risk, with the f 1g correlationbetweendi⁄erentcomponentsofZ modelingcorrelationbetween i di⁄erent sectors or (cid:133)rms.1 Assume the log-moment generating function satis(cid:133)es : H((cid:11)) = logEeh(cid:11);Zii < for all (cid:11) Rd: 1 2 1Infact,theoriginsoflargedeviationtheorycanbetracedtoapplicationsininsurance though H. CramØr. 4 Figure 1: Two dimensional escape set, unscaled random walk Let L((cid:12)) be the Legendre transform of H: : L((cid:12)) = sup [ (cid:11);(cid:12) H((cid:11))]; (cid:12) Rd: h i(cid:0) 2 (cid:11) Rd 2 Then for each T < , Xn;n N satis(cid:133)es the LDP on C [0;T] : Rd with 1 f 2 g rate function (cid:0) (cid:1) T L((cid:30)_(t))dt if (cid:30) is absolutely continuous and (cid:30)(0) = x, I ((cid:30)) = 0 T otherwise. (cid:26) R 1 Thus for a given (cid:30), the rough interpretation gives an estimate for the prob- ability that Xn (cid:147)tracks(cid:148)(cid:30) in the form P sup X (t) (cid:30)(t) (cid:14) e nIT((cid:30)): n (cid:0) j (cid:0) j (cid:20) (cid:25) (0 t T ) (cid:20) (cid:20) Although it does not follow directly from the LDP, one can use the large deviation estimates on [0;T] (or argue directly using weak convergence methods) that 1 logP (cid:28)n < (1) (cid:0)n f 1g inf I ((cid:30)) : (cid:30)(T) M for some j = 1;:::;d, T < : T j j ! f (cid:21) 1g The idea behind the reduction to (cid:133)nite time estimates is straightforward. One (cid:133)rst shows using the upper bound alone that if the event is to occur at all, then it must happen with overwhelming probability before some (cid:133)xed (cid:133)nite time T (see, e.g., [34, Lemma 2.2, Chapter 5]). Speci(cid:133)cally, one shows that given any K < there is T < such that 1 1 1 limsup logP (cid:28)n [T; ) K: n (cid:0)n f 2 1 g (cid:20) !1 It follows that P (cid:28)n < and P (cid:28)n [0;T] have the same decay for suf- f 1g f 2 g (cid:133)ciently large but (cid:133)nite T, and an application of the LDP to P (cid:28)n [0;T] f 2 g gives (1). 5 Remark 2 While the large deviation approximation in this and other ex- amples is guaranteed to give the correct rate of decay, depending on the particular application, the estimate itself may not be as accurate as one needs. The rate of decay is often well suited to qualitative issues (e.g., con- trolanddesign). IfamoreaccurateapproximationtoP (cid:28)n < isdesired f 1g then the large deviation information is very useful in the design of Monte Carlo schemes. This application is the subject of Lectures 8(cid:150)10. Figure 2: Dynamics of tracking loop with no noise Example 2 (Metastability for di⁄usion processes, a PLL type ex- ample). Various algorithms in adaptive control, suboptimal (cid:133)ltering, and elsewhere are designed to reject noise and keep a parameter near a desired operating point [44, 42, 19]. Large deviation theory gives natural measures oftheperformanceofthesealgorithms. Anexampleisthefollowingdi⁄usion model of a phase-locked loop: dX" = a(cid:25)X"dt+b sin(cid:25)X"dt+p"dW 1 1 2 (cid:0) dX" = (cid:25)X"dt 2 1 (cid:0) (cid:1) (cid:0) with X" a measure of the (cid:147)tracking error.(cid:148)Here a and b are parameters (cid:0) 2 to be selected for the loop design, and higher order loops have a larger dimension and more parameters to select. The (cid:147)noiseless(cid:148)system (" = 0) is illustrated in Figure 2. The di⁄usion model arises from a device driven by wide bandwidth noise and high carrier frequency !(cid:13) as indicated in Figure 3, after the noise is approximated by a Brownian motion and high frequency 6 terms due to the double angle formula and the multiplexer are dropped, with X" = (cid:18) ^(cid:18)(cid:26). 2 (cid:0) Figure 3: Tracking loop driven by signal plus noise Performance measures would include, e.g., P (cid:28)" < T , where (cid:28)" = 0 f g inf t 0 : X"(t) (cid:25) ,andP denotesprobabilitygivenX"(0) = 0. Given f (cid:21) j 2 j (cid:21) g 0 (cid:30) with (cid:30)(0) = 0, let (cid:30) be the set of u L2([0;T] : R) such that for all S 2 t [0;T] 2 t t (cid:30) (t) = a(cid:25)(cid:30) (s)ds+ b(sin(cid:25)(cid:30) (s)+u(s))ds 1 1 2 (cid:0) Z0 Z0 t (cid:30) (t) = (cid:25)(cid:30) (s)ds 2 1 (cid:0) Z0 The rate function for X";" (0;1) with this initial condition is f 2 g T 1 I ((cid:30)) = inf u(t)2dt : u ; T (cid:30) 2 2 S (cid:26)Z0 (cid:27) where the in(cid:133)mum over the empty set is . (We call this the control form 1 of the rate function since we view u as a control and (cid:30) as a controlled state. In contrast, the rate function for the previous example was in a calculus of variations form). It follows from the LDP that "logP (cid:28)" < T inf I ((cid:30)) : (cid:30) (t) (cid:25) for some t [0;T] : 0 T 2 (cid:0) f g ! f j j (cid:21) 2 g Again, non-asymptotic approximations to P (cid:28)" < T are very useful. 0 f g Example 3 (Empirical measure large deviations and MCMC) Con- sider an ergodic Markov chain Xi;i N0 with state space S and unique f 2 g stationary distribution (cid:25). The empirical measure or normalized occupation measure of the chain is de(cid:133)ned for n N by 2 n 1 1 (cid:0) (cid:22)n(A) = (cid:14) (A); A ( ) n Xi 2 B S i=0 X 7 where (cid:14) is the Dirac measure that puts probability 1 at x. x Under appropriate conditions, (cid:22)n converges in an appropriate topology to (cid:25) with probability one (w.p.1). In fact, this property is used in many im- portant applications in the physical and biological sciences, statistics, and elsewhere as a method of numerically approximating (cid:25). In this setting, a large deviation principle for (cid:22)n;n N will give information on the likeli- f 2 g hood that (cid:22)n is near an alternative (cid:147)target(cid:148)measure besides (cid:25). Since many chains can have the same invariant distribution, the rate function can then be used to compare the numerical e¢ ciency of the possible chains. Large deviation theory for the empirical measure is also used in many other areas, such as information theory and statistics. We will discuss this example and such an application in Lecture 11. The large deviation theory for the em- pirical measure of a Markov chain was originally developed in the papers [12, 13]. Example 4 (Queueing and data loss) An area where large deviation has been very active is in the analysis of stochastic network models, and especially models for communication. In this example we present a simple model that involves choosing parameters to achieve a desired loss rate. Figure 4: A tandem queue with two service rates The tandem queuing model is depicted in Figure 4. The second queue has a (cid:133)nite bu⁄er, and data is lost when the process reaches this bu⁄er. When the second queue is small the (cid:133)rst queue serves at rate (cid:22) . However, 1 whenthesecondqueueexceedsathresholdthe(cid:133)rstqueuereducesitsservice rate to (cid:23). The problem is to determine a threshold so that a prescribed and very small loss rate holds. Since the bu⁄er is expected to be relatively large it is scaled by n, as is the threshold, which is given by (cid:18)n with (cid:18) (0;1). The dynamics and 2 partition of the state space for the system as well as the state space for the rescaled system (introduced below) are illustrated in Figure 5. The queueing process (Q (t);Q (t)) is modeled as a jump Markov model with 1 2 the indicated rates. Since a queue cannot be negative, jump rates are zero for jumps that would cause the state to leave (N0)2. The problem of interest is to estimate quantities such as P Q exceeds n before (Q ;Q ) reaches (0;0) (1;0) 2 1 2 f g where P denotes probability given (Q (0);Q (0)) = (1;0). (1;0) 1 2 8 Figure 5: Jump rates and partition of the state space for the scaled system. Large deviation estimates can be proved for the scaled system de(cid:133)ned by (Qn(t);Qn(t)) = (Q (nt);Q (nt))=n [21]. Expressing the quantity men- 1 2 1 2 tioned above in terms of the scaled system gives P Qn exceeds 1 before (Qn;Qn) reaches (0;0) : (1=n;0) 2 1 2 f g Although this quantity involves an a priori potentially unbounded time in- terval, one can show as with the (cid:133)rst example that if the event is to occur at all, it will happen with overwhelming probability before some (cid:133)xed time T, which allows a reduction to the (cid:133)nite time LDP. Owing to the presence of boundaries and an interface across which the rates su⁄er a discontinuity, the large deviations analysis presents a number of di¢ culties, and this example falls into the category of large deviation theory for processes with (cid:147)discontinuous statistics(cid:148)[15, 16, 45]. Example 5 (Occupancy models) Consider a large number of urns into which a large number of tokens will be distributed according to some ran- domized rule. A quantity of interest in this context is the empirical measure according to the number of tokens. Thus if there are n urns and Tn tokens to be distributed, then we are interested in the distribution of : (cid:0)n(T) = (cid:0)n(T);(cid:0)n(T);:::;(cid:0)n(T);(cid:0)n (T) ; 0 1 J J+ where(cid:0)n(T)isthefractio(cid:0)nofurnsthatcontainexactlyj tok(cid:1)ens,and(cid:0)n (T) j J+ is the faction that contain strictly more than J tokens, after all have been distributed. (One can also consider the in(cid:133)nite dimensional empirical meas- ure, but to simplify we restrict to the (cid:133)nite dimensional case here.) To be speci(cid:133)c, suppose that the urn chosen for a given token is selected uniformly, and independent of the selection for all other tokens. One can 9 then consider the evolution of the occupancy vector (cid:0)n(i=n), where at (con- tinuous) time i=n exactly i tokens have been placed. The placement of the next token into the various categories indexed by j = 0;1;:::;J;J+ will be determined by the vector (cid:0)n(i=n), since each urn is equally likely to receive the next token. Let yn have the conditional distribution i P yn = j n = (cid:0)n(i=n); i i j f jF g where n = (cid:27)((cid:0)n(k=n);0 k i). Since when yn = j the class of urns of Fi (cid:20) (cid:20) i type j is reduced by 1 while the class of type j +1 is increased by 1, the dynamics of (cid:0)n(i=n) are given by (cid:0)n((i+1)=n) = (cid:0)n(i=n)+ n1 eyin+1(cid:0)eyin if j 2 f0;1;:::;Jg : (cid:0)n(i=n) if j = J+ (cid:26) (cid:0) (cid:1) This is the same scaling as we have seen in some of the other examples, and indeed one can prove an LDP for the piecewise linear continuous time process de(cid:133)ned by i i+1 (cid:0)n(t) = (cid:0)n(i=n)+[(cid:0)n((i+1)=n) (cid:0)n(i=n)](nt i); t ; : (cid:0) (cid:0) 2 n n (cid:20) (cid:21) This particular occupancy problem has a number of applications for which a large deviations analysis is relevant. One example is to the testing of random number generators [32], where the urns correspond to a (cid:133)nite uniform partition of [0;1], the tokens to U[0;1] iid random variables, and a token is assigned to an urn if the random variable falls into the correspond- ing subset of the partition. The distribution of (cid:0)n(T) gives a very sensitive measure of the degree to which the variates are truly iid U[0;1]. Another application is to the dimensioning of optical switches in communication net- works [47]. A last application is to the empirical distribution of the number of lottery players who have selected the same combination (note that in many lotteries the number of combinations and players may be on the order of 108). Other rules of placement that increase or decrease the likelihood that a given urn is selected depending on its current state are of interest, and the various schemes go by names such as Bose(cid:150)Einstein, Maxwell(cid:150)Boltzmann, and Fermi(cid:150)Dirac statistics [48]. Example 6 (Performance analysis in rare event Monte Carlo) Our (cid:133)nal example concerns the analysis of Monte Carlo schemes (such as im- portance sampling) which might be used to provide approximations more accurate than the large deviation approximation. To be concrete, we con- sider importance sampling, and its application to the (cid:133)rst example. The standard measure of accuracy for such a scheme is the variance of a single sample, and due to unbiasedness the minimization of variance is equivalent 10

Description:
May 31, 2013 these notes first appeared in A Weak Convergence Approach to the Theory of deviation principle (LDP) with rate I, if I is a rate function, and if.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.