ebook img

On Stability of Hawkes Process PDF

0.44 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On Stability of Hawkes Process

ON STABILITY OF HAWKES PROCESS. DMYTRO KARABASH 2 Abstract. For Hawkes process (long-memory point process P with 1 (cid:0) (cid:80) (cid:1) intensity rate r(t)=λ g (t)+ h(t−τ) ) some existence and 0 0 τ<t,τ∈S 2 stability properties are studied. The new approach to Hawkes process presented in this paper allows to enlarge the class of conditions under c which the uniqueness of invariant distribution of the process can be e D shown; unlike previous results the λ is allowed to be non-Lipschitz and even discontinuous. Multi-type and other generalizations are provided. 4 1 ] R P . h t a m [ 3 v 3 7 5 1 . 1 0 2 1 : v i X r a Date: first draft May 12, 2011; last revision December 17, 2012. Partially supported National Science Foundation grants:DMS-0904701 and DARPA. 1 2 DMYTROKARABASH 1. Introduction 1.1. Hawkes Process Informally. This paper studies Hawkes Process P which is a particular type of time-homogeneous self-exciting locally-finite point process on R with long-memory. Realization of P is a random locally- finite subset S of R. Any locally-finite point process is characterized by intensity rate r(t,ω) that is the intensity of the point process at time t conditioned on the past history of the process until time t. If we denote the random set of the point process by S(ω), then this can be expressed as (cid:20) (cid:21) P #|S(ω)∩[t,t+δt]| ≥ 1|F = r(t,ω)δt+o(δt) (1) t whereF istheσ−fieldforS (ω) := S(ω)∩(−∞,t)generatedbyelementary t t events {S(ω)∩I (cid:54)= ∅} where interval I varies over all subintervals (−∞,t). It is convenient to view the process as a random subset of R := [0,∞) + andthusintensityratefunctionr(t,ω)isdefinedfort ∈ R . HawkesProcess + is defined by three R -valued functions on R . Given these three functions + + λ,h,g the intensity rate 0 (cid:88) r(t,ω) = λ(g (t)+ h(t−τ)) (2) 0 τ∈St where g is some initial condition. 0 Written above is an informal description of the process. The description is similar to ones found in the literature except for term g . Usually instead 0 of starting from initial condition g , the process is defined by giving S , i.e. 0 0 points of process before time 0. This can be done by letting (cid:88) g (s) = h(t+s−τ) (3) 0 τ∈S0 where S ia a locally finite collection of points in (−∞,0). The reason for 0 thisapproach isnotonlya minuteformalgeneralization oftheprocessbuta different perspective that we describe in the following paragraph informally and in section 2 formally. The perspective is viewing Hawkes process as a solution of a stochastic partial differential equation with jumps. In fact Hawkes process corresponds to solution of one of the simplest such equa- tion where the rate of jumps depends only on g(0), jump is always h and the continuous part of evolution is an infinitesimal shift operator. Yet this simplest example is hard to analyze in case when it is non-linear. There is a built-in invariance under time translation due to form of (2). The evolution of the impulse function g : R → R is defined by: t + + (cid:88) g (s) = g (t+s)+ h(t+s−τ) (4) t 0 τ∈St It is an F -measurable Markov process (it does not contain all the past t information; for example when h(t) = e−t). The next point of time τ ∈ S ON STABILITY OF HAWKES PROCESS. 3 has the conditional distribution (cid:20) (cid:90) t (cid:21) P[τ ≥ t|F ] = exp − λ(g (s))ds (5) t t 0 at which point g (·) jumps up by h(·). Thus g is in itself a Markovian s t process. 1.2. Results. The main two results of this paper—Theorems 2 and 3— generalize Br´emaud-Massouli´e theorem [1] in different directions and prove that under certain conditions on λ and h, for a wide class C of initial condi- tions g , the distributions of g (·) converge to the same limit. Furthermore 0 t the limiting distribution is supported on C. In addition to these theorems we also observe several facts well known for attractive systems. These are collected in Proposition 1. Rigorous state- ments of all results are in sections 4 and 5. 1.3. Historical Perspective and References. Point processes were first studied in [4] by Erlang in relevance to queueing theory and Hawkes process werefirstintroducedin[6]tostudyself-excitingpointprocesses. Forfurther references please see [2, 9]. The results of this paper are the first results that considers non-Lipschitz λ and λ that has jumps. Another related study is that of large deviations with non-Lipschitz λ but h restricted to be sum of exponentials see [13]; for general h but Lipschitz λ see [14]; and for linear λ with explicitly computed large deviation function and other limit theorems see [15]. Currently Hawkes processes are used to model many phenomena ranging from queues and population control; to mutations and virus spread; to de- faults and jumps in the markets; to neuroactivity and social-networks; and finally to modeling of artificial intelligence and creative thinking. Most of the real world applications involve use of multi-dimensional Hawkes process yet this paper considers only one-dimensional Hawkes process for purposes of clarity. The rough scope of possible generalization to multi-dimensional Hawkes process (as well as generalized λ and h) is outlined in section 7. 1.4. Example. Simplest example comes from population control theory. Consider asexual population in certain region as a subset of time where each point in time represents either immigrant into region or birth inside theregion; andthatimmigrantsaresameasnewborns. Furtherassumethat rate of immigration is α and that number of children of each member of h(t+X) population is β with each child birth-date distribution being dt where (cid:107)h(cid:107)1 X is the birth-date of the parent. This corresponds to Hawkes process with λ(z) = α+β·z andhasgiven. ThisisanexampleofaHawkesprocesswhere λ is linear and this also provides its relation to Galton-Watson trees which happens to be an easy way of seeing many properties of Hawkes processes. 4 DMYTROKARABASH 2. Rigorous definition of the process and results 2.1. Definition of Hawkes Process. Definition 1. Hawkes process is a random collection P of points in R := + [0,∞) characterized by triplet of functions (λ,h,g ) where the functions 0 λ,h,g : [0,∞) → [0,∞). The conditional Poisson intensity at time t is 0   (cid:88) rt := λg0(t)+ h(t−τ) (6) 0≤τ<t,τ∈S where the sum is over all previous points in S := S∩[0,t]. The function g t 0 describes the initial condition and the functions λ and h describe the evolu- tion of the process. We denote this Hawkes process by quadruple (P,λ,h,g). We will define the coupling of these different measures P by canonical g construction in the next subsection 3.1. It is convenient to set up the state space X of locally L R -valued 1 + functions on R . Let M(X) be the space of probability distributions on X. + Then the process can be realized as an X-valued process. Start from g ∈ 0 X and consider random evolution determined by semi-group T defined by t g = T g . Evolution has two components: deterministic flow (T g)(x) = t+s t s t g(t+x)fortuptostoppingtimeτ atwhichpoint(T g)(x) = g(τ+x)+h(x) τ with the distribution of τ = τ(g) given by (cid:20) (cid:90) t (cid:21) P [τ ≥ t] = exp − λ(g(s))ds . (7) g 0 After τ, semi-group continues as before and new clock starts. Each time stopping time occurs there is a point in P. The generator of semi-group T t is given by: A := D+λ(g(0))(θ −Id) (8) h where θ is a shift operator defined by (θ )F(g) = F(g + h) and D is a h h derivative of push-forward of time evolution defined by F(θ g)−F(g) (cid:15) DF(g) := lim (9) (cid:15)→0 (cid:15) where θ is again a shift operator defined by θ g(s) = g(s + (cid:15)). It might (cid:15) (cid:15) be slightly easier to understand in infinitesimal form A of generator A and δ test functional F: A F(g) := F(θ g)−F(g)+λ(g(0))(F(g+h)−F(g))δ+o(δ) (10) δ δ Then this X-valued process g satisfies t (cid:88) g (s) = g (t+s)+ h(t+s−τ). (11) t 0 τ∈St ON STABILITY OF HAWKES PROCESS. 5 For any initial condition g (·), we have a random Markovian evolution 0 g (·). Then λ(z ) will be the intensity of the point process, where t t (cid:88) z := g (0) = g(t)+ h(τ −t). (12) t t τ∈St It is sometimes more convenient to consider instead of S the process t Q = (N ,g ) (13) t t t where N = #(S ) is the number of jumps from time 0 to time t. Then Q t t t can be viewew as a point process via N and as Markovian process via g . t t (cid:82)∞ Remark 1. We assume h(t)dt = 1 without loss of generality. Indeed 0 (cid:82) one can always achieve this if (cid:107)h(cid:107) = h(t)dt < ∞ by observing that triples 1 h(t) g(t) (λ(z),h(t),g(t)) and (λ(z(cid:107)h(cid:107) ), , ) produce the same Hawkes process 1 (cid:107)h(cid:107)1 (cid:107)h(cid:107)1 with same distribution of P. On the other hand if (cid:107)h(cid:107)1 = ∞ and infz∈R+λ(z) > 0 then limt→∞gt(0) = ∞ a.s. and hence the stationary version of the process is going to Poisson process of intensity lim λ(z) if such a limit exists. z→∞ 2.2. Metrics ands Measures. We equip space X with Lloc metric 1 (cid:82)n (cid:88) 1 |g(s)−f(s)|ds ∀g,f ∈ X, d (g,f) = · 0 (14) X 2n 1+(cid:82)n|g(s)−f(s)|ds i=1 0 The weak convergence in Proposition 1 will take place in the space of dis- tribution on X. Definition 2. Let D(X) be Skorohod space of X-valued processes equip with Skorohod J -metric d and σ-field G generated by open balls in d and {G } 1 g is the corresponding filtration. Let g∗ be the non-initial part of impulse function: (cid:88) g∗(s) = h(t+s−τ) (15) t τ∈St and ∀g,g ∈ D(X), d∗(g,g) := d(S(g),S(g)) = d(g∗,g∗) (16) (cid:101) (cid:101) (cid:101) (cid:101) be the pull-back of d under the map S : D(X) → D(X), S : g (cid:55)→ g∗ defined according to (15) as (cid:0) (cid:1) S(g) (s) = g (s)−g (t+s) (17) t t 0 Let µ be the distribution of g —impulse-function at time s—starting f,s s from initial condition f. The distribution of the whole process g starting with initial condition g 0 will be denoted by P and the associated expectation values will be denoted g0 by E . If the initial distribution is random and given via some measure µ g0 then we denote (cid:90) (cid:90) P = P µ(df), E = E µ(df), (18) µ f µ f 6 DMYTROKARABASH Similarly P∗ ,E∗ ,P∗,E∗ will be used as analogues expressions for g∗. g0 g0 µ µ In Theorems 2 and 3 we are going to consider total variation d of P∗ TV starting from two different initial conditions which can also be viewed as total variation of the corresponding P but under a different σ-field G∗ = G ◦S−1. Furthermore let us consider d be total variation after time t that is TV,T (cid:0) (cid:1)−1 total variation with sigma-algebra G ◦ S| where operator (S| ) [T,∞) [T,∞ is defined by: (cid:0) (cid:1) (cid:0) (cid:1) S| (g) (s) = 1 g (s)−g (t+s) (19) [T,∞) t t>T t 0 3. Tools Thetwotoolswearegoingtouseare: couplingandparent-childstructure. These two allow one to get the first intuitions of the process rather quick sincethenprocesscannotbe“toodifferent”fromtheexamplewehavegiven in subsection 1.4. The coupling is done via the canonical coupling of point processes which we discuss first. 3.1. Canonical Construction. One way to construct point processes is to use canonical space. To any measure m one can associate a Poisson process that for any measurable A, would associate a random number of points in A withPoissondistributionofratem(A). InourcasecanonicalmeasurePwill be Poisson point process corresponding to Lebesgue measure on R ×R . + + This process is defined by the property that any region A ∈ R × R + + that has area |A| would have the number of points distributed according to Poisson with rate |A|. Given λ,h,g measure P induces a measure Pλ,h con- g sistentwiththedefinitionabovevialettingτ beingpartofthecorresponding point process P if there is a point on strip τ×[0,λ(z )) of a canonical plane t R ×R . In the rest of the paper we will denote P = Pλ,h due to the fact + + g that λ and h would be the same unless stated otherwise This induces a canonical coupling on the collection of the measures (P ) g indexed by initial conditions g. This coupling is no way unique but this coupling is maximum for non-decreasing λ and two different initial condi- tions one of which is strictly larger than the other (by point-wise partial ordering). The coupling is maximum since the number of points in the dif- ference of these two processes will be stochastically dominated by any other coupling. 3.2. Coupling/Stochastic Domination. Herewepresentoneparticular coupling that we use here. Lemma (Stochastic Domination). Suppose (P,λ,h,g0) and (P(cid:101),λ(cid:101),(cid:101)h,g(cid:101)0) are two Hawkes processes satisfying ∀x ∈ R+,(h(x) ≤(cid:101)h(x),g(x) ≤ g(cid:101)(x)) (20) ∀x ≤ y,λ(x) ≤ λ(cid:101)(y) (21) ON STABILITY OF HAWKES PROCESS. 7 Then there exists a coupling such that P⊂P(cid:101), i.e. S⊂S(cid:101). Note: h,(cid:101)h are not necessarily normalized. Proof. If a(t), b(t) are two intensities and if a(t) ≤ b(t) for all t, one can couple the point processes, such that the two processes jump together with rate a(t) and the second one jumps by itself at rate b(t)−a(t). This can also be done if ∀ω ∈ Ω,a(t,ω) ≤ b(t,ω). In fact this is done naturally by our construction. Hence it remains to prove that λ(gt(0)) ≤ λ(cid:101)(g(cid:101)t(0)). (22) We know that up to the first jump in P(cid:101) we have gt(0) = g0(t),g(cid:101)t(0) = g (t). However we know from (20) and (21) that (cid:101)0 λ(g0(t)) ≤ λ(cid:101)(g(cid:101)0(t)) (23) which in turn implies that (22) holds up to first jump of P(cid:101). Then we claim byinductionthatitholdsforalltimes. Indeedatthetimeofjumptheorder of g ≤ g(cid:101) is preserved because there is no jump in P before first jump of P(cid:101) while since h ≤(cid:101)h, the jump of the g(cid:101) is larger. (cid:3) 3.3. Parent-Child Structure and Branching representation. We add the following parent-child structure to obtain a random forest structure embedded in time. This is useful due to connection to Galton- Watson trees or branching processes. StartwithHawkesprocesswithfunctions(λ,h,g )andletP = {τ ,τ ,...}, 0 1 2 whereτ isanincreasingsequence, i.e. i < j =⇒ τ < τ . Letusalsodenote i j λ (z) := λ(z)−λ(0) (24) 0 Now to each τ we associate a randomly chosen parent element p(τ ) from i i P ∪{−∞}, where −∞ represents having no parent and being a root node. Given a sequence (τ ,τ ,...,τ ), the distribution of p(τ ) is independ random 1 2 i i variable given by the following law: λ(0) λ (z )g(τ ) P[p(τ ) = −∞|τ ,...τ ] = + 0 τi i (25) i 1 i λ(z ) λ(z ) z τi τi τi λ (z )h(τ −τ ) P[p(τ ) = τ |τ ,...τ ] = 1 0 τi i j (26) i j 1 i j<i λ(z ) z τi τi Note that p(τ ) is a collection of mutually independent but not identically i distributed variables. Figure 3.3 provides a particular visualization of parent-child structure that is consistend with the above description. We are specifically interested in linear case which provides us with dual description of the same process as described in the foolowing tool: Lemma (Branching Process Equivalence). When λ(z) = λ¯(z) = A + Bz roots have rate A+g(t) and each tree is branching process with number of 8 DMYTROKARABASH 5 4 3 2 1 0 2 4 6 8 10 12 Figure 1. For both λ(z) = 1 + z, h(t) = 1 . Each (1+t)2 region corresponds to children of point of corresponding color. branches having distribution Poisson(B). The distribution of age of any parent τ at the time of birth of any child τ is then given by j (cid:90) ∞ P[τ −τ > t|p(τ) = τ ] = h(s)ds. (27) j j t Proof. By the above scheme we see that the intensity of roots is given by g(τ ) P[p(τ ) = −∞]λ(z ) = λ(0)+Bz i = A+g(t). (28) i τi τi z τi NoweachnewpointcreatesanareaofsizeB(cid:107)h(cid:107) = B andhencethenumber 1 of children is Poisson(B). Now the shape of the area is given via h which then proves (27). (cid:3) Definition 3. We can further split the points into classes. All roots are generated with rate A + g(t) and in the linear case we can split the ones generated from A and the ones from g(t). The point whose root will be coming from g(t) will be called initial. Hence initial roots are those that are generated by g(t) portion and initial points are all the points in the trees of those initial roots. The set of all initial points is going to be denoted by I 4. Existence Webeginwithpreliminaryresultsregardingexistenceandergodicity. Let us consider the following hypothesis on Hawkes Process. ON STABILITY OF HAWKES PROCESS. 9 Hypothesis 1. The function λ satisfies: ∃A,B ≥ 0,∀z ∈ R,λ(z) ≤ λ¯(z) := A+Bz (29) which brings us to our first theorem: Proposition 1. Suppose Hawkes process satisfies Hypothesis 1. Then the following three statements hold: (i) Hawkes process is well defined for all times. (ii) When the Hypothesis 1 is satisfied with B < 1, there exists an invariant distribution for process g . t (iii) If in addition λ is non-decreasing and we start from 0 initial condition, i.e. g = 0, then the distribution of g converges weakly to µ , the 0 t 0 unique minimal measure and hence ergodic. Proof of Proposition 1: (i) Consider Hawkes process with rate λ¯(z) = A+ Bz; by coupling this process is dominating and by Galton-Watson repre- sentation it is well defined for all times. Hence dominated process is well defined for all times. (ii) If B < 1, then the dominating process has finite average density; then so is dominated process and we can get invariant distribution µ by taking g0 Cesaro limit along diagonalization subsequence 1 (cid:90) tk µ := lim µ ds (30) g0 tk→∞ tk 0 s,g0 where µ represents marginal distribution of the impulse-function at time s,g0 s given some initial condition g . 0 (iii)Theexpectedsizeofeachtreeisfiniteandhencewehavefinitedensity. Since λ is non-decreasing the process is order-preserving. Hence E (cid:2)f(g(t+s ),...,g(t+s ))(cid:3) (31) 0 1 k is non-decreasing in t for any non-decreasing f which implies convergence of µ to µ . s,0 0 This gives us weak convergence. Claim is that obtained invariant distri- butionareuniqueminimal(bypoint-wiseordering). Toseethisconsiderany other invariant µ and choose initial condition chosen randomly correspond- (cid:101) ing to measure µ; but then we have point-wise domination and by coupling (cid:101) we see that µ dominates µ. Hence µ is also extremal invariant measure and (cid:101) hence ergodic. (cid:3) 5. Uniqueness Definition 4. Let invariant distribution µ be supported on class of functions C, meaning that µ(C) = 1 (32) is satisfied. Then we say that the pair (λ,h) is (C,µ)-stable if starting from any initial condition g in C the distribution µ of the impulse function at 0 t,g0 10 DMYTROKARABASH time t converges to µ: d lim µ =µ (33) t→∞ t,g0 where =d signifies that the limit is in the distributional sense. Now we turn to main results of this paper. Here is our 2nd hypothesis: Hypothesis 2. The function λ is non-decreasing and it satisfies sup (λ(x+s)−λ(x)) ≤ φ(s) (34) x∈R+ for some concave non-decreasing φ satisfying (cid:90) ∞ (cid:90) ∞ φ(H(s))ds = C < ∞, where H(s) = h(t)dt (35) 0 s Theorem 2. If Hawkes process satisfies both Hypothesis 1 with B < 1 and Hypothesis 2 then pair (λ,h) is (C,µ)-stable as in definition 4 with (cid:26) (cid:90) ∞ (cid:27) C := g ∈ C(R ) : φ(g(s))ds < ∞ (36) + 0 and µ being µ from Proposition 1 part (iii). 0 Remark 2. In this theorem we relax the Lipschitz condition with constant 1 on λ that was imposed in [1] (in a slightly different form since (cid:107)h(cid:107) was not normalized). When we do have this assumption the results of [1] follow as a corollary of the above theorem 2. In fact the results follow even under weaker assumption that λ is Lipschitz with constant L for some L: ∀x,y ∈ R |λ(x)−λ(y)| ≤ L|x−y| (37) + Before we start with proof of theorem 2 let us prove the following lemma: Lemma 2.1. If µ is a stationary distribution of the impulse function g of t a Hawkes process then E [g(s)] = E [g(0)]H(s). (38) µ µ Proof. By stationarity (cid:20)(cid:90) ∞ (cid:21) (cid:90) ∞ E [g(s)] = E g(0)h(t)dt = E [g(0)] h(t)dt = E[g(0)]H(s). (39) µ µ µ s s (cid:3) Proof of Theorem 2: We use a recurrence argument. It is enough to show that exists some class C(cid:48)⊂C such that: (i) f ∈ C(cid:48) implies that P∗ and P∗ have non-trivial overlap f 0 I(P∗,P∗) := 1−d (P∗,P∗) ≥ δ > 0 (40) f 0 TV f 0 (ii) C(cid:48) is recurrent with respect to C in the sense that starting from any pointin classC the impulse functionwill enterC(cid:48) atsometime inthe future, i.e. ∀g ∈ C,P [∀t ∈ R ,g ∈/ C(cid:48)] = 0 (41) 0 g0 + t

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.