On the Azuma inequality in spaces of subgaussian of rank p random variables Krzysztof Zajkowski Institute of Mathematics, University of Bialystok Ciolkowskiego 1M, 15-245 Bialystok, Poland [email protected] Abstract 7 1 For p > 1 let a function ϕ (x) = x2/2 if x 1 and ϕ (x) = 1/p x p p p 0 | | ≤ | | − 1/p + 1/2 if x > 1. For a random variable ξ let τ (ξ) denote inf c 0 : 2 | | ϕp { ≥ n ∀λ∈R lnEexp(λξ) ≤ ϕp(cλ)}; τϕp is a norm in a space Subϕp(Ω) = {ξ : τϕp(ξ) < a of ϕp-subgaussian random variables which we call subgaussian of rank p J r∞an}dom variables. For p = 2 we have the classic subgaussian random variables. 1 The Azuma inequality gives an estimate on the probability of the deviations 1 of a zero-mean martingale (ξn)n≥0 with bounded increments from zero. In its ] classic form is assumed that ξ = 0. In this paper it is shown a version of the R 0 Azuma inequality underassumption thatξ is any subgaussianof rankp random P 0 . variable. h t a 2010 Mathematics Subject Classification: 60E15 m Key words: Hoeffding-Azuma’s inequality, ϕ-subgaussian random variables [ 1 v 1 Introduction 9 9 0 Let (Ω, ,P) be a probability space. Hoeffding’s inequality says that for bounded 3 independFent zero-mean random variables ξ ,...,ξ such that P(ξ [a ,b ]) = 1, i = 0 1 n i i i ∈ . 1,...n, the following estimate on the probability of the deviation of their sum S = 1 n 0 n ξ from zero i=1 i 7 2ε 1 P P( S ε) 2exp v: | n| ≥ ≤ (cid:16)− ni=1(bi −ai)2(cid:17) i holds. The proof is based on Hoeffding’s lemmaP: If ξ is a random variable with mean X zero such that P(ξ [a,b]) = 1 then r a ∈ 1 Eexp(λξ) exp (b a)2λ2 ; (1) ≤ 8 − (cid:16) (cid:17) compare [5]. Let us observe that the above inequality means that the centered bounded random variable ξ is a subgaussian random variable. Let us recall the notion of the subgaussian random variable which was introduced by Kahane in [6]. A random variable ξ is 1 subgaussian if there exists a number c [0, ) such that for every λ R the following ∈ ∞ ∈ inequality holds c2λ2 Eexp(λξ) exp , ≤ 2 (cid:16) (cid:17) that is the moment generating function of ξ is majorized by the moment generating function of some centered gaussian random variables with variance c2 (see Buldygin and Kozachenko [3] or [2, Ch.1]). In terms of the cumulant generating functions this condition takes a form: lnEexp(λξ) c2λ2/2. ≤ For a random variable ξ a number τ(ξ) defined as follows c2λ2 τ(ξ) = inf c 0 : λ∈R lnEexp(λξ) ≥ ∀ ≤ 2 n o is a norm in a space Sub(Ω) = ξ L(Ω) : τ(ξ) < of subgaussian random { ∈ ∞} variables, where L(Ω) denote the family of all real valued random variables defined on Ω. The space Sub(Ω) is a Banach space with respect to the norm τ; see [2, Ch.1, Th.1.2]. Immediately by the definition of τ we have that τ(ξ)2λ2 lnEexp(λξ) . ≤ 2 By using this inequality one can show some estimate of tails distribution of ξ in the form ε2 P( ξ ε) 2exp ; | | ≥ ≤ − 2τ(ξ)2 (cid:16) (cid:17) see [2, Ch.1, Lem.1.3]. If we do not know a value of the norm τ(ξ) but know some upper bound on it then we can formulate the above inequality substituting this bound instead of τ(ξ). And so for a centered random variable ξ essentially bounded by the interval [a,b], by virtue of the inequality (1), we have that τ(ξ) (b a)/2 and we get ≤ − Hoeffding’s inequality for one variable. Because for a sum of independent subgaussian random variables the following follows n n 2 τ ξ τ(ξ )2, i i ≤ (cid:16)Xi=1 (cid:17) Xi=1 see [2, Ch.1, Lem.1.7], then combining these facts we obtain the proof of Hoeffding’s inequality; compare [2, Ch.1, Cor.1.3]. 2 Spaces of subgaussian of rank p random variables Onecangeneralizethenotionofsubgaussianrandomvariablestoclassesofϕ-subgaussian r.v.s (see [2, Ch.2]). A continuous even convex function ϕ(x) (x R) is called a N- ∈ function, if the following condition hold: 2 (a) ϕ(0) = 0 and ϕ(x) is monotone increasing for x > 0, (b) lim ϕ(x)/x = 0 and lim ϕ(x)/x = . x→0 x→∞ ∞ It is called a quadratic N-function, if in addition ϕ(x) = ax2 for all x x , with 0 | | ≤ a > 0 and x > 0. The quadratic condition is needed to ensure nontriviality for classes 0 of ϕ-subgaussian random variables (see [2, Ch.2, p.67]). Let ϕ be a quadratic N-function. A random variable ξ is said to be ϕ-subgaussian if there is a constant c > 0 such that lnEexp(λξ) ϕ(cλ). The ϕ-subgaussian standard ≤ (norm) τ (ξ) is defined as ϕ τϕ(ξ) = inf c 0 : λ∈R lnEexp(λξ) ϕ(cλ) ; { ≥ ∀ ≤ } a space Sub (Ω) = ξ L(Ω) : τ (ξ) < with the norm τ is a Banach space (see ϕ ϕ ϕ { ∈ ∞} [2, Ch.2, Th.4.1]). Now we define some class of such spaces. Definition 2.1. Let for p > 1 x2, if x 1, ϕ (x) = 2 | | ≤ p (cid:26) 1 x p 1 + 1, if x > 1. p| | − p 2 | | The functions ϕ are examples of quadratic N-functions. Let us observe that if p 1 < p′ < p then ϕp′ ≤ ϕp and, in consequence, Subϕp′(Ω) ⊂ Subϕp(Ω), since τϕp′(ξ) ≥ τ (ξ) for any ξ Sub (Ω). Let us emphasize that the spaces Sub (Ω) : p > 1 ϕp ∈ ϕp′ { ϕp } form increasing family with respect to p. For the sake of completeness our presentation we show that any centered bounded random variable is subgaussian of any rank p. In general it is ϕ-subgaussian random variable for any quadratic N-function ϕ (see [4, Ex.3.1]). Proposition 2.2. Let ϕ be an N-function such that ϕ(x) = x2/2 for x 1. If ξ is a bounded random variables with Eξ = 0 then ξ Sub (Ω). | | ≤ ϕ ∈ Proof. If ξ = 0 with probability one then 0 = ψ (λ) ϕ(cλ) ξ ≤ for λ R, c 0 and any N-function ϕ. Let us recall that if ξ is bounded but ∈ ≥ nonconstant then ψ is strictly convex on R and it follows positivity ψ′′ on whole R. ξ ξ Let ξ d almost surely then | | ≤ ψ (λ) d λ = ϕ(ϕ(−1)(d λ )) ϕ(dλϕ(−1)(1)) ξ ≤ | | | | ≤ for λ > 1/d (see [2, Ch.2, Lem. 2.3]), where ϕ(−1)(x), x 0, is the inverse function | | ≥ of ϕ(x), x 0. For λ 1, by the Taylor theorem, we get ≥ | | ≤ 1 1 ψ (λ) = ψ (0)+ψ′(0)λ+ ψ′′(θ)λ2 = ψ′′(θ)λ2 ξ ξ ξ 2 ξ 2 ξ 3 for some θ between 0 and λ. Let c = max ψ′′(λ) then ψ (λ) 1/2(√cλ)2. λ∈[−1,1] ξ ξ ≤ Let us observe that for λ 1/√c the function 1/2(√cλ)2 = ϕ(√cλ). Without | | ≤ loss of generality we may assume that d > √c that is 1/d < 1/√c. Taking b = max dϕ(−1)(1), √c we get { } ψ (λ) ϕ(bλ) ξ ≤ for every λ R which follows ξ Sub (Ω). ϕ ∈ ∈ Let ϕ(x) (x R) be a real-valued function. The function ϕ∗(y) (y R) defined ∈ ∈ by ϕ∗(y) = sup xy ϕ(x) is called the Young-Fenchel transform or the convex x∈R{ − } conjugate of ϕ (in general, ϕ∗ may take value ). It is known that if ϕ is a quadratic ∞ N-function then ϕ∗ is quadratic N-function too. For instance, since our ϕ is a dif- p ferentiable (even at 1) function one can easy check that ϕ∗ = ϕ for p,q > 1, if ± p q 1/p+1/q = 1. An exponential estimate for tails distribution of a random variable ξ belonging to the space Sub (Ω) is as follows: ϕ ε P( ξ ε) 2exp ϕ∗ ; (2) | | ≥ ≤ − τ (ξ) (cid:16) (cid:16) ϕ (cid:17)(cid:17) see [2, Ch.2, Lem.4.3]. Moreover ξ Sub (Ω) if and only if Eξ = 0 and there exist ϕ ∈ constant C > 0 and D > 0 such that ε P( ξ ε) Cexp ϕ∗ | | ≥ ≤ − D (cid:16) (cid:16) (cid:17)(cid:17) for every ε > 0; compare [2, Ch.2, Cor.4.1]. By virtueoftheabovefactsonecanshowexamples ofsubgaussian ofrankprandom variables. Example 2.3. Let q > 1 and a random variable ξ has the double Weibull distribution with a density function 1 1 g (x) = x q−1exp x q . ξ 2| | − q| | n o Then ξ Sub (Ω), where 1/p+1/q = 1. ∈ ϕp Example 2.4. Let ξ (0,1), then η = ξ 2/q E ξ 2/q Sub (Ω). ∼ N | | − | | ∈ ϕp 3 On the Azuma inequality Before we prove some general form of Azuma’s inequality, first we show some upper bound on the norm of centered bounded random variable in Sub (Ω) for any p > 1. ϕp 4 Lemma 3.1. Let ξ be a bounded random variable such that P(ξ [a,b]) = 1 and Eξ = 0. Let c denote the number (b a)/2 and d the number m∈ax a,b . Then − {− } for every p > 1 the norm τ (ξ) γ = c2/(2d) r[2(d/c)2 + 1/r 1/2)] 1/r, where ϕp ≤ r { − } r = min p,2 . { } Proof. If p 2 then r = 2 and γ = γ = c. By Hoeffding’s lemma we have that r 2 ≥ τ (ξ) c, and because for every p 2 the norm τ (ξ) τ (ξ) then the Lemma ϕ2 ≤ ≥ ϕp ≤ ϕ2 follows. Assume now that 1 < p < 2, then r = p. Let us note that there exist a very simple estimate of the cumulant generating function of ξ: ψ (λ) = lnEexp(λξ) lnEexp(d λ ) = d λ . ξ ≤ | | | | We can form some majorant of ψ with this function and Hoeffding’s bound. ξ Solving the equation dλ = c2λ2/2 we obtain λ = 2d/c2. Let us emphasize that for 1 < p < 2 a function c2λ2 c2λ2, if λ 2d f(λ) := min ,d λ = 2 | | ≤ c2 2 | | (cid:26) d λ , if λ > 2d n o | | | | c2 is a majorant of ψ . Let us observe now that f(2d/c2) = 2(d/c)2 2, since d c. ξ ≥ ≥ We find now γ such that ϕ (γ 2d/c2) = 2(d/c)2. Notice that ϕ ([ 1,1]) = [0,1/2]. p p p p − It follows that solving the equation ϕ (γ 2d/c2) = 2(d/c)2 we should use the form of p p ϕ (x) for x > 1, i.e. ϕ (x) = 1/p x p 1/p+1/2. A solution of the equation p p | | | | − 1 2 1 1 1 d 2 p γ + = 2 p p(cid:12) d(cid:12) − p 2 (cid:16)c(cid:17) (cid:12) (cid:12) (cid:12) (cid:12) has the form c2 d 2 1 1 1 p γ = p 2 + . p 2d c p − 2 n h (cid:16) (cid:17) io Let us emphasize that for p (1,2] we have the following inequality ∈ f(λ) ϕ (γ λ). p p ≤ Since f is the majorant of ψ , we get that ξ ψ (λ) ϕ (γ λ) ξ p p ≤ for every λ R. Thus, by definition of the norm τ at ξ, ∈ ϕp c2 d 2 1 1 1 r τ (ξ) γ = r 2 + , ϕp ≤ r 2d c r − 2 n h (cid:16) (cid:17) io in the case r = p, which completes the proof. 5 Remark 3.2. Let us note once again that γ = c. Because γ is the solution of the 2 p equation ϕp(γp2d/c2) = 2(d/c)2 and ϕp ϕp′ for p p′ then γp γp′. More precisely ≥ ≥ ≤ γ is strictly increasing as p is decreasing to 1. Thus there exists lim γ . Denote it p pց1 p by γ . One can check that γ = d+c2/4d. 1 1 Now we can formulate our main result. Theorem 3.3. Let ξ be a subgaussian of a rank p random variable such that τ (ξ ) 0 ϕp 0 ≤ d and (ξ ) be a martingale with bounded increments, i.e. ξ ξ d almost 0 n n≥0 n n−1 n | − | ≤ surely for n = 1,2,.... Let c denote n d2 and d the number n d . Then i=1 i i=1 i pP P ε P( ξ ε) 2exp ϕ , n q | | ≥ ≤ (cid:16)− (cid:16) r γr +dr(cid:17)(cid:17) r 0 p where 1/p+1/q = 1, r := min p,2 and γ = c2/(2d) r[2(d/c)2+1/r 1/2)] 1/r. r { } { − } Proof. First we recall an argument which gives the similar estimate on the moment generating function of martingales with bounded increments as in the case of sums of independent random variables. For the martingale (ξ ) we have n n≥0 Eexp(λξ ) = E exp(λξ )E exp(λ(ξ ξ )) , n n−1 n n−1 n−1 − F (cid:16) (cid:17) (cid:0) (cid:12) (cid:1) (cid:12) where denotes σ-field generated by random variables ξ ,ξ ,...,ξ . n−1 0 1 n−1 F Now we find a bound for E(exp(λ(ξ ξ )) ). Let η := (ξ ξ )/d . n n−1 n−1 n n n−1 n − |F − Observe that 1 η 1 a.s.. By convexity of the natural exponential function we n − ≤ ≤ get 1+η 1 η n n exp(λ(ξ ξ )) = exp(d λη ) exp(d λ)+ − exp( d λ), n n−1 n n n n − ≤ 2 2 − and, in consequence, 1 1 E exp(d λη ) exp(d λ)+ exp( d λ), n n n−1 n n F ≤ 2 2 − (cid:0) (cid:12) (cid:1) (cid:12) since E(η ) = 0. By virtue of the inequality 1/2exp(d λ) + 1/2exp( d λ) n n−1 n n |F − ≤ exp(λ2d2/2) one gets n λ2d2 Eexp(λξ ) exp n Eexp(λξ ) n n−1 ≤ 2 (cid:16) (cid:17) and, inductively, λ2 n d2 Eexp(λξ ) exp i=1 i Eexp(λξ ). n 0 ≤ P2 (cid:16) (cid:17) 6 Taking the logarithm of both sides we obtain λ2 n d2 lnE(exp(λξ ) i=1 i +lnEexp(λξ ), n 0 ≤ P2 that is n 1/2 ψ (λ) ϕ λ d2 +ψ (λ). (3) ξn ≤ 2 i ξ0 (cid:16) (cid:16)Xi=1 (cid:17) (cid:17) Becauseψ (λ) ϕ (d λ)andarandomvariableξ ξ istheboundedrandomvariable ξ0 ≤ p 0 n− 0 ( ξ ξ n d a.s.), by Lemma 3.1, we can rewrite the above estimate on ψ as | n − 0| ≤ i=1 i ξn follows P ψ (λ) ϕ (γ λ)+ϕ (d λ), (4) ξn ≤ p r p 0 where γ is as in Lemma 3.1 (c = n d2 and d = n d ). r i=1 i i=1 i The composition ϕp with thepfuPnction √r is stilPl convex. By properties of N- · functions (see [2, Ch.2, Lem.2.2]) for λ > 0 we get ϕ (γ λ)+ϕ (d λ) = ϕ r γrλr +ϕ r drλr p r p 0 p r p 0 (cid:0)p (cid:1) (cid:0)p (cid:1) ϕ λ r γr +dr , ≤ p r 0 (cid:16) p (cid:17) which combining with (4) gives ψ (λ) ϕ λ r γr +dr . ξn ≤ p r 0 (cid:16) p (cid:17) Because ϕ is the even function then the above inequality is valid for any λ. It means p that the random variable ξ Sub (Ω) and its norm τ (ξ ) r γr +dr. n ∈ ϕp ϕp n ≤ r 0 Recall that the convex conjugate ϕ∗ = ϕ , where 1/p + 1/q p= 1. By (2) and the p q above estimate of τ (ξ ) we obtain our inequality. ϕp n Remark 3.4. For ξ = 0 a.s. d = 0 and we can assume that ξ is subgaussian of rank 0 0 0 2 (classic subgaussian). Recall that if p = 2 then q = 2 and ϕ (x) = x2/2. In this case q γ = γ = c and we get the classic form of Hoeffding-Azuma’s inequality. r 2 Letusobservethatξ = 0a.s. issubgaussianofanyrankp. Consider moreprecisely 0 case 1 < p < 2. If 1 < p < 2 then the H¨older conjugate q > 2. Let us recall that for ε R ϕ (ε) ϕ (ε) and moreover ϕ (ε) > ϕ (ε) if ε > 1. Because c = γ < γ , q 2 q 2 2 p ∈ ≥ | | where c = n d2, there exists exactly one ε > 0 such that i=1 i p pP ε ε p p ϕ = ϕ , q 2 γ c (cid:16) p(cid:17) (cid:16) (cid:17) i.e. ε is the unique solution of the equation p 1 ε q 1 1 ε2 p p + = . q γ − q 2 2c (cid:16) p(cid:17) 7 If ε < ε then ϕ (ε/γ ) < ϕ (ε/c) and ϕ (ε/γ ) > ϕ (ε/c) if ε > ε . It follows that p q p c q p c p | | | | for ε > ε the estimate p | | ε P( ξ ε) 2exp ϕ n q | | ≥ ≤ − γ (cid:16) (cid:16) p(cid:17)(cid:17) is sharper than the Hoeffding-Azuma’s inequality. It means that by ξ = 0 Theorem 0 3.3 is some supplement of the Hoeffding-Azuma inequality. Moreover in this Theorem we considered the case when ξ is any subgaussian of rank p random variable. 0 Many another examples of concentrationinequalities onecanfindforinstance in[7]. Let us emphasize that most of them concern the case of independent summands. The Azumainequalityisdealtwithdependentones. LetusnotethatintheoriginalAzuma’s paper [1] are considered bounded increments satisfying some general conditions that holdformartingalesincrements. Itismostimportanttousthatwecanfindsomebound of the normof their sums inthe spaces of subgaussian of rank p randomvariables which allow us to get an estimate for the probabilities of tail distributions. Applications of such estimates may by multiple. I would like to drew attention on some application to prove of the strong laws of large numbers for dependent random variables in these spaces (see [8]). References [1] K. Azuma, Weighted sums of certain dependent random variables, Tokohu Mathe- matical Journal 19 (1967), 357-367. [2] V. Buldygin, Yu. Kozachenko, Metric Characterization of Random Variables and Random Processes, Amer.Math.Soc., Providence, RI, 2000. [3] V. Buldygin, Yu. Kozachenko, Subgaussian random variables, Ukrainian Math. J. 32 (1980), 483-489. [4] R. Giuliano Antonini, Yu. Kozaczenko, T. Nikitina, Spaces of ϕ-sub-Gaussian ran- dom variables, Rend. Accad. Naz. Sci. XL Mem. Mat. Appl. 27(5), (2003), 95-124. [5] W. Hoeffding, Probability for sums of bounded random variables, Journal of the American Statistical Association 58 (1963), 13-30. [6] J.P. Kahane, Local properties of functions in terms of random Fourier series (French), Stud. Math., 19 (no. 1), 1-25 (1960). [7] C. McDiarmid, Concentration, in Probabilistic Methods for Algorithmic Discrete Mathematics, 1998, 195-248. 8 [8] K. Zajkowski, On the strong law of large numbers for ϕ-subgaussian random vari- ables, arXiv:1607.03035. 9