ebook img

The empirical Christoffel function in Statistics and Machine Learning PDF

1.3 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The empirical Christoffel function in Statistics and Machine Learning

The empirical Christoffel function in Statistics and Machine Learning Jean B. Lasserre [email protected] LAAS-CNRS and Institute of Mathematics University of Toulouse LAAS, 7 avenue du Colonel Roche 31077 Toulouse, France. Edouard Pauwels [email protected] 7 IRIT 1 0 Universit´e Toulouse 3 Paul Sabatier 2 118 route de Narbonne, 31062 Toulouse, France. r p A Editor: 3 ] Abstract G L We illustrate the potential in statistics and machine learning of the Christoffel function, . or more precisely, its empirical counterpart associated with a counting measure uniformly s c supported on a finite set of points. Firstly, we provide a thresholding scheme which allows [ to approximate the support of a measure from a finite subset of its moments with strong asymptoticguaranties. Secondly,weprovideaconsistencyresultwhichrelatestheempirical 2 v Christoffel function and its population counterpart in the limit of large samples. Finally, 6 we illustrate the relevance of our results on simulated and real world datasets for several 8 applications in statistics and machine learning: (a) density and support estimation from 8 finite samples, (b) outlier and novelty detection and (c) affine matching. 2 0 . 1 1. Introduction 0 7 The main claim of this paper is that the Christoffel function (a tool from Approximation 1 : Theory) can prove to be very useful in statistics and Machine learning applications. The v Christoffel function is associated with a finite measure and a degree parameter d. It has i X an important history of research with strong connection to orthogonal polynomials (Szego, r 1974;DunklandXu,2001),interpolationandapproximationtheory(Nevai,1986;DeMarchi a et al., 2014). Its typical asymptotic behavior as d increases is of particular interest because, in some specific settings, it provides very relevant information on the support and the density of the associated input measure. Important references include M´at´e and Nevai (1980); M´at´e et al. (1991); Totik (2000); Gustafsson et al. (2009) in a single dimension, Bos (1994); Bos et al. (1998); Xu (1999); Berman (2009); Kroo and Lubinsky (2013) for specific multivariate settings and Kroo and Lubinsky (2012) for ratii of mutually absolutely continuousmeasures. Thetopicisstillasubjectofactiveresearch, butregardingproperties of the Christoffel function, a lot of information is already available. The present work shows how properties of the Christoffel function can be used suc- cessfully in some statistics and Machine Learning applications. To the best of our our knowledge, this is the first attempt in such a context with the recent work of Lasserre and 1 Pauwels (2016) and Malyshkin (2015). More precisely, we consider the empirical Christoffel function, a specific case where the input measure is a scaled counting measure uniformly supported on a set (a cloud) of datapoints. This methodology has three distinguishing features: (i) It is extremely simple and involves no optimization procedure, (ii) it scales linearly with the number of observations (one pass over the data is sufficient), and (iii) it is affineinvariant. Thesethreefeaturesprovetobeespeciallyimportantinalltheapplications that we consider. In Lasserre and Pauwels (2016) we have exhibited a striking property of some dis- tinguished family of sum-of-squares (SOS) polynomials (Qd)d∈N, indexed by their degree (2d ∈ N), and easily computed from empirical moments associated with a cloud of n points in Rp which we call X. The associated family of sublevel sets S = {x : Q (x) ≤ α}, α,d d for various values of α > 0, approximates the global shape of original cloud of points X. The degree index d can be used as a tuning parameter, trading off regularity of the poly- nomial Q with the fitness of the approximation of the shape of X (as long as the cloud d contains sufficiently many points). Remarkably, even with relatively low degree d, the sets S capture acurately the shape of X, and so provides a compact (algebraic) encoding of α,d the cloud. In fact the reciprocal function x (cid:55)→ Q (x)−1 is precisely the Christoffel function Λ d µn,d associated to the empirical counting measure supported on X and the degree index d. Some properties of the Christoffel function stemming from approximation theory suggest that it could be exploited in a statistical learning context by considering its empirical counterpart. Thepurposeofthisworkistopushthisideafurther. Inparticularweinvestigate(a)further properties of the Christoffel function which prove to be relevant in some machine learning applications, (b) statistical properties of the empirical Christoffel function as well as (c) further applications to well known machine learning tasks. Contributions This paper significantly extends Lasserre and Pauwels (2016) in several directions. Indeed our contribution is threefold: I. We first provide a thresholding scheme which allows to approximate the compact supportS ofameasurewithstrongasymptoticguarantees. Thisresultrigorouslyestablishes the property that, as d increases, the scaled Christoffel function decreases to zero outside S and remains positive in the interior of S. II. In view of potential applications in statistics and machine learning we provide a rationale for using the empirical Christoffel function in place of its population counterpart in the limit of large sample size. We consider a compactly supported population measure µ as well as an empirical measure µ uniformly supported on a sample of n vectors in n Rp, drawn independently from µ. For each fixed d we show a highly desirable strong asymptotic property as n increases. Namely, the empirical Christofell function Λ (·) µn,d converges, uniformly in x ∈ Rp, to Λ (·), almost-surely with respect to the draw of the µ,d random sample. III. We illustrate the benefits of the empirical Christoffel function in some important applications, mainly in statistics and machine learning. The rationale for such benefits buildsonapproximationpropertiesoftheChristoffelfunctioncombinedwithourconsistency 2 result. In particular, we first show on simulated data that the Christoffel function can be useful for density estimation and support inference. In Lasserre and Pauwels (2016) we have described how the Christoffel function yields a simple procedure for intrusion detection in networks, and here we extend these results by performing a numerical comparison with well established methods for novelty detection on a real world dataset. Finally we show that the Christoffelfunctionisalsoveryusefultoperformaffinematchingandinverseaffineshuffling of a dataset. Organisation of the paper Section 2 describes the notation and definitions which will be used throughout the paper. In Section 3 we introduce the Christoffel function, outline some of its known properties and describe our main theoretical results. Applications are presented in Section 4 where we consider both simulated and real world data as well as a comparison with well established machine learning methods. For clarity of exposition most proofs and technical details are postponed to the Appendix in Section 6. 2. Notation, definitions and Preliminary results 2.1 Notation and definitions Wefixtheambientdimensiontobepthroughoutthetext. Forexample, wewillmanipulate vectors in Rp as well as p-variate polynomials with real coefficients. We denote by X a set of p variables X ,...,X which we will use in mathematical expressions defining 1 p polynomials. We identify monomials from the canonical basis of p-variate polynomials with their exponents in Np: we associate to α = (α ) ∈ Np the monomial Xα := i i=1...p Xα1Xα2...Xαp which degree is deg(α) := (cid:80)p α = |α|. We use the expressions < and 1 2 p i=1 i gl ≤ to denote the graded lexicographic order, a well ordering over p-variate monomials. gl This amounts to, first, use the canonical order on the degree and, second, break ties in monomials with the same degree using the lexicographic order with X = a,X = b... For 1 2 example, the monomials in two variables X ,X , of degree less or equal to 3 listed in this 1 2 order are given by: 1, X , X , X2, X X , X2, X3, X2X , X X2, X3. 1 2 1 1 2 2 1 1 2 1 2 2 We denote by Np, the set {α ∈ Np; deg(α) ≤ d} ordered by ≤ . R[X] denotes the set of d gl p-variate polynomials: linear combinations of monomials with real coefficients. The degree of a polynomial is the highest of the degrees of its monomials with nonzero coefficients1. We use the same notation, deg(·), to denote the degree of a polynomial or of an element of Np. For d ∈ N, R[X] denotes the set of p-variate polynomials of degree at most d. We set d s(d) = (cid:0)p+d(cid:1), the number of monomials of degree less or equal to d. d We will denote by v (X) the vector of monomials of degree less or equal to d sorted by d ≤ , i.e., v (X) := (Xα) ∈ R[X]s(d). With this notation, we can write a polynomial gl d α∈Np d d P ∈ R[X] as P(X) = (cid:104)p,v (X)(cid:105) for some real vector of coefficients p = (p ) ∈ Rs(d) d d α α∈Np d ordered using ≤ . Given x = (x ) ∈ Rp, P(x) denotes the evaluation of P with the gl i i=1...p assignments X = x ,X = x ,...X = x . Given a Borel probability measure µ and 1 1 2 2 p p 1. For the null polynomial, we use the convention that its degree is 0 and it is ≤ smaller than all other gl monomials. 3 α ∈ Np, y (µ) denotes the moment α of µ, i.e., y (µ) = (cid:82) xαdµ(x). Finally for δ > 0 α α Rp and every x ∈ Rp, let B (x) := {x : (cid:107)x(cid:107) ≤ δ} be the closed Euclidean ball of radius δ and δ centered at x. We use the shorthand notation B to denote the closed Euclidean unit ball. Recall that its Lebesgue volume vol(B (x)) satisfies: δ p vol(B (x)) = π2 δp, ∀x ∈ Rp. δ Γ(cid:0)p +1(cid:1) 2 p+1 Furthermore, let ω := 2π 2 denote the surface of the p dimensional unit sphere in p Γ(p+1) 2 Rp+1. Throughout the paper, we will only consider measures of which all moments are finite. Moment matrix For a finite Borel measure µ on Rp denote by sup(µ) its support, i.e., the smallest closed set Ω ⊂ Rp such that µ(Rp\Ω) = 0. The moment matrix of µ, M (µ), is a matrix indexed d by monomials of degree at most d ordered by ≤ . For α,β ∈ Np, the corresponding entry gl d in M (µ) is defined by M (µ) := y (µ), the moment (cid:82) xα+βdµ of µ. When p = 2, d d α,β α+β letting y = y (µ) for α ∈ N2, we have α α 4 1 X X X2 X X X2 1 2 1 1 2 2 1 1 y y y y y 10 01 20 11 02 X y y y y y y M (µ) : 1 10 20 11 30 21 12 . 2 X y y y y y y 2 01 11 02 21 12 03 X2 y y y y y y 1 20 30 21 40 31 22 X X y y y y y y 1 2 11 21 12 31 22 13 X2 y y y y y y 2 02 12 03 22 13 04 The matrix M (µ) is positive semidefinite for all d ∈ N. Indeed, for any p ∈ Rs(d), let P ∈ d R[X] be the polynomial with vector of coefficients p; then pTM (µ)p = (cid:82) P(x)2dµ(x) ≥ d d Rp 0. We also have the identity M (µ) = (cid:82) v (x)v (x)Tdµ(x) where the integral is under- d Rp d d stood elementwise. Sum of squares (SOS) We denote by Σ[X] ⊂ R[X] (resp. Σ[X] ⊂ R[X] ), the set of polynomials (resp. polyno- d d mials of degree at most d) which can be written as a sum of squares of polynomials. Let P ∈ R[X] for some m ∈ N, then P belongs to Σ[X] if there exists a finite J ⊂ N and 2m 2m a family of polynomials P ∈ R[X] , j ∈ J, such that P = (cid:80) P2. It is obvious that j m j∈J j sum of squares polynomials are always nonnegative. A further interesting property is that this class of polynomials is connected with positive semidefiniteness. Indeed, P belongs to Σ[X] if and only if 2m ∃Q ∈ Rs(m)×s(m), Q = QT, Q (cid:23) 0, P(x) = v (x)TQv (x), ∀x ∈ Rp. (2.1) m m As a consequence, every real symmetric positive semidefinite matrix Q ∈ Rs(m)×s(m) defines a polynomial in Σ[X] by using the representation (2.1). 2m 4 Orthonormal polynomials Wedefineaclassical(Szego,1974; Dunkl andXu,2001)familyoforthonormalpolynomials, {P } ordered according to ≤ , which satisfies for all α ∈ Np α α∈Np gl d d (cid:104)P ,P (cid:105) = δ , (cid:104)P ,Xβ(cid:105) = 0, if β < α, (cid:104)P ,Xα(cid:105) > 0. (2.2) α β µ α=β α µ gl α µ Existence and uniqueness of such a family is guaranteed by the Gram-Schmidt orthonor- malization process following the ≤ ordering on monomials and by the positivity of the gl moment matrix, see for instance Dunkl and Xu (2001) Theorem 3.1.11. Let D (µ) be the lower triangular matrix of which rows are the coefficients of the d polynomials P defined in (2.2) ordered by ≤ . It can be shown that D (µ) = L (µ)−T, α gl d d where L (µ) is the Cholesky factorization of M (µ). Furthermore, there is a direct relation d d with the inverse moment matrix as M (µ)−1 = D (µ)TD (µ) (Helton et al. (2008) Proof d d d of Theorem 3.1). 3. The Christoffel function and its empirical counterpart 3.1 The Christoffel function Let µ be a finite Borel measure on Rp with all moments finite and such that its moment matrix M (µ) is positive definite for every d = 0,1,.... For every d, define the function d κ : Rp×Rp → R by: µ,d (cid:88) (x,y) (cid:55)→ κ (x,y) := P (x)P (y) = v (x)TM (µ)−1v (y), (3.1) µ,d α α d d d α∈Np d where the family of polynomials {P } is defined in (2.2) and the last equality follows α α∈Np d from properties of this family (see also Lasserre and Pauwels (2016)). The kernel (x,y) (cid:55)→ (cid:80) K(x,y) := P (x)P (y) is a reproducing kernel on L (µ) because α∈Np α α 2 (cid:90) P (X) = K(X,y)P (y)dµ(y), ∀α ∈ Np, α α that is, the (P ) are eigenvectors of the associated operator on L (µ), and so α 2 (cid:90) p(X) = K(X,y)p(y)dµ(y), ∀p ∈ R[X]. The function x (cid:55)→ Λ (x) := κ (x,x)−1 is called the Christoffel function associated with µ,d µ,d µ and d ∈ N. The following result states a fundamental extremal property of the Christoffel function. Theorem 3.1 (see e.g. Dunkl and Xu (2001); Nevai (1986)) Let ξ ∈ Rp be fixed, arbitrary. Then (cid:26)(cid:90) (cid:27) Λ (ξ) = min P(x)2dµ(x) : P(ξ) = 1 . (3.2) µ,d P∈R[X]d Rp 5 The Christoffel function plays an important role in orthogonal polynomials and the theory of interpolation and approximation, see e.g. Szego (1974); Dunkl and Xu (2001). One is particularly interested in the asymptotics of the normalized Christoffel function x (cid:55)→ s(d)Λ (x) as d → ∞. The subject has a very long history in the univariate case, µ,d see Nevai (1986) for a detailed historical account prior to the 80’s. The first quantitative asymptotic result was given in M´at´e and Nevai (1980) and was latter improved by M´at´e etal.(1991)andTotik(2000). Inthemultivariatesetting, preciseresultsareknowninsome particular cases such as balls, spheres and simplices (Bos, 1994; Bos et al., 1998; Xu, 1996, 1999; Kroo and Lubinsky, 2013) but much remains to be done for the general multivariate case. A typical example of asymptotic result is given under quite general (and technical) conditionsinKrooandLubinsky(2013,2012). Thisworkshowsthat,asd → ∞,thelimitof Λ theratii, ν,d, oftwoChristofellfunctionsassociatedto twomutuallyabsolutelycontinuous Λ µ,d measures µ and ν, converges to the density dν(x) on the interior of their common support. dµ Remark 3.2 Notice that Theorem 3.1 also provides a mean to compute the numerical value Λ (x) for x ∈ Rn, fixed, arbitrary. Then indeed (3.2) is a convex quadratic programming µ,d problem which can be solved efficiently, even in high dimension using first order methods such as projected gradient descent and its stochastic variants. This is particularly interesting when the nonsingular moment matrix M (µ) is large. d We next provide additional insights on Theorem 3.1 and solutions of (3.2). Theorem 3.3 For any ξ ∈ Rp, the optimization problem in (3.2) is convex with a unique optimal solution P∗ ∈ R[X] defined by d d κ (X,ξ) P∗(X) = µ,d = Λ (ξ)κ (X,ξ). (3.3) d κ (ξ,ξ) µ,d µ,d µ,d In addition, (cid:90) (cid:90) Λ (ξ) = P∗(x)2dµ(x) = P∗(x)dµ(x) (3.4) µ,d d d (cid:90) Λ (ξ)ξα = xαP∗(x)dµ(x) α ∈ Np. (3.5) µ,d d d The proof is postponed to Section 6. Interestingly, each of the orthonormal polynomials (Pα)α∈Np also satisfies an important and well-known extremality property. Theorem 3.4 (see e.g. Szego (1974)) Let α ∈ Np be fixed, arbitrary and let d = |α|. Then up to a multiplicative positive constant, P is the unique optimal solution of α   (cid:90) (cid:88)  min P2(x)dµ(x) : P(x) = xα+ θ xβ for some {θ } . (3.6) β β β< α P∈R[x]d  β< α gl  gl Finally, we hilight the following important property which will be useful in the sequel. Theorem 3.5 (See e.g. Lasserre and Pauwels (2016)) Λ is invariant by change of µ,d polynomial basis v , change of the origin of Rp or change of basis in Rp. d 6 3.2 When µ is the Lebesgue measure In this section we consider the important case of the Lebesgue measure on a compact set S ⊂ Rp such that cl(int(S)) = S. It is known that in this case the Christoffel function encodesinformationonthesetS;seeforexamplethediscussioninSection3.1. Inparticular, the scaled Christoffel function remains positive on the interior of S. We push this idea further and present a new result asserting that it is possible to recover the set S with strong asymptoticgarantiesbycarefulythresholdingthe correspondingscaledChristoffelfunction. For any measurable set A, denote by µ the uniform probability measure on A, that is A µ = λ /λ(A) where λ is the Lebesgue measure and λ is its restriction to A (λ (A(cid:48)) = A A A A λ for any measurable set A(cid:48)). A∩A(cid:48) Threshold and asymptotics The main idea is to use quantitative lower bounds on the scaled Christoffel function, s(d)Λ , on the interior of S (Lemma 6.2) and upper bounds outside S (Lemma 6.6). In µS,d combining these bounds one proves the existence of a sequence of thresholds of the scaled Christoffel function which estimate S in a strongly consistent manner. Let us introduce the following notation and assumption. Assumption 3.6 (a) S ⊂ Rp is a compact set such that cl(int(S)) = S. (b) {δ } is a decreasing sequence of positive numbers converging to 0. For every k ∈ N, k k∈N let d be the smallest integer such that: k 23−δk+δdkiadmk(S)dpk(cid:18)pe(cid:19)pexp(cid:18)dp2(cid:19) ≤ αk (3.7) k where diam(S) denotes the diameter of the set S, and δpω (d +1)(d +2)(d +3) α := k p k k k k λ(S)(d +p+1)(d +p+2)(2d +p+6) k k k Remark 3.7 (On Assumption 3.6) • d is well defined. Indeed, since δ is positive, the left hand side of (3.7) goes to 0 as k k k → ∞ while the right hand side remains bounded for increasing values of d . k • From the definition of dk and the fact that δk is decreasing, the sequence {dk}k∈N is non decreasing. • Given {δ } , computing d can be done recursively and only requires the knowledge k k∈N k of diam(S) and λ(S). • A similar condition can be enforced if only upper bounds on diam(S) and on λ(S) are available. In this case, replace these quantities by their upper bounds in (3.7) to obtain a similar result. 7 We are now ready to state the first main result of this section whose proof is postponed to Section 6 for sake of clarity of exposition. Recall the definition of the Hausdorff distance d (X,Y) between two subsets X,Y of Rp: H (cid:40) (cid:41) d (X,Y) = max sup inf dist(x,y),sup inf dist(x,y) . H x∈Xy∈Y y∈Y x∈X Theorem 3.8 Let S ⊂ Rp, {δk}k∈N, {αk}k∈N and {dk}k∈N satisfy Assumption 3.6. For every k ∈ N let S ⊂ Rp be the set defined by, k S := {x ∈ Rp : s(d )Λ (x) ≥ α }. k k µS,dk k Then d (S ,S) → 0 as k → ∞. H k Extension to more general probability measures Theorem 3.8 can easily be extended to probability measures that are more general than uniform distributions, in which case we consider the following alternative assumption. Assumption 3.9 (a) S ⊂ Rp is a compact set such that cl(int(S)) = S. (b) Thefunctionw: int(S) → [w ,+∞)isintegrableonint(S)withw > 0. Themeasure − − (cid:82) µ is such that for any measurable set A, µ(A) = w(x)dx and µ(S) = 1. {δ } A∩S k k∈N is a decreasing sequence of positive numbers which limit is 0. For every k ∈ N, let d k be the smallest integer such that: 23−δk+δdkiadmk(S)dpk(cid:18)pe(cid:19)pexp(cid:18)dp2(cid:19) ≤ αk (3.8) k where (d +1)(d +2)(d +3) α := w δpω k k k . k − k p(d +p+1)(d +p+2)(2d +p+6) k k k Under Assumption 3.8 we obtain the following analogue of Theorem 3.8. Theorem 3.10 Let S ⊂ Rp, w: S → [w−,+∞), {δk}k∈N, {αk}k∈N and {dk}k∈N satisfy Assumption 3.9. For every k ∈ N, let S := {x ∈ Rp : s(d )Λ (x) ≥ α }. k k µS,dk k Then d (S ,S) → 0 as k → ∞. H k The proof is postponed to Section 6. 8 3.3 Discrete approximation via the empirical Christoffel function In this Section µ is a probability measure on Rp with compact support S. We focus on the statistical setting where information on µ is available only through a sample of points drawn independently from the given distribution µ. In this setting, for every n ∈ N, let µ denote the empirical measure uniformly supported on an independent sample of n n points distributed according to µ. It is worth emphasizing that in principle Λ is easy to µn,d compute and requires the inversion of a square matrix of size s(d), see (3.1). Alternatively, the numerical evaluation Λ (x) at x ∈ Rn, fixed arbitrary, reduces to solving a convex µn,d quadratic programming problem, which can be done efficiently even in high dimension, see Remark 3.2. Our second main result is for fixed d ∈ N and relates the population Christoffel function Λ and its empirical version Λ , as n increases. We proceed by distinguishing what µ,d µn,d happensfarfromS andclosetoS. First, Lemma6.8ensuresthatbothChristoffelfunctions associated with µ and µ vanish far from S so that the influence of this region can be n neglected. On the other hand, when closer to S one remains in a compact set and the strong law of large numbers allows to conclude. Theorem 3.11 Let µ be a probability measure on Rp with compact support. Let {X } i i∈N be a sequence of i.i.d. Rp-valued random variables with common distribution µ. For n = 1,2,..., define the (random) empirical probability measure µ = 1 (cid:80)n δ . Then, for n n i=1 Xi every d ∈ N∗ such that the moment matrix M (µ) is invertible, and as n increases, it holds d that a.s. sup {|Λ (x)−Λ (x)|} −→ 0. (3.9) µn,d µ,d x∈Rp Equivalently a.s. (cid:107)Λ −Λ (cid:107) −→ 0. (3.10) µn,d µ,d ∞ where (cid:107)·(cid:107) denotes the usual “sup-norm”. ∞ A detailed proof can be found in Section 6. Theorem 3.11 is a strong result which states a highly desirable property, namely that almost surely with respect to the random draw of the sample, the (random) function Λ (·) converges to Λ (·) uniformly in x ∈ Rp as n µn,d µ,d increases. 4. Applications 4.1 Rationale In this section we describe some applications for which properties of the Christoffel function prove to be very useful in a statistical context. We only consider the case of bounded support. ArelevantpropertyofthescaledChristoffelfunctionisthatitencodesinformation on the support and the density of a population measure µ: • M´at´e et al. (1991), Totik (2000) and Kroo and Lubinsky (2012) provide asymptotic results involving the density of the input measure. 9 • Theorem 3.8 provides asymptotic results related to the support of the input measure. The support and density of a measure is of interest in many statistical applications. How- ever, theaforementionedresultsarelimitedtopopulationmeasureswhicharenotaccessible in a statistical setting. In the context of empirical Christoffel functions, Theorem 3.11 sug- gests that these properties still hold (at least in the limit of large samples) when one uses the empirical measure µ in place of the population measure µ. Combining these ideas sug- n geststouseoftheempiricalChritoffelfunctioninstatisticalapplicationssuchas(a)density estimation, (b) support inference or (c) outlier detection. This is illustrated on simulated and real world data and we compare the performance with well established methods for the same purpose. Finally, we also describe another application, namely inversion of affine shuffling, whose links with statistics are less clear. All results presented in this section are mainly for illustrative purposes. In particular, the choice of the degree d as a function of the sample size n was done empirically and a precise quantitative analysis is a topic of future research beyond the scope of the present paper. 4.2 Density estimation Most asymptotic results regarding the scaled Christoffel function suggest that the limiting behaviour involves the product of a boundary effect term and a density term. Hence if one knows both the Christoffel function and the boundary effect term, one has access to the density term. Unfortunately, this boundary term is only known in specific situations, the most typical example being the Euclidean ball. Hence, in the present state of knowledge, one of the following is assumed to hold true. • The support of the population measure µ is S and lim s(d)Λ exists (possibly d→∞ λS,d unknown). • The support is unknown but contains a set S with the same property as above. In this case, we consider the restriction of the population measure µ to S. Note that a sample from the restriction is easily obtained from a sample from µ by rejection. Λ In both cases, assuming that µ has a density h on S, it is expected that the ratii µ,d or Λ λS,d s(d)Λ µ,d , converge to h. An example of such a result in the univariate setting is the lim s(d)Λ d→∞ λS,d following. Theorem 4.1 (Theorem 5 M´at´e et al. (1991)) Suppose that µ is supported on [−1,1] with density h ≥ a > 0. Then for almost every x ∈ [−1,1], (cid:112) lim dΛ (x) → πh(x) 1−x2. µ,d d→∞ Extensions include Totik (2000) for general support and Kroo and Lubinsky (2012) for the multivariate setting. Combining Theorems 4.1 and 3.11 suggest that the empirical Christoffel function can be used for density estimation. For illustration purposes, we set µ to be the restriction of a Gaussian to [−1,1]. We perform the following experiment for given n,d ∈ N. 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.