Uniqueness of the maximum likelihood k estimator for -monotone densities 0 Arseni Seregin∗ 1 0 Department of Statistics, Box 354322 2 University of Washington Seattle, WA 98195-4322 n e-mail: [email protected] a J Abstract: Weproveuniquenessofthemaximumlikelihoodestimatorfor theclassofk−monotonedensities. 2 1 AMS 2000 subject classifications:Primary62G07. Keywordsand phrases:uniqueness,k–monotonedensity,mixturemod- ] els, density estimation, maximum likelihood, nonparametric estimation, T shapeconstraints. S . h t 1. Introduction a m The family M of k-monotone densities on R is a mixture model defined by k + [ the basic density g0(x)≡k(1−x)+k−1. That means that every density f ∈Mk 1 is a multiplicative convolution of some mixing probability distribution G and v g : 0 2 +∞ 1 x 8 f(x)= g dG(y). 0 y y 7 Z0 (cid:18) (cid:19) 1 This class generalizes classes of monotone (1−monotone) and convex decreas- . 1 ing (2-monotone)densities on R . As k increases,the classes M decrease and + k 0 the class of completely monotone densities lies in the intersection of all M . k 0 Balabdaoui and Wellner [2009]provesconsistency andBalabdaoui and Wellner 1 [2007] gives asymptotic limit theory for the MLE of a k−monotone density. : v For k = 1 the uniqueness of the MLE follows from a well known character- i X ization of the Grenander estimator, for k = 2 the uniqueness was proved in r Groeneboom et al. [2001] and for k = 3 in Balabdaoui [2004]. In Jewell [1982] a the uniqueness and weak consistency was proved for the MLE of a completely monotone density. In Lindsay and Roeder [1993] a general method was devel- oped which allows to prove uniqueness of the MLE in mixture models defined by strictly totally positive kernelssuchas the exponentialkernelfor completely monotonedensities.However,fork−monotonedensitiesthekernelisnotstrictly totally positive as follows fromSchoenberg and Whitney [1953] and uniqueness of the MLE for k > 3 has not been investigated before. Below, we answer this question proving the following theorem: ∗ResearchsupportedinpartbyNSFgrantDMS-0804587 1 imsart-generic ver. 2009/08/13 file: Uniqueness-of-k-monotone-MLE.tex date: January 12, 2010 Arseni Seregin/Uniqueness of the k-monotone MLE 2 Theorem 1.1. Let k ≥2 be a positive integer and X ,...,X be i.i.d. random 1 n variables with k-monotone density f ∈M . With probability 1 the MLE fˆ is 0 k n uniquelydefinedbysomediscretemixingprobabilitymeasureGˆ .Thecardinality n of the support set supp(Gˆ ) is less or equal than n. If supp(Gˆ ) consists of n n points Y < ··· < Y , m < n then there exists a subset of order statistics 1 m X <···<X such that: (i1) (im) X <Y <X , (ij) j (ij+k) where we assume X =+∞ for l>m. Also we have Y >X . (il) m (n) 2. Proof With probability 1 we have that the order statistics are distinct: 0<X <···<X . (1) (n) We assume that X =+∞ for l>n. (l) Theproofconsistsofseverallemmas.Firstlemmaprovidesatoolforcounting zeroes of a function. Lemma 2.1. Let A⊆R be an interval. Consider a function f ∈Ck(A). Then the following inequality is true for the number of zeroes N(f,A) of f counted with multiplicities: N(f(k),A)≥N(f,A)−k. Here the multiplicity of zero x of the function f ∈ Ck(A), k ≥ 0, x ∈ A is an integer n≤k+1 such that: f(y)=···=f(n−1)(y)=0 and either n=k+1 or f(n)(y)6=0. Proof. Let x , 1≤i≤m be zeroes of f on A and n be its multiplicities. Then i i byRolle’stheoremthederivativef′ ∈Ck−1(A)hasm−1zeroesbetweenpoints x withmultiplicitiesatleast1andalsozeroesx withmultiplicitiesn −1.Thus: i i i m m ′ N(f ,A)≥m−1+ (n −1)=−1+ n =N(f,A)−1. i i i=1 i=1 X X Applying this inequality k times we obtain the result. Proof of the second lemma builds the necessary tools for the main result. Note, that discreteness of the mixing measure Gˆ actually follows from gen- n eral results about the MLE for mixture models proved in Lindsay [1983a]. We repeat a part of that argument to make our proof self-contained and also to provide referencesto the classic convexanalysis textbook Rockafellar[1970] for an interested reader. imsart-generic ver. 2009/08/13 file: Uniqueness-of-k-monotone-MLE.tex date: January 12, 2010 Arseni Seregin/Uniqueness of the k-monotone MLE 3 Lemma 2.2. There exists a set Y of points Y < ··· < Y , m ≤ n such that 1 m k−monotone density fˆ defined by a mixing probability measure Gˆ is the MLE n n if and only if supp(Gˆ )⊆Y. In other words, any MLE has the form: n m k(Y −x)k−1 fˆ (x)= a j + , (2.1) n j Yk j=1 j X where a ≥ 0, a = 1 and Y > X . Moreover, the vector (fˆ (X ))n is j j j (j) n (i) i=1 unique. P Proof. Following Lindsay [1983a, 1995] we define a curve Γ parameterized by y ∈R as: + 1 x n k(y−X )k−1 n Γ(y)= g = (i) + . y 0 y yk (cid:18) (cid:18) (cid:19)(cid:19)i=1 !i=1 Sinceg (x/y)asafunctionofyisboundedandequalszeroat0and+∞wehave 0 thatΓiscompact.Supposethatf ∈M withcorrespondingmixingprobability k measure G. Then the vector (f(X ))n belongs to the set Γ¯ =cl(conv(Γ)): (i) i=1 +∞ (f(X ))n = Γ(y)dG(y) (i) i=1 Z0 Since Γ is compact it follows that Γ¯ is also compact and Γ¯ = conv(Γ) by n (Rockafellar [1970], Theorem 17.2). Therefore, the continuous function z i=1 i attains its maximum on Γ¯, where z are the coordinates in the space Rn. Let us denote S ≡argmax n z . i Q+ Γ¯ i=1 i Since the intersection of Γ and the interior ri(Rn) of Rn is not empty we have S ⊂ ri(Rn). The fuQnction n z is strictly co+ncave, t+herefore S consists + i=1 i of a single point b = (b )n > 0. Therefore for any MLE fˆ it follows that the i i=1 Q n vector (fˆ (X ))n is equal to b and unique. The gradient of n z at b is n (i) i=1 i=1 i proportional to 1/b≡(1/b )n . We have dim(Γ¯)=n. Inidie=e1d if we consider n points t ∈(X Q,X ) then i (i) (i+1) the vectors Γ(t ) are linearly independent (Schoenberg and Whitney [1953] or i directobservation).By(Rockafellar[1970],Theorem27.4)thevector1/bbelongs to the normal cone of Γ¯ at b. Since 1/b > 0 we have b ∈ ∂Γ¯ and the plane α defined by the equation n z /b = n is a support plane of Γ¯ at b. Thus for i=1 i i v =k/nb we have: i i P n p(y)≡yk− v (y−X )k−1 ≥0, i (i) + i=1 X for all y ≥0 and p(y)=0 if and only if y =0 or Γ(y)∈α. Let us denote by Y the set of y such that Γ(y)∈α i.e. Γ(Y)=α∩Γ. The intersection α∩Γ¯ is an exposed face (Rockafellar [1970], p.162) of Γ¯. By Theorem 18.3 from Rockafellar [1970] we have α∩Γ¯ = conv(Γ(Y)) and by Theorem 18.1 we have supp(Gˆ )⊆Y. n imsart-generic ver. 2009/08/13 file: Uniqueness-of-k-monotone-MLE.tex date: January 12, 2010 Arseni Seregin/Uniqueness of the k-monotone MLE 4 Thefunctionp(y)belongstoC(k−2)(R );aty =0ithasazeroofmultiplicity + k−1andateachy ∈Y ithas a zeroofmultiplicity greateror equalthan 2.By Lemma 2.1 we have: N(p(k−2),[0,X ])≥N(p,[0,X ])−(k−2)≥2#(Y ∩[0,X ])+1. (j) (j) (j) The function p(k−2)(y) is piecewise quadratic and therefore on each interval [X ,X ] it may have at most two zeroes and on [0,X ) it has only one (i) (i+1) (1) zero at zero. Thus: 1+2(j−1)≥N(p(k−2),[0,X ])≥2#(Y ∩[0,X ])+1 (j) (j) orequivalentlythesetY hasnomorethanj−1pointson[0,X ].Inparticular, (j) the set Y contains no more than n points. Let Y < ··· < Y be the points of 1 m Y. Then we have Y >X . j (j) This implies that for any MLE fˆ the support of the corresponding mixing n measure Gˆ is a subset of Y and thus any MLE fˆ has the form: n n m k(Y −x)k−1 fˆ (x)= a j + , n j Yk j=1 j X where a ≥0, a =1. j j The last lemPma proves uniqueness of the MLE. Lemma2.3. ThediscretemixingprobabilitymeasureGˆ whichdefinesanMLE n is unique. Proof. Suppose there exist two different MLEs fˆ1 and fˆ2. Then by Lemma 2.2 n n we have: m k(Y −x)k−1 fˆl(x)= al j + n j Yk j=1 j X and therefore the function q(x) defined by: m k(Y −x)k−1 q(x)=fˆ1(x)−fˆ2(x)= s j + , n n j Yk j=1 j X where s = a1 −a2, admits at least n zeroes X and not all s are equal to j j j (i) j zero. Therefore, by Lemma 2.1: N(q(k−2),[0,X ))≥N(q,[0,X ))−(k−2)≥i−1−(k−2). i i Thefunctionq(k−2)ispiecewiselinearwithknotsatY .Therefore,N(q(k−2),[0,X ))≤ j i #{Y ∩[0,X )}+1 and: (i) #{Y ∩[0,X )}≥i−k. (i) This implies: Yi−k <X(i), imsart-generic ver. 2009/08/13 file: Uniqueness-of-k-monotone-MLE.tex date: January 12, 2010 Arseni Seregin/Uniqueness of the k-monotone MLE 5 and by Lemma 2.2 Yi−k <X(i) <Yi. Now,byCorollary1ofSchoenberg and Whitney[1953]wehavedet((Y −X )k)m > i (j) + i,j=1 0. Thus the vectors Γ(Y ), 1≤i≤m are independent and all s =0. This con- i i tradiction completes the proof. Note, that from Lemma 2.2 it follows that any mixing probability measure onafinitesetY ={Y }definesMLE.Lemma2.3showsthattheMLEisunique j andhencethevectorsΓ(Y )areindependent.Inparticular,thereexistsasubset j of order statistics X <···<X such that: (i1) (im) det (Y −X )k−1 m >0. j (il) + j,l=1 (cid:0) (cid:1) Therefore, by Corollary 1 of Schoenberg and Whitney [1953] we obtain Yj−k < X <Y . (ij) j The equality (2.1) also implies that for any MLE Y >X since otherwise m (n) the last coordinate of all vectors Γ(Y ) is equal to zero and fˆ (X ) = 0 and j n (n) this completes the proof of Theorem 1.1. Acknowledgments Iwouldliketo thankmy advisor,ProfessorJonA.Wellner,forintroducingthis problem to me and for helpful comments and suggestions. References Balabdaoui, F. (2004). Nonparametric estimation of k-monotone density: A new asymptotic distribution theory. Ph.D. thesis, University of Washington, Department of Statistics. Balabdaoui, F. and Wellner, J. (2009). Estimation of a k-monotone den- sity:characterizations,consistencyandminimaxlowerbounds. Statist.Neerl. to appear. Balabdaoui, F. and Wellner, J. A. (2007). Estimation of a k-monotone density: limit distribution theory and the spline connection. Ann. Statist. 35 2536–2564. URLhttp://dx.doi.org.offcampus.lib.washington.edu/10.1214/009053607000000262 Groeneboom, P., Jongbloed, G. and Wellner, J. A. (2001). Estimation of a convex function: characterizations and asymptotic theory. Ann. Statist. 29 1653–1698. Jewell, N. P. (1982). Mixtures of exponentialdistributions. Ann. Statist. 10 479–484. Lindsay,B. G.(1983a).Thegeometryofmixturelikelihoods:ageneraltheory. Ann. Statist. 11 86–94. Lindsay, B. G. (1983b). The geometry of mixture likelihoods. II. The expo- nential family. Ann. Statist. 11 783–792. imsart-generic ver. 2009/08/13 file: Uniqueness-of-k-monotone-MLE.tex date: January 12, 2010 Arseni Seregin/Uniqueness of the k-monotone MLE 6 Lindsay, B. G. (1995). Mixture Models: Theory, Geometry and Applications, vol. 5. NSF-CBMS Regional Conference Series in Probability and Statistics, IMS, HaywardCA. Lindsay, B. G. and Roeder, K. (1993). Uniqueness of estimation and identifiability in mixture models. Canad. J. Statist. 21 139–147. URLhttp://dx.doi.org.offcampus.lib.washington.edu/10.2307/3315807 Rockafellar,R. T.(1970). Convex analysis. PrincetonMathematicalSeries, No. 28, Princeton University Press, Princeton, N.J. Schoenberg, I. J. and Whitney, A. (1953). On Po´lya frequence functions. III. The positivity of translation determinants with an application to the interpolationproblembysplinecurves. Trans. Amer. Math. Soc.74246–259. imsart-generic ver. 2009/08/13 file: Uniqueness-of-k-monotone-MLE.tex date: January 12, 2010