Some Theoretical Results Regarding the Polygonal Distribution 7 1 0 2 Hien D. Nguyen and Geoffrey J. McLachlan ∗ n a January 17, 2017 J 7 1 ] E Abstract M The polygonal distributions are a class of distributions that can be . t defined via the mixture of triangular distributions over the unit interval. a t s The class includes the uniform and trapezoidal distributions, and is an [ alternative to the beta distribution. We demonstrate that the polygonal 1 v densities are dense in the class of continuous and concave densities with 2 bounded second derivatives. Pointwise consistency and Hellinger consis- 1 5 tency results for the maximum likelihood (ML) estimator are obtained. 4 0 Ausefulmodelselectiontheoremisstatedaswellasresultsforarelated . 1 distributionthatisobtainedviathepointwisesquareofpolygonaldensity 0 7 functions. 1 : v i X 1 Introduction r a Piecewise approximation is among the core calculus techniques for the analysis of complex functions (cf. Larson & Bengzon (2013, Ch. 1)). In particular, ∗HDN is at the Department of Mathematics and Statistics, La Trobe University, Mel- bourne. GJM is at the School of Mathematics and Physics, University of Queensland, St. Lucia. (*Correspondingauthoremail: [email protected]) 1 piecewise-linearfunctionshavebeenknowntoprovideusefulapproximationsin theanalysisofcurvesandintegrals;seeforexampletheuseoflinearsplinesfor functional approximation and the trapezoid method for quadrature in Sections 9.3 and 12.1 of Khuri (2003), respectively. In statistics, there have been numerous uses of piecewise-linear functions as models for probability distributions and densities. The simplest of these distributions is the triangular distribution, which is often used as a teaching tool for understanding the fundamentals of distribution theory (cf. Doanne (2004) and Price & Zhang (2007)). Furthermore, triangular distributions have beenusedinnumerousapplications,includingtaskcompletiontimemodelingin ProgramEvaluationandReviewTechniques(PERT),aswellasforthemodeling of securities prices on the New York Stock Exchange, and haul times modeling of civil engineering data; see Kotz & Van Dorp (2004, Ch. 1) for details. Beyondthesimpletriangulardistribution,numerouspiecewise-linearmodels havebeenproposedforthemodelingofrandomphenomena. Theseincludethe Grenander estimator for monotone densities (Grenander, 1956), triangular ker- neldensityestimators(Silverman,1986,Ch.3),thedisjointedpiecewise-density estimatorsofBeirlantetal.(1999),thetrapezoidaldistributionsofvanDorp& Kotz (2003), the general segmented distributions of Vander Wielen & Vander Wielen (2015), and the mixture of triangular distributions of Ho et al. (2016). Differing from the mixture formation of Ho et al. (2016), Karlis & Xekalaki (2001) considered the polygonal densities that are defined via the following characterization. Let X [0,1] be a random variable and let Z [g] ([g] = 1,2,...,g ). ∈ ∈ { } Suppose that P(Z =i) = πi ≥ 0 for i ∈ [g] ( gi=1πi = 1), and that the P 2 following conditional characterization holds for i [g]: ∈ 2x/θi, if x<θi, f(xZ =i)=f(x;θi)= (1) | 2(1 x)/(1 θi), if x θi, − − ≥ if θi (0,1); f(xZ =i) = 2(1 x) if θi = 0; and f(xZ =i) = 2x if θi = 1. ∈ | − | We call the distribution that is specified by the marginal density, g g f(x;θ)= πif(xZ =i)= πif(x;θi), (2) | i=1 i=1 X X a polygonal distribution with g-components and parameter θ Θg, where ∈ g Θg = θ> =(π1,...,πg,θ1,...,θg):πi 0, πi =1,θi [0,1], and i [g] . ( ≥ ∈ ∈ ) i=1 X In Karlis & Xekalaki (2008), it is shown that the polygonal distribution in- cludesasspecialcasestheuniformdistribution,aswellasthetrapezoidaldistri- butions of van Dorp & Kotz (2003). The recent reports of Kacker & Lawrence (2007) and Goncalves & Amaral-Turkman (2008) have demonstrated the po- tentialuseoftrapezoidaldistributionsinmetrologyandgenomics,respectively. AlongwiththerecentefficientalgorithmforML(maximumlikelihood)estima- tionofpolygonaldistributionsbyNguyen&McLachlan(2016),webelievethat an exploration of some theoretical properties of the polygonal distribution is now warranted as it has the potential to be a widely-applicable model for data analysis. In this article, we deduce firstly that the polygonal distribution can be con- sistently estimated via ML estimation. Secondly, we show that the polygonal distribution is dense within the class of concave functions, with respect to the supremum norm. Thirdly, we compute bounds on the Hellinger divergence (cf. 3 Gibbs & Su (2002)) bracketing entropy for the polygonal distribution. These entropyboundsallowsustodotwothings. One,wecanproduceanexponential inequality that bounds the convergence of the ML function in Hellinger diver- gence. And two, the entropy bounds allow us to deduce a penalized likelihood- based model selection criterion for choosing the number of components for an optimal polygonal distribution fit. 2 Consistency of the Maximum Likelihood Esti- mator LetXn ={Xj}nj=1 beanIID(independentandidenticallydistributed)random samplefromapolygonaldistributionwithparameterθ0> = π10,...,πg0,θ10,...,θg0 ∈ Θ0, where (cid:0) (cid:1) Θ0g =(θ∈Θg :E[L(θ;Xn)]=υs∈uΘpgE[L(υ;Xn)]). Let the log-likelihood function for the sample be n n g L(θ;Xn)= logf(Xi;θ)= log πif(Xi;θi), i=1 i=1 i=1 X X X define the ML estimator as any θˆn Θn, where ∈ Θn = θ Θ :L(θ;X )= supL(υ;X ) . g ( ∈ g n υ∈Θg n ) In Nguyen & McLachlan (2016), an efficient MM algorithm (minorization– maximization; seeHunter&Lange(2004))wasderived,forthecomputationof θˆn,whenXnisfixedatsomerealizedvalue. Wenowinvestigatetheconsistency of the ML estimator. Write the Euclidean-vector norm as ; the following k·k2 4 result is adapted from van der Vaart (1998). Lemma 1 (van der Vaart, 1998, Thm. 5.13). Assume that (i) logf(x;θ) is upper-semicontinuous, and that (ii) sup logf(X;θ) is measurable and θ∈T Esupθ∈Tlogf(X;θ) < ∞ for every sufficiently small ball T ⊂ Θg. If θ˜n is an estimator, such that L θ˜n;Xn L(θ0;Xn) o(1), for some θ0 Θ0, ≥ − ∈ (cid:16) (cid:17) then for every (cid:15)>0 and compact set K Θg, ⊂ lim P inf θ˜n θ (cid:15) and θ˜n K =0. n→∞ θ∈Θ0g(cid:13) − (cid:13)2 ≥ ∈ ! (cid:13) (cid:13) (cid:13) (cid:13) We observe that f(X;θ) is continuous for every θ Θg since it is the ∈ convexcombinationofcontinuousfunctions,andthusassumption(i)isfulfilled ascontinuityimpliessemicontinuity. Further,sincef(x;θ)isalsocontinuousin X andhasboundedsupport,wehavethefulfillmentofassumption(ii)sincef is continuousinbothxandθ,wehavesup sup logf(x;θ)< forevery x∈[0,1] θ∈T ∞ ball T∈Θ, and thus Esupθ∈Tlogf(X;θ)<∞ for every underlying polygonal distribution. Lastly, we can take θ˜n =θˆn as it fulfills the condition of Lemma 1. We therefore have the following result. Theorem 1. The ML estimator θˆn ∈Θng is consistent in the sense that lim P inf θˆn θ (cid:15) and θˆn K =0, n→∞ θ∈Θ0g(cid:13) − (cid:13)2 ≥ ∈ ! (cid:13) (cid:13) (cid:13) (cid:13) for every (cid:15)>0 and K Θg. ⊂ Remark 1. Theorem 1 makes two allowances that are important for mixture models, such as the polygonal distribution. First, the ML estimator θˆn can be one of many possible global maximizers of the log-likelihood function. This is importantasthelackofidentifiabilitybeyondapermutationofthecomponent labels (cf. Titterington et al. (1985, Sec. 3.1)) guarantees that if global max- 5 ima of the log-likelihood exists then there are at least g! of them. Secondly, andsimilarly,sincethelackofidentifiabilitybeyondapermutationalsoimplies thatforanygivengenerativedensityfunctionf(x;θ0),thereareatleastg! 1 − othermodels,whichcanbeobtainedbypermutingthecomponentlabelsofthe parameter elements, that would generate the same random variable. Thus, the allowanceforthegenerativemodeltoarisefromasetofpossibleparameterval- ues,ratherthanasingularparametervalue,providesabroaderbutmoreuseful form of consistency in the context of mixture models. We note that this form of consistency is equivalent to the quotient-space consistency of Redner (1981) andRedner&Walker(1984),aswellastheextremumestimatorconsistencyof Amemiya (1985). 3 Approximations via Polygonal Distributions We start by noting that every density of form (1) is a concave function in x [0,1],andthuswehavethefactthat(2)isalsoconcaveinx,bycomposition. ∈ Proposition 1. For any θ Θg and g N, the polygonal distribution of form ∈ ∈ (2) is concave over the domain of x [0,1]. ∈ Proof. Since each triangular component of form (1) is concave, and since (2) is a convex composition of triangular components, we obtain the desired result due to the preservation of concavity under positive summation (cf. Boyd & Vandenberghe (2004, Sec. 3.2)). Wenextconsiderregularpiecewise-linearapproximationsoftwicecontinuously- differentiable, positive, and concave functions h (on x [0,1]). That is, we ∈ consider approximations of the form h(x)≈l(x)=(cid:26)h(ti)+ h(titi)−−hti(+ti1+1)(x−ti) if ti ≤x≤ti+1, (3) 6 for i [g], where ti = (i 1)/g. The following result is well known, although ∈ − we follow the reporting of de Boor (2001, Ch. 3). Lemma 2 (de Boor, 2000, Ch. 3). For any twice continuously-differentiable function h over x [0,1], the piecewise-linear approximation l(x) (defined by ∈ (3)) satisfies the bound 1 sup |h(x)−l(x)|≤ 8g2 sup |h00(x)|. (4) x [0,1] x [0,1] ∈ ∈ Defineapolygonalapproximationasanyfunctionofform(2)overx [0,1], ∈ where θ Θ˜g+1 and ∈ Θ˜g = θ> =(π1,...,πg+1,θ1,...,θg):πi 0,θi [0,1], and i [g] . ≥ ∈ ∈ (cid:8) (cid:9) Weseektodemonstratethatpolygonalapproximationscanbeusedtoconstruct apiecewise-linearapproximationforanycontinuous,non-negative,andconcave function h. Start by setting θi = (i 1)/g, for each i [g+1]. Note the implication − ∈ that the polygonal approximation f is discontinuous only at the nodes θi. In Karlis & Xekalaki (2008), it is observed that k g π π f(x)=2(1−x) π1+ 1 iθi!+2x θii +πg+1!, i=2 − i=k+1 X X islinearineveryinterval[θk,θk+1],fork [g]. Forbrevity,thesummationsare ∈ taken to be zero if the end points are incoherent. Notingthatl(θi)=h(θi),foreachi [g+1],itsufficestomatchthevalues ∈ off(x)ateachθi,sincethelinesegmentbetweenanytwopointsinaCartesian space is uniquely determined. Thus, we are required to solve the system of 7 equations h(0)=2π1, h(1)=2πg+1, and k k 1 h(0) π h − = 2(g k+1) + i (5) (cid:18) g (cid:19) − " 2g i=2 g−i+1# X g π h(1) +2(k 1) i + , − " i 1 2g # i=k+1 − X foreachk / 1,g+1 . Thisisalinearsystemoffullrankandthusissolvable. ∈{ } Wemustfinallyvalidatethatthesolutionisnon-negativeunderourhypothesis (i.e. πi∗ ≥ 0 solves the system, for i ∈ [g]). Since h(0) ≥ 0 and h(1) ≥ 0, we have π1∗ ≥0 and πg∗+1 ≥0. For any i∈/ {1,g+1}, we can check that (g i+1)(i 1) i 1 i i 2 πi∗ = − 2g − 2h −g −h g −h −g (cid:20) (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19)(cid:21) solves (5). We obtain the desired outcome by noting that the definition of concavity implies h([a+b]/2) [h(a)+h(c)]/2, for 0 a < b 1. Via ≥ ≤ ≤ Lemma 2, we can now state our approximation theorem. Theorem 2. The class of of polygonal approximations is dense within the class of twice continuously-differentiable, non-negative, and concave functions with bounded second derivative, over the unit interval. That is, for any twice continuously-differentiable, non-negative, and concave function h with bounded second derivative, over the unit interval, there exists a g N and f(x;θ) (with ∈ θ∈Θ˜g+1), such that for every (cid:15)>0, supx∈[0,1]|h(x)−f(x;θ)|<(cid:15). Proof. For any h, we have a piecewise-linear approximation l, such that (4) holds. Wehavealsoshownthatunderthehypothesis,foranylwithgsegments, there exists a polygonal approximation with g+1 components that is equal to 8 l. Thus, one can replace l with f in (4) and note that the second derivative is bounded in order to obtain the result. Remark 2. The obtained result is a finite approximation theorem. That is, for anylevelofaccuracy(cid:15)>0,thereexistsapolygonalapproximationwithafinite numberofcomponentsg+1< thatcanprovidetherequiredegreeofaccuracy. ∞ This makes Theorem 2 comparable to denseness results for mixtures of shifted andscaleddensities,suchasDasGupta(2008,Thm. 33.2). Oneadvantagethat Theorem 2 has over DasGupta (2008, Thm. 33.2) and the likes is that we have beenabletoestablishthecomponentweightsmustbenon-negative(i.e. πi 0, ≥ i [g+1]). In comparison, no such guarantees are made by DasGupta (2008, ∈ Thm. 33.2); see also the denseness results in Cheney & Light (2000). Weconsiderinpassingarelatedfamilyofdistributionsthatcanbeobtained viathepointwisesquareofapolygonalapproximation. Thatisdefinethefamily of density functions , where S S = s(x)=f2(x;θ):θ∈Θ˜g+1,kf(x;θ)k22 =1, and g∈N n o 1/2 and kh(x)k2 = 01h2(x)dx is the L2-norm over the unit interval. Define ´ (cid:16) (cid:17) theKLdivergence(Kullback-Leibler;Kullback&Leibler(1951))fromadensity f to h (defined on x [0,1]) as K(h f) = 1h(x)logf(x)dx. The following ∈ || 0 h(x) ´ specialization of Massart (2007, Prop. 7.20) allows for the use of Theorem 2 to derive an approximation theorem for densities in class . As the specialization S is simple, we provide it without proof. Lemma 3 (Massart,2007,Prop. 7.20). Let ¯1/2 be a convex cone in such T L∞ that 1 ¯1/2 and let 1/2 be the set of elements from ¯1/2 with 2-norm equal ∈T T T L 9 to 1. If = t2 :t 1/2 , then for any density h T ∈T (cid:8) (cid:9) 2 min 1,inf K(h t) 12 inf sup h1/2(x) u(x) . (cid:26) t∈T || (cid:27)≤ u∈T¯1/2x∈[0,1](cid:12) − (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) Proposition 2. For any twice continuously-differentiable, non-negative, and concave density function h with bounded second derivative (on x [0,1]), there ∈ exists a density s , such that for any (cid:15) (0,1), infs K(h s) (cid:15). ∈S ∈ ∈S || ≤ Proof. Define ¯1/2 to be the set of polygonal approximations with potentially S any g N. Since the polygonal approximations are continuous on a bounded ∈ domain,theyareelementsof . Further,thesumandpositiveproductofany L∞ twopolygonalapproximationisanotherpolygonalapproximation,thustheclass formsaconvexcone. Theelement1isin ¯1/2 sincethepolygonaldistributions S includestheuniformdistribution. Next,define 1/2 bethesetofelementsfrom S ¯1/2 with 2-norm equal to 1, and subsequently = t2 :t 1/2 . Since S L S ∈S concavity is conserved under increasing and concave co(cid:8)mposition (c(cid:9)f. Boyd & Vandenberghe (2004, Eqn. 3.10)), the function h1/2 is concave under the hypothesis. Thus, by Theorem 2, there exists a u ¯1/2 such that ∈S inf sup h1/2(x) u(x) δ u ¯1/2x [0,1] − ≤ ∈S ∈ (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) for any δ > 0. We can then take any δ that makes (cid:15) = 12δ2 < 1 to complete the proof. 4 Hellinger Bracketing Results WeestablishedtheconsistencyoftheMLestimatorwithrespecttoconvergence inprobabilityoftheparameterestimateθˆn ∈Θng tosomeelementwithintheset ofmaximizersoftheexpectedlog-likelihoodΘ0. Sincethelog-likelihoodiscon- g 10