Table Of Content

1 Optimal prefix codes for pairs of geometrically-distributed random variables Fre´de´rique Bassino, Julien Cle´ment, Gadiel Seroussi, Fellow, IEEE, and Alfredo Viola Abstract—Optimal prefix codes are studied for pairs of in- encoding run lengths (the original motivation in [1]), and dependent, integer-valued symbols emitted by a source with a in image compression when encoding prediction residuals, geometric probability distribution of parameter q, 0<q<1. By which are well-modeled by two-sided geometric distributions. encodingpairsofsymbols,itmaybepossibletoreducetheredun- Optimal codes for the latter were characterized in [3], based dancy penalty of symbol-by-symbol encoding, while preserving on some combinations and variants of Golomb codes. Codes thesimplicityoftheencodinganddecodingprocedurestypicalof 3 Golomb codes and their variants. It is shown that optimal codes basedontheGolombconstructionhavethepracticaladvantage 1 for these so-called two-dimensional geometric distributions are ofallowingtheencodingofasymboliusingasimpleexplicit 0 2 parameter-singular,inthesensethataprefixcodethatisoptimal computation on the integer value of i, without recourse to for one value of the parameter q cannot be optimal for any nontrivial data structures or tables. This has led to their n othervalueofq.Thisisinsharpcontrasttotheone-dimensional adoption in many practical applications (cf. [4],[5]). a case, where codes are optimal for positive-length intervals of the J parameterq.Thus,inthetwo-dimensionalcase,itisinfeasibleto Symbol-by-symbolencoding,however,canincursignificant 6 give a compact characterization of optimal codes for all values redundancy relative to the entropy of the distribution, even of the parameter q, as was done in the one-dimensional case. when dealing with sequences of independent, identically dis- ] Instead, optimal codes are characterized for a discrete sequence tributed random variables. One way to mitigate this problem, T of values of q that provides good coverage of the unit interval. I Specifically, optimal prefix codes are described for q = 2−1/k while keeping the simplicity and low latency of the encoding s. (k≥1),coveringtherangeq≥ 21,andq=2−k (k>1),covering and decoding operations, is to consider short blocks of d>1 c the range q < 1. The described codes produce the expected symbols, and use a prefix code for the blocks. In this paper, 2 [ reductioninredundancywithrespecttotheone-dimensionalcase, westudyoptimalprefixcodesforpairs(blocksoflengthd=2) while maintaining low complexity coding operations. 2 of independent, identically distributed geometric random vari- v Index terms—geometric distributions, prefix codes, Huffman ables, namely, distributions on pairs of nonnegative integers 3 codes, Golomb codes, codes for countable alphabets, lossless (i,j) with probabilities of the form 1 compression 4 P(i,j)=p(i)p(j)=(1 q)2qi+j i,j 0. (1) 2 − ≥ 2. I. INTRODUCTION We refer to this distribution as a two-dimensional geometric 0 In 1966, Golomb [1] described optimal binary prefix codes distribution (TDGD), defined on the alphabet of integer pairs 1 forsomegeometricdistributionsoverthenonnegativeintegers, = (i,j) i,j 0 . For succinctness, we denote a TDGD 1 A { | ≥ } namely, distributions with probabilities p(i) of the form of parameter q by TDGD(q). : v Asidefromthementionedpracticalmotivation,theproblem Xi p(i)=(1−q)qi, i≥0, is of intrinsic combinatorial interest. It was proved in [6] (see also[7])that,iftheentropy1 (cid:80) P(a)logP(a)ofadis- r for some real-valued parameter q, 0 < q < 1. In [2], these − a∈A a tribution over a countable alphabet is finite, optimal codes Golomb codes were shown to be optimal for all geometric A exist and can be obtained, in the limit, from Huffman codes distributions. These distributions occur, for example, when fortruncatedversionsofthealphabet.However,theproofdoes This work was supported in part by ECOS project U08E02, by PDT not give a general way for effectively constructing optimal project54/1782006–2008,andbyCSICproject(UniversidaddelaRepu´blica) codes, and in fact, there are few families of distributions fondos 2009–2011. A. Viola’s work was done in part while he was visiting over countable alphabets for which an effective construction GREYC, Universite´ de Caen and the Laboratoire d’Informatique Gaspard- Monge, Universite´ de Marne la Valleé, France. Parts of this paper were is known [8][9]. An algorithmic approach to building optimal presented at the 2006Data Compression Conference, and atthe 2006 IEEE codesispresentedin[9],whichcoversgeometricdistributions InternationalSymposiumonInformationTheory. and various generalizations. The approach, though, is not F.BassinoiswithLIPNUMR7030.Universite´ Paris13-CNRS,France (e-mail:[email protected]). applicable to TDGDs, as explicitly noted in [9]. J. Cle´ment is with GREYC UMR 6072, CNRS, Universite´ de Caen, Some characteristic properties of the families of optimal ENSICAEN,France(e-mail:[email protected]). codes for geometric and related distributions in the one- G. Seroussi is with Hewlett-Packard Laboratories, Palo Alto, CA 94304, USA,andwithFacultaddeIngenier´ıa,UniversidaddelaRepu´blica,Monte- dimensional case turn out not to hold in the two-dimensional video,Uruguay(e-mail:[email protected]). case. Specifically, the optimal codes described in [1] and [3] A.ViolaiswithInstitutodeComputacioń,FacultaddeIngenier´ıa,Univer- correspond to binary trees of bounded width, namely, the sidaddelaRepu´blica,Montevideo,Uruguay(e-mail:viola@fing.edu.uy). Copyright (c) 2012 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be 1logx and lnx will denote, respectively, the base-2 and the natural [email protected]. logarithmofx. 2 number of codewords of any given length is upper-bounded sion). We also derive an exact expression for the asymptotic by a quantity that depends only on the code parameters. oscillatory behavior of the redundancy of the new codes as Also, the family of optimal codes in each case partitions the q 1.Thestudyconfirmstheredundancygainsoversymbol- → parameter space into regions of positive volume, such that all by-symbol encoding with Golomb codes, and the fact that the corresponding distributions in a region admit the same the discrete sequence of codes presented provides a good optimal code. These properties do not hold in the case of approximationtothefullclassofoptimalcodesovertherange optimal codes for TDGDs. In particular, optimal codes for of the parameter q. TDGDs turn out to be parameter-singular, in the sense that Our constructions and proofs of optimality rely on the if a code is optimal for TDGD(q), then is not optimal technique of Gallager and Van Voorhis [2], which was also q q T T for TDGD(q(cid:48)) for any parameter value q(cid:48) = q. This result used in [3]. As noted in [2], most of the work and ingenuity (cid:54) is presented in Section III. (A related but somewhat dual in applying the technique goes into discovering appropriate problem, namely, counting the number of distinct trees that “guesses” of the basic components on which the construction can be optimal for a given source over a countable alphabet, iterates, and in describing the structure of the resulting codes. is studied in [10].) With the correct guesses, the proofs are straightforward. The An important consequence of this singularity is that any technique of [2] is reviewed in Section II, where we also set containing optimal codes for all values of q must be introduce some definitions and notation that will be useful uncountable, and, thus, it would be infeasible to give a throughout the paper. compact characterization of such a set, as was done in [1] or [3]forone-dimensionalcases.2 Thus,fromapracticalpointof II. PRELIMINARIES view, the best we can expect is to characterize optimal codes A. Definitions forcountablesequencesofparametervalues.Inthispaper,we We are interested in encoding the alphabet of integer A present such a characterization, for a sequence of parameter pairs (i,j), i,j 0, using a binary prefix code C (we will values that provides good coverage of the range of 0<q<1. refer to C plainl≥y as a code, the binary and prefix properties Specifically, in Section IV, we describe the construction of assumed throughout). As usual, we associate C with a rooted optimal codes for TDGD(q) with q = 2−1/k for integers (infinite) binary tree, whose leaves correspond, bijectively, to sko≥fo1r,3TDcoGvDer(iqn)gwthiethraqn=ge2q−≥k f21o,rainntdegienrsSekct>ion1,Vc,owveeridnog sdyigmitb.oTlsheinbAin,arayndcowdheweroerdeaacshsibgrnaendchtoisalsaybmelbedolwisit“hreaabdinoafrfy” the range q < 21 (thus, overall, we show optimal codes for the labels on the path from the root to the corresponding leaf. all values of q such that logq is either an integer or the The depth of a node x in a tree T, denoted depth (x), is inverseofone).Inthecase−q < 12,weobservethat,ask →∞ the number of branches on the path from the root tTo x. By (q 0),theoptimalcodesdescribedconvergetoalimitcode, extension, the depth (or height) of a finite tree is defined as → inthesensethatthecodewordforanygivenpair(a,b)remains the maximal depth of any of its nodes. A level of T is the the same for all k >k0(a,b), where k0 is a threshold that can set of all nodes at a given depth (cid:96) (we refer to this set as be computed from a and b (this limit code is also mentioned, level (cid:96)). Let nT denote the number of leaves in level (cid:96) of T (cid:96) withoutproofs,in[11]).Thecodesinbothconstructionsareof (we will sometimes omit the superscript T when clear from unboundedwidth.However,theyareregular[12],inthesense the context). We refer to the sequence nT as the profile { (cid:96)}(cid:96)≥0 that the corresponding infinite trees have only a finite number of T. Two trees will be considered equivalent if their profiles of non-isomorphic whole subtrees (i.e., subtrees consisting of are identical. Thus, for a code C, we are only interested in a node and all of its descendants). This allows for deriving its tree profile, or, equivalently, the length distribution of its recursionsandexplicitexpressionsfortheaveragecodelength, codewords. Given the profile of a tree, and an ordering of aswellasfeasibleencoding/decodingprocedures.Noticethat, in decreasing probability order, it is always possible to A to the best of our knowledge, the only case for which an defineacanonicaltree(say,byassigningleavesinalphabetical optimal code for a TDGD had been characterized prior to order; see, e.g., [13]) that uniquely defines a code for . The this work was the trivial case q = 12, in which case encoding notion of tree equivalence adopted implies that givenAa tree, each component of (i,j) separately with a unary code (i.e., a we can arbitrarily permute the nodes at any level, since such Golomb code of order one) has zero redundancy, and is thus a permutation leaves the profile invariant. This will allow us optimal (cf. also [11]). to make, without loss of generality, certain assumptions on Practical considerations, and the redundancy of the new the structure of the tree. In particular, we will often make the codes, are discussed in Section VI, where we present redun- assumption that if a tree contains, say, at least 2j leaves at a dancyplotsandcomparisonswithsymbol-by-symbolGolomb certain level (cid:96), then there is a set of 2j leaves at level (cid:96) that codingandwiththeoptimalcodeforaTDGDforeachplotted have a common ancestor4 ν at level (cid:96) j (an alphabetically value of q (optimal average code lengths for arbitrary values ordered tree, in fact, always has this pr−operty). of q were estimated numerically to sufficiently high preci- With a slight abuse of terminology, we will not distinguish betweenacodeanditscorrespondingtree(orprofile),andwill 2Loosely,byacompactcharacterizationwemeanoneinwhicheachcode is characterized by a finite number of finite parameters, which drive the 4We use the usual “family” terminology for trees: nodes have children, correspondingencoding/decodingprocedures. parents, ancestors and descendants. We also use the common convention of 3 ThesearethesamedistributionsforwhichoptimalityofGolombcodes visualizing trees with the root at the top and leaves at the bottom. Thus, wasoriginallyestablishedin[1]. ancestorsare“up,”anddescendantsare“down.” 3 refer to the same object sometimes as a tree and sometimes denoted f , is the maximum difference between the depths T as a code. Unless noted otherwise, all trees considered in this of any two leaves of T. Quasi-uniform trees T have f 1, T ≤ paper are full, i.e., every node in the tree is either a leaf or while uniform trees have f =0. In Section IV we present a T theparentoftwochildren(fulltreesaresometimesreferredto characterization of optimal codes of fringe thickness two for in the literature as complete). A tree is balanced (or uniform) 4-uniform distributions, which generalizes the quasi-uniform if it has 2k leaves, all of them at depth k, for some k 0. case. This generalization will help in the characterization of ≥ We denote such a tree by . We will restrict the use of the the optimal codes for TDGD(q), q =2−1/k. k U termsubtreetorefertowholesubtreesofT,i.e.,subtreesthat The concatenation of two trees T and U, denoted T U, is · consist of a node and all of its descendants in T. obtainedbyattachingacopyofU toeachleafofT.Regarded We call s(i,j) = i+j the signature of (i,j) . For a asacode,T U consistsofallthepossibleconcatenationst u ∈ A · · given value s = s(i,j), there are s+1 pairs with signature of a word t T with one u U. The Golomb code of order ∈ ∈ s, all with the same probability, P(s)=(1 q)2qs, under the k 1 [1], denoted G , encodes an integer i by concatenating k − ≥ distribution(1).GivenacodeC,symbolsofthesamesignature Q (imodk)withaunaryencodingof i/k (e.g., i/k ones k (cid:98) (cid:99) (cid:98) (cid:99) can be freely permuted without affecting the properties of followed by a zero). The first-order Golomb code G is just 1 interest to us (e.g., average code length). Thus, for simplicity, the unary code, whose corresponding tree consists of a root we can also regard the correspondence between leaves and withoneleafchildonthebranchlabeled’0’,and,recursively, symbols as one between leaves and elements of the multiset a copy of G attached to the child on the branch labeled ’1’. 1 Thus, we have G =Q G . ˆ= 0,1,1,2,2,2,...,s,...,s,... . (2) k k· 1 A { (cid:124) (cid:123)(cid:122) (cid:125) } s+1times C. The Gallager-Van Voorhis method In constructing the tree, we do not distinguish between When proving optimality of infinite codes for TDGDs, we different occurrences of a signature s; for actual encoding, will rely on the method due to Gallager and Van Voorhis [2], the s+1 leaves labeled with s are mapped to the symbols which is briefly outlined below, adapted to our setting and (0,s),(1,s 1),...,(s,0) in some fixed order. In the sequel, − terminology. we will often ignore normalization factors for the signature probabilities P(s) (in cases where normalization is inconse- • Define a sequence of finite reduced sources (St)∞t=0. quential), and will use instead weights w(s)=qs. The alphabet of the reduced source St is a multiset = , where is a multiset comprising the Consider a tree (or code) T for . Let U be a subtree of St Ht ∪Ft Ht A signatures 0,1,...,s 1 (with multiplicities as in (2)), T, and let s(x) denote the signature associated with a leaf x − and consists of a finite number of (possibly infinite) of U. Let F(U) denote the set of leaves of U, referred to as subseFtst of ˆ, referred to as virtual symbols, which form its fringe. We define the weight, wq(U), of U as A a partition of the remaining signatures. We naturally (cid:88) w (U)= qs(x), associate with each virtual symbol a weight equal to the q x∈F(U) sum of the weights of the signatures it contains. and the cost, Lq(U), of U as • Vboetrtiofym-thuaptHthueffsmeqanuepnrcoece(Sdut)r∞te=.0Tihsiscommepaantsibtlheatwaitfhtetrhae (cid:88) (U)= depth (x)qs(x) number of merging steps of the Huffman algorithm on Lq U thereducedsource ,onegets .Proceedrecursively, x∈F(U) St St−1 until is obtained. (the subscript q may be omitted when clear from the context). S0 When U = T, we have w (T) = (1 q)−2, and (T) =∆ • Apply the Huffman algorithm to S0. q q (o1pt−imqa)l2fLoqr(TTD)GisDt(hqe)aifverag(Te)code le(nT−g(cid:48)t)hfoofr aTn.yAtrLetreeeT(cid:48)T. is evoWlvhiinlge “thbeotstoeqmu-eunpc,”etohferiendfiuncietedcsooduercCescSotnsctarnucbteedsereesnulatss q q L ≤L from a “top-down” sequence of corresponding finite codes C , whose size grows with t, and which unfold by recursive t B. Some basic objects and operations reversal of the mergers in the Huffman procedure. One shows For α 1, we say that a finite source with probabilities that the sequence of codes (Ct)t≥0 converges to an infinite ≥ p p p , N 2, is α-uniform if p /p α. code C, in the sense that for every j 1, with codewords of 1 2 N 1 N ≥ ≥ ··· ≥ ≥ ≤ ≥ A 2-uniform source is also called quasi-uniform. An optimal C consistently sorted, the jth codeword of C is eventually t t code for a quasi-uniform source on N symbols consists of constant when t grows, and equal to the jth codeword of C. 2(cid:100)logN(cid:101) N codewords of length logN , and 2N 2(cid:100)logN(cid:101) A corresponding convergence argument on the sequence of − (cid:98) (cid:99) − codewords of length logN , the shorter codewords corre- average code lengths then establishes the optimality of C. (cid:100) (cid:101) sponding to the more probable symbols [2]. We refer to such This method was successfully applied to characterize in- a code (or the associated tree) also as quasi-uniform, denote finite optimal codes in [2] and [3]. While the technique is it by Q , and denote by Q (i) the codeword it assigns to straightforward once appropriate reduced sources are defined, N N the symbol associated with p , 1 i N. For convenience, we the difficulty in each case is to guess the structure of these i ≤ ≤ define Q as a null code, which assigns code length zero to source.Inasense,thisisaself-bootstrappingprocedure,where 1 the single symbol in the alphabet. Clearly, for integers k 0, one needs to guess the structure of the codes sought, and use ≥ we have Q = . The fringe thickness of a finite tree T, that structure to define the reduced sources, which, in turn, 2k k U 4 serve to prove that the guess was correct. We will apply the givenanarbitraryvalueq(cid:48) intheinterval,encodingTDGD(q(cid:48)) Gallager-VanVoorhismethodtoproveoptimalityofcodesfor with the best available code from the characterized set results certainfamiliesofTDGDsinSectionsIVandV.Ineachcase, in relatively low added redundancy, and yields the expected we will emphasize the definition and structure of the reduced redundancy gains over optimal symbol-by-symbol encoding sources, and show that they are compatible with the Huffman with Golomb codes. procedure. We will omit the discussion on convergence, and WewillproveTheorem1throughaseriesoflemmas,which theformalinductionproofs,sincetheargumentsareessentially will shed more light on the structure of optimal trees for the same as those in [2] and [3]. TDGDs. For simplicity, we assume throughout that a fixed optimal tree is given (for a given value of q). q T III. PARAMETER-SINGULARITYOFOPTIMALCODESFOR Lemma 1: Leaves with a given signature s are found in at TDGDS most two consecutive levels of Tq. Proof: Let d and d denote, respectively, the minimum 0 1 In the case of one-dimensional geometric distributions, the andmaximumdepthsofaleafwithsignaturesin .Assume, unit interval (0,1) is partitioned into an infinite sequence of Tq contrary to the claim of the lemma, that d > d +1. We 1 0 cseomdei-Gopenisionptetirmvaallsf(oqrka−ll1,vqaklu],esko≥ft1h,esduicsthribthuatitotnhepaGraomloemtebr transform Tq into a tree Tq(cid:48) as follows. Pick a leaf with k signature s at level d , and one at level d . Place both 0 1 q in (q ,q ]. Specifically, for k 0, q is the (unique) nonnegakt−iv1e rkoot of the equation qk+≥qk+1−k1=0 [2]. Thus, saingninattuerrnesalsnaosdceh.ilPdirceknaonfythseiglneaaftuartelesv(cid:48)eflrdo0m,wahliecvheblesctroimctelys we have q = 0, q = (√5 1)/2 0.618,q 0.755, etc. 0 1 − ≈ 2 ≈ deeper than d1, and move it to the vacant leaf at level d1. A similar property holds in the case of two-sided geometric Tracking changes in the code lengths corresponding to the distributions [3], where the two-dimensional parameter space affected signatures, and their effect on the cost, we have is partitioned into a countable sequence of patches such that all the distributions with parameter values in a given patch ( (cid:48))= ( )+qs(d d +2) qs(cid:48)δ, (3) Lq Tq Lq Tq 0− 1 − admit the same optimal code. In this section, we prove that, insharpcontrasttotheseexamples,optimalcodesforTDGDs where δ is a positive integer. By our assumption, the quantity are parameter-singular,in thesense thata codethat isoptimal multiplying qs in (3) is non-positive, and we have Lq(Tq(cid:48)) < for a certain value of the parameter q cannot be optimal for q( q),contradictingtheoptimalityof q.Therefore,wemust L T T any other value of q. More formally, we present the following have d1 d0+1. ≤ result. A gap in a tree T is a non-empty set of consecutive levels Theorem 1: Let q and q be real numbers in the interval containing only internal nodes of T, and such that both the 1 (0,1),withq =q ,andlet beanoptimaltreeforTDGD(q). level immediately above the set (assuming the set does not 1 q Then, is n(cid:54)ot optimal foTr TDGD(q ). include level 0) and the level immediately below it contain at q 1 T least one leaf each. The corresponding gap size is defined as Remark.ItfollowsfromTheorem1thatanysetcontaining the number of levels in the gap. It follows immediately from an optimal code for each distribution TDGD(q), for all values Lemma1thatinanoptimaltree,ifthelargestsignatureabove of q, must be uncountable. This implies, in turn, that most a gap is s, then the smallest signature below the gap is s+1. optimal codes for TDGDs do not have finite descriptions, Lemma 2: Let k =1+ logq−1 . Then, for all sufficiently in sharp contrast with the one-dimensional case. From an (cid:98) (cid:99) large s, the size g of any gap between leaves of signature s algorithmic point of view, then, the key question is for what and leaves of signature s+1 in satisfies g k 1. “interesting” countable sets of values of q a full character- Tq ≤ − Proof: We consider the cases q > 1, q = 1, and q < 1 ization of optimal codes is possible. In a theoretical sense, 2 2 2 separately. perhaps the ultimate such set would be that of all values Case q > 1. In this case, we have k = 1, and the claim of of q which have finite descriptions (more formally, the set 2 the lemma means that there can be no gaps in the tree from of computable values of q relative to some universal Turing a certain level on. Assume that there is a gap between level d machine; see, e.g., [14]). For this set, the goal would be to withsignaturess,andleveld(cid:48) withsignaturess+1,d(cid:48) d 2. obtain a general procedure which, given a finite description − ≥ By Lemma 1, all signatures s+1 are either in level d(cid:48) or in of q, and a pair (i,j), produces the corresponding codeword level d(cid:48) +1. Without loss of generality, we can assume that in an optimal code for TDGD(q). A somewhat less ambitious thereisasubtreeof ofheightatmosttwo,rootedatanode theoretical goal, although probably not less valuable from a Tq v of depth d(cid:48) 1 d+1, and containing at least two leaves practicalpointofview,wouldbetocharacterizeoptimalcodes − ≥ of signature s+1. Hence, the weight of the subtree satisfies for a dense countable set of values of q, e.g., all rational valuesofq,orallvaluesofq suchthatlogq isrational.These w(v) 2qs+1 >qs, comprehensivecharacterizationsappearquitechallenging,and ≥ remain open problems. In Sections IV and V we characterize and switching a leaf s on level d with node v on level d(cid:48) 1 − optimalcodesfora“smaller”infinitecountablesetofTDGDs, decreases the cost of , in contradiction with its optimality q T namely, the set of distributions TDGD(q) such that logq is (when switching nodes, we carry also any subtrees rooted − either a positive integer or the inverse of one. It will turn at them). Therefore, there can be no gap between the level out, as will be shown in Section VI, that this set provides containing signatures s and s+1, as claimed. Notice that this good coverage of the interval 0 < q < 1, in the sense that, holds for all values of s, regardless of level. 5 the alphabet of a TDGD. Next, we estimate the change, ∆, in cost due to this transformation. We have ∆= ( (cid:48)) ( )=qs 2(cid:96)−2qs(cid:48). Lq Tq −Lq Tq − The term qs is due to the increase, by one, in the code length for the signature s, which causes an increase in cost, while the term 2(cid:96)−2qs(cid:48) is due to the decrease in code length for − 2(cid:96)−2 signatures s(cid:48), which produces a decrease in cost. Since is optimal, we must have ∆ 0, namely, q T ≥ (cid:16) (cid:17) 0 qs 2(cid:96)−2qs(cid:48) =qs 1 2(cid:96)−2qs(cid:48)−s , ≤ − − and thus, 2(cid:96)−2qs(cid:48)−s 1, from which the lower bound in (4) ≤ follows.(Note:clearly,theconditions 2(cid:96)−1 1wouldhave Fig.1. Treetransformations. ≥ − sufficedtoprovethelowerbound;thestricterconditionofthe lemma will be required for the upper bound, and was adopted Case q = 1. In this case, the TDGD is dyadic, the optimal here for uniformity.) 2 profile is uniquely determined, and it and has no gaps (the Toprovetheupperbound,weapplyadifferentmodification Copatsiemaql<pro1fi.lAesissutmhaettohfatGs1·G21k). 2, and that there is a gap atossTuqm.eH, ewreit,hwouetlloocsasteof2(cid:96)g+e1nesrigalniatytu,rtehsats(cid:48)thaetselevseiglnLat(usr(cid:48)e),saanrde of size g b2etween signatures≥s at l−evel d, and signatures s+1 the leaves of a balanced tree of height (cid:96)+1, rooted at a node at level d+g+1. Signatures s+1 may also be found at level ν of depth L(s) 1. The availability of the required number − d+g+2. Without loss of generality, and by our assumption of leaves at level L(s(cid:48)) is guaranteed by the conditions of on s, we can assume that there is a subtree of rooted at a the lemma. We then exchange ν with a leaf of signature s at q nodev atleveld+g+1 k,andcontainingatlTeast2k leaves level L(s). The situation, after the transformation, is depicted with signature s+1, incl−uding some at level d+g+1. Thus, in Figure 1(B). The resulting change in cost is computed as we have follows. w(v) 2kqs+1 >qs =w(s), ∆= ( (cid:48)) ( )= qs+2(cid:96)+1qs(cid:48). ≥ Lq Tq −Lq Tq − the second inequality following from the definition of k. As before,we must have ∆ 0, fromwhich the upper bound ≥ Therefore, we must have d+g+1 k d, or equivalently, follows. − ≤ g k 1, for otherwise exchanging v and s would decrease We are now ready to prove Theorem 1. ≤ − the cost, contradicting the optimality of . ProofofTheorem1: Weassume,withoutlossofgeneral- q Next, we bound the rate of change of sTignature magnitudes ity,that q1 >q,andwewrite q1 =q(1+ε),0<ε<q−1 1. − as a function of depth in an optimal tree. Together with the In q, choose a sufficiently large signature s (the meaning T bound on gap sizes in Lemma 2, this will lead to the proof of of “sufficiently large” will be specified in the sequel), and a Theorem 1. It follows from Lemma 1 that for every signature node of signature s at level L(s). Let s(cid:48) > s be a signature s 0 there is a level of containing at least one half of such that (cid:96)=∆ L(s(cid:48)) L(s) 2. We apply the transformation q ≥ T − ≥ the s+1 leaves with signature s. We denote the depth of this of Figure 1(A) to , yielding a modified tree (cid:48). We claim Tq Tq level by L(s) (with some fixed policy for ties), dependence that when weights are taken with respect to TDGD(q ), and 1 on being understood from the context. with an appropriate choice of the parameter (cid:96), (cid:48) will have Tq Tq Lemma 3: Letsbeasignature,and(cid:96) 2apositiveinteger strictly lower cost than . Therefore, is not optimal for q q ≥ T T such that s 2(cid:96)+2 1, and such that L(s(cid:48)) = L(s)+(cid:96) for TDGD(q1). To prove the claim, we compare the costs of q some signatu≥re s(cid:48) >−s. Then, for Tq, we have and Tq(cid:48) with respect to TDGD(q1). Reasoning as in the proTof of the lower bound in Lemma 3, we write (cid:96) 2 (cid:96)+1 log−q−1 ≤ s(cid:48)−s ≤ logq−1 . (4) ∆=Lq1(Tq(cid:48))−Lq1(Tq)=q1s−2(cid:96)−2q1s(cid:48) (cid:16) (cid:17) (cid:18) (cid:96)+1 (cid:19) tion oPfroLo(fs:(cid:48)S),intcheerse(cid:48)a>resm≥or2e(cid:96)+th2a−n12(cid:96)>−22(cid:96)l−ea1v−es1,wbiyththseigdneafitunrie- =q1s 1−2(cid:96)−2q1s(cid:48)−s ≤q1s 1−2(cid:96)−2q1logq−1 (5) s(cid:48) at level L(s(cid:48)). We perform the following transformation where the last inequality follows from the upper bound in (depicted in Figure 1(A)) on the tree , yielding a modified Lemma 3. It follows from (5) that we can make ∆ negative if q T treeTq(cid:48):ChoosealeafwithsignaturesatlevelL(s),andgraft (cid:96) 2+ (cid:96)+1 logq >0. toitatreewithaleftsubtreeconsistingofaleafwithsignature − logq−1 1 s (“moved” from the root of the subtree), and a right subtree that is a balanced tree of height (cid:96) 2 with 2(cid:96)−2 leaves of Writing q1 in terms of q and ε, and after some algebraic signature s(cid:48). These signatures come−from 2(cid:96)−2 leaves at level manipulations, the above condition is equivalent to L(s(cid:48)) of , which are removed. It is easy to verify that the logq−1 Tq (cid:96)>3 1. (6) modified tree Tq(cid:48) defines a valid, albeit incomplete, code for log(1+ε) − 6 Hence, choosing a large enough value of (cid:96), we get ∆ < 0, andweconcludethatthetree q isnotoptimalforTDGD(q1), (A) T (B) 2 subject to an appropriate choiTce of s, which we discuss next. (cid:10)(cid:74) (cid:10)(cid:74)Tg (cid:10) (cid:74) (cid:10) (cid:74) The argument above relies strongly on Lemma 3. We recall (cid:114) (cid:114) thatinorderforthislemmatohold,(cid:96)andthesignaturesmust T1 T2 gTg2 gTg1 satisfy the condition s 2(cid:96)+2 1. Now, it could happen that, ≥ − Fig.2. Graphicalrepresentationsfortreeswithassociatedweights. after choosing (cid:96) according to (6) and then s according to the condition of Lemma 3, the level L(s)+(cid:96) does not contain 2(cid:96)−2 signaturess(cid:48) asrequired(e.g.,whenthelevelispartofa 2) NoticethatC (i,j)concatenatesthe“unary”partsofthe gap).Thiswouldforceustoincrease(cid:96),whichcouldthenmake k codewordsforiandj inaGolombcodeoforderk(asif s violate the condition of the lemma. We would then need to encoding i and j separately), but encodes the “binary” increase s, and re-check (cid:96), in a potentially vicious circle. The part jointly by means of T , which, in general, does bound on gap sizes of Lemma 2 allows us to avoid this trap. k not yield the concatenation of the respective “binary” The bound in the lemma depends only on q and thus, for a parts Q (i) and Q (j). However, when k = 1 and given TDGD, it is a constant, say g . Thus, first, we choose a k k q k =2,C isequivalenttothefullconcatenationG G . value(cid:96)0 satisfyingtheconstrainton(cid:96)in(6).Then,wechoose When k k= 1, the code T is void, and C = G k·Gk. s ≥ 2(cid:96)0+gq+4. Now, we try (cid:96) = (cid:96)0,(cid:96)0 + 1,(cid:96)0 + 2,..., in The parameter in this cakse is q = 1, th1e geo1m·etri1c succession,and checkwhetherlevel L(s)+(cid:96) containsenough 2 distribution is dyadic, and the code redundancy is zero. oftherequiredsignatures.ByLemmas1and2,anappropriate When k = 2, we have q = 1/√2 and the finite levelL(s(cid:48))willbefoundforsome(cid:96) (cid:96) +g +2.Forsucha ≤ 0 q source ˆ has four symbols with respective weights valueof(cid:96),wehave2(cid:96)+2 1 2(cid:96)0+gq+4 1<s,satisfyingthe Ak − ≤ − 1, √2/2, √2/2, 1/2 . This source is quasi-uniform, conditionofLemma3.Thiscondition,inturn,guaranteesalso { } and, therefore, it admits Q as an optimal tree. This is that there are at least 2(cid:96)−2 signatures s(cid:48) at L(s(cid:48)), as required. 4 a balanced tree of depth two, which can also be written as Q = Q Q . Thus, we have C = G G . Later 4 2 2 2 2 2 · · on in the section, in Corollary 1, we will show that IV. OPTIMALCODESFORTDGDSWITHq =2−1/k this situation will not repeat for larger values of k: the It follows from the results of Section III that it is infeasible “symbol bysymbol” code G G is strictlysuboptimal k k toprovideacompactdescriptionofoptimalcodesforTDGDs for TDGD(2−1/k) when k >2·. covering all values of the parameter q, as can be done with In deriving the proof of Theorem 2 and in subsequent sec- one-dimensional geometric distributions [1], [2] or their two- tions,weshallmakeuseofthefollowingnotationstodescribe sided variants [3]. Instead, we describe optimal prefix codes and operate on some infinite trees with weights associated to for a discrete sequence of values of q, which provide good their leaves. We denote by v the trivial tree consisting of a coverage of the parameter range. In this section, we study single node (leaf) of weight v. Given a tree T and a scalar optimal codes for TDGDs with parameters q = 2−1/k for integers k 1, i.e., q 1, while in Section V we consider g, gT denotes the tree T with all its weights multiplied by ≥ ≥ 2 g. Given trees T and T , the graphic notation in Figure 2(A) parameters of the form q = 2−k, k > 1, covering the range 1 2 q < 1 (thetwoparametersequencescoincideatk =1,q = 1, represents a tree T consisting of a root node with T1 as its 2 2 left subtree and T as its right subtree, each contributing its whichwechoosetoassigntothecasecoveredinthissection). 2 respective leaf weights. The multiset of weights associated withT istheunionofthemultisetsassociatedwithT andT . A. Initial characterization of optimal codes for q =2−1/k 1 2 We will also use the notation [T T ] to represent the forest 1 2 The following theorem characterizes optimal codes for consistingoftheseparatetreesT andT ,whichhasthesame 1 2 TDGDs of parameter q = 2−1/k, k 1, in terms of unary associated multiset of weights as the tree T of Figure 2(A), ≥ codes and Huffman codes for certain finite distributions. In but a different underlying graph. We denote by 1 the tree of Tg Subsection IV-C we further refine the characterization by a unary code whose leaf at each depth i 1 has weight gi, ≥ providing explicit descriptions of these Huffman codes. and by 2 the structure in Figure 2(B). It is readily verified q =Th2e−or1e/mk,2k: A1n,oipstgimivaelnpbreyfixcodeCk forTDGD(q),with twhiatthTega2cTchgororfestphoenids to1 tlheeavceosnactatdeenpatthioni of2twoofun2arycacrroydiensg, ≥ − ≥ Tg Ck(i,j)=Tk(imodk,j modk)·G1((cid:4)ki(cid:5))·G1((cid:4)kj(cid:5)), cwoerirgehsptogni.dIsntoparthtiecuolaprt,imasalshtorewenfionrFtihgeurdey3ad,itcheTtDreGeDq−w2Titqh2 where G1 is the unary code, and Tk, referred to as the top q = 21, where each leaf is weighted according to the signature code, is an optimal code for the finite source defined by the of the symbol it encodes. following symbol set and respective weights: The following lemma follows directly from the above ˆ = (i,j) 0 i,j <k , w(i,j)=qi+j. (7) definitions, applying elementary symbolic manipulations on k A { | ≤ } geometric sums. Remarks. Lemma 4: For any real number g, 0 < g < 1, we have 1) Theorem2canreadilybegeneralizedtoblocksofd>2 (cid:18) g (cid:19)2 w( 2)=w( 1)2 = . In particular, if q =2−1/k, symbols.For simplicity,we presenttheproof for d=2. Tg Tg 1 g − 7 ContinuingwiththeHuffmanprocedure,eachsymbolqi−k 1 Tqk in canbemergedwithasymbolqi−k 2,furtherleading, S−1 Tqk by the definition of 2 (see Figure 2(B)), to a reduced source Tg (cid:110) ∗ = q−2k 2, q−2k+1 2, q−2k+2 2,... S Tqk Tqk Tqk (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) 1 2 3 time times times (cid:111) ..., q−k−1 2,q−k 2,...,q−3 2,q−2 2 . Tqk Tqk Tqk Tqk (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) k k−1 2 1 times times times time Fig.3. Thetreeq−2Tq2. Wenowtakeacommon“factor”q−2k 2 fromeachsymbolof Tqk ∗. By the discussion of Figures 2 and 3, this factor corre- S we have w(Tq2k)=w(Tq1k)=1. sbpyoqnkdsevtoerayctiompye tohfeGd1ep·tGh1i,ncwrietahsewsebigyh1ts. tAhfattergethtemcuoltmipmlioedn We rely on this observation in the proof of Theorem 2 factor is taken out, the source ∗ becomes the source ˆ below.Intheproof,whendefiningvirtualsymbols,wefurther S Ak of (7), to which the Huffman procedure needs to be applied overload notation and regard trees with associated weights, suchasqr d,alsoasmultisetsofsignatures,withasignature to complete the code construction. Thus, the code described s for eachTlqekaf of the tree with weight qs. in the theorem is optimal. To make the result of Theorem 2 completely explicit, it Proof of Theorem 2: We use the Gallager-Van Voorhis remains to characterize an optimal prefix code for the finite construction [2]. For s 0, define the reduced source ≥ source ˆ of (7). The following lemma presents some basic k = propertiAes of ˆk and its optimal trees. Recall the definitions s s s A W H ∪F of α-uniformity and fringe thickness from Section II. where Lemma 5: The source ˆk is 4-uniform, and it has an A Hs ={i∈Aˆ | i<s} optimParlotorfe:eItTfoollfofwrisnfgreomthi(c7k)naensdstfhTer≤ela2t.ionqk = 1 thatthe 2 (asnidgnatures in Hs occur with the same multiplicity as in Aˆ), 4mqa2xi<ma4l.rHateioncbee,twêkenisw4e-iugnhitfsoormfs.yTmhbeoclslaiinmAôknisthqe−o2kp+ti2m=al A k−1 tree holds trivially for k 2, in which case the optimal tree Fs = i(cid:91)=0{(cid:124)qsk+(cid:123)tii(cid:122)mTeqs2k(cid:125),sq(cid:124)+st+ki(cid:123)m+i(cid:122)eTis+q1k(cid:125)1, s(cid:124)s+ti(cid:123)m+i(cid:122)+es(cid:125)i1}. ftshoigernAamˆtukurletiissseiutnnAiˆfô∗kr,m⊆i..eTA.,ôkpcr≤oonvseistthinegcloafimthefolrigkht>est22,(cid:100)cko(nk4s−id1)e(cid:101)r k A The multisets (of signatures) qs+i 1 and qs+i 2 play the ˆ∗ = (cid:8)k,k,...,k,k+1,...,k+1,... roleofvirtualsymbolsinthereducTedqksources,asTdqikscussedin Ak K ∪ (cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) k−1times k−2times Subsection II-C (we omit the qualifier ‘virtual’ in the sequel). (cid:9) ..., 2k 3,2k 3, 2k 2 , It is readily verified that all the weights of symbols in Fs are (cid:124) −(cid:123)(cid:122) −(cid:125) (cid:124)(cid:123)−(cid:122)(cid:125) smallerthantheweightsofsignaturesin s.Sinceq =2−1/k, 2times 1time H bTyhuLse,mwmeaca4n, wapephlyavsetewps(qosf+tihTeq2kH)u=ffmwa(qnsp+rioTcq1ekd)u=rewto(s+sii)n. wothheerrewiKse.=The{ksu−m1}ofifthkemtwoods4ma∈lles{t2w,3e}ig,hotsroKf siisgneamtupretys F such way that the s+i+1 signatures s+i are merged with in ˆ∗ satisfies s+i+1symbolsqs+i 1,resultingins+i+1treesqs+i−k 1. Ak The remaining k symTbqokls qs+i 1 can be merged with thTeqkk w(2k 2)+w(2k 3)=q2k−2+q2k−3 =q2k−2(1+q−1) sfryommbokls1qsd+oiwTqn2kt,ore0s.uAltfintegritnhiksTstreqekqeuseqnsc+ei−ofkHTqu2kffwmhaennmierargnegress, − − = 21(1+q−1)qk−2 >w(k−2). − s istransformedinto s−k,aslongass k.Startingfrom The sum of the two largest weights in ˆ∗, on the other hand, sW=tk for some t>0,Wthe procedure event≥ually leads to 0. is either q0 if k mod4 0,1 , or A1(k1+q−1) otherwise. Formally, our reduced source Wtk, t ≥ 0, corresponds toWSt Therefore, if the Huffma∈n p{roce}dure is2applied to ˆk, every in our description of the Gallager-Van Voorhis construction in pair of consecutive elements of ˆ∗ will be mergedA, without Section II-C. Thus, the iteration leads to , as called for in Ak S0 involving a previously merged pair. The ratio of the largest theconstruction.Itisreadilyverifiedthatthissourceadmitsan to the smallest weight remaining after these mergers is at additional sequence of Huffman mergers, as described above, most 1(1+q−1)/qk−1 = q+1 < 2. Hence, the resulting 2 leading (with a slight abuse of notation) to source is quasi-uniform and has a quasi-uniform optimal tree. Therefore, completing the Huffman procedure for ˆ results k(cid:91)−1 Ak = qi−k 2,qi−k 1 . in an optimal tree of fringe thickness at most two. S−1 i=0{(cid:124) (cid:123)(cid:122)Tqk(cid:125) (cid:124) (cid:123)(cid:122)Tqk(cid:125)} To complete the explicit description of an optimal tree for k i+1 ˆ , we will rely on a characterization of trees T with f 2 times times Ak T ≤ 8 that are optimal for 4-uniform sources.5 This characterization one). The number of leaves at level M decreases by is presented next. three, and the numbers of leaves at levels M 1 and − M +1 increase by one and two, respectively. B. Optimal trees with f 2 for 4-uniform sources Consider now a distribution on N symbols, with associated T ≤ vector of probabilities (or weights) p = (p ,p ,...,p ), To proceed as directly as possible to the construction of an 1 2 N p p p . Let L denote the average code length soupbtismecatliotrneetofoArpApêkn,dwixeAde.fWerealsltatrhtebpyrocohfasraocfterreiszuinltgs ainlltthhies of1T≥σ,c2u≥nd·e·r·p≥(wNith shorteσr,ccodewords naturally assigned to larger weights), and let possible profiles for a tree T with N leaves, and f 2. Let T ≤ T be such a tree, let m = logN , and denote by n the D =L L , σ 0,1 , c <c c . (9) (cid:100) (cid:101) (cid:96) σ,c σ,c− σ,c−1 ∈{ } σ ≤ σ number of leaves at depth (cid:96) in T. It follows from these definitions, and the structure of the Lemma 6: The profile of T satisfies n = 0 for (cid:96) < m 2 (cid:96) − profile (8) (see also Remark 2 above), that for σ 0,1 and (cid:96) > m+1, and either nm−2 = 0 or nm+1 = 0 (or both, and c <c c , we have ∈ { } when fT 1). σ ≤ σ ≤ ItfollowsfromLemma6thatT isfullycharacterizedbythe D =p +p p . (10) σ,c N−2c+1 N−2c+2− 2M−N+c quadruple(n ,n ,n ,n ),witheithern =0or m−2 m−1 m m+1 m−2 A useful interpretation of (10) follows directly from the n =0.WesayT islongifn =0,andthatT isshort m+1 m−2 profile (8): for T , D is the difference between the sum ifn =0.DefiningM =m σ,whereσ =1ifT isshort, σ,c σ,c m+1 − of the two heaviest weights on level M +1 and the lightest or 0 if it is long, a tree with f 2 can be characterized T ≤ weight on level M 1. more compactly by a triple of nonnegative integers NT = Let sg(x) be defi−ned as 1,0, or 1, respectively, for nega- (nM−1,nM,nM+1). We will also refer to this triple as the tive, zero, or positive value−s of x, and consider the following (compact) profile of T, with the associated parameters N,m, sequence (recalling that c =0): andσ understoodfromthecontext.Noticethatwhenn = 0 m−2 nm+1 =0, T is the quasi-uniform tree QN, and (abusing the s=−sg(D1,c1), −sg(D1,c1−1), ..., −sg(D1,c1+1), metaphor), it is considered both long and short (i.e., it has sg(D ), sg(D ), ..., sg(D ). (11) representations with both σ =0 and σ =1). 0,1 0,2 0,c0 Lemma 7: Let T be a tree with f 2. For σ 0,1 Lemma 8: The sequence s is non-decreasing. T ≤ ∈ { } and M =m σ, define The definition of the sequence s induces a total ordering of − (cid:22)2N 2M(cid:23) the pairs (σ,c) (and, hence, also of the trees Tσ,c), with pairs c =(N 2M)σ and c = − . withσ =1orderedbydecreasingvalueofc,followedbypairs σ − σ 3 with σ = 0 in increasing order of c. The two subsequences Then, T is equivalent to one of the trees T defined by the “meet” at c , which defines the same tree regardless of the σ,c σ profiles value of σ (in the pairs ordering, we take (1,c ) as identical 1 to(0,c )=(0,0)).Wedenotethistotalorderby .Recalling N =(n , n , n ) 0 (cid:22) Tσ,c (cid:16) M−1 M M+1 (cid:17) that the quantities Dσ,c are differences in average code length = 2M N+c, 2N 2M 3c, 2c , between consecutive codes in this ordering, Lemma 8 tells − − − us that, as we scan the codes in order, we will generally σ 0,1 , c c c . (8) ∈{ } σ ≤ ≤ σ see the average code length decrease monotonically, reach a minimum, and then (possibly after staying at the minimum Remarks. for some number of trees) increase monotonically. In the 1) Equation (8) characterizes all trees with N leaves and followingtheorem,weformalizethisobservation,andidentify f 2 in terms of the parameters σ and c. The T ≤ the trees T that are optimal for p. parameter c has different ranges depending on σ: we σ,c have N 2m−1 c 2N−2m−1 when σ = 1, Theorem 3: Let p be a 4-uniform distribution such that p and 0 −c 2N≤−2m ≤wh(cid:98)en σ3= 0(cid:99). The use of the has an optimal tree T with fT ≤2. Define pairs (σ∗,c∗) and ≤ ≤ (cid:98) 3 (cid:99) (σ∗,c∗) as follows: parametrized quantities M,c , and c will allow us to σ σ treat the two ranges in a unified way in most cases. (σ ,c ) = (1,c ) if D 0, ∗ ∗ 1 1,c1 ≥ Also, notice that T1,c and T0,c represent the same (σ∗,c∗) = (0,c ) if D 0; tree,corresponding,res1pectively,to0interpretationsofthe 0 0,c0 ≤ 2) Tquhaesip-aurnaimfoertmertcrereepQreNseanstssthhoertnuomr lboenrgo.finternal(non- (o−th1e)rw(σ−is)es,g(Difσ−D,c1−,c)1is t<he la0s,t nleegtati(vσe−e,nct−r)y inbes, asuncdhdetfihnaet leaf) nodes at level M of T. An increase of c by one (σ , c ) = (σ , c σ ); ∗ ∗ correspondstomovingapairofsiblingleavespreviously − −− − rootedatlevelM−1toanewparentatlevelM (thereby if D0,c0 >0, let (σ+,c+) be such that (−1)(σ+)sg(Dσ+,c+) is increasing the number of internal nodes at that level by the first positive entry in s, and define (σ∗, c∗) = (σ , c 1+ σ ). 5Noticethatnotevery4-uniformsourceadmitsanoptimaltreewithfT ≤2 + +− + (tfraTelteh>ofou2rg.hthtehe4-ounneifsoromf isnoteurrecsetwiniththipsrosbeacbtiiolintiedso)1.10F(o4r,3e,x1am,1p,l1e,)amnuosptthimavael Tophteinm,alalflortrepe.s Tσ,c with (σ∗,c∗) (cid:22) (σ,c) (cid:22) (σ∗,c∗) are 9 TABLEI FINDINGOPTIMALTREESTσ,cFORN =19,p= 419(4,4,3,3,3,3,3,3,3,3,3,2,2,2,2,2,2,1,1)(OPTIMALTREEPARAMETERSEMPHASIZEDINBOLDFACE). (1,3)= (σ,c) (1,7) (1,6) (1,5) (1,4) (0,0) (0,1) (0,2) (n ,n ,n ) (4,1,14) (3,4,12) (2,7,10) (1,10,8) (13,6,0) (14,3,2) (15,0,4) M 1 M M+1 − 49·L 214 211 208 206 206 206 208 σ,c 49·Dσ,c 3 3 2 0 0 2 s -1 -1 -1 0 0 1 (σ ,c ) (σ ,c ) (σ ,c ) (σ ,c ) ∗ ∗ + + − − ∗ ∗ Notice that, by Lemma 8, the range (σ ,c ) (σ,c) so 2m k2 <k2 k(k 1)/2. If c +1>c , then all trees ∗ ∗ (cid:22) (cid:22) − − − 1 1 (σ∗,c∗)iswelldefinedandneverempty,consistentlywiththe T in (8) are long. Otherwise, D is well defined, and σ,c 1,c +1 1 assumptions of the theorem and with Lemma 7. The example we have in Table I lists all the trees T with f 2 for N =19, as σ,c T ≤ D = D characterized in Lemma 7, and shows how Theorem 3 is used − 1,c1+1 − 1,k2−2m−1+1 to find optimal trees for a given 4-uniform distribution on 19 = p1 (p2m−k2−1+p2m−2k) − symbols. p 2p =p 2qk−1 =1 q−1 <0, 1 k2−k(k−1)/2 1 ≤ − − − (15) C. The top code wherethefirstandsecondequalitiesfollowfromthedefinition By Lemma 5, Theorem 3 applies to the source ˆk defined of c and from (10), the first inequality from the ordering of A 1 in (7). We will apply the theorem to identify parameters the weights and from (14), the third equality from Lemma 9, (σkF,ocrk)thtehartemyiaeilnddaenr oofptitmhealsetrceteioTn,σkw,cek ftaokreAˆNk. = k2, and wanedctohneclluasdteetqhuaatliotpytifmroaml trteheesrfeolartioˆn qakre=l12o.ngByinLtehmismcaas8e,. k let p=(p1,p2,...,pk2) denote the vector of (unnormalized) Similarly, when M(cid:48) =m 1, we haAve symbolweightsin ˆ ,innon-increasingorder.Thus,wehave − k p = (q0,q1,q1,..A.,qj,qj,...,qj,...,q2k−3,q2k−3,q2k−2). 2m 2Q 2k2 k(k 1)/2 2, (16) ≥ ≥ − − − Here, qj is repeated j + 1 times for 0 j k 1, and ≤ ≤ − so 2m k2 +1 k2 k(k 1)/2 1, and p 2wkhi−ch1fo−lljowtismimesmfeodriakte≤lyjfr≤om2tkh−is2s.trTuhcetufroe,lleoswtainbglislheemsmthae, pk2−k(k−−1)/2−1 =≥qk =−21. If c−0 =c0 =−0, then al2lmt−reke2s+T1σ≤,c in (8) are short. Otherwise, similarly to (15), we have relation between indices and weights in p. Lemma 9: For 0 i < k(k +1)/2, we have p = qj, 1 q−2 1 ≤ i+1 D =p +p p >2q2k−2 = >0, where j is the unique integer in the range 0 j k 1 0,1 k2−1 k2− 2m−k2+1 −2 2 −2 ≤ ≤ − satisfying which implies that optimal trees are short in this case. j(j+1) It follows from Lemma 10 that we can take m M(cid:48) as the i= +r for some r, 0 r j. (12) − 2 ≤ ≤ parameterσforalltreesTσ,cthatareoptimalforp.Noticethat F1oqrk−02−≤j(cid:48),i(cid:48)w<herke(kj(cid:48)+is1t)h/e2,uwniequheavinetepgke2r−iin(cid:48) =theq2rakn−g2e−j0(cid:48) = sMlig(cid:48)histlyansatrloicgtoeru,sintoththaet,pinarcaamseetserwMherdeeafinqeudasiin-uLneifmormmat7re,ebuist j2(cid:48) k 1 satisfying ≤ optimal,m−M(cid:48) willassumeadefinitevaluein{0,1}(which ≤ − will vary with k), while, in principle, a representation with i(cid:48) = j(cid:48)(j(cid:48)+1) +r(cid:48) for some r(cid:48), 0 r(cid:48) j(cid:48). (13) eithervalueofσisavailable.Thisveryslightlossofgenerality 2 ≤ ≤ isofnoconsequencetoourderivations,and,inthesequel,we Wedefinesomeauxiliaryquantitiesthatwillbeusefulinthe will identify M with M(cid:48), i.e., we will take M = logQ . It sequel.Letm= logk2 ,Q=k2 k(k 1)/4 ,andM(cid:48) = also follows from Lemma 10 that when applying T(cid:100)heore(cid:101)m 3 (cid:100) (cid:101) −(cid:100) − (cid:101) (cid:100)log2Q(cid:101), with dependence on k understood from the context. tofindoptimaltreesforp,weonlyneedtofocusononeofthe We assume that k > 2, since the optimal codes for k = 1 two segments (corresponding to σ=0 or σ=1) that comprise and k = 2 have already been described in Subsection IV-A. the sequence s in (11), the choice being determined by the It is readily verified that we must have either M(cid:48) = m or value of k. This will simplify the application of the theorem. M(cid:48) =m 1.Thenextlemmashowsthattherelationbetween Lemmas 9 and 10, together with Theorem 3, suggest a − M(cid:48)andmdeterminestheparameterσoftheoptimaltreesTσ,c clear way, at least in principle, for finding an optimal tree forLAeˆmkm. a 10: IfM(cid:48) =m,thentreesTσ,c thatareoptimalfor Tσσ=,c mfor AˆMk. T(rheecapllainragmtehtaetrmσ aisnddeMtermarieneddetiemrmmiendeidatbelyyka)s. Aˆk aPreroloofn:gA(sσsu=m0e);Mo(cid:48)th=erwmi.seT,htehne,ywaerecsahnowrtr(iσte=1). Nthoawt ,asrec−cailnlcinregasthese, ethxeprwesesiigohntsfopr Dσ,c inan(1d0p), we obsearlvsoe k2−2c+1 k2−2c+2 2m =2M(cid:48) <21+logQ =2Q increase, while p2M−k2+c, which gets subtracted, decreases. Thus,since,byTheorem3,anoptimalvalueofcoccurswhen =2k2 2 k(k 1)/4 2k2 k(k 1)/2, (14) D changes sign, we need to search for the value of c for − (cid:100) − (cid:101)≤ − − σ,c 10 TABLEII − OPTIMALCODEPARAMETERSANDPROFILESFORAˆk, 3≤k≤10. Vk Vk (cid:54) (cid:0)(cid:64) k2 M2 0j 0r σk0 ck0 (nM−(10,n,4M,0,n)M+1) k(cid:63)(cid:0)(cid:0) U(cid:114)k (cid:64)(cid:64) q0 k−(cid:63)(cid:54)1 (cid:0)(cid:0)Uk(cid:64)(cid:114)−1(cid:64) q0 34 34 01 00 10 11 ((10,,173,,22)) k(cid:63)(cid:54)(cid:0)(cid:0)(cid:0)U(cid:114)(cid:64)(cid:114)k(cid:124)(cid:114)(cid:64)(cid:114)(cid:64)2k(cid:123)−(cid:122)q11(cid:114)(cid:114)(cid:125)(cid:114) k(cid:63)(cid:54)(cid:0)(cid:0)(cid:0)U(cid:114)(cid:64)(cid:114)k(cid:114)(cid:124)(cid:64)(cid:64)2(cid:123)k(cid:122)−q(cid:114)11−(cid:125)(cid:114)1 5 5 3 1 0 0 (7,18,0) (cid:54) (cid:0)(cid:64)(cid:124) (cid:123)(cid:122) (cid:125) (cid:54) (cid:0)(cid:64)(cid:124) (cid:123)(cid:122) (cid:125) 67 56 15 00 10 50 ((11,52,534,1,00)) k(cid:63)(cid:0)(cid:0)(cid:124) U(cid:114)(cid:114)k(cid:123)(cid:122)(cid:114)(cid:64)(cid:114)(cid:64)2(cid:125)k−q12(cid:114)(cid:114)(cid:114) k(cid:63)(cid:0)(cid:0)(cid:124) U(cid:114)(cid:114)k(cid:123)(cid:122)(cid:114)(cid:64)(cid:114)(cid:64)2(cid:125)k−q12(cid:114)(cid:114)(cid:114) 8 6 2 2 0 5 (5,49,10) (cid:114)(cid:114)(cid:114) 2k 1(cid:114)(cid:114)(cid:114) (cid:114)(cid:114)(cid:114) 2k 1(cid:114)(cid:114)(cid:114) − − 9 6 0 0 1 17 (0,47,34) 10 7 7 1 0 1 (29,69,2) Fig.4. TreesVk andVk−. which the increasing sum of the first two terms “crosses” the profile of the optimal tree T defined by the theorem, for value of the decreasing third term. This can be done, at least σk,ck 3 k 10. roughly, by using explicit weight values from Lemma 9 with ≤ ≤ i(cid:48) 2c 1, 2c 2 and i = 2m k2 +c, and solving a The tools derived in the proof of Theorem 4 also yield ∈ { − − } − the following result, a proof of which is also presented in quadratic equation, say, for the parameter j (the parameter j(cid:48) will be tied to j by the constraint D 0). A finer Appendix B. σ,c ≈ Corollary 1: Let k > 2 and q = 2−1/k. Then, G G is adjustment of the solution is achieved with the parameters r k · k andr(cid:48),observingthatachangeofsignofD canonlyoccur not optimal for TDGD(q). σ,c near locations where the weights in p change (i.e., “jumps” in either j or j(cid:48)), which occur at intervals of length up to D. Average code length k. At the “jump” locations, either r or r(cid:48) must be close to The following corollary gives explicit formulas for the av- zero. While there is no conceptual difficulty in these steps, eragecodelengthofthecodesC characterizedinTheorem2 k theactualcomputationsaresomewhatinvolved,duetovarious and Theorem 4. The proof is deferred to Appendix C. integer constraints and border cases. Theorem 4 below takes Corollary 2: Let M, ∆(x), j, and r be as defined in these complexities into account and characterizes, explicitly Theorem 4. Then, the average code length (C ) for the q k in terms of k, the parameter pair (σ ,c ) of an optimal code L k k code C under TDGD(q), for arbitrary q, is given by T for ˆ . k σTk,hcekoremA4:k Let q =2−1/k, Q=k2 k(k 1)/4 , m= qjV(q) logk2 , and M = logQ . Define the−fu(cid:100)nction− (cid:101) Lq(Ck)=M +1+ (1 qk)2 , (20) (cid:100) (cid:101) (cid:100) (cid:101) − (k x 2)(k x 1) where ∆(x)=2k2 2M+1+x(x+1) − − − − . − − 2 (17) V(q)=1 qk+1+(1 q)(cid:16)qk+1(k j 1)+j(cid:17) − − − − Let x0 denote the largest real root of ∆(x), and let ξ =(cid:98)x0(cid:99). +(1 q)2(cid:16)qk(cid:0)2r+∆(j)(cid:1) r(cid:17). Set − − (cid:16) (cid:106) (cid:107)(cid:17)  ξ, −∆(2j)+1 , if ∆(ξ)≤2ξ, When q =2−1/k, we have (j,r)= (18) (C )=M +1+2qjV∗(q), (21) (cid:0)ξ+1, 0(cid:1), otherwise. Lq k with Then, the tree T , as defined by the profile (8) with σ = σ =m M anσdk,ck V∗(q)=1+(1 q)(cid:0)qk+(2 q)j(cid:1)+(1 q)2(1+∆(j)) . k − − − − j(j+1) c=ck =k2−2M + 2 +r, (19) V. OPTIMALCODESFORTDGDSWITHq =2−k is optimal for ˆ . Furthermore, c is the smallest value of c A. The codes k k for any optimaAl tree T for ˆ . Assume q = 2−k for some integer k > 1. We reuse the σk,c Ak The proof of Theorem 4 is presented in Appendix B. In notation m =Q2m for a uniform tree of depth m, assuming, U the theorem (and its proof), we have chosen to identify the additionally, that its 2m leaves have weight one. The infinite optimal tree T with the smallest possible value of c. It tree(andassociatedmultisetofleafweights) isrecursively σk,c Vk canreadilybeverifiedthatthischoiceminimizesthevariance defined as follows. Start from , and attach to its leftmost k U of the code length among all optimal trees T . With only leaf a copy of q . Thus, has 2k 1 leaves of weight qs at σk,c Vk Vk − minor changes in the construction and proof, one could also depth (s+1)k for all s 0, and no other leaves. The related identify the largest value of c for an optimal tree, and, thus, tree − is defined by sta≥rting from , and attaching to its the full range of values of c yielding optimal trees T . For leftmVokst leaf a copy of q . Thus, U−k−h1as 2k−1 1 leaves of σk,c Vk Vk − conciseness, we have omitted this extension of the proof. weight q0 at depth k 1, and 2k 1 leaves of weight qs at Examples of the application of Theorem 4 are presented in depth (s+1)k 1 fo−r all s > 0.−The trees and − are − Vk Vk Table II, which lists the parameters M, j, r, σ , c , and the illustrated in Figure 4. k k

Optimal prefix codes for pairs of geometrically-distributed random variables PDF

4.5 MB·English

by Frédérique Bassino

#additional_collections #journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Optimal prefix codes for pairs of geometrically-distributed random variables

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.