Variable-Length Lossy Compression Allowing Positive Overflow and Excess Distortion Probabilities Shota Saito Hideki Yagi Toshiyasu Matsushima Waseda University The University of Electro-Communications Waseda University E-mail: [email protected] E-mail: h,[email protected] E-mail: [email protected] 7 1 Abstract—This paper investigates the problem of variable- entropy-basedquantity1.Inthenon-asymptoticregime,coding 0 length lossy source coding. We deal with the case where both theorems are shown for both stochastic and deterministic 2 the excess distortion probability and the overflow probability of encoders. To show the achievability results, we give an ex- codeword length are less than or equal to positive constants. n plicit code construction instead of using the random coding The infimum of the thresholds on the overflow probability is a characterized by a smooth max entropy-based quantity. Both argument. It turns out that the constructed code satisfies the J non-asymptotic and asymptotic cases are analyzed. To show propertiesoftheoptimalcodeshownin[4].Further,usingthe 7 the achievability results, we do not utilize the random coding results obtained in the non-asymptotic regime, we establish argument but give an explicit code construction. asymptotic coding theorem for general sources. ] T I I. INTRODUCTION II. ONE-SHOTCODING THEOREM . s The problem of variable-length lossy source coding is A. Problem Formulation c one of the fundamental research topics in Shannon theory. [ Let X be a source alphabet and Y be a reproduction For this problem, several studies have adopted the excess alphabet, where both are finite sets. Let X be a random 1 distortion probability as a distortion criterion (e.g., [4], [5], v variabletaking a value in X and x be a realization of X. The [13], [12]). The excess distortion probability is defined as the 0 probability distribution of X is denoted as PX. A distortion probability that the distortion between a source sequence and 0 measure d is defined as d:X ×Y →[0,+∞). 8 its reproduction is greater than a certain threshold. The pair of an encoder and a decoder (f,g) is defined 1 For the problem of variable-length lossy source coding as follows. An encoder f is defined as f : X → {0,1}⋆, 0 . under the excess distortion probability, there are mainly two where {0,1}⋆ denotes the set of all binary strings and the 1 criteria on codeword length: the mean codeword length and empty string λ, i.e., {0,1}⋆ = {λ,0,1,00,...}. An encoder 0 the overflow probability. Kostina et al. [4] have considered f is possibly stochastic and produces a non-prefix code. For 7 the mean codeword length and shown the non-asymptotic 1 x ∈ X, the codeword length of f(x) is denoted as ℓ(f(x)). : characterization on the optimal mean codeword length. They A deterministic decoder g is defined as g : {0,1}⋆ → Y. v also have performed the asymptotic analysis on the optimal Variable-length source coding without the prefix condition is i X mean codeword length for i.i.d. sources. On the other hand, discussed as in, for example, [3] and [9]. Yagi and Nomura [13] and Nomura and Yagi [5] have con- r The performance criteria considered in this paper are the a sidered the overflow probability. In [13], they have treated excess distortion and the overflow probabilities. the case where either the overflow probability or the excess Definition1:GivenD ≥0,theexcessdistortionprobability distortionprobabilityislessthanorequaltoapositiveconstant for a code (f,g) is defined as Pr{d(X,g(f(X)))>D}. asymptoticallyforgeneralsources.In[5],theyhavedealtwith Definition 2: Given R ≥ 0, the overflow probability for a the case where the probability of union of events that the code (f,g) is defined as Pr{ℓ(f(X))>R}. overflowoccursandtheexcessdistortionoccursislessthanor Using these criteria, we define a (D,R,ǫ,δ) code. equaltoapositiveconstantasymptoticallyforgeneralsources. Definition3:GivenD,R≥0andǫ,δ ∈[0,1),acode(f,g) This paper considers the excess distortion probability and satisfying the overflow probability as in [5] and [13]. However, the primary differences are 1) we address the case where both Pr{d(X,g(f(X)))>D}≤ǫ, (1) the excess distortion probability and the overflow probability Pr{ℓ(f(X))>R}≤δ (2) may be positive and 2) we analyze both non-asymptotic and asymptotic cases. is called a (D,R,ǫ,δ) code. The contribution of this paper is the non-asymptotic (one- 1 The smooth max entropy has first introduced by Renner and Wolf [6]. shot)andasymptoticcharacterizationsontheminimumthresh- Recently, several studies have characterized the optimal rate by the smooth old of the overflow probability by using a new smooth max maxentropy (e.g.,[7],[8],[10],[11]). The fundamental limits are the minimum thresholds Definition 8: Given D ≥ 0 and ǫ,δ ∈ [0,1), Gδ (X) is D,ǫ R∗(D,ǫ,δ) and R˜(D,ǫ,δ) for given D, ǫ, and δ. defined as Definition 4: Given D ≥0 and ǫ,δ ∈[0,1), we define Gδ (X):= min Hδ(Y) (4) D,ǫ R∗(D,ǫ,δ):=inf{R:∃ a (D,R,ǫ,δ) code}, PY|X: Pr{d(X,Y)>D}≤ǫ R˜(D,ǫ,δ):=inf{R:∃ a deterministic (D,R,ǫ,δ) code}. = min min log|B|, PY|X: B⊂Y: Remark 1: Consider the special case δ = 0. From a Pr{d(X,Y)>D}≤ǫ Pr{Y∈B}≥1−δ (D,R,ǫ,0)code,wecanconstructafixed-lengthcodeachiev- where P denotes a conditional probability distribution of Y|X ing rate R and the excess distortion probability ≤ ǫ. Thus, Y given X. R∗(D,ǫ,0) or R˜(D,ǫ,0) represents the minimum rate in Remark 2: For a given D ≥0 and ǫ∈[0,1), suppose that fixed-length lossy source coding under the excess distortion criterion. Pr inf d(X,y)>D >ǫ. (5) y∈Y (cid:26) (cid:27) B. Smooth Max Entropy-Based Quantity Then, there are no codes whose excess distortion probability The smooth max entropy, which is also called the smooth is less than or equal to ǫ. Conversely, if such codes do not Re´nyi entropy of order zero, has first introduced by Renner exist for given D and ǫ, (5) holds. In this case, we define andWolf[6].Later,Uyematsu[10]hasshownthatthesmooth R∗(D,ǫ,δ) = +∞ and R˜(D,ǫ,δ) = +∞. Further, if (5) max entropy can be defined in the following form. holds, we also define Gδ (X) = +∞ because there is D,ǫ Definition 5 ([6], [10]): Given δ ∈ [0,1), the smooth max no conditional probability distribution P on Y satisfying Y|X entropy Hδ(X) is defined as2 Pr{d(X,Y)>D}≤ǫ. Hδ(X):= min log|A|, (3) C. One-Shot Coding Theorem for Stochastic Codes A⊂X: Pr{X∈A}≥1−δ The next lemma shows the achievability result on R of a where |·| represents the cardinality of the set. (D,R,ǫ,δ) code. One of the useful properties of the smooth max entropy, Lemma 1: Assume that Gδ (X) < +∞. For any D ≥ 0 D,ǫ which is used in the proof of the achievability result in our and ǫ,δ ∈[0,1), there exists a (D,R,ǫ,δ) code such that main theorem, is Schur concavity. To state the definition R=⌊Gδ (X)⌋. (6) of a Schur concave function, we first review the notion of D,ǫ majorization. Proof: See Section IV-A. Definition6:LetR bethesetofnon-negativerealnumbers + Remark 3: To prove the achievability result, we do not and Rm be the m-th Cartesian product of R , where m is a + + use the random coding argument but give an explicit code positive integer. Suppose that x = (x1,..., xm) ∈ Rm+ and construction. The constructed code satisfies the properties4 of y=(y ,...,y )∈Rm satisfy 1 m + the optimal code shown in [4]. The next lemma shows the converse bound on R of a x ≥x , y ≥y (i=1,2,...,m−1). i i+1 i i+1 (D,R,ǫ,δ) code. If x∈Rm and y∈Rm satisfy, for k=1,...,m−1, Lemma 2: For any D ≥0 and ǫ,δ ∈[0,1), any (D,R,ǫ,δ) + + code satisfies k k m m xi ≤ yi and xi = yi, R>Gδ (X)−1. (7) D,ǫ i=1 i=1 i=1 i=1 X X X X Proof: See Section IV-B. then we say that y majorizes x (it is denoted as x≺y). Combining Lemmas 1 and 2, we can immediately obtain Schur concave functions are defined as follows. Definition 7: We say that a function h(·) : Rm → R is a the following result on R∗(D,ǫ,δ). Schur concave function if h(y) ≤ h(x) for any+x,y ∈ Rm Theorem 1: For any D ≥0 and ǫ,δ ∈[0,1), it holds that + satisfying x≺y. Gδ (X)−1<R∗(D,ǫ,δ)≤⌊Gδ (X)⌋. (8) D,ǫ D,ǫ From the definition of the smooth max entropy and Schur concave functions, it is easy to see that the smooth max By Theorem 1, the minimum threshold R∗(D,ǫ,δ) can entropy is a Schur concave function3. be specified within one bit in the interval not greater than Next, using the smooth max entropy, we introduce a new Gδ (X), regardless of the values D, ǫ, and δ. This result is D,ǫ quantity,whichplaysanimportantroleinproducingourmain mainly due to an explicit construction of good codes, rather results. than the random coding argument, given in Section IV-A. 2Alllogarithms areofbase2throughout thispaper. 4Undertheconstraintoftheexcessdistortionprobability,Kostinaetal.[4] 3 In [2], byusing the notion ofmajorization, it is shown that the smooth havediscussedtheoptimalvariable-lengthcodewhichachievestheminimum Re´nyi entropy oforder αisaSchurconcave function for0≤α<1anda mean codeword length. In [4], several properties of the optimal code are Schurconvex functionforα>1. pointedout. D. One-Shot Coding Theorem for Deterministic Codes Proof: See Section IV-D. The next lemma shows the achievability result on R of a As shown in Theorem 3, we characterize the minimum deterministic (D,R,ǫ,δ) code. threshold on the overflow probability by the quantity related Lemma 3: Assume that Gδ (X) < +∞. For any D ≥ 0 to the entropy, whereas previous studies such as [5] and [13] D,ǫ and ǫ,δ ∈[0,1), there exists a deterministic (D,R,ǫ,δ) code have characterized it by the quantity related to the mutual such that information. Remark 4: In the non-asymptotic (one-shot) regime, the 2loge R= Gδ (X)+ . (9) results on the minimum threshold are different for stochastic (cid:22) D,ǫ 2GδD,ǫ(X)(cid:23) encoders and deterministic encoders as shown in Theorems Proof: See Section IV-C. 1 and 2. In the asymptotic regime, however, the restriction From Lemma 3 and the fact that R∗(D,ǫ,δ)≤R˜(D,ǫ,δ), to only deterministic encoders does not change the minimum the following result on R˜(D,ǫ,δ) is obtained. threshold. Theorem 2: For any D ≥0 and ǫ,δ ∈[0,1), it holds that Remark 5: Instead of (11) and (12), as a generalization of the problem in [13], we can consider the conditions 2loge Gδ (X)−1<R˜(D,ǫ,δ)≤ Gδ (X)+ . (10) D,ǫ (cid:22) D,ǫ 2GδD,ǫ(X)(cid:23) limsupPr{dn(Xn,gn(fn(Xn)))>nD}≤ǫ, (13) n→∞ By Theorem 2, R˜(D,ǫ,δ) can be specified in the interval limsupPr{ℓ(f (Xn))>nR}≤δ (14) n within four bits, which is slightly weaker than the result for n→∞ stochastic codes. and define Rˆ(D,ǫ,δ|X) as the minimum threshold under the III. ASYMPTOTICCODING THEOREM conditions (13) and (14). Then, by almost the same proof of Theorem 3, we have A. Problem Formulation Let Xn and Yn be the n-th Cartesian productof X and Y, Rˆ(D,ǫ,δ|X)=limlimsup 1Gδ+τ (Xn) respectively. Let Xn be a random variable taking a value in τ↓0 n→∞ n D,ǫ+τ Xn andxn bearealizationofXn.Theprobabilitydistribution for any D ≥0 and ǫ,δ ∈[0,1). of Xn is denoted as PXn. In this section, coding problem for general sources X = {Xn}∞n=1 [1] is considered. A IV. PROOFS OF MAIN RESULTS distortionmeasured isdefinedasd :Xn×Yn →[0,+∞). n n A. Proof of Lemma 1 An encoder f : Xn → {0,1}⋆ is possibly stochastic and n produces a non-prefix code. A decoder g : {0,1}⋆ → Yn is First, some notations are defined before the construction of n deterministic. the encoder and decoder is described. We define an (n,D,R,ǫ,δ) code as follows. • For y ∈Y and D ≥0, BD(y) is defined as Definition 9: Given D,R ≥ 0 and ǫ,δ ∈ [0,1), a code (fn,gn) satisfying BD(y):={x∈X :d(x,y)≤D}. (15) Pr{dn(Xn,gn(fn(Xn)))>nD}≤ǫ, (11) • We define yi (i=1,2,···) by the following procedure5. Pr{ℓ(fn(Xn))>nR}≤δ (12) Let y1 be defined as for all n ≥ n0 with some n0 > 0 is called an (n,D,R,ǫ,δ) y1 :=arg maxPr{X ∈BD(y)}, (16) code. y∈Y The fundamentallimitis the followingminimumthreshold. and for i=2,3,···, let y be defined as i i−1 Definition 10: Given D ≥0 and ǫ,δ ∈[0,1), y :=arg maxPr X ∈B (y)\ B (y ) . (17) i D D j R(D,ǫ,δ|X):=inf{R:∃ an (n,D,R,ǫ,δ) code}. y∈Y j[=1 B. Coding Theorem: General Formula • For i = 1,2,...,AD(yi) := BD(yi) \ ij−=11BD(yj). ThenexttheoremcharacterizesR(D,ǫ,δ|X)bythesmooth From the definition, we have S max entropy-based quantity Gδ (Xn). D,ǫ i i Theorem 3: For any D ≥0 and ǫ,δ ∈[0,1), A (y )= B (y ) (i≥1), (18) D j D j 1 j=1 j=1 R(D,ǫ,δ|X)=limsup Gδ (Xn), [ [ n→∞ n D,ǫ AD(yi)∩AD(yj)=∅ (∀i6=j), (19) where Gδ (Xn) is defined as Pr{X ∈AD(y1)}≥Pr{X ∈AD(y2)}≥··· . (20) D,ǫ GδD,ǫ(Xn):= min Hδ(Yn). 5 Inthispaper,weassumethatX andY arefinitesets.However, wecan PYn|Xn: assumecountablyinfiniteX andY ifthisoperationisadmittedforcountably Pr{dn(Xn,Yn)>nD}≤ǫ infiniteX andY. • If ǫ+δ <1, let i∗ ≥1 be the integer satisfying Next, we evaluate the overflow probability. From the con- struction of the encoder, it is easily verified that ℓ(w ) = i i∗−1 ⌊logi⌋ (i=1,...,k∗). Hence, setting R=⌊logi∗⌋, we have Pr{X ∈A (y )}<1−ǫ−δ, (21) D i k∗ i=1 X Pr{ℓ(f(X))>R}≤ Pr{f(X)=w } i∗ i Pr{X ∈AD(yi)}≥1−ǫ−δ. (22) i=Xi∗+1 k∗−1 i=1 X = Pr{X ∈A (y )} D i If ǫ+δ ≥1, we define i∗ =1. i=Xi∗+1 • Let k∗ ≥1 be the integer satisfying +Pr{f(X)=wk∗,X ∈AD(yk∗)} k∗−1 i∗ k∗−1 = Pr{X ∈AD(yi)}− Pr{X ∈AD(yi)}+β Pr{X ∈AD(yi)}<1−ǫ, (23) Xi=1 Xi=1 i=1 ≤α−(1−ǫ−δ)+β =δ, X k∗ Pr{X ∈A (y )}≥1−ǫ. (24) wherethelastinequalityisduetothe definitionofα and(22) D i and the last equality is due to the definition of β. i=1 X Therefore, the code (f,g) is a (D,R,ǫ,δ) code with R = From this definition, it holds that k∗ ≥i∗. ⌊logi∗⌋.Tocompletetheproofofthetheorem,we shallshow • Letαandβ bedefinedasα:= ik=∗−11Pr{X ∈AD(yi)} logi∗ =GδD,ǫ(X). We define Y :=g(f(X)). Notice that and β :=1−ǫ−α. • Letwibethei-thbinarystringinP{0,1}⋆intheincreasing PY(y1)(=a)Pr{X ∈AD(y1)}+Pr{X ∈ AD(yi)} order of the length and ties are arbitrarily broken. For i≥k∗+1 [ example, w = λ,w = 0,w = 1,w = 00,w = 01, 1 2 3 4 5 +Pr{f(X)=w ,X ∈A (y )} 1 D k∗ etc. (b) = Pr{X ∈A (y )}+Pr{d(X,g(f(X)))>D} We construct the following encoder f : X → {0,1}⋆ and D 1 decoder g :{0,1}⋆ →Y. (=c)Pr{X ∈A (y )}+ǫ, (27) D 1 [Encoder] P (y )=Pr{X ∈A (y )} (i=2,...,k∗−1), (28) Y i D i 1) For x∈A (y ) (i=1,...,k∗−1), set f(x)=w . D i i where (a) and (b) follow from the definition of the encoder 2) For x∈A (y ), set6 D k∗ and decoder and (c) is due to (26). Then7, w with prob. β , i∗−1 i∗−1 f(x)= k∗ Pr{X∈AD(yk∗)} (25) P (y )= Pr{X ∈A (y )}+ǫ<1−δ, (w1 with prob. 1− Pr{X∈AβD(yk∗)}. i=1 Y i i=1 D i X X i∗ i∗ 3) For x∈/ ki=∗1AD(yi), set f(x)=w1. PY(yi)= Pr{X ∈AD(yi)}+ǫ≥1−δ, (29) i=1 i=1 [Decoder] SeSt g(wi)=yi (i=1,...,k∗). PX(y )≥P (Xy )≥···≥P (y ), Y 1 Y 2 Y k∗ Now,weevaluatetheexcessdistortionprobability.Wehave which implies that logi∗ =Hδ(Y). Hence, if d(x,g(f(x))) ≤D for x∈A (y ) (i=1,...,k∗−1) since D i g(f(x))=y . Furthermore, we have d(x,g(f(x)))≤D with Hδ(Y)=Gδ (X) (30) i D,ǫ probability β/Pr{X ∈A (y )} for x∈A (y ). Thus, D k∗ D k∗ is shown, the desired equation logi∗ =Gδ (X) is obtained. D,ǫ To show (30), the following lemma is useful. k∗−1 Pr{d(X,g(f(X)))≤D}= Pr{X ∈AD(yi)} Lemma 4: If PY∗ which is induced by PY∗|X satisfying Pr{d(X,Y∗) > D} ≤ ǫ majorizes any P which is induced i=1 Y˜ +Pr{f(X)=wk∗,X ∈AD(Xyk∗)}=α+β =1−ǫ. by PY˜|X satisfying Pr{d(X,Y˜)>D}≤ǫ, then it holds that Hδ(Y∗)=Gδ (X). D,ǫ Proof: The lemma follows from the fact that the smooth Therefore, we have maxentropyis a Schurconcavefunctionand the definitionof Gδ (X). Pr{d(X,g(f(X)))>D}=ǫ. (26) D,ǫ 7Ifi∗=k∗,theequalityin(29)doesnothold.However,Pii∗=1PY(yi)≥ 6 NotethatwehavePr{X ∈AD(yk∗)}≥β from(24). 1−δ istruesincePii=∗1PY(yi)=1. In view of Lemma 4, we shall show that P majorizes any wherethelastinequalityisdueto(31).Thisisacontradiction Y P induced by P satisfying Pr{d(X,Y˜) > D} ≤ ǫ. To to the fact that Pr{d(X,Y˜)>D}≤ǫ. Y˜ Y˜|X show this fact, suppose the following condition: (♠) There exists a P satisfying Pr{d(X,Y˜) > D} ≤ ǫ B. Proof of Lemma 2 Y˜ but not being majorized by P . For any (D,R,ǫ,δ) code (f,g), set Y := g(f(X)). The Y Assuming (♠), we shall show a contradiction. definition of a (D,R,ǫ,δ) code gives Lety givethelargestP (y)inY,y givethelargest π(1) Y˜ π(2) Pr{ℓ(f(X))>R}≤δ, Pr{d(X,Y)>D}≤ǫ. (34) P (y) in Y \{y }, y give the largest P (y) in Y \ Y˜ π(1) π(3) Y˜ {y ,y }, etc. That is, P (y ) ≥ P (y ) ≥ ··· ≥ Let T := {g(f(x))∈Y :x satisfies ℓ(f(x))>R}. Then, π(1) π(2) Y˜ π(1) Y˜ π(2) P (y ) and P (y ) ≥ P (y ) for all i = k∗ + the first inequality in (34) is rewritten as Pr{Y ∈T} ≤ δ. Y˜ π(k∗) Y˜ π(k∗) Y˜ π(i) 1,k∗+2,.... Considering the fact that the support of P is Hence, Y {1,2,...,k∗} and the assumption (♠), we can say that there Pr{Y ∈Tc}≥1−δ, (35) exists a 1≤j ≤k∗−1 satisfying 0 where the superscript “c” represents the complement. From j0 (35) and the definition of the smooth max entropy, (P (y )−P (y ))>0. (31) Y˜ π(i) Y i Xi=1 Hδ(Y)≤log|Tc|. (36) On the other hand, the excess distortion probability under On the other hand, since ℓ(g−1(y))≤⌊R⌋ for y ∈Tc, P P is evaluated as X Y˜|X |Tc|≤1+2+···+2⌊R⌋ =2⌊R⌋+1−1<2R+1. (37) Pr{d(X,Y˜)>D} j0 Combining (36) and (37) yields Hδ(Y)<R+1. Thus, from ≥ PX(x)PY˜|X(yπ(i)|x)I{d(x,yπ(i))>D} the second inequality in (34), we have GδD,ǫ(X)<R+1. xX∈XXi=1 C. Proof of Lemma 3 j0 = P (x)P (y |x) First, some notations are defined. X Y˜|X π(i) x∈X i=1 • Let k∗ ≥1 be the integer satisfying (24) and (23). XXj0 • Define γ as γ := 1− ki=∗1Pr{X ∈ AD(yi)}. Then, it − PX(x)PY˜|X(yπ(i)|x)I{x∈BD(yπ(i))} holds that γ ≤ǫ. P xX∈XXi=1 • Let j∗ ≥1 be the integer satisfying j0 j∗−1 = P (y ) Y˜ π(i) Pr{X ∈A (y )}<1−γ−δ, (38) D i i=1 X j0 Xi=1 j∗ − P (x) P (y |x)I{x∈B (y )} X Y˜|X π(i) D π(i) Pr{X ∈A (y )}≥1−γ−δ. (39) D i x∈X i=1 X X i=1 j0 j0 X ≥ P (y )−Pr X ∈ B (y ) , (32) We construct the following deterministic encoder f :X → Y˜ π(i) D π(i) ( ) {0,1}⋆ and decoder g :{0,1}⋆→Y. i=1 i=1 X [ [Encoder] where I{·} is the indicator function and the last inequal- ity is due to j0 P (y |x)I{x ∈ B (y )} ≤ 1) For x∈AD(yi) (i=1,...,k∗), set f(x)=wi. i=1 Y˜|X π(i) D π(i) 2) For x∈/ k∗ A (y ), set f(x)=w . I x∈ j0 B (y ) for all x∈X. For the second term i=1 D i 1 i=1 DPπ(i) [Decoder] Set g(w )=y (i=1,...,k∗). inn(32), it holds that o S i i S Now, we evaluate the excess distortion probability. From j0 (a) j0 the definition of the encoder and decoder, Pr X ∈ B (y ) ≤ Pr X ∈ B (y ) D π(i) D i ( i=1 ) ( i=1 ) k∗ [ [ j0 j0 Pr{d(X,g(f(X)))≤D}= Pr{X ∈AD(yi)} (b) (c) = Pr{X ∈AD(yi)} = PY(yi)−ǫ, (33) Xi=1 i=1 i=1 ≥1−ǫ. X X where (a) follows from the definition of y , (b) follows from i Therefore, we have Pr{d(X,g(f(X)))>D}≤ǫ. (18) and (19), and (c) follows from (27) and (28). Plugging Next, we evaluate the overflow probability. From the defi- (33) into (32) gives nition of the encoder, we have j0 Pr{f(X)=w }=Pr{X ∈A (y )}+γ, Pr{d(X,Y˜)>D}≥ (P (y )−P (y ))+ǫ>ǫ, 1 D 1 Y˜ π(i) Y i Pr{f(X)=w }=Pr{X ∈A (y )} (i=2,...,k∗). i=1 i D i X Setting R=⌊logmin(j∗,k∗)⌋, it holds that8 D. Proof of Theorem 3 (Direct part) From Lemma 3, there exists a code (f ,g ) j∗ n n satisfying Pr{ℓ(f(X))>R}≤1− Pr{f(X)=w } i Xi=1 Pr{dn(Xn,gn(fn(Xn)))>nD}≤ǫ, (41) j∗ 1 1 2 =1− Pr{X ∈AD(yi)}+γ Pr(cid:26)nℓ(fn(Xn))> nGδD,ǫ(Xn)+ n2GδD,ǫ(Xn)(cid:27)≤δ. (42) i=1 X Fix γ >0 arbitrarily. Then, it holds that ≤1−((1−γ−δ)+γ)=δ, 1 1 Gδ (Xn)≤limsup Gδ (Xn)+γ where the last inequality is due to (39). n D,ǫ n→∞ n D,ǫ Therefore, the code (f,g) is a deterministic (D,R,ǫ,δ) and code with R=⌊logmin(j∗,k∗)⌋. 2 ≤γ Let i∗ be the integer satisfying (22) and (21). Then, from n2GδD,ǫ(Xn) the proof of Lemma 1, it holds that logi∗ =Gδ (X). Since for all n≥n0 with some n0 >0. Then, from (42), we have D,ǫ γ ≤ǫ, it is easily verified that i∗ ≤j∗ and i∗ ≤k∗, meaning 1 1 that i∗ ≤ min(j∗,k∗). If i∗ = min(j∗,k∗), min(j∗,k∗) ≤ Pr nℓ(fn(Xn))>limsupnGδD,ǫ(Xn)+2γ ≤δ (43) i∗+2obviouslyholds.Then,assumingthati∗ <min(j∗,k∗), (cid:26) n→∞ (cid:27) we shall show that for all n ≥ n0. Thus, from (41) and (43), (fn,gn) is an (n,D,R,ǫ,δ) code with min(j∗,k∗)≤i∗+2. 1 R=limsup Gδ (Xn)+2γ n D,ǫ n→∞ This leads to for all n≥n . Since γ >0 is arbitrary, this indicates that 0 logmin(j∗,k∗)≤log(i∗+2)≤logi∗+ 2loge, R(D,ǫ,δ|X)≤limsup 1Gδ (Xn). i∗ n D,ǫ n→∞ where the rightmost inequality is due to Taylor’s expansion, (Conversepart)Forany(n,D,R,ǫ,δ)code,Lemma2gives and we obtain (9). nR>Gδ (Xn)−1. Therefore, it holds that Thefirststeptoshowmin(j∗,k∗)≤i∗+2isthefollowing D,ǫ inequality: R≥limsup 1Gδ (Xn) n D,ǫ n→∞ j∗ i∗ for any (n,D,R,ǫ,δ) code. Hence, we have Pr{X ∈A (y )}− Pr{X ∈A (y )} D i D i 1 Xi=1 Xi=1 R(D,ǫ,δ|X)≥limsup Gδ (Xn). (a) n→∞ n D,ǫ ≤ 1−γ−δ+Pr{X ∈A (y )}−(1−ǫ−δ) D j∗ ACKNOWLEDGMENT =Pr{X ∈A (y )}+ǫ−γ D j∗ Thisworkwassupportedin partbyJSPSKAKENHIGrant (b) ≤ Pr{X ∈A (y )}+Pr{X ∈A (y )}, (40) Numbers 26289119, 16K00195, 16K00417, and 16K06340. D j∗ D i∗+1 REFERENCES where (a) follows from (22) and (38) and (b) follows from [1] T. S. Han, Information-Spectrum Methods in Information Theory, Springer, 2003. k∗−1 [2] H.Koga,“CharacterizationofthesmoothRe´nyientropyusingmajoriza- ǫ−γ ≤ 1− Pr{X ∈A (y )} D i tion,”ITW,Seville, Spain,pp.1–5,Sept.2013. i=1 ! [3] I.KontoyiannisandS.Verdu´,“Optimallosslessdatacompression:Non- X k∗ asymptotics and asymptotics,” IEEE Trans. Inf. Theory, vol.60 no.2, pp.777–795,Feb.2014. − 1− Pr{X ∈A (y )} D i [4] V. Kostina, Y. Polyanskiy, andS. Verdu´, “Variable-length compression ! i=1 allowingerrors,”IEEETrans.Inf.Theory,vol.61,no.8,pp.4316–4330, X =Pr{X ∈A (y )}≤Pr{X ∈A (y )}. Aug.2015. D k∗ D i∗+1 [5] R.NomuraandH.Yagi,“Variable-length lossysourcecodingallowing some probability of union of overflow and excess distortion,” ISIT, Inequality (40) is equivalent to j∗−1Pr{X ∈ A (y )} ≤ Barcelona, Spain,pp.2958–2962, July2016. i=1 D i i∗+1Pr{X ∈ A (y )}. Thus, we obtain j∗ −1 ≤ i∗+1, [6] R.RennerandS.Wolf,“SmoothRe´nyientropyandapplications,”ISIT, i=1 D i P Chicago, USA,page232,June-July2004. implying that min(j∗,k∗)≤i∗+2. [7] S. Saito and T. Matsushima, “Threshold of overflow probability using P smooth max-entropy in lossless fixed-to-variable length source coding for general sources,” IEICE Trans. Fundamentals, vol. E99-A, no. 12, 8 IfR=⌊logk∗⌋,thenPr{ℓ(f(X))>R}=0. pp.2286–2290, Dec.2016. [8] S.SaitoandT.Matsushima,“Thresholdofoverflowprobabilityinterms ofsmoothmax-entropyforvariable-lengthcompressionallowingerrors,” ISITA,Monterey,California, USA,Oct.–Nov.2016. [9] W. Spankowski and S. Verdu´, “Minimum expected length of fixed-to- variable lossless compression without prefix constraints,” IEEE Trans. Inf.Theory,vol.57,no.7,pp.4017–4025, July2011. [10] T. Uyematsu, “A new unified method for fixed-length source coding problemsofgeneralsources,”IEICETrans.Fundamentals,vol.E93-A, no.11,pp.1868–1877, Nov.2010. [11] T. Uyematsu, “Relating source coding and resolvability: a direct ap- proach,” ISIT,Austin,USA,pp.1350–1354,June2010. [12] T.WeissmanandN.Merhav,“Tradeoffsbetweentheexcess-code-length exponent and the excess-distortion exponent in lossy source coding,” IEEETrans.Inf.Theory,vol.48,no.2,pp.396–415, Feb.2002. [13] H. Yagi and R. Nomura, “Variable-length coding with epsilon-fidelity criteria for general sources,” ISIT, Hong Kong, China, pp. 2181–2185, June2015.