ebook img

Wyner's Common Information: Generalizations and A New Lossy Source Coding Interpretation PDF

0.23 MB·English
by  Ge Xu
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Wyner's Common Information: Generalizations and A New Lossy Source Coding Interpretation

1 Wyner’s Common Information: Generalizations and A New Lossy Source Coding Interpretation Ge Xu, Wei Liu, Student Member, IEEE, and Biao Chen, Senior Member, IEEE 3 Abstract 1 0 Wyner’s common information was originally defined for a pair of dependent discrete random variables. Its 2 significance is largely reflected in, hence also confined to, several existing interpretations in various source coding n a problems.Thispaperattemptstobothgeneralizeitsdefinitionandtoexpanditspracticalsignificancebyprovidinga J new operational interpretation. Thegeneralization istwo-folded: thenumber of dependent variables canbe arbitrary, 0 1 so are the alphabet of those random variables. New properties are determined for the generalized Wyner’s common information of N dependent variables. More importantly, a lossy source coding interpretation of Wyner’s common ] T informationisdeveloped using theGray-Wynernetwork. Inparticular, itisestablished thatthecommon information I . equalstothesmallestcommonmessageratewhenthetotalrateisarbitrarilyclosetotheratedistortionfunctionwith s c jointdecoding.Asurprisingobservationisthatsuchequalityholdsindependentofthevaluesofdistortionconstraints [ aslongasthedistortionsarewithinsomedistortionregion.Examplesaboutthecomputationofcommoninformation 1 are given, including that of a pair of dependent Gaussian random variables. v 7 3 Index Terms 2 2 . Common information, Gray-Wyner network, rate distortion function 1 0 3 I. INTRODUCTION 1 : v Considera pairof dependentrandomvariables X and Y with joint distribution p(x,y), which denoteseither the i X probability density function if X and Y are continuous or the probability mass function if X and Y are discrete. r a Quantifying the information that is common between X and Y has been a classical problem both in information theoryandinmathematicalstatistics[1]–[4].ThemostwidelyusednotionisShannon’smutualinformation,defined as p(x,y) I(X;Y)=E log (cid:20) p(x)p(y)(cid:21) G.XuandB.ChenarewiththeDepartmentofElectricalEngineeringandComputerScience,SyracuseUniversity,Syracuse,NY13244USA (email:[email protected], [email protected]). W. Liu was with the Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY 13244 USA. He is nowwithBloombergL.P.,NewYork,NY10022USA(email:[email protected]). ThematerialinthispaperwaspresentedinpartattheAnnualAllertonConferenceonCommunication,Control,andComputing,Montecillo, IL,Sept.2010andAnnualConferenceonInformationSciencesandSystems,Baltimore,MD,March2011.Thisworkwassupportedinpartby the ArmyResearch Officeunder AwardW911NF-12-1-0383, bythe AirForceOffice ofScientific Research underAward FA9550-10-1-0458, andbytheNational Science Foundation underAward1218289. January11,2013 DRAFT 2 where p(x) and p(y) are the marginal distribution of X and Y corresponding to the joint distribution p(x,y) and E[] denotesexpectationwith respectto p(x,y). Shannon’smutualinformationmeasuresthe amountofuncertainty · reductioninonevariablebyobservingtheother.Itssignificanceliesinitsapplicationstoabroadrangeofproblems inwhichconcreteoperationalmeaningsofI(X;Y)canbeestablished.Theseincludebothsourceandchannelcoding problemsin informationand communicationtheory [5] and hypothesistesting problemsin statistical inference[6]. Othernotionsof informationhave also been defined between a pair of dependentvariables.Most notable among them are Ga´cs and Ko¨rner’s common randomness K(X,Y) [2] and Wyner’s common information C(X,Y) [4]. Ga´cs and Ko¨rner’s common randomness is defined as the maximum number of common bits per symbol that can be independently extracted from X and Y. Quite naturally, K(X,Y) has found extensive applications in secure communications, e.g., for key generation [7]–[9]. More recently, a new interpretation of K(X,Y) using the Gray- Wyner source coding network was given in [10]. It was noted in [2], [11] that the definition of K(X,Y) is rather restrictivein that K(X,Y) equals0 in mostcases exceptforthe specialcase when X =(X′,V) andY =(Y′,V) and X′,Y′,V are independentvariablesor those (X,Y) pair that can be convertedto such a dependencestructure throughrelabelingthe realizations, i.e., whose distribution is a permutationof the originaljoint distributionmatrix. Notice also that K(X,Y) is defined only for discrete random variables. Wyner’s common information was originally defined for a pair of discrete random variables with finite alphabet as C(X,Y)= inf I(X,Y;W). (1) X−W−Y Here, the infimum is taken over all auxiliary random variables W such that X, W, and Y form a Markov chain. Clearly, the quantity C(X,Y) in (1) can be defined for any pair of random variables with arbitrary alphabets. However,theoperationalmeaningsofC(X,Y)availableinexistingliteraturearelargelyconfinedtothatfordiscrete X and Y. These include the minimum common rate for the Gray-Wyner lossless source coding problem under a sum rate constraint, the minimum rate of a common input of two independent random channels for distribution approximation[4], and strong coordinationcapacity of a two-node network without commonrandomnessand with actions assigned at one node [12]. ThispaperintendstogeneralizeWyner’scommoninformationalongtwodirections.Thefirstistogeneralizeitto that of multiple dependentrandomvariables. The second is to generalize it to that of continuousrandom variables. For the first direction, Wyner’s common information is defined through a conditional independence structure which is equivalent to the Markov chain condition for two dependent variables. Relevant properties related to this generalizationarederived.Inaddition,weprovethatWyner’soriginalinterpretationsin[4]canbedirectlyextended to that involving multiple variables. Note that both mutual information and common randomness have also been generalized to that of multiple random variables [14]–[16]. For the second direction, we provide a new lossy source coding interpretation using the Gray-Wyner network. Specifically, we show that, for the Gray-Wyner network, Wyner’s common information is precisely the smallest commonmessage rate for a certain range of distortion constraintswhen the total rate is arbitrarily close to the rate January11,2013 DRAFT 3 distortion function with joint decoding. As the common information is only a function of the joint distribution, this smallest common rate remains constant even if the distortion constraints vary, as long as they are in a specific distortion region. There has also been recent effort in characterizing the common message rate for lossy source codingusing the Gray-Wynernetwork[17]. We establish the equivalencebetween the characterizationin [17]with an alternative characterization presented in the present paper. Computing Wyner’s common information is known to be a challenging problem; C(X,Y) was only resolved for several special cases described in [4], [13]. Along with our generalizations of Wyner’s common information, we provide two new examples where we can explicitly evaluate the common information of multiple dependent variables. In particular, we derive, through an estimation theoretic approach, C(X,Y) for a bivariate Gaussian source and its extension to the multi-variate case with a certain correlation structure. The rest of the paper is organized as follows. Section II reviews Wyner’s two approaches for the common information of two discrete random variables, the general Gray-Wyner network, and the relations among joint, marginal,and conditionalrate distortionfunctions. Section III gives the definition of Wyner’s commoninformation for N dependentrandom variables with arbitrary alphabets along with some associated properties. The operational meanings of Wyner’s common information developed in [4] are also extended to that of N discrete dependent random variables in Section III. In Section IV, we provide a new interpretation of Wyner’s common information usingGray-Wyner’slossy sourcecodingnetwork.Specifically,we provethatfortheGray-Wynernetwork,Wyner’s common information is precisely the smallest common message rate for a certain range of distortion constraints whenthetotalrateisarbitrarilyclosetotheratedistortionfunctionwithjointdecoding.InSectionV,twoexamples, thedoublysymmetricbinarysourceandthebivariateGaussiansource,areusedtoillustratethelossysourcecoding interpretation of Wyner’s common information. The common information for bivariate Gaussian source and its extension to the multi-variate case is also derived in V. Section VI concludes this paper. Notation: Throughout this paper, we use calligraphic letter to denote the alphabet and p(x) to denote either X point mass function or probability density function of a random variable X. Boldface capital letter XA denotes a vector of random variables X where A is an index set. A B denotes set theoretic subtraction, i.e., A B = i i∈A { } \ \ x:x A and x / B . For two real vectors of identical size x and y, x y denotes component-wise inequality. { ∈ ∈ } ≤ II. EXISTINGRESULTS A. Wyner’s result Wyner defined the common information of two discrete random variables X and Y with distribution p(x,y) in equation (1) and providedtwo operationalmeanings for this definition. The first approach is shown in Fig. 1. This model is a source coding network first studied by Gray and Wyner in [18]. In this model, the encoder observes a pairofsequences(Xn,Yn), andmapthemto threemessagesW ,W ,W ,takingvaluesinalphabetsofrespective 0 1 2 sizes2nR0,2nR1 and2nR2.Decoder1,uponreceiving(W ,W ),needstoreproduceXn withhighreliabilitywhile 0 1 decoder 2, upon receiving (W ,W ), needs to reproduce Yn with high reliability. Define 0 2 1 ∆= E[d (Xn,Xˆn)]+E[d (Yn,Yˆn)] H H 2n (cid:16) (cid:17) January11,2013 DRAFT 4 W 1 Decoder 1 Xˆn Xn,Yn W Encoder 0 W 2 Decoder 2 Yˆn Fig.1. Sourcecodingoverasimplenetwork. X˜n Processor 1 W Y˜n Processor 2 Fig.2. Randomvariable generators. whered (, )istheHammingdistortion.LetC bethetheinfimumofallachievableR forthesysteminFig.1such H 1 0 · · thatforanyǫ>0,thereexists,fornsufficientlylarge,asourcecodewiththetotalrateR +R +R H(X,Y)+ǫ 0 1 2 ≤ and ∆ ǫ. ≤ The second approach is shown in Fig. 2. In this approach, the joint distribution p(xn,yn) = n p(x ,y ) is i=1 i i Q approximated by the output distribution of a pair of random number generators. A common input W, uniformly distributed on = 1, ,2nR0 is sent to two separate processors which are independent of each other. These W { ··· } processors(randomnumbergenerators)generateindependentandidenticallydistributed(i.i.d)sequencesaccording to two distributions q (xn w) and q (yn w) respectively. The output sequences of the two processors are denoted 1 2 | | by X˜n and Y˜n respectively and the joint distribution of the output sequences is given by 1 q(xn,yn)= q (xn w)q (yn w). 1 2 | | wX∈W |W| Let 1 q(xn,yn) D (q,p)= q(xn,yn)log . n n p(xn,yn) xn∈XXn,yn∈Yn Let C be the infimum of rate R for the common input such that for any ǫ>0, there exists a pair of distribtions 2 0 q (xn w), q (yn w) and n such that D (q,p) ǫ. 1 2 n | | ≤ Wyner proved in [4] that C =C =C(X,Y). 1 2 B. Generalized Gray-Wyner networks Considerthe Gray-Wynersourcecodingnetwork[18]with one encoderand N decodersasshownin Fig. 3. The encoderobservesani.i.d.vectorsourcesequence X , ,X whereeachX = X , ,X ,k =1, ,n, 1 n k 1k Nk { ··· } { ··· } ··· January11,2013 DRAFT 5 W 0 W1 Decoder 1 Xˆn 1 Xn, ,Xn 1 ··· N Encoder W2 Decoder 2 Xˆn 2 ··· WN Decoder N Xˆn N Fig.3. Generalized Gray-Wynersourcecodingnetwork. is a length-N vectorwith joint distribution p(x). Denote by Xn =[X , ,X ] the ith componentof the vector i i1 ··· in sequence. There are a total of N receivers, with the ith receiver only interested in recovering the ith component sequence Xn. The encoder encodes the source into N +1 messages, one is a public message available at all i receivers while the other N messages are private messages only available at the corresponding receivers. For m=1,2, , let I = 0,1,2, ,m 1 . An (n,M ,M , ,M ) code is defined by m 0 1 N ··· { ··· − } ··· • An encoder mapping f : n n I I I , X1 ×···×XN → M0 × M1 ×··· MN • N decoder mappings g :I I ˆn, i=1,2, ,N. i Mi × M0 →Xi ··· For an (n,M ,M , ,M ) code, let f(X , ,X ) = (W ,W , ,W ) and Xˆn = g (W ,W ), i = 0 1 ··· N 1 ··· n 0 1 ··· N i i i 0 1,2, ,N. ··· We discuss below the lossless and lossy source coding using the generalized Gray-Wyner network. 1) Lossless Gray-Wyner source coding: Define the probability of error as N 1 P(n) = E[d (Xn,Xˆn)], (2) e nN H i i Xi=1 where Xˆn =g (W ,W ) for i=1, ,N and d (un,uˆn) is the Hamming distance between un and uˆn. i i i 0 ··· H A rate tuple (R ,R , ,R ) is said to be achievableif for any ǫ>0, there exists, for n sufficiently large, an 0 1 N ··· (n,M ,M , ,M ) code such that 0 1 N ··· M 2n(Ri+ǫ), i=0,1, ,N, (3) i ≤ ··· P(n) ǫ. (4) e ≤ Denote by the region of all achievable rate tuples (R ,R , ,R ). 1 0 1 N R ··· January11,2013 DRAFT 6 Theorem 1: is the union of all rate tuples (R ,R , ,R ) that satisfy 1 0 1 N R ··· R I(X , ,X ;W), (5) 0 1 N ≥ ··· R H(X W), i=1,2, ,N, (6) i i ≥ | ··· for some W p(wx , ,x ). 1 N ∼ | ··· 2) Lossy Gray-Wyner source coding: Let d(x,xˆ) , d (x ,xˆ ), ,d (x ,xˆ ) be a compound distortion measure. Define ∆ ,i = 1, ,N to be 1 1 1 N N N i { ··· } ··· the average distortion between the ith component sequence of the encoder input and the ith decoder output, n 1 ∆ =E[d (Xn,Xˆn)]= E[d (X ,Xˆ )]. (7) i i i i n i ik ik kX=1 Definethevectorofaveragedistortionstobe ∆, ∆ , ,∆ .An(n,M ,M , ,M )codewithanaver- 1 N 0 1 N { ··· } ··· agedistortionvector∆issaidtobean(n,M ,M , ,M ,∆)ratedistortioncode.LetD, D ,D , ,D 0 1 N 1 2 N ··· { ··· }∈ RN. A rate tuple (R ,R , ,R )is said to be D-achievableif forarbitrary ǫ>0, thereexists, for n sufficiently + 0 1 ··· N large, an (n,M ,M , ,M ,∆) code such that 0 1 N ··· M 2n(Ri+ǫ), i=0,1, ,N, (8) i ≤ ··· ∆ D+ǫ. (9) ≤ Let (D) be the region of all D-achievable rate tuples (R ,R , ,R ). 2 0 1 N R ··· Theorem 2: (D) is the union of all rate tuples (R ,R , ,R ) that satisfy 2 0 1 N R ··· R I(X , ,X ;W), (10) 0 1 N ≥ ··· R R (D ), i=1,2, ,N, (11) i ≥ Xi|W i ··· for some W p(wx , ,x ). 1 N ∼ | ··· Here, R (D ) is the conditional rate distortion function defined as [21] Xi|W i R (D )= min I(X ;Xˆ W). (12) Xi|W i pt(xˆi|xi,w):Edi(Xi,Xˆi)≤Di i i| Theorems1 and 2 are direct extensionsof Theorem4 and 8 in [18]forGray-Wynernetworkwith two receivers. Note that in [18],the authorsprovedonlythe discrete case for[18, Theorem8], the proofforcontinuousalphabets can be constructed in a similar fashion. C. Joint, marginal and conditional rate distortion functions In this section, we review the joint, marginal and conditional rate distortion functions and their relations. Two- dimensional sources will be considered and the results can be generalized immediately to N-dimensional vector sources. Given a two-dimensional source (X ,X ) with probability distribution p(x ,x ) and two distortion measures 1 2 1 2 d (x ,xˆ ) and d (x ,xˆ ) defined on ˆ and ˆ , the joint rate distortion function is given by 1 1 1 2 2 2 1 1 2 2 X ×X X ×X R (D ,D )=minI(X X ;Xˆ Xˆ ), (13) X1X2 1 2 1 2 1 2 January11,2013 DRAFT 7 where the minimum is taken over all test channels p (xˆ xˆ x x ) such that Ed (X ,Xˆ ) D , Ed (X ,Xˆ ) t 1 2 1 2 1 1 1 1 2 2 2 | ≤ ≤ D . The conditional rate distortion function is defined in (12). The joint, marginal and conditional rate distortion 2 functions satisfy the following inequalities. Lemma 1: [19], [20] Given a two-dimensional source (X ,X ) with joint distribution p(x ,x ) and two 1 2 1 2 distortionmeasuresd (x ,xˆ ),d (x ,xˆ )definedrespectivelyon ˆ and ˆ ,theratedistortionfunctions 1 1 1 2 2 2 1 1 2 2 X ×X X ×X satisfy the following inequalities R (D ,D ) R (D )+R (D ), (14a) X1X2 1 2 ≥ X1|X2 1 X2 2 R (D ) R (D ) I(X ;X ), (14b) X1|X2 1 ≥ X1 1 − 1 2 R (D ,D ) R (D )+R (D ) I(X ;X ), (14c) X1X2 1 2 ≥ X1 1 X1 2 − 1 2 R (D ) R (D ), (15a) X1 1 ≥ X1|X2 1 R (D )+R (D ) R (D ,D ). (15b) X1 1 X2 2 ≥ X1X2 1 2 Sufficient conditions for equality in (14) are that the optimum backward test channels for the functions on the left side of each equation factor appropriately, i.e., for (14a) p (x x xˆ xˆ ) = p(x xˆ x )p(x xˆ ), for (14b) b 1 2 1 2 1 1 2 2 2 | | | p (x xˆ x )=p(x xˆ ) and for(14c) that p (x x xˆ xˆ )=p(x xˆ )p(x xˆ ). Equalities hold in (15) if and only b 1 1 2 1 1 b 1 2 1 2 1 1 2 2 | | | | | if X and X are independent. 1 2 Furthermore, Gray has shown that under quite general conditions, equalities hold in (14) for small values of distortion. This is because the marginal, joint and conditional rate distortion functions equal to their Extended Shannon Lower Bounds (ESLB) [19], [21] under suitable conditions. These ESLB, denoted by R(L)(D) for a rate X distortion function R (D), satisfy the following property.Denote by a surface in the m-dimensionalspace and X D theinequality∆ meansthatthereexistsa vectorβ suchthat∆ β. Ifthereisnosuch avector,∆> . ≤D ∈D ≤ D Likewise, means that β for any β [19]. 1 2 2 1 D ≤D ≤D ∈D Lemma 2: [19] Given a two-dimensional source (X ,X ) with joint distribution p(x ,x ) such that for x 1 2 1 2 1 ∈ ,x , p(x x ) > 0, reproduction alphabets ˆ = , ˆ = and two per-letter distortion measures 1 2 2 2 1 1 1 2 2 X ∈ X | X X X X d (x ,xˆ ) and d (x ,xˆ ) that satisfy 1 1 1 2 2 2 d (x ,xˆ ) > d (x ,x )=0,x =xˆ ,i=1,2, (16) i i i i i i i i 6 there exist strictly positive surfaces (X X ), (X X ), (X ) and (X ) such that 1 2 1 2 1 2 D D | D D R (D ,D )=R(L) (D ,D ), if (D ,D ) (X X ), X1X2 1 2 X1X2 1 2 1 2 ≤D 1 2 R (D )=R(L) (D ), if D (X X ), X1|X2 1 X1|X2 1 1 ≤D 1| 2 R (D )=R(L)(D ), if D (X ), X1 1 X1 1 1 ≤D 1 R (D )=R(L)(D ), if D (X ), X2 2 X2 2 2 ≤D 2 and (X X ) (X ), 1 2 1 D | ≤ D (X X ) ( (X X ), (X )) ( (X ), (X )). 1 2 1 2 2 1 2 D ≤ D | D ≤ D D January11,2013 DRAFT 8 Finally, R(L) (D ,D ) = R(L) (D )+R(L)(D ), (17) X1X2 1 2 X1|X2 1 X2 2 = R(L)(D )+R(L)(D ) I(X ;X ). (18) X1 1 X2 2 − 1 2 It is apparentthat when the rate distortion functionsequal to their correspondingESLB, equations(17) and (18) imply equalities in (14a), (14b) and (14c). III. THECOMMONINFORMATION OFN DEPENDENTDISCRETERANDOM VARIABLES A. Definition Wyner’s original definition of the common information in (1) assumes a Markov chain between the random variables X, Y and the auxiliary variable W, i.e., X W Y. This Markov chain is equivalent to stating that X − − andY areconditionallyindependentgivenW.Thisconditionalindependencestructurecanbenaturally generalized to that of N dependent random variables. Let X , X , ,X be N dependent random variables that take 1 N { ··· } values in some arbitrary (finite, countable, or continuous) spaces . The joint distribution of X 1 2 N X ×X ×···×X is denoted as p(x), which is either a probability mass function or a probability density function. We now give the definition of the common information for N dependent random variables. Definition 1: Let X be a random vector with joint distribution p(x). The common information of X is defined as C(X),infI(X;W), (19) where the infimum is taken over all the joint distributions of (X,W) such that 1) the marginal distribution for X is p(x), 2) X are conditionally independent given W, i.e., N p(xw) = p(x w). (20) i | | Yi=1 We now discuss several properties associated with the definition given in (19). Wyner’s common information of two random variables (X ,X ) satisfies the following inequality 1 2 I(X ,X ) C(X ,X ) min H(X ),H(X ) . 1 2 1 2 1 2 ≤ ≤ { } AsimilarinequalityforthecommoninformationofN randomvariablescanbederived.LetA = 1,2, ,N ⊆N { ··· } and A¯= A. We have N\ max I(XA;XA¯) C(X) min H(X−j) , (21) A { }≤ ≤ j { } where X−j ,XN\{j} = X , ,X ,X , ,X for j . 1 j−1 j+1 N { ··· ··· } ∈N To verify the upper bound, for any j , let W = X−j. Thus, X , ,X are conditionally independent j 1 N ∈ N ··· given W , and j I(X;W )=I(X;X−j)=H(X−j). j January11,2013 DRAFT 9 Thus C(X) H(X−j) for all j . ≤ ∈N For the lower bound, since X , , X are conditionally independent given W, we have the Markov chain 1 N ··· XA W XA¯ for any subset A . Hence, − − ⊆N I(X;W) I(XA;W) I(XA;XA¯), ≥ ≥ where the second inequality is by the data processing inequality. Therefore, I(X;W) max I(XA;XA¯) . (22) ≥ A { } The common information defined in (19) also satisfies the following monotone property. Lemma 3: Let X p(x). For any two sets A,B that satisfy A B = 1,2, ,N , we have ∼ ⊆ ⊆N { ··· } C(XA) C(XB), (23) ≤ Proof: Let W′ be the auxiliary variable that achieves C(XB), i.e., I(XB;W′) = inf I(XB;W). Since W A B, XB being conditionally independent given W′ implies that XA are conditionally independent given W′. ⊆ Thus I(XB;W′) I(XA;W′), ≥ infI(XA;W), ≥ where the infimum is taken over all W such that XA is independent given W. Theabovemonotonepropertyofthecommoninformationiscontrarytowhatthenameimplies:conceptually,the informationin common ought to decrease when new variables are included in the set of random variables. Such is the case for Ga´cs and Ko¨rner’scommon randomness, i.e., K(XA) K(XB). As a consequence,we have that for ≥ any N random variables C(X) K(X). The fact that the commoninformation C(X) increases as more variables ≥ are involvedsuggests that it may have potential applicationsin statistical inference problems.This was exploredin [22]. B. Coding theorems for the common information of N discrete random variables Section II-A describes two operational interpretations of Wyner’s common information for two discrete random variables based on the Gray-Wyner network and distribution approximation. These operational interpretations can also be extended to the common information of N dependent random variables. For the first approach, we consider the lossless Gray-Wyner network with N terminals. For the Gray-Wyner sourcecodingnetwork,A number R is said to be achievableif forany ǫ>0, there exists, for n sufficientlylarge, 0 an (n,M ,M , ,M ) code with 0 1 N ··· M 2nR0, (24) 0 ≤ N 1 logM H(X)+ǫ, (25) i n ≤ Xi=0 P(n) ǫ. (26) e ≤ January11,2013 DRAFT 10 Define C as the infimum of all achievable R . 1 0 Theorem 3: For N discrete random variables X with joint distribution p(x), C =C(X). (27) 1 The proof of Theorem 3 is a direct extension of the proof for two discrete random variables in [4] and hence is omitted. The second approach of interpreting the common information of discrete random variable uses distribution approximation. Let X , ,X be i.i.d. copies of X with distribution p(x), i.e., the joint distribution for 1 n { ··· } X , ,X is 1 n { ··· } n p(n)(x , ,x )= p(x ). (28) 1 n k ··· kY=1 An (n,M,∆) generator consists of the following: • a message set with cardinality M; W • for all w ∈W, probability distributions qi(n)(xni|w), for i=1,2,··· ,N. Define the probability distribution on n n n X1 ×X2 ×···×XN N 1 q(n)(x , ,x )= q(n)(xn w). (29) 1 ··· n M i i| wX∈W iY=1 Let 1 q(n)(x , ,x ) ∆=D q(n)(x , ,x );p(n)(x , ,x ) = q(n)(x , ,x )log 1 ··· n , (30) n(cid:16) 1 ··· n 1 ··· n (cid:17) Xxn n 1 ··· n p(n)(x1,··· ,xn) where p(n)(x , ,x ) is defined in (28) and q(n)(x , ,x ) is defined as in (29). 1 n 1 n ··· ··· AnumberRissaidtobeachievableifforallǫ>0,iffornsufficientlylargethereexistsan (n,M,∆)generator with M 2nR and ∆ ǫ. Define C as the infimum of all achievable R. 2 ≤ ≤ Theorem 4: For N discrete random variables X with joint distribution p(x), C =C(X). (31) 2 The proof can be constructed in the same way as that of [4, Theorems 5.2 and 6.2], hence is omitted. IV. THELOSSY SOURCE CODING INTERPRETATIONOFWYNER’S COMMON INFORMATION Thecommoninformationdefinedin(1)and(19)equallyappliestothatofcontinuousrandomvariables.However, such definitions are only meaningful when they are associated with concrete operational interpretations. In this section, we develop a lossy source coding interpretation of Wyner’s common information using the Gray-Wyner network. While this new interpretation holds for the general case of N dependent random variable, we elect to present coding theorems involving only a pair of dependent variables for ease of notion and presentation. January11,2013 DRAFT

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.