ebook img

Polarization of the Rényi Information Dimension for Single and Multi Terminal Analog Compression PDF

0.51 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Polarization of the Rényi Information Dimension for Single and Multi Terminal Analog Compression

Polarization of the Re´nyi Information Dimension for Single and Multi Terminal Analog Compression Saeid Haghighatshoar∗, Emmanuel Abbe† ∗EPFL, Lausanne, Switzerland, saeid.haghighatshoar@epfl.ch †Princeton University, Princeton, NJ, USA, [email protected] Abstract—This paper shows that the Re´nyi information di- over the past years. The vast majority of the research has mension(RID)ofani.i.d.sequenceofmixturerandomvariables however capitalized on a common sparsity model. polarizes to the extremal values of 0 and 1 (fully discrete and Several works have explored connections between infor- continuous distributions) when transformed by an Hadamard 3 matrix. This provides a natural counter-part over the reals mation theory and compressed sensing1, in particular [6]– 1 of the entropy polarization phenomenon over finite fields. It [11], however it is only recently [12] that a foundation of 0 is further shown that the polarization pattern of the RID is A2A compression has been developed, shifting the attention 2 equivalenttotheBECpolarizationpattern,whichadmitsaclosed to probabilistic signal models beyond the sparsity structure. form expression. These results are used to construct universal n It is shown in [12] that under linear encoding and Lipschitz- and deterministic partial Hadamard matrices for analog to a continuous decoding, the fundamental limit of A2A compres- J analog (A2A) compression of random i.i.d. signals. In addition, a framework for the A2A compression of multiple correlated sion is the Re´nyi information dimension (RID), a measure 7 signalsisdeveloped,providingafirstcounter-partoftheSlepian- whose operational meaning had remained marginal in infor- 2 Wolf coding problem in the A2A setting. mation theory until [12]. In the case of a nonsingular mixture ] Index Terms—Re´nyi information dimension, Polarization, In- distribution, the RID is given by the mass on the continuous T formation preserving matrices, Analog compression, Distributed part, and for the specific case of sparse mixture distributions, analog compression, Compressed sensing. I thisgivesadimensionalityreductionoforderk/n.Itisnatural . s to ask whether this improvement on compressed sensing is c [ I. INTRODUCTION due to potentially complex or non-robust coding strategies. [13] shows that robustness to noise is not a limitation of the 1 A. Analog to analog compression framework in [12]. Two other works [14], [15] have corrobo- v 8 Analogtoanalog(A2A)compressionofsignalshasrecently rated the fact that complexity may not be a limitation either. 8 gatheredinterestininformationtheory[12]–[15].InA2Acom- In [14] spatially-coupled matrices are used for the encoding 3 pression,ahighdimensionalanalogsignalxn ∈Rnisencoded of the signal, leveraging on the analytical ground of spatially- 6 into a lower dimensional analog signal ym = fn(xn) ∈ Rm. coupledcodesandpredictionsof[17].Inparticular,[14]shows . 1 The goal is to design the encoding so as to preserve in ym all that the RID is achieved using approximate message passing 0 the information about xn, and to obtain successful decoding algorithmwithblockdiagonalGaussianmeasurementmatrices 3 for a given distortion measure like MSE or error probability. measurement matrices. However, the size of the blocks are 1 : In particular, the encoding may be corrupted by noise. It is increasing as the measurement rate approaches the RID. In v worth mentioning that when the alphabet of x and y is finite, [15], using a new entropy power inequality (EPI) for integer- i X this framework falls into traditional topics of information valuedrandomvariablesthatwasfurtherdevelopedin[16],the r theory such as lossless and lossy data compression, or joint polarization technique was used to deterministically construct a source-channelcoding.ThenoveltyofA2Acompressionisto partial Hadamard matrices for encoding discrete signals over consider x and y to be real valued and to impose regularity the reals. This provides a way to achieve a measurement rate constrains on the encoder, in particular linearity, as motivated of o(n) for signals with a zero RID along with a stable by compressed sensing [1], [2]. low complexity recovery algorithm. The case of mixture The challenge and practicality of A2A compression is to distributions was however left open in [15]. obtain dimensionality reduction, i.e., m/n(cid:28)1, by exploiting This paper proposes a new approach to A2A compression a prior knowledge on the signal. This may be sparsity as in by means of a polarization theory over the reals. The use of compressedsensing.Fork-sparsesignals,andwithoutanysta- polarization techniques for sparse recovery was proposed in bilityorcomplexityconsiderations,itisnothardtoseethatthe [18]fordiscretesignals,relyingoncodingstrategiesoverfinite dimensionalityreductioncanbeoforderk/n.Ameasurement fields.Inthispaper,itisshownthatusingtheRID,oneobtains rate of order k/nlog(n/k) has been shown to be sufficient anaturalcounter-partovertherealsoftheentropypolarization to obtain stable recovery by solving tractable optimization phenomenon [19], [20]. Specifically, the entropy (or source) algorithms like convex programming (l1 minimization). This polarizationphenomenon[20]showsthattransformingani.i.d. remarkable achievement has gathered tremendous amount of attentionwithalargevarietyofalgorithmicsolutionsdeployed 1 [3]–[5]investigateLDPCcodingtechniquesforcompressedsensing sequence of discrete random variables using an Hadamard distributionofX canbedecomposedasp =δp +(1−δ)p , X c d matrixpolarizestheconditionalentropiestotheextremevalues where p and p are the continuous and the discrete part c d of0and1(deterministicandmaximallyrandomdistributions). of the distribution and 0 ≤ δ ≤ 1 is the weight of the We show in this paper that the RID of an i.i.d. sequence of continuous part. Thus, δ = 0 and δ = 1 corresponds to the mixture random variables also polarizes to the two extreme fully discrete and fully continuous case respectively. For such values0and1(discreteandcontinuousdistributions).Togetto a probability distribution, the Re´nyi information dimension is this result, properties of the RID in vector settings and related interchangeablydenotedbyd(p )ord(X)andisequaltothe X informationmeasuresarefirstdeveloped.Itisthenshownthat weight of the continuous part δ. theRIDpolarizationis,asopposedtotheentropypolarization, ThereisanotherrepresentationforarandomvariableX that obtainedwithananalyticalpattern.Inotherwords,thereisno wewillrepeatedlyuseinthepaper.AssumeU isacontinuous need to rely on algorithms to compute the set of components random variable with probability distribution p and V is a c which tend to 0 or 1, as this is given by a known pattern discrete random variable with probability distribution p and d equivalent to the BEC channel polarization [19]. This is then U and V are independent. Let Θ∈{0,1} be a binary valued used to obtain universal A2A compression schemes based on randomvariable,independentofU andV withP(Θ=1)=δ. explicit partial Hadamard matrices. The current paper focuses It is easy to see that we can represent X as X =ΘU +Θ¯V, on the encoding strategies and on extracting the RID without where Θ¯ = 1−Θ. In this case, the random variable X will specifying the decoding strategy. Numerical simulations pro- have the distribution p = δp +(1−δ)p . Also, if Xn is X c d 1 vide evidence that efficient message passing algorithms may a sequence of such random variables with the corresponding be used in conjunction to the obtained encoders. binary random variables Θn, C = {i ∈ [n] : Θ = 1} 1 Θ i Finally,thepaperextendstherealmofA2Acompressionto is a random set consisting of the position of the continuous a multi signal settings. Techniques of distributed compressed components of the signal. Similarly, C¯ =[n]\C is defined Θ Θ sensing were introduced in [23] for specific classes of sparse to be the position of the discrete components. signal models. We provide here an information theoretic For a matrix Φ of a given dimension m × n and a set framework for general multi signal A2A compression, as a S ⊂[n],Φ isasub-matrixofdimensionm×|S|consistingof S counter part of the Slepian&Wolf coding problem in source those columns of Φ having index in S. Similarly, for a vector compression [24]. A measurement rate region to extract the of random variables Xn, the vector X = {X : i ∈ S} is a 1 S i RIDofcorrelatedsignalsisobtainedandisshowntobetight. sub-vectorofXn consistingofthoserandomvariableshaving 1 index in S. For two matrices A and B of dimensions m ×n B. Notations and preliminaries 1 andm ×n,[A;B]denotesthe(m +m )×nmatrixobtained 2 1 2 The set of reals, integers and positive integers will be by vertically concatenating A and B. ddeennootteedthbey sRe,t ZofasntdricZtl+y rpeosspieticvteiveilnyt.egNers=. FZo+r \n{0}∈wNill, quaFnotrizaantixon∈oRf xanbdyainqte∈rsNpa,c[ixn]gq =1. S(cid:98)qiqxm(cid:99)ildaerlnyo,tfeosrthaevuenctioforromf [n] = {1,2,...,n} denotes the sequence of integers from 1 q random variables Xn, [Xn] will denote the component-wise to n. For a set S, the cardinality of the set will be denoted by 1 1 q uniform quantization of Xn. |S|, thus |[n]|=n. 1 For a(q) and b(q) two functions of q, a(q) (cid:22) b(q) or Allrandomvariablesaredenotedbycapitallettersandtheir equivalently b(q)(cid:23)a(q) will be used for realizationbylowercaseletter(xisarealizationoftherandom variableX).Theexpectedvalueandthevarianceofarandom b(q)−a(q) XvajriaibsleaXcolaurmendevneocttoedr cboynsEis{tiXng} oafndtheσX2ra.nFdoormi,vjari∈ablZes, ql→im∞ log2(q) ≥0. i . {X ,X ,...,X } and for i>j, we set Xj equal to null. Similarly, a(q) = b(q) is equivalent to a(q) (cid:22) b(q),a(q) (cid:23) i i+1 j i For a discrete random variable X with a distribution p , b(q). X H(X) = H(p ) denotes the discrete entropy of X. For An ensemble of single terminal measurement matrices will X the continuous case, h(X) = h(pX) denotes the differential be denoted by {ΦN}, where N is the labeling sequence and entropy of X. Throughout the paper, we assume that all of can be any subsequence of N. The dimension of the family discrete and continuous random variables have well-defined will be denoted by mN ×N, where mN is the number of discrete entropy and differential entropy respectively. For measurements taken by ΦN. The asymptotic measurement randomelementsX,Y andZ,I(X;Y)andI(X;Y|Z)denote rate of the ensemble is defined by limsupN→∞ mNN. We will themutualinformationofX andY andtheconditionalmutual also work with an ensemble of multi terminal measurement information of X and Y given Z. I(X;Y|z) denotes the matrices. We will focus to the two terminal case and the mutual information of X and Y given a specific realization extension to more than two terminals will be straightforward. Z =z. Hence, I(X;Y|Z)=E {I(X;Y|z)}. For simplicity, We will denote these two terminals by x,y and the cor- Z we also assume that all of the random variables (discrete, responding ensemble by {Φx ,Φy } with the corresponding N N continuous or mixture) have finite second order moments. dimension mx × N and my × N. The measurement rate N N All probability distributions are assumed to be nonsingular. vector for this ensemble will be denoted by (ρx,ρy), where Hence, in the general case for a random variable X, the ρx =limsupN→∞ mNxN,ρy =limsupN→∞ mNyN. II. RE´NYIINFORMATIONDIMENSION Definition 2. LetXn bearandomvectorinL.ThejointRID 1 of Xn provided that it exists, is defined as Let X be a random variable with a probability distribution 1 pX over R. The upper and the lower RID of this random d(Xn)= lim H([X1n]q). variable are defined as follows: 1 q→∞ log (q) 2 H([X] ) Definition 3. Let (Xn,Ym) be a random vector in L. The d¯(X)=limsup q , 1 1 q→∞ log2(q) conditional RID of X1n given Y1m and Re´nyi information of Yn about Xn, provided they exist, are defined as follows: H([X] ) 1 1 d(X)=liminf q . q→∞ log2(q) d(Xn|Ym)= lim H([X1n]q|Y1m) 1 1 q→∞ log (q) ByLebesquedecompositionorJordandecompositiontheorem, 2 any probability distribution over R like p can be written as IR(X1n;Y1n)=d(X1n)−d(X1n|Y1m). X aconvexcombinationofadiscretepart,acontinuouspartand Generally, it is difficult to give a characterization of RID a singular part, namely, for a general multi-dimensional distribution because it can contain probability mass over complicated subsets or sub- p =α p +α p +α p , X d d c c s s manifolds of lower dimension. However, we will show that the vector Re´nyi information dimension is well-defined for where p , p and p denote the discrete, continuous and the d c s the space L. In order to give the characterization of RID over singular part of the distribution and α ,α ,α ≥0 and α + d c s d L,wealsoneedtodefinesomeconceptsfromlinearalgebraof α +α = 1. In [27], Re´nyi showed that if α = 0, namely, c s s matrices, namely, for two matrices of appropriate dimensions, there is no singular part in the distribution and p = (1− X we propose the following definition of the “influence” of one δ)p +δp for some δ ∈[0,1], then the RID is well-defined d c and d(X) = d¯(X) = d(X) = δ. Moreover, he proved that if matrix on another matrix and “residual” of one matrix given X1n is a continuous random vector then limq→∞ Hlo([gX(1nq])q) = another matrix. n, implying the RID of n for the n-dimensional con2tinuous Definition 4. Let A and B be two arbitrary matrices of random vector. dimensionm ×nandm ×n.AlsoletK ⊂[n].Theinfluence 1 2 OurobjectiveistoextendthedefinitionofRIDforarbitrary ofthematrixB onthematrixAandtheresidualofthematrix vectorrandomvariables,whicharenotnecessarilycontinuous. A given B over the column set K are defined to be To do so, we first restrict ourselves to a rich space of random I(A;B)[K]=rank([A;B] )−rank(A ), variables with well-defined RID. Over this space, it will be K K possible to give a full characterization of the RID as we will R(A;B)[K]=rank([A;B]K)−rank(BK). see in a moment. Remark 2. It is easy to check that I(A;B)[K] is the amount Definition 1. Let (Ω,F,P) be a standard probability space. of increase of the rank of the matrix A by adding rows of K ThespaceL(Ω,F,P)isdefinedasL=∪∞ L ,whereL is the matrix B and R(A;B)[K] is the residual rank of the n=1 n 1 K thesetofallnonsingularrandomvariablesandforn∈N\{1}, matrix A knowing the rows of the matrix B . Moreover, K K L is the space of n-dimensional random vectors defined as one can easily check that I(A;B)[K]=R(B;A)[K]. n L ={Xn : there exist k ∈N,A∈Rn×k and Zk Theorem 1. Let (X1n,Y1m) be a random vectors in the space n 1 1 L, namely, there are i.i.d. nonsingular random variables Zk independent and nonsingular such that Xn =AZk}. 1 1 1 andtwomatricesAandB ofdimensionn×k andm×k such that Xn =AZk and Ym =BZk. Let Z =Θ U +Θ¯ V be Remark1. Itisnotdifficulttoseethatalln-dimensionalvector 1 1 1 1 i i i i i the representation for Z , i∈[k]. Then, we have i randomvariables,singularornonsingular,canbewellapprox- 1) d(Xn)=E{rank(A )}, imated in the space L, for example in (cid:96)2-sense. However, this 2) d(X1n|Ym)=E{R(ACΘ;B)[C ]}, is not sufficient to fully characterize the RID. Specially, the 1 1 Θ RID is discontinuous in (cid:96)p topology, p≥1. For example, we where CΘ = {i ∈ [k] : Θi = 1} is the random set consisting can construct a sequence of fully discrete random variables of the position of continuous components. in L converging to a fully continuous random variable in (cid:96) , p Remark 3. Notice that the results intuitively make sense, whereastheRIDofthesequenceis0anddoesnotconvergeto namely, for a specific realization θk if θ = 0 we can 1 i 1. Although we have such a mathematical difficulty in giving neglect Z because it is fully discrete and does not affect the i a characterization of the RID, we think that the space L is RID. Moreover, over the continuous components the resulting rich enough for modeling most of the cases that we encounter contribution to the RID is equal to the rank of the matrix in applications. A , which is the effective dimension of the space over Cθ Over L, we will generalize the definition of the RID to which the continuous random variable A U is distributed. Cθ Cθ include joint RID, conditional RID and Re´nyi information Finally,allofthesecontributionsareaveragedoverallpossible defined as follows. realizations of Θk. 1 Using Theorem 1, it is possible to prove a list of properties Let{B }∞ beasequenceofi.i.d.uniform{+,−}-valued n n=1 of the RID. random variables. By replacing Bn for {+,−}-labeling bn in 1 1 the definition of the erasure process, we obtain a stochastic Theorem 2. Let (Xn,Ym) be a random vector in L as in Theorem 1. Then, we1hav1e the following properties: process en = eB1B2...Bn. Let Fn be the σ-field generated by Bn. Using the BEC polarization [19], [21], we have the 1) d(X1n) = d(MX1n) for any arbitrary invertible matrix follow1ing results: M of dimension n×n. 1) (e ,F ) is a positive bounded martingale. 2) d(Xn,Ym)=d(Xn)+d(Ym|Xn). n n 1 1 1 1 1 2) e converges to e ∈{0,1} with P(e =1)=α. 3) I (Xn;Ym)=I (Ym;Xn). n ∞ ∞ R 1 1 R 1 1 3) For any 0 < β < 1, liminf P(e ≤ 2−Nβ) = 4) I (Xn;Ym) ≥ 0 and I (Xn;Ym) = 0 if and only 2 n→∞ n R 1 1 R 1 1 1 − α, where N = 2n is the number of all possible if Xn and Ym are independent after removing discrete 1 1 cases that e can take. common parts, namely, those Z ,i ∈ [k] that are fully n i Let n∈N and N =2n. Assume that XN is a sequence of discrete. 1 i.i.d. nonsingular random variables with a RID equal to d(X) Further investigation also shows that we have a very nice and let ZN = H XN, where H is the Hadamard matrix 1 N 1 N duality between the discrete entropy and the RID as depicted of order N. For i ∈ [N], let us define I (i) = d(Z |Zi−1). n i 1 in Table I. As we will see in Subsection III-B and III-C, this Assume that bn is the binary expansion of i−1. By replacing 1 duality can be generalized to include some of the theorems 0 by + and 1 by −, we can equivalently represent I (i) be a n in classical information theory like single terminal and multi sequenceof{+,−}values,namely,In(i)=Ib1b2...bn.Similar terminal (Slepian & Wolf) source coding problems. to the erasure process, we can convert I to a stochastic n process In = IB1B2...Bn by using i.i.d. uniform {+,−}- Discreterandomvariables RandomvariablesinL valued random variables Bn. We have the following theorem. DiscreteentropyH RIDd 1 Conditionalentropy ConditionalRID Theorem3(SingleterminalRIDpolarization). (I ,F )isan Mutualinformation Re´nyimutualinformation n n erasure stochastic process with initial value d(X) polarizing Deterministic Discrete Chainrule Chainrule to {0,1}. Singleterminalsourcecoding SingleterminalA2Acompression Multiterminalsourcecoding MultiterminalA2Acompression For n ∈ N and N = 2n, let {(Xi,Yi)} be a sequences of random vectors in the space L, with joint and conditional TABLE I: Duality between H and d RID d(X,Y), d(X|Y) and d(Y|X). Let ZN = H XN and 1 N 1 assume that WN = H YN. Let us define two processes I 1 N 1 n and J as follows. III. MAINRESULTS n I (i)=d(Z |Zi−1),i∈[N], In this section, we will give a brief overview of the n i 1 results proved in the paper. Subsection III-A is devoted to the J (i)=d(W |Wi−1,ZN),i∈[N]. n i 1 1 results obtained for the polarization of the Re´nyi information Similarly, we can label I and J by a sequence of bn dimension. These results are used in Subsections III-B and n n 1 III-C to study A2A compression problem from an information and convert them to stochastic processes In =IB1B2...Bn and theoretic point of view. Subsection III-B considers the single Jn = JB1B2...Bn. By this definition, we have the following theorem. terminalcasewhereasSubsectionIII-Cisdevotedtothemulti terminal case. Theorem 4 (Multi terminal RID polarization). (I ,F ) and n n (J ,F ) are erasure stochastic processes with initial value n n A. Polarization of the Re´nyi information dimension d(X) and d(Y|X), both polarizing to {0,1}. BeforestatingthepolarizationresultfortheRID,wedefine Remark 5. In the t terminal case t > 2 for a t termi- the m-dimensional erasure process as follows. nal source (X ,X ,...X ), using a similar method it is 1 2 t Definition 5. Let α∈[0,1]. An “erasure process” with initial possible to construct erasure processes with initial values value α is defined as follows. d(X1),d(X2|X1),...,d(Xt|X1t−1), polarizing to {0,1}. 1) e∅ =α. e+ =2α−α2 and e− =α2. B. Single terminal A2A compression 2) Let en = eb1b2...bn, for some arbitrary {+,−}-valued sequence bn. Define In this subsection, we will use the properties of the RID 1 developed in Section II to study the A2A compression of e+ =eb1b2...bn+ =2e −e2, memoryless sources. We assume that we have a memoryless n n n e− =eb1b2...bn− =e2. sourcewithsomegivenprobabilitydistribution.Theideaisto n n capture the information of the source, to be made clearer in Remark 4. Notice that using the {+,−} labeling, we can a moment, by taking some linear measurements. As is usual construct a binary tree where each leaf of the tree is assigned in information theory, we are mostly interested in asymptotic a specific {+,−}-valued sequence. regime for large block lengths. To do so, we will use an ensemble of measurement matrices to analyze the asymptotic in the expression (1) by log (q), take the limit as q tends to 2 behavior.WewillalsodefinethenotionofREP(restrictediso- infinityandusethedefinitionoftheRID,wegettheequivalent entropy property) for an ensemble of measurement matrices. form d(XN|Φ XN) This subsection is devoted to the single terminal case. The 1 N 1 ≤(cid:15). results for the multi terminal case will be given in Subsection d(XN) 1 III-C. We are mostly interested to the the measurement rate Interestingly, this implies that in the high resolution regime region of the problem in order to successfully capture the that we are considering for analysis, the information isometry source. (keeping more than 1 − (cid:15) ratio of the information of the Definition6. LetXN beasequenceofi.i.d.randomvariables signal) is equivalent to the Re´nyi isometry. Moreover, from 1 with a probability distribution p (discrete, mixture or con- the properties of the RID, it is easy to see that this REP X tinuous) over R, and let DN =[XN] for q ∈N. The family measure meets some of the invariance requirements that we 1 1 q of measurement matrices {Φ }, indexed with a subsequence expect. For example, it is scale invariant and any invertible N of N and with dimension mN ×N, is (cid:15)-REP(pX) with the linear transformation of the input signal X1N keeps the (cid:15)-REP measurement rate ρ if measure unchanged. H(DN|Φ XN) We can also extend the definition when the probability limsup 1 N 1 ≤(cid:15), (1) distributionofthesourceisnotknownexactlybutitisknown H(DN) q→∞ 1 to belong to a given collection of distributions Π. m limsup N =ρ. N Definition 7. Assume Π = {π : π ∈ Π} is a class of N→∞ To give some intuitive justification for the REP definition, nonsingular probability distributions over R. The family of let us assume that all of the measurements are captured with measurement matrices {ΦN}, indexed with a subsequence of acasdee,vaicltehowuigthhtfihneitpeotpernetciaisliionnfoqr1m0 aftoiornsoomfteheq0sig∈naNl,.inIntetrhmast rNataenρdiwfiitthidsim(cid:15)-eRnEsPio(nπ)mfNor×eNve,ryisπ(cid:15)-∈REΠP.(Π)formeasurement of bits, can be very large, but what we effectively observe Now that we have the required tools and definitions, we throughthefiniteprecisiondeviceisonlyH([XN] ).Insuch 1 q0 give a characterization of the required measurement rate in a setting, the ratio of the information we lose after taking the ordertokeeptheinformationisometry.Similartoalltheorems measurements, assuming that some genie gives us the infinite in information theory, we do this using the “converse” and precision measurement captured from the signal, is exactly “achievability” parts. what we have in the definition of REP, namely, Theorem 5 (Converse result). Let XN be a sequence of H(DN|Φ XN) 1 1 N 1 , (2) i.i.d. random variables in L. Suppose {ΦN} is a family of (cid:15)- H(DN) 1 REP(pX) measurement matrices of dimension mN ×N, then where we assume that D1N = [X1N]q0. This might be a ρ≥d(X1)(1−(cid:15)). reasonable model for application because pretty much this Remark 8. This result implies that to capture the information is what happens in reality. The problem with this model is of the signal the asymptotic measurement rate must be ap- that it is not invariant under some obvious transformations proximately greater then the RID of the source. This in some like scaling. For example, assume that we are scaling the sense is similar to the single terminal source coding problem signalbysomerealnumber.Inthiscase,throughsomesimple in which the encoding rate must be grater then the entropy of examplesitispossibletoshowthattheratioin(2)canchange the source.This again theemphasizes theanalogy between H considerably. There are two approaches to cope with this and d. Moreover, in the discrete case, d(X)=0, the result is problem. One is to scale the signal with a desired factor to trivial. match it to the finite precision quantizer, which in its own Remark9. Itwasprovedin[12]thatunderlinearencodingand can be very interesting to analyze but probably will be two block error probability distortion condition, the measurement complicated. The other way, is to take our approach and rate must be higher than the RID of the source, ρ ≥ d(X). develop a theory for the case in which the resolution is high Theorem 5 strengthen this result stating that ρ ≥ d(X) enough so that the quality measure proposed in (2) is not must hold even under the milder (cid:15)-REP restriction on the affected by the shape of the distribution of the signal. measurement ensemble. Remark 6. Notice that in the fully discrete case, the REP Theorem 5 puts a lower bound on the measurement rate in definition is simplified to the equivalent form order to keep the (cid:15)-REP property. However, it might happen H(XN|Φ XN) 1 N 1 ≤(cid:15), that there is no measurement family to achieve this bound. H(X1N) Fortunately, as we will see, it is possible to deterministically m limsup N ≤ρ. truncatethefamilyofHadamardmatricestoobtainameasure- N→∞ N mentfamilywith(cid:15)-REPpropertyandmeasurementrated(X). Remark7. ForanondiscretesourcewithstrictlypositiveRID, Thisissummarizedinthefollowingtwotheorems.Noticethat d(X) > 0, if we divide the numerator and the denumerator inthefullycontinuouscaseasTheorem5implies,thefeasible measurement rate is approximately 1 which for example can that we need from different terminals in order to capture be achieved with any complete orthonormal family, thus no the signal faithfully. We will analyze the problem for two explicitconstructionisnecessary.Forthenoncontinuouscase, terminal case. The extension to more than two terminals is we will distinguish between the fully discrete case and the straightforward. mixture case because they need different proof techniques. Definition8. Let{(X ,Y )}N beatwoterminalmemoryless Theorem 6 and 7 summarize the results. i i i=1 source with (X ,Y ) being in L. The family of distributed 1 1 Theorem 6 (Achievability result). Let XN be a sequence measurementmatrices{Φx ,Φy },indexedwithasubsequence 1 N N of i.i.d. discrete integer2-valued random variables. Then, for of N, is (cid:15)-REP(p ) for the measurement rate (ρ ,ρ ) if X,Y x y any (cid:15)>0, there is a family of (cid:15)-REP(p ) partial Hadamard matrices of dimension mN ×N, for NX=2n with ρ=0. limsupH([X1N]q,[Y1N]q|ΦxNX1N,ΦyNY1N) ≤(cid:15), (3) H([XN] ,[YN] ) Theorem 7 (Achievability result). Let XN be a sequence of q→∞ 1 q 1 q 1 mx my i.i.d. random variables in L. Then, for any (cid:15) > 0, there is a limsup N ≤ρ , limsup N ≤ρ . N x N y familyof(cid:15)-REP(p )partialHadamardmatricesofdimension N→∞ N→∞ X m ×N, for N =2n with ρ=d(X ). N 1 Remark11. If(X,Y)isarandomvectorinLwithd(X,Y)> WehavealsothegeneralresultinTheorem8whichimplies 0, similar to what did in the single terminal case, dividing thatwecanconstructafamilyoftruncatedHadamardmatrices the numerator and the denumerator in the expression (3) by which is (cid:15)-REP for a class of distributions. log (q) and taking the limit as q tends to infinity, we get the 2 equivalent definition Theorem 8 (Achievability result). Let Π be a family of probability distributions with strictly positive RID. Then, for d(XN,YN|Φx XN,Φy YN) any (cid:15) > 0, there is a family of (cid:15)-REP(Π) partial Hadamard 1 1 N 1 N 1 ≤(cid:15), d(XN,YN) matrices of dimension m × N, for N = 2n, with ρ = 1 1 N sup d(π). whichimpliestheequivalenceoftheinformationisometryand π∈Π the Re´nyi isometry. Remark 10. Theorem 8 implies that there is a fixed ensemble ofmeasurementmatricescapableofcapturingtheinformation Remark12. Noticethatinthefullydiscretecase,thedefinition of the all of the distributions in the family Π. This is very above is simplified to the equivalent form usefulinapplicationsbecauseusuallytakingthemeasurements is costly and most of the time we do not have the exact H(X1N,Y1N|ΦxNX1N,ΦyNY1N) ≤(cid:15), distribution of the signal. If each distribution needs its own H(XN,YN) 1 1 specific measurement matrix, we have to do several rounds of mx my limsup N ≤ρ , limsup N ≤ρ . the measurement each time taking the measurements compat- N x N y N→∞ N→∞ iblewithonespecificdistributionanddotherecoveryprocess for that specific distribution. The benefit of Theorem 8 is that We can also extend the definition to a class of probability onemeasurementensembleworksforallofdistributions.Itis distributions. also good to notice that although the measurement ensemble Definition 9. Assume that Π = {π : π ∈ Π} is a class of is fixed, the recovery (decoding) process might need to know nonsingularprobabilitydistributionsinL.Thefamilyofmea- the exact distribution of the signal in order to have successful surement matrices {Φx ,Φy } is (cid:15)-REP(Π) for measurement N N recovery. rate (ρ ,ρ ) if it is (cid:15)-REP(π) for every π ∈Π. x y C. Multi terminal A2A compression Definition 10. Let (X,Y) be a two dimensional random vector in L with a distribution p . The Re´nyi information In this section, our goal is to extend the A2A compression X,Y region of p is the set of all (ρ ,ρ )∈[0,1]2 satisfying theoryfromthesingleterminalcasetothemultiterminalcase. X,Y x y In the multi terminal setting, we have a memoryless source ρ ≥d(X|Y), ρ ≥d(Y|X), ρ +ρ ≥d(X,Y). which is distributed in more than one terminal and we are x y x y going to take linear measurements from different terminals in Definition 11. Assume that Π is a class of two dimensional order to capture the information of the source. We are again random vectors from L. The Re´nyi information region of the interested in an asymptotic regime for large block lengths. To class Π is the intersection of the Re´nyi information regions of do so, we will use an ensemble of distributed measurement the distributions in Π. matrices that we will introduce in a moment. Similar to the singleterminalcase,weareinterestedinthemeasurementrate Similar to the single terminal case, we are interested in the region of the problem, namely, the number of measurements rate region of the problem. We have the following converse and achievability results. 2WeprovedthistheoremusingtheEPIresultwedevelopedin[16],where Theorem 9 (Converse result). Let {(X ,Y )}N be a two- weprovedtheresultforlatticediscreterandomvariables.However,webelieve i i i=1 thatsucharesultisalsotruefornon-latticediscretedistributions. terminalmemorylesssourcewith(X1,Y1)beinginL.Assume thatthedistributedfamilyofmeasurementmatrices{Φx ,Φy } Taking the expectation over Θk, we will get the result. To N N 1 is (cid:15)-REP with a measurement rate (ρ ,ρ ). Then, prove (4), notice that x y ρx+ρy ≥d(X,Y)(1−(cid:15)), H([X1n]q|θ1k)=. H([ACθUCθ +AC¯θVC¯θ]q) ρx ≥d(X|Y)−(cid:15)d(X,Y), ρy ≥d(Y|X)−(cid:15)d(X,Y). =. H([ACθUCθ +AC¯θVC¯θ]q|VC¯θ) (5) =H([A U ] ), (6) Remark 13. This rate region is very similar to the rate region Cθ Cθ q . otbhfyetthoheneldyRisIdtDriif,bfeuwrteehndiccsheoatuhgracateintchoeemddipnishgca(rsSeitzleeepseinathtnreo&paynWaholaolsfg)ybpebreoenbtwlreeempelnawctiehtdhe wfaarhceteerteqhuawtaleknuuopswetdiongHfinV(iVC¯teCθ¯,θu)[nAc≤CeθrtNUaiCHnθty](q.V1aS)npde=c[iAfi0Cc.aθWlUlyeC,θas+luspoApouC¯ssθeeVdLC¯tθh]ieqs discrete entropy and the RID. Similar to the Slepian&Wolf the minimum number of lattices of size 1 required to cover q pmreoabsluemrem, wenetcraaltleρrxe+gioρny.=d(X,Y) the dominant face of the AC¯θ ×[0,2q]|C¯θ|, which is a finite number. Then Theorem 10 (Acievability result). Let {(Xi,Yi)}Ni=1 be a H([ACθUCθ]q|VC¯θ,[ACθUCθ +AC¯θVC¯θ]q)≤log2(L), discrete two-terminal memoryless source. Then there is a which implies (5) and (6). family of (cid:15)-REP partial Hadamard matrices {Φx ,Φy } with GenerallyA isnotfullrank.AssumethattherankofA N N Cθ Cθ (ρx,ρy)=(0,0). is equal to m and let Am be a subset of linearly independent rows. It is not difficult to see that knowing [A U ] there Theorem 11 (Achievability result). Let {(Xi,Yi)}Ni=1 be a is only finite uncertainty in the remaining commpCoθneqnts of two-terminal memoryless source with (X ,Y ) belonging to 1 1 [A U ] , which is negligible compared with log (q) as q L. Given any (ρx,ρy) satisfying tenCdθs toCθinqfinity. Therefore, we obtain 2 ρ +ρ ≥d(X ,Y ),ρ ≥d(X |Y ),ρ ≥d(Y |X ), . x y 1 1 x 1 1 y 1 1 H([Xn] |θk)=H([A U ] ) 1 q 1 . Cθ Cθ q there is a family of (cid:15)-REP partial Hadamard matrices with =H([A U ] ) measurement rate (ρ ,ρ ). . m Cθ q x y =mlog (q). 2 We have also the general result in Theorem 12 which Thus, taking the limit as q tends to infinity, we obtain implies that we can construct a family of truncated Hadamard matrices which is (cid:15)-REP for a class of distributions. H([Xn]|θk) lim 1 1 =rank(A ). q→∞ log (q) Cθ Theorem 12 (Achievability result). Let Π be a family of 2 two dimensional probability distributions in L. Then, for any Also, taking the expectation with respect to Θk, we obtain 1 (ρx,ρy) in the measurement region of Π, there is a family d(X1n)=E{rank(ACΘ)}, which is the desired result. of partial Hadamard matrices which is (cid:15)-REP(Π) with a To prove the second part of the theorem, notice that measurement rate (ρx,ρy). H([Xn] |Ym)=. H([Xn] |Ym,Θk). 1 q 1 1 q 1 1 IV. PROOFTECHNIQUES For a specific realization θk we have 1 In this section, we will give a brief overview of the techniquesusedtoprovetheresults.Wewilldividethissection H([X1n]q|Y1m,θ1k) itnhteoptrhoroefetseucbhsneicqtuioenssf.oIrnthSeubRsIeDct.ioSnubIVse-cAti,owneIVw-CillaonvderIvVie-Dw =. H([ACθUCθ +AC¯θVC¯θ]q|BCθUCθ +BC¯θVC¯θ) will be devoted to proof ideas and intuitions about the A2A =. H([ACθUCθ +AC¯θVC¯θ]q|BCθUCθ +BC¯θVC¯θ,VC¯θ) =H([A U ] |B U ). compression problem in the single and multi terminal case. Cθ Cθ q Cθ Cθ Generally, A is not full-rank. Let A be the set of all A. Re´nyi information dimension Cθ m linearly independent rows of A of size m. Then Cθ in this section we will prove Theorem 1 and 2 and we will . give further intuitions about the RID over the space L. H([ACθUCθ]q|BCθUCθ)=H([AmUCθ]q|BCθUCθ). ProofofTheorem1:Toprovethefirstpartofthetheorem, It may happen that some of the rows of A can be written as m notice that alinearcombinationofrowsofB .LetA betheremaining H([Xn] )=. H([Xn] ,Θk)=. H([Xn] |Θk), matrix after dropping m−r predCicθtable rorws of Am. Given, 1 q 1 q 1 1 q 1 B U , A U has a continuous distribution thus because H(Θk1)≤k =. 0. As Θk1 ∈{0,1}k and takes finitely Cθ Cθ r Cθ . H([A U ] |B U )=rlog (q). many values, it is sufficient to show that for any realization r Cθ q Cθ Cθ 2 θk, 1 It is easy to check that r is exactly R(A;B)[Cθ]. Therefore, lim H([X1n]q|θ1k) =rank(A ). (4) taking the expectation with respect to Θk1, we get q→∞ log2(q) Cθ d(X1n|Y1m)=E{R(A;B)[CΘ]}. We also get the following corollary, which shows the and Y˜m are independent. As we have dropped all of the dis- 1 additive property of the RID for the independent random crete components, the resulting Θ , i∈[r] are 1 with strictly i variables from L. positive probability. This implies that for any realization of θn and the corresponding C , R(A ;B )[C ]=rank(A ). Corollary 1. Let Xn be independent random variables from 1 θ r r θ r,Cθ L. Then d(XN)=(cid:80)1 N d(X ). In particular, this holds for any Cθ of size 1, namely, for any 1 i=1 i columnofA andB ,whichimpliesthatifA hasanon-zero r r r Proof:NoticethatwecansimplywriteX1N =IN×X1N, column the corresponding column in Br must be zero and if where IN is the identity matrix of order N. Therefore, by the Br has a non-zero column then the corresponding column in rank characterization for the RID, we have Ar must be zero. This implies that X˜1n and Y˜1m depend on disjoint subsets of the random variables Zr. Therefore, they N N 1 d(XN)=E{rank(I [C ])}=E{(cid:88)Θ }=(cid:88)d(X ), must be independent. 1 N Θ i i i=1 i=1 B. Polarization of the RID where we used the fact that the columns of IN are linearly Inthis section,we willprove thepolarization ofthe RIDin independent thus adding a column increases the rank by 1. the single and multi terminal case as stated in Theorem 3 and Therefore, the rank of IN(CΘ) is equal to the number of 1’s Theorem 4. The main idea is to use the recursive structure of is ΘN1 , namely, (cid:80)Ni=1Θi. the Hadamard matrices and the rank characterization of the Using the results of Theorem 1, we can prove Theorem 2. RID in the space L. Proof of Theorem 2: For part 1, the proof is simple by Proof of Theorem 3: For the initial value, we have considering the rank characterization. We know that Xn = I (1)=d(X ).Letn∈NandN =2n.Tosimplifytheproof, 1 0 1 AZk and d(Xn) = E{rank(A )}. Moreover, MXn = instead of the Hadamard matrices, H, we will use shuffled 1 1 CΘ 1 MAZk thus d(Xn)=E{rank(MA )}. As M is invertible Hadamard matrices, H˜, constructed as follows: H˜ =H and 1 1 CΘ 1 1 rank(A )=rank(MA ), thus we get the result. H˜ is constructed from H˜ as follows CΘ CΘ 2N N For part 2, notice that for any realization θk and the corresponding set C , 1 h˜1 , h˜1  rank([A;B]Cθθ)==rraannkk((ABCCθθ))++RR((BA;;BA))[[CCθθ]].  h˜h˜N...1 → h˜hh˜˜...1ii ,,,, −−h˜h˜...h˜i1i , Taking the expectation over Θk, we get the desired result  . .  1 . . . , . d(X1n,Y1m)=d(X1n)+d(Y1m|X1n)=d(Y1m)+d(X1n|Y1m). where h˜ , i ∈ [N] denotes the i-th row of the H˜ . Let Xn i N 1 For part 3, using the chain rule result from part 2 and be as in Theorem 3 and let Z˜1N = H˜NX1N, where HN is applying the definition of IR(X1n;Y1m), we get rfierpsltapcreodvebythHa˜tNI˜.iAslasols,oleatnI˜enr(ais)u=re dp(roZ˜cie|Zs˜s1i−w1i)th, iin∈iti[aNl]v.aWluee IR(X1n;Y1m)=d(X1n)+d(Y1m)−d(X1n,Y1m), d(X1) and evolves as follows whichshowsthesymmetryofI withrespecttoXn andYm. I˜ (i)+ =I˜ (2i−1)=2I˜ (i)−I˜ (i)2 R 1 1 n n+1 n n For part 4, notice that for a specific realization θ1k, a simple I˜n(i)− =I˜n+1(2i)=I˜n(i)2, rankcheckshowsthatR(A;B)[C ]≤rank(A ).Takingthe expIfecXtantioanndovYermΘak1r,ewinedgeeptendd(eXn1θnt,|Yth1me)eq≤uadl(itXCyθ1nfo).llows from wAhlseor,eleit∈H˜i[−N1]awnidthH˜thiedecnoorrteespthoendfiirnsgt i{−+,1−a}n-ldabtehleinfigrsbtn1i. thedefi1nition.F1ortheconversepart,noticethatifXm isfully rowsofH˜N.Also,leth˜i denotethei-throwofH˜N.Thus,we idsisfcurelltye dthisecnredte(Xth1ne|nY1dm()Y1m≤|Xd(1nX)1n≤) d=(Y01m. )Si=mil0arla1yn,difusYin1mg hnaovnesinZ˜g1iul=arHr˜ainXdo1NmavnadriaZ˜b1il−es1,=it rHe˜siu−l1tsXt1hNa.tAZ˜s1i Xbe1Nlonagretoi.it.hde. the identity d(X1n)−d(X1n|Y1m)=d(Y1m)−d(Y1m|X1n), we space L generated by the X1N random variables. Notice that get the equality. This case is fine because after removing the using the rank characterization for the RID over L, we have discrete Zi,i∈[k], either X1n or Y1m is equal to 0, namely, a d(Z˜i|Z˜1i−1)=E{I(H˜i−1;h˜i)[CΘ]}, deterministic value, and the independence holds. Assume that none of X1n or Y1m is fully discrete. Without where I(H˜i−1;h˜i)[CΘ]∈{0,1} is the amount of increase of laomssonogfgZenkeraanlidty,lelettXZ˜1rnbaentdheY˜nmon-bdeisctrheeterreasnudltoimngvararinadbolems wrahnekreofwHe˜Chi−aΘv1ebtyheadshduinfgfleh˜di.HNadoawm,acrodnmsidaterrixthHe˜ sta.gCeonns+ide1r, 1 1 1 2N vectors after dropping the discrete constituents, namely, we therowi+ whichcorrespondstotherow2i−1ofH˜ .Now, 2N have X˜n = A Zr and Y˜m = B Zr, where A and B are ifwelookatthefirstblockofthenewmatrix,wesimplynotice 1 r 1 1 r 1 r r the matrices consisting of the first r columns of A and B that adding h˜ has the same effect in increasing the rank of i respectively. It is easy to check that d(Xn) = d(X˜n) and this block as it had in H˜ . A similar argument holds for the 1 1 N d(X˜n|Y˜m) = d(Xn|Ym). Thus it remains to show that X˜n second block. Moreover, adding h˜ increases the rank of the 1 1 1 1 1 i matrix if it increases the rank of either the first or the second where ⊗ denotes the Kronecker product and (ak)t,(bk)t are 1 1 block or both. Let 1 (ΘN) ∈ {0,1} denote the random rank the transpose of the column vectors ak and bk. Let i 1 1 1 increase in H˜i−1 by adding h˜ , then we have i Γ={Θ ,Θ ,...,Θ } (8) 1 2 N 1 (Θ2N)=1 (ΘN)+1 (Θ2N )−1 (ΘN)1 (Θ2N ). 2i−1 1 i 1 i N+1 i 1 i N+1 be the random element corresponding to the Θ pattern of ΘN1 and Θ2NN+1 are i.i.d. random variables and a simple E1k(j),j ∈ [N], where Θj ∈ {0,1}k,j ∈ [N]. Using the checkshowsthat1i(ΘN1 )and1i(Θ2NN+1)arealsoi.i.d..Taking rank result developed for the RID, it is easy to see that for the expectation value, we obtain every j ∈[N] I˜n(i)+ =2I˜n(i)−I˜n(i)2. (7) J (j)=d(W |Wj−1,ZN) n j 1 1 Moreover, if we denote W˜N = H˜ X2N , then by the =E{I([Hj−1⊗(bk)t;H ⊗(ak)t];h ⊗(bk)t)[C ]}. 1 N N+1 1 1 j 1 Γ structure of H˜ it is easy to see that I˜ (i)+ and I˜ (i)− can N n n For i ∈ [N], let 1 (ΘN) ∈ {0,1} denote the random be written as follows: i 1 increase of rank of [Hi−1 ⊗(ak)t] by adding h ⊗(ak)t. I˜n(i)+ =d(Z˜i+W˜i|Z˜1i−1,W˜1i−1), Now,considerthestagen+1,w1hereCγwearegoingtoi comb1ine I˜n(i)− =d(Z˜i−W˜i|Z˜i+W˜i,Z˜1i−1,W˜1i−1). two copies of H˜N to construct the matrix H˜2N. The the row i corresponding to W is split into two new rows i+ and i− i Using the chain rule for the RID, we have whichcorrespondtotherownumber2i−1andtherownumber I˜n(i)++I˜n(i)− = 1d(Z˜ −W˜ ,Z˜ +W˜ |Z˜i−1,W˜i−1) 2i of H˜2N. 2 2 i i i i 1 1 H˜ ⊗(ak)t , H˜ ⊗(ak)t  = 12d(Z˜i,W˜i|Z˜1i−1,W˜1i−1) H˜NN ⊗(a1k1)t , −H˜NN ⊗(a1k1)t  . .  =d(Z˜i,|Z˜1i−1)=I˜n(i),  .. , ..  Iw˜heivcohlvaelosnlgikweiathn(e7r)a,siumreplpieroscthesastI˜wni(tih)−ini=tiaI˜lnv(ai)lu2e.Tdh(eXre)f.ore, hh˜˜ii−−11⊗⊗((bbk1k1))tt ,, −h˜h˜i−i−11⊗⊗(b(bk1k1)t)t Now,noticethattheonlydifferencebetweenHN andH˜N is h˜i⊗(bk1)t , h˜i⊗(bk1)t the permutation of the rows, namely, there is a row shuffling matrix B such that H˜ = B H . It was proved in [20] Similar to the single terminal case, we see that adding h˜i⊗ that B Nand H commNute, whNichNimplies that H˜ XN = (bk1)t increases the rank of the matrix if it increases the rank N N N 1 of the either the first or the second block. In other words, H B XN.However,noticethatXN isani.i.d.sequenceand N N 1 1 BNX1N is again an i.i.d. sequence with the same distribution 12i−1(Θ21N)=1i(ΘN1 )+1i(Θ2NN+1)−1i(ΘN1 )1i(Θ2NN+1), asXN.Inparticular,addingorremovingB doesnotchange 1 N the RID values, which implies that for Z1N = HNX1N and where 1i(ΘN1 ),1i(Θ2NN+1) ∈ {0,1} are the corresponding I (i) = d(Z |Zi−1), I (i) = I˜ (i). Therefore, I is also be amount of increase of the rank of the first and second block ann erasure proice1ss withninitial vanlue d(X), which polarizes to by adding the i-the row. In particular, ΘN1 and Θ˜N1 are i.i.d. {0,1}. soare1i(ΘN1 )and1i(Θ2NN+1).Takingtheexpectation,similar Using a similar technique, we can prove Theorem 4. The to what did in the single terminal case, we obtain that main idea is that (X,Y) are correlated random variables in J (i)+ =2J (i)−J (i)2. (9) the space L and they can be written as a linear combination n n n of i.i.d. nonsingular random variables. Moreover, one can also show that for i∈[N], Proof of Theorem 4: For the initial value, we have Ia0(m1e)m=orydl(eXss1)soaunrdceJ,0s(i1m)il=ardt(oY1th|Xe1s)in.gAles t{e(rXmii,nYali)c}aNi=se1, iist Jn(i)++2 Jn(i)− =Jn(i), is easy to see that I is an erasure process with initial value which together with (9), implies that J (i)− = J (i)2. n n d(X1)anditremainstoshowthatJ isalsoanerasureprocess Therefore, J is also an erasure process with initial value but with initial value d(Y1|X1). d(Y|X).Similartothesingleterminalcase,onecanalsoshow Let H˜i−1, H˜i and h˜i denote the first i−1 rows, the first i thatthepermutationmatrixBN isnotnecessary,thustheproof rows,andthei-throwH˜N.AsX1,Y1 ∈Lthereisasequence is complete. of i.i.d. random variables Ek and two vectors ak and bk such 1 1 1 thatX =(cid:80)k a E andY =(cid:80)k b E .As{(X ,Y )}N C. Single terminal A2A compression 1 i=1 i i 1 i=1 i i i i i=1 is memoryless, there is a concatenation of sequence of i.i.d. In this part, we will overview the techniques used to prove copies of E1k, E ={E1k(1),E1k(2),...,E1k(N)}, such that the achievability part. The converse part, given in Theorem ZN =H˜ XN =[(B H )⊗(ak)t]E, 5, has been proved in Appendix A. We will give separate 1 N 1 N N 1 constructions for the fully discrete case and the mixture case WN =H˜ YN =[(B H )⊗(bk)t]E, 1 N 1 N N 1 although the proof techniques used are very similar. Achievability proof for the mixture case:Wewillgivean Proof of Theorem 6: By a similar procedure, it is easy explicitconstructofthethemeasurementensembleasfollows. to show that {Φ } has zero measurement rate. N Li.ie.td.nn∈onNsinagnudlalertrNand=om2nv.aArisasbulmesewthiatht XR1INDiseqausaelqtuoendc(eXo)f. limsupmNN =limsupP(In ≥(cid:15)H(X1)) Let ZN = H XN, where H is the Hadamard matrix of N→∞ n→∞ 1 N 1 N ≤P(limsupI ≥(cid:15)H(X )) order N. Also assume that I (i) = d(Z |Zi−1), i ∈ [N]. As n 1 n i 1 n→∞ we proved in Theorem 3, I is an erasure process with initial =P(I ≥(cid:15)H(X ))=0. ∞ 1 value d(X). We will construct the measurement matrix Φ N by selecting all of the rows of HN with the corresponding Moreover, assuming that S = {i ∈ [N] : In(i) ≥ (cid:15)H(X1)} In value greater than (cid:15)d(X). Therefore, we can construct the and Bi =S∩[i−1], we have measurement ensemble {Φ } labelled with all N that are a N H(XN|Z )=H(ZN|Z )=H(Z |Z ) power of 2. Assume that the dimension of Φ is m ×N. 1 S 1 S Sc S N N (cid:88) It remains to prove that the ensemble {Φ } is (cid:15)-REP with = H(Z |Z ,Z ) N i Bi S measurement rate d(X). This will complete the proof of i∈Sc Theorem 7. ≤ (cid:88) H(Z |Zi−1) i 1 Proof of Theorem 7: We first show that the family i∈Sc (cid:88) {Φ } has measurement rate d(X). Notice that the process I = I (i)≤N(cid:15)H(X )=(cid:15)H(XN), N n n 1 1 convergesalmostsurely.Thus,italsoconvergesinprobability. i∈Sc Specifically, considering the uniform probability assumption, which show the (cid:15)-REP property for {Φ }. N this implies that The last step is to prove Theorem 8, namely, to show that m #{i∈[N]:I (i)≥(cid:15)d(X)} limsup N =limsup n for a family of mixture distributions Π with strictly positive N N N→∞ N→∞ RID, there is a fixed measurement family {Φ } which is (cid:15)- N =limsupP(In ≥(cid:15)d(X)) REP for all of the distributions in Π with a measurement rate n→∞ vector lying in the Re´nyi information region of of the family. =P(I ≥(cid:15)d(X))=d(X). ∞ Proof of Theorem 8: The proof is simple considering It remains to prove that {ΦN} is (cid:15)-REP. Let S = {i ∈ [N] : the fact that the construction of the family {ΦN} in the proof I (i) ≥ (cid:15)d(X)} denote the selected rows to construct Φ of Theorem 7 depends only on the erasure pattern. Also, the n N and let ZN =H XN be the full measurements. It is easy to erasure pattern is independent of the shape of the distribution 1 N 1 check that Φ XN =Z . Also let B =S∩[i−1] denote all and only depends on its RID. Moreover, it can be shown that N 1 S i of the indices in S before i. We have the erasure patterns for different value of δ are embedded in one another, namely, for δ > δ(cid:48), Iδ(i) ≥ Iδ(cid:48)(i),i ∈ [N]. d(X1N|ZS)=d(Z1N|ZS)=d(ZSc|ZS) Consideringthemethodweusetoconnstructthenfamily{ΦN}, (cid:88) = d(Z |Z ,Z ) this implies that an (cid:15)-REP measurement family designed for i Bi S a specific RID δ is (cid:15)-REP for any distribution with RID less i∈Sc (cid:88) than δ. Thus, if we design {Φ } for sup d(π), it will be ≤ d(Z |Zi−1) N π∈Π i 1 (cid:15)-REP for any distribution in the family. i∈Sc (cid:88) Figure 1 shows the absorption phenomenon for a binary = In(i)≤N(cid:15)d(X)=(cid:15)d(X1N), random variable with P(1) = p = 0.05. Figure 2 shows the i∈Sc polarization of the RID for a random variable with RID 0.5. which shows the (cid:15)-REP property for {Φ }. N Absorption Scheme for N=512, p=0.05 Achievability proof for the discrete case: For the discrete case, the construction of the measurement family is very 3 similartothemixturecasewiththeonlydifferencethatinstead 2.5 ofusingtheerasureprocesscorrespondingtotheRID,weuse ptchareosecd,eisassscsrfueomtreiine∈gntrt[hNoapt]y,ZIf1nuN(nic=)ti=oHn.HNMX(Zo1Nir|e,Zw1ie−xe1ad)c.telfiIyn,nei[n1t5ht]eh,efuosdlilinosgwcritenhtgee Conditional Entropy1.25 conditional EPI result [16], the following was proved. 1 Lemma 1 (“Absorption phenomenon”). (I ,F ,P) is a pos- 0.5 n n itive martingale converging to 0 almost surely. 0 50 100 150 200 250 300 350 400 450 500 Output Number Similar to the mixture case, we again construct the family Fig. 1: Absorption pattern for N =512, p=0.05 {Φ }byselectingthoserowsoftheshuffledHadamardmatrix N with I value greater than (cid:15)H(X ). 1

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.