1 State Amplification Sn∼Qni=1p(si) Young-Han Kim, Arak Sutivong, and Thomas M. Cover W ∈[2nR] Xn(W,Sn) p(y|x,s) Yn (Wˆ(Yn),Ln(Yn)) Pr(W 6=Wˆ(Yn))→0 Pr(Sn∈/Ln(Yn))→0 Abstract—WeconsidertheproblemoftransmittingdataatrateRover astatedependentchannelp(y|x,s)withstateinformationavailableatthe Fig.1. Pureinformation transmissionversusstateuncertainty reduction. senderandatthesametimeconveyingtheinformationaboutthechannel state itself to the receiver. The amount of state information that can be learnedatthereceiveriscapturedbythemutualinformationI(Sn;Yn) betweenthestatesequenceSn andthechanneloutputYn.Theoptimal state uncertainty and the residual state uncertainty after observing tradeoff is characterized between the information transmission rate R the channel output. Later in Section III we will draw a connection andthestateuncertaintyreductionrate∆,whenthestateinformationis eithercausallyornoncausallyavailableatthesender.Inparticular,when between the list size reduction and the conventional information state transmission is the only goal, the maximum uncertainty reduction measureI(Sn;Yn)thatalsocapturestheamountofinformationYn rateisgivenby∆∗=maxp(x|s)I(X,S;Y).Thisresultiscloselyrelated learns about Sn. 8 andinasensedualtoarecentstudybyMerhavandShamai,whichsolves Moreformally,wedefinea(2nR,2n∆,n)codeastheencodermap 0 the problem of masking the state information from the receiver rather 0 than conveying it. Xn :[2nR] n n 2 ×S →X and decoder maps n I. INTRODUCTION a Wˆ : n [2nR] 3 J hasAccahpaancniteylp(y|x,s)withnoncausalstateinformationatthesender Ln :YYn →→2Sn C = max (I(U;Y) I(U;S)) (1) 2 p(u,x|s) − with list size as shown by Gelfand and Pinsker [13]. Transmitting at capacity, Ln =2n(H(S)−∆). ] | | IT hYonw.eIvners,oombsecuinrestsatnhceesstawteeiwnfioshrmtaoticoonnSvenyatshreecsetaivteedinbfyortmheatrieocneiSvenr oTfheaplirsotbdaebciloitdyinogfearmroersPsa(gne)daerceoddeinfigneedrrorersPpee(,cnwt)ivaenldytahseprobability s. itself, which could be time-varying fading parameters or an original e,s c imagethatwewishtoenhance. Forexample, astageactorwithface 2nR 1 [ S usesmakeupX tocommunicatetothebackrowaudienceY.Here Pe(,nw) = 2nR Pr(Wˆ 6=w|W =w), 2 X is used to enhance and exaggerate S rather than to communicate wX=1 v new information. Another motivation comes from cognitive radio Pe(,ns) =Pr(Sn ∈/ Ln(Yn)) 5 systems [12], [22], [8], [17] with the additional assumption that the where the message index W is chosen uniformly over [2nR] and 0 secondary user Xn communicates its own message and at the same the state sequence Sn is drawn i.i.d. p(s), independent of W. A 0 timefacilitatesthetransmissionoftheprimaryuser’ssignalSn.How ∼ 3 should the transmitter communicate over the channel to “amplify” pair (R,∆) is said to be achievable if there exists a sequence of 0 his knowledge of the state information to the receiver? What is (2nR,2n∆,n) codes with Pe(,nw) → 0 and Pe(,ns) → 0 as n → ∞. 7 Finally,wedefinetheoptimal(R,∆)tradeoffregion,orthetradeoff the optimal tradeoff between state amplification and independent 0 region inshort, tobe theclosure of all achievable (R,∆) pairs, and information transmission? / denote it by ∗. s To answer these questions, we study the communication problem R c Thispapershowsthatthetradeoffregion ∗ canbecharacterized : depicted in Figure 1. Here the sender has access to the channel as the union of all (R,∆) pairs satisfying R v state sequence Sn = (S1,S2,...,Sn), independent and identically Xi distributed(i.i.d.)accordingtop(s),andwishestotransmitamessage R I(U;Y) I(U;S) index W [2nR] := 1,2,...,2nR , independent of Sn, as well ≤ − r ∈ { } ∆ H(S) a astohelpthereceiver reducetheuncertaintyabout thechannel state ≤ R+∆ I(X,S;Y) innusesof astatedependent channel ( ,p(yx,s), ). Based ≤ on the message W and the channel staXte×SSn, the|sendeYr chooses for some joint distribution of the form p(s)p(u,xs)p(yx,s). Xn(W,Sn)andtransmitsitacrossthechannel. Uponobserving the As a special case, if the encoder’s sole goal is| to “a|mplify” the channel output Yn, the receiver guesses Wˆ [2nR] and forms a state information (R=0), then the maximum uncertainty reduction list Ln(Yn) n that contains likely candida∈tes of the actual state rate sequence Sn.⊆S Without any observation Yn, the receiver would know only that ∆∗ =sup ∆:(R,∆) is achievable for some R 0 { ≥ } thechannelstateSn isoneof2nH(S) typicalsequences(withalmost is given by certainty) and we can say the uncertainty about Sn is H(Sn). Now upon observing Yn and forming a list Ln(Yn) of likely candidates ∆∗ =min H(S), maxI(X,S;Y) . (2) for Sn, the receiver’s list size is reduced from nH(S) to log Ln . { p(x|s) } Thus we define the channel state uncertainty reduction rate to|be | Themaximumuncertaintyreductionrate∆∗isachievedbydesigning the signal Xn to enhance the receiver’s estimation of the state Sn 1 1 ∆= (H(Sn) log Ln )=H(S) log Ln while using the remaining pure information bearing freedom in Xn n − | | − n | | toprovidemore informationabout thestate.More specifically, there asanaturalmeasurefortheamountofinformationthereceiverlearns are three different components involved in reducing the receiver’s aboutthechannelstate.Inotherwords,theuncertaintyreductionrate uncertainty about the state: ∆ [0,H(S)] captures the difference between the original channel ∈ 1) The transmitter uses the channel capacity to convey the state Email:[email protected], arak [email protected], [email protected] information. In Section II, we study the classical setup [19], 2 [15]ofcodingformemorywithdefectivecells(Example1)and paperchannel)underquadraticdistortion[29].Inthisparticularcase, show that this “source-channel separation” scheme is optimal optimality was shown for a simple power-sharing scheme between when the memory defects are symmetric. pure information transmission via Costa’s original coding scheme 2) Thetransmittergetsoutofthewayofthereceiver’sviewofthe and state amplification via simple scaling. state.Forinstance, themaximumuncertainty reduction for the Recently, Merhav and Shamai [21] considered a related problem binarymultiplyingchannelY =X S(Example2inSectionII) of transmitting pure information, but this time under the additional · with binary input X 0,1 and binary state S 0,1 is requirement of minimizing the amount of information the receiver ∈ { } ∈ { } achieved by sending X 1. can learn about the channel state. In this interesting work, the ≡ 3) The transmitter actively amplifies the state. In Example 3 in optimal tradeoff between pure information rate R and the amount SectionIII,weconsidertheGaussianchannelY =X+S+Z ofstateinformationE ischaracterizedforbothcausalandnoncausal withGaussianstateS andGaussiannoiseZ.Heretheoptimal setups.Furthermore,fortheGaussiannoncausalcase(writingondirty transmitter amplifies the state as X = αS under the given paper), the optimal rate-distortion tradeoff is given under quadratic power constraint EX2 P. distortion. (This may well be called “writing dirty on paper”.) ≤ Itisinterestingtonotethatthemaximumuncertaintyreductionrate The current paper thus complements [21] in a dual manner. It is ∆∗ istheinformationrateI(X,S;Y)thatcouldbeachievedifboth refreshing to note that our notion of uncertainty reduction rate ∆ thestateS andthesignalX couldbefreelydesigned, insteadofthe is essentially equivalent to Merhav and Shamai’s notion of E; both stateS being generated bynature. Thisratealsoappears inthe sum notions capture the normalized mutual information I(Sn;Yn). (See rate of the capacity region expression for the cooperative multiple the discussion in Section III.) The crucial difference is that ∆ is to access channel [7, Problem 15.1] and the multiple access channel be maximized while E is to be minimized. Both problems admit with cribbing encoders by Willemsand van der Meulen [32]. single-letter optimal solutions. The rest of thispaper isorganized as follows. Inthenext section, When the state information is only causally available at the transmitter, that is, when the channel input Xi depends on only the weestablishtheoptimal(R,∆)tradeoffregionforthecaseinwhich past and the current channel channel stateSi,we willshow that the the state information Sn is noncausally available at the transmitter tradeoffregion ∗ isgivenastheunionofall(R,∆)pairssatisfying before the actual communication. Section III extends the notion of R stateuncertaintyreductiontocontinuousalphabets,byidentifyingthe ∆R≤IH(U(S;)Y) lriasttede1cIo(dSinng;Yrenq)u.irIenmpeanrttiScnula∈r,Lwne(Ychna)rawcittehritzheetmheutoupatliminafolr(mRa,ti∆on) n ≤ tradeoffregionforCosta’s“writingondirtypaper”channel.Sincethe R+∆ I(X,S;Y) ≤ intuition gained from the study of the noncausal setup carries over over all joint distributions of the form p(s)p(u)p(xu,s)p(yx,s). when the transmitter has causal knowledge of the state sequence, | | Interestingly, the maximum uncertainty reduction rate ∆∗ stays the the causal case is treated only briefly in Section IV, followed by same as in the noncausal case (2). That causality incurs no cost on concluding remarks in Section V. the (sum) rate is again reminiscent of the multiple access channel with cribbing encoders [32]. II. OPTIMAL(R,∆)TRADEOFF:NONCAUSALCASE Theproblemofcommunicationoverstate-dependentchannelswith Inthissection,wecharacterizetheoptimaltradeoffregionbetween state information known at the sender has attracted a great deal of the pure information rate R and the state uncertainty reduction rate attention. This research area was first pioneered by Shannon [27], ∆ with state information noncausally available at the transmitter, as KuznetsovandTsybakov[19],andGelfandandPinsker[13].Several formulated in Section I. advancements in both theory and practice have been made over the Theorem 1: Thetradeoffregion ∗ forastate-dependent channel years. For instance, Heegard and El Gamal [15], [14] characterized ( ,p(yx,s), ) withstateinfRormation Sn noncausally known X ×S | Y the channel capacity and devised practical coding techniques for at the transmitter is the union of all (R,∆) pairs satisfying computer memory with defective cells. Costa [5] studied the now R I(U;Y) I(U;S) (3) famous“writingondirtypaper”problemandshowedthatthecapacity ≤ − of an additive white Gaussian noise channel is not affected by ∆ H(S) (4) ≤ additional interference, as long as the entire interference sequence R+∆ I(X,S;Y) (5) ≤ is available at the sender prior to the transmission. This fascinat- forsomejointdistributionoftheformp(s)p(u,xs)p(yx,s),where ing result has been further extended with strong motivations from | | the auxiliary random variable U has cardinality bounded by applications in digital watermarking (see, for example, Moulin and |U| ≤ . O’Sullivan[24],ChenandWornell[3],andCohenandLapidoth[4]) |X|·|S| Aswillbeclearfromtheproofoftheconverse,theregiongivenby and multi-antenna broadcast channels (see, for example, Caire and (3)–(5) is convex. (We can merge the time-sharing random variable Shamai [2], Weingarten, Steinberg, and Shamai [31], and Mohseni into U.) Since the auxiliary random variable U affects the first and Cioffi [23]). Readers are referred to Caire and Shamai [1], inequality (3) only, the cardinality bound on follows directly LapidothandNarayan[20],andJafar[16]formorecompletereviews U fromthe usual technique; see Gelfand and Pinsker [13] or a general on the theoretical development of the field. On the practical side, treatment by Salehi [26]. Finally, we can take X as a deterministic Erez,Shamai,andZamir[10],[34]proposedefficientcodingschemes function of (U,S) without reducing the region, but at the cost of based on lattice strategies for binning. More recently, Erez and ten increasingthecardinalityboundofU;refertotheproofofLemma2 Brink[11] report efficient coding techniques that almost achieve the below. capacity of Costa’s dirty paper channel. It iseasytoseethat wecan recover theGelfand–Pinsker capacity In[29], [30], weformulated the problem of simultaneously trans- formula mittingpureinformationandhelpingthereceiverestimatethechannel stateunderadistortionmeasure.Althoughthecharacterizationofthe C =max R:(R,∆) ∗ for some ∆ 0 { ∈R ≥ } optimal rate-distortion tradeoff is still open in general (cf. [28]), a = max (I(U;Y) I(U;S)). completesolutionisgivenfortheGaussiancase(thewritingondirty p(x,u|s) − 3 For the other extreme case of pure state amplification, we have the Proof: Since U (X,S) Y forms a Markov chain, it is → → following result. trivial to check that . (8) 0 Corollary 1: Under the condition of Theorem 1, the maximum R ⊆R uncertainty reduction rate ∆∗ = max ∆ : (R,∆) ∗ for some For the other direction of inclusion, we need some notation. Let { ∈ R R 0 is given by be the set of all distributions of the form p(s)p(x,us)p(yx,s) ≥ } P | | consistent with the given p(s) and p(yx,s), where the auxiliary ∆∗ =min H(S), maxI(X,S;Y) . (6) random variable U is defined on an arbi|trary finite set. Further let { p(x|s) } ′ betherestrictionof suchthatX =f(U,S) forsomefunction Thus the receiver can learn about the state Sn essentially at the fP, i.e., p(xu,s) takes vPalues 0 or 1 only. maximal cut-set rate I(X,S;Y). Ifwedefi|ne todenotetheclosureofall(R,∆)pairssatisfying 1 Before we prove Theorem 1, we need the following two lemmas. (3), (4), and (7R) over ′, or equivalently, if is defined to be the 1 The first one extends Fano’s inequality [7, Lemma 7.9.1] to list restriction of overPa smaller set of distribRutions ′, then clearly 0 decoding. R P . (9) Lemma 1: For a sequence of list decoders Ln : n 2Sn, R1 ⊆R0 Yn Ln(Yn) with list size Ln fixed for each nY, let→Pe(,ns) = Let R2 be defined as the closure of (R,∆) pairs satisfying (3)–(5). Pr(S7→n / Ln(Yn)) be the sequ|ence| of corresponding probabilities Since X →(U,S)→Y forms a Markov chain on P′, we have of list d∈ecoding error. If Pe(,ns) 0, then . (10) → R2 ⊆R1 H(SnYn) log Ln +nǫn To complete the proof, it now suffices to show that | ≤ | | where ǫn →0 as n→∞. R⊆R2. (11) Proof: Define an error random variable E as Toseethis,werestrict tothedistributionsoftheformU =(V,U˜) 2 E = 01,, iiff SSnn ∈/ LLnn,. with V independent ofR(U˜,S), namely, ∈ p(x,us)=p(x,v,u˜s)=p(v)p(u˜s)p(xv,u˜,s) (12) We can then expand | | | | with deterministic p(xv,u˜,s), i.e., x is a function of (v,u˜,s), H(E,SnYn)=H(SnYn)+H(E Yn,Sn) and call this restriction| . Since X is a deterministic function of | =H(E Y| n)+H(Sn|Yn,E). (V,U˜,S) and at the saRme3 time (V,U˜) (X,S) Y form a | | Markov chain, can be written as the c→losure of all→(R,∆) pairs 3 NotethatH(E Yn) 1andH(E Yn,Sn)=0.Wecanalsobound satisfying R H(SnYn,E)|as ≤ | | R I(V,U˜;Y) I(V,U˜;S) H(SnE,Yn)=H(SnYn,E =0)Pr(E =0) ≤ − | | ∆ H(S) +H(Sn|Yn,E =1)Pr(E =1) R+∆≤I(V,U˜,S;Y)=I(X,S;Y) ≤log|Ln|(1−Pe(,ns))+nlog|S|Pe(,ns) forsomedistributiono≤ftheformp(s)p(x,v,u˜s)p(yx,s)satisfying | | where the inequality follows because when there is no error, the (12). But we have remaininguncertaintyisatmostlog|Ln|,andwhenthereisanerror, I(V,U˜;Y) I(V,U˜;S) I(U˜;Y) I(V,U˜;S) the uncertainty is at most nlog . This implies that − ≥ − |S| =I(U˜;Y) I(U˜;S) H(Sn|Yn)≤1+log|Ln|(1−Pe(,ns))+nlog|S|Pe(,ns) andthesetofconditional distributionson(U˜,−X)givenS satisfying =log|Ln|+1+(nlog|S|−log|Ln|)Pe(,ns). (12)isasrichasanyp(u˜,xs).(Indeed, anyconditional distribution | Takingǫn= n1+(log|S|−n1 log|Ln|)Pe(,ns)provesthedesiredresult. pch(aos|be)n pca(nc)baendredperteesremnitneidsticasdistrcibpu(tcio)pn(pa(|ba,cb,)c)fowritahppcraorpdriniaatleitlyy P | of C upper bounded by ( 1) +1; see also [32, Eq. (44)].) The second lemma is crucial to the proof of Theorem 1 and |A|− |B| Therefore, we have contains a more interesting technique than Lemma 1. This lemma (13) shows that the third inequality (5) can be replaced by a tighter R⊆R3 ⊆R2 inequality (7) below (recall that I(U,S;Y) I(X,S;Y) since which completes the proof. ≤ U (X,S) Y), which becomes crucial for the achievability Now we are ready to prove Theorem 1. → → proof of Theorem 1. Proofof Theorem 1: Fortheproof ofachievability, inthelight of Lemma 2, it suffices toprove that any pair (R,∆) satisfying (3), Lemma 2: Let be the union of all (R,∆) pairs satisfying (3)– R (4),(7)forsomep(u,xs)isachievable. Sincethecodingtechnique (5).LetR0 betheclosureoftheunionofall(R,∆)pairssatisfying isquitestandard, weon|lysketchtheproof here.Forfixedp(u,xs), | R I(U;Y) I(U;S) (3) theresultofGelfand–Pinsker[13]showsthatthetransmittercansend ≤ − I(U;Y) I(U;S)bitsreliablyacrossthechannel.Nowweallocate ∆ H(S) (4) − ≤ 0 R I(U;Y) I(U;S) bits for sending the pure information R+∆ I(U,S;Y) (7) ≤ ≤ − ≤ and usethe remaining Γ=I(U;Y) I(U;S) Rbitsfor sending − − forsomejointdistributionp(s)p(x,us)p(yx,s),wheretheauxiliary thestateinformationbyrandombinning.Morespecifically,weassign random variable U has finitecardina|lity. Th|en typicalSnsequencesto2nΓbinsatrandomandsendthebinindexof the observed Sn using nΓ bits. At the receiving end, the receiver is = 0. able to decode the codeword Un from Yn with high probability. R R 4 Using joint typicality of (Yn,Un,Sn), the state uncertainty can noting that Ui (Xi,Si) Yi form a Markov chain, we have → → be first reduced from H(S) to H(S Y,U). Indeed, the number of n typical Sn sequences jointly typical |with (Yn,Un) is bounded by nR (I(Ui;Yi) I(Ui;Si))+nǫn. (14) 2n(H(S|Y,U)+ǫ).Inaddition, usingΓ=I(U;Y) I(U;S) Rbits ≤ i=1 − of independent refinement information from the−hash index−of Sn, Ontheotherhand,Xsincelog Ln =n(H(S) ∆),wecantrivially we can further reduce the state uncertainty by Γ. Hence, by taking bound ∆ by Lemma 1 as | | − the listof all Sn sequences jointly typical with(Yn,Un) satisfying the hash check, we have the total state uncertainty reduction rate n∆≤nH(S)−H(Sn|Yn)+nǫ′n nH(S)+nǫ′. ∆=I(U,Y;S)+Γ ≤ n Similarly, we can bound R+∆ as =I(U,Y;S)+I(U;Y) I(U;S) R − − =I(U,S;Y)−R. n(R+∆)≤I(W;Yn)+I(Sn;Yn)+nǫ′n′ (a) aBlyl (vRar,y∆in)gp0ai≤rsRsat≤isfIy(inUg;Y)−I(U;S), itcan be readilyseen that ≤II((WW,;SYnn;|YSnn))++In(ǫS′′n;Yn)+nǫ′n′ ≤ n R I(U;Y) I(U;S) =(b)I(Xn,Sn;Yn)+nǫ′n′ ≤ − n R+∆∆≤HI((US,)S;Y) ≤(c) n1 I(Xi,Si;Yi)+ǫ′n′ (15) ≤ Xi=1 where (a) follows since W is independent of Sn and conditioning for any fixed p(x,us) are achievable. | reducesentropy,(b)followsfromthedataprocessinginequality(both Fortheproofofconverse,wehavetoshowthatgivenanysequence of(2nR,2n∆,n)codeswithPe(,nw),Pe(,ns) →0,the(R,∆)pairsmust dirWecetionnosw),ainndtro(cd)ufcoellothwesfursoumalthteimme-esmhaorriynlgessrnaensdsoomfthvaericahbalneneQl. satisfy uniform over 1,...,n , independent of everything else. Then (14) { } R I(U;Y) I(U;S) implies ≤ − ∆≤H(S) R≤I(UQ;YQ|Q)−I(UQ;SQ|Q)+ǫn R+∆≤I(X,S;Y) =I(UQ,Q;YQ)−I(UQ,Q;SQ)+ǫn. for some joint distribution p(s)p(x,us)p(yx,s). On the other hand, (15) implies | | preTvhioeuspuwreoriknfobrymGateiolfnanrdateanRd Pcainnskbeer r[e1a3d,ilPyrobpoousnidtieodn f3r]o.mHtehree R+∆≤I(XQ,SQ;YQ|Q)+ǫ′n′ we repeat a simpler proof given in Heegard [14, Appendix 2] ≤I(XQ,SQ,Q;YQ)+ǫ′n′ for completeness; see also [9, Lecture 13]. Starting with Fano’s =I(XQ,SQ;YQ)+ǫ′n′ inequality, we have the following chain of inequalities: where the last equality follows since Q (XQ,SQ) YQ form a → → nR I(W;Yn)+nǫn Markov chain. ≤ n Finally,werecognizeU =(UQ,Q),X =XQ,S =SQ,Y =YQ, = I(W;YiYi−1)+nǫn and note that S ∼p(s),Pr(Y =y|X =x,S =s)=p(y|x,s), and | U (X,S) Y, which completes the proof of the converse. Xi=1 → → n ≤ I(W,Yi−1;Yi)+nǫn senRdoinugghltyhespceoadkeiwngo,rdtheUnoptriemliaalblcyodaintgthsechGemelefanisd–ePqiunisvkaelerntrattoe Xi=n1 n R′=I(U;Y) I(U;S) and reducing the receiver’s uncertainty by = i=1I(W,Yi−1,Sin+1;Yi)−i=1I(Yi;Sin+1|W,Yi−1)+nǫn b∆e′n=otIed(St;hUat,Y(−R)′f,r∆om′)YhansatnhdetshaemdeecfoordmedacsodtheewoacrdhiUevnab.lIetsrhegoiuolnd X X n n for the dual tradeoff problem between pure information rate R and =(a) I(W,Yi−1,Sin+1;Yi)− I(Yi−1;Si|W,Sin+1)+nǫn (minimum) normalized mutual information rate E = n1I(Sn;Yn) Xi=1 Xi=1 studied in [21]. But we can reduce the uncertainty about Sn further n n =(b) I(W,Yi−1,Sin+1;Yi)− I(W,Yi−1,Sin+1;Si)+nǫn ibnydeapleloncdaetnintgrefipanretmΓentofinftohrempatuiroeni(nhfaosrhmiantdioenx orafteSnR).′Btyo vcaornyvineyg i=1 i=1 X X Γ [0,R′]wecantracetheentiretradeoffregion(R′ Γ,∆′+Γ). where (a) follows from the Csisza´r sum formula ∈ − It turns out an alternative coding scheme based on Wyner–Ziv n n n sourcecodingwithsideinformation[33],insteadofrandombinning, I(Yi;Sin+1|W,Yi−1)= I(Yi;Sj|W,Sjn+1,Yi−1) also achieves the tradeoff region R∗. To see this, fix any p(u,x|s) i=1 i=1j=i+1 and p(vs) satisfying X X X | n j−1 = I(Yi;Sj|W,Sjn+1,Yi−1) Γ:=I(V;S|U,Y)≤I(U;Y)−I(U;S) Xj=1 Xi=1 andconsidertheWyner–ZivencodingofSn withcoveringcodeword n Vn andsideinformation(Un,Yn)atthedecoder.Morespecifically, = I(Yj−1;Sj|W,Sjn+1) we can generate 2nI(V;S) Vn codewords and assign them into 2nΓ j=1 X bins.AsbeforeweusetheGelfand–Pinskercodingtoconveyames- aongdniz(ibn)gfothlleowausxibleiacrayusrean(dWom,Svin+ar1i)abilseiUndie=pen(dWen,tYoi−f1S,Si.in+B1y)raencd- sraatgeeΓof=raIt(eVI;(SU|;UY,Y)−)iIs(sUu;ffiSc)iernetlitaoblryecoovnesrtrtuhcetcVhnanantetlh.eSirnecceeivtheer 5 0 0 0 0 0 0 ∆ ∆∆ 1 11 1 1 1 1 1 1 2 S=0 S=1 S=2 3 HH``3311´´−− 1133 p q r Fig.2. Memorywithdefective cells. 0 00 0 13 R 00 3311 RR with side information Yn and Un, we can allocate the rate Γ for (a)(p,q,r)=(1/3,1/3,1/3) (b)(p,q,r)=(1/2,1/6,1/3) conveyingVnandusetheremainingrateR=I(U;Y) I(U;S) Γ for extra pure information. Forming a list of Sn jointly−typical w−ith Fig.3. Theoptimal (R,∆)tradeoff formemorywithdefective cells. (Yn,Un,Vn) results in the uncertainty reduction rate ∆ given by ∆=I(S;Y,U,V) (a) If p = q, then the choice of α∗ = 1/2 maximizes both (20) and (22), and hence achieves the entire tradeoff region =I(S;Y,U)+Γ ∗. The optimal transmitter splits the full channel capacity =I(S;U,Y)+I(U;Y)−I(U;S)−R RC = rH(α∗) = r to send both the pure information and =I(U,S;Y) R. the state information. (See Figure 3(a) for the case (p,q,r)= − (1/3,1/3,1/3).) Thus the tradeoff region ∗ can be achieved via the combination R (b) On the other hand, when p = q, there is a clear tradeoff in of two fundamental resultsincommunication withside information: 6 our choice of α. For example, consider the case (p,q,r) = channelcodingwithsideinformationbyGelfandandPinsker[13]and (1/2,1/6,1/3).Ifthegoalistocommunicatepureinformation ratedistortionwithsideinformationbyWynerandZiv[33].Itisalso over the channel, we should take α∗ = 1/2 to maximize the interesting to note that the information about Sn can be transmitted number of distinguishable input preparations. This gives the in a manner completely independent of geometry (random binning) channelcapacityC =rH(α)=1/3.Ifthegoalis,however,to orcompletelydependentongeometry(randomcovering);referto[6] helpthereceiver reduce thestateuncertainty, wetakeα∗ =0, for a similar phenomenon in a relay channel problem. i.e.,wetransmitafixedsignalX 0.Thisway,thetransmitter ≡ When Y isa function of (X,S), it is optimal to identify U =Y, can minimize his interference with the receiver’s view of the and Theorem 1 simplifies to the following corollary. state S. The entire tradeoff region is given in Figure 3(b). Corollary 2: The tradeoff region ∗ for a deterministic state- dependent channel Y = f(X,S) wiRth state information Sn non- Example 2: Consider the binary multiplying channel Y =X·S, where the output Y is the product of the input X 0,1 and the causally known at the transmitter is the union of all (R,∆) pairs state S 0,1 . We assume that the state sequen∈ce{Sn}is drawn satisfying ∈ { } i.i.d. according to Bern(γ). It can be easily shown that the optimal R H(Y S) (16) tradeoff region is given by ≤ | ∆ H(S) (17) R γH(α) (23) ≤ ≤ R+∆ H(Y) (18) ∆ H(γ) (24) ≤ ≤ for some joint distribution of the form p(s)p(xs)p(yx,s). In R+∆ H(αγ). (25) | | ≤ particular, the maximum uncertainty reduction rate is given by This is achieved by p(x) Bern(α), independent of S. ∼ ∆∗ =min H(S), maxH(Y) . (19) As in Example 1(b), there is a tension between the pure informa- { p(x|s) } tion transmission and the state amplification. When the goal is to The next two examples show different flavors of optimal state maximize the pure information rate, we should choose α∗ =1/2 to uncertainty reduction. achieve the capacity C = γ. But when the goal is to maximize the state uncertainty reduction rate, we should choose α∗ =1 (X 1) Example 1: Considertheproblemofconveyinginformationusing to achieve ∆∗ =H(γ).In words, to maximize the state uncerta≡inty a write-once memory device with stuck-at defective cells [19], [15] reductionrate,thetransmittersimplyclearsthereceiver’sviewofthe asdepicted in Figure2. Hereeach memory cell hasprobability p of state. being stuckat 0, probability q of being stuckat 1, and probabilityr of being a good cell, with p+q+r = 1. It is easy to see that the III. EXTENSIONTOCONTINUOUSSTATESPACE channel output Y is a simple deterministic function of the channel input X and the state S. Theprevioussectioncharacterizedthetradeoffregion ∗ between R Now it is easy to verify that the tradeoff region ∗ is given by the pure information rate R and the state uncertainty reduction rate R rH(α) R (20) r∆ed=uctHio(nSr)a−te n∆1 loigs|mLena(nYinng)f|u.lAopnplyarewnhtleynththeencohtiaonnneolfsutnatceerStaihnatys ≤ ∆ H(p,q,r) (21) finite cardinality (i.e., < ), or at least when H(S)< . ≤ |S| ∞ ∞ However, from the proof of Theorem 1 (the generalized Fano’s R+∆ H(p+αr,q+(1 α)r) (22) ≤ − inequality in Lemma 1), along with the fact that the optimal region where α can be chosen arbitrarily (0 α 1). This region is issingle-letterizable,wecantakeanalternativelookatthenotionof ≤ ≤ achieved by choosing p(x) Bern(α). Without loss of generality, state uncertainty reduction as reducing the list size from 2nH(S) to ∼ we can choose X Bern(α) independent of S, because the input Ln(Yn). Wewillshow shortlyinProposition 1 that thedifference ∼ | | XTahffeercetsarYe townolycawsheesntoSc=ons2i.der. ∆equ=ivaHlen(tSt)o−then1nloorgm|aLlinz|edofmtuhteuanloirnmfoarlmizaetdiolnis∆t sIiz=e ni1sIe(sSsnen;tYianll)y, 6 Sn∼N(0,QI) Zn∼N(0,NI) W ∈[2nR] Xn(W,Sn) Yn Wˆ(Yn) Pni=1EXi2≤nP Pr(W 6=Wˆ(Yn))→0 1 n n lim I(S ;Y )≥∆I n→∞n Fig.4. Writingondirtypaper. whichiswell-definedfor anarbitrarystatespace and capturesthe More specifically, consider a finite partition1 to quantize the state amount of information the receiver Yn can learnSabout thestateSn random variable S into [S]. Under this partition, let ∗∗ be the set R[S] (or lack thereof [21]). Hence, the physically motivated notion ∆ of of all (R,∆) pairs satisfying list size reduction is consistent with the mathematical information measure ∆I, and both notions of state uncertainty reduction can be R≤I(U;Y)−I(U;[S]) used interchangeably, especially when is finite. ∆ H([S]) To be more precise, we define a (2SnR,n) code by an encoding ≤ R+∆ I(X,[S];Y) function ≤ Xn :[2nR] n n for some joint distribution of the form p([s])p(u,x[s])p(yx,[s]) ×S →X | | with auxiliary random variable U. Consider the original list size and a decoding function reduction problem with state information [S] and let ∗ denote Wˆ : n [2nR]. the tradeoff region. Then Theorem 1 shows that ∗∗ R=[S] ∗ . In Y → R[S] R[S] particular,foranyǫ>0and(R,∆) ∗∗,thereexistsasequence Then theassociated stateuncertainty reduction ratefor the (2nR,n) ∈R[S] code is defined as of(2n(R−ǫ),2n(∆−ǫ),n)codesXn(W),Wˆ(Yn),Ln(Yn)suchthat ∆I = 1I(Sn;Yn) Pe(,nw) =Pr(W 6=Wˆ)→0 and Pe(,ns) =Pr([S]n 6=Ln(Yn))→0. n Now from the generalized Fano’s inequality (Lemma 1), the wherethemutualinformationiswithrespecttothejointdistribution achievable list size reduction rate ∆ ǫ should satisfy − n p(xn,sn,yn)=p(xnsn) p(si)p(yixi,si) n(∆−ǫ)≤I([S]n;Yn)+nǫn ≤I(Sn;Yn)+nǫn | | i=1 induced byXn(W,Sn)withmessageYW distributeduniformly over hwaivtheǫfrnom→th0eadsenfin→itio∞n.ofHen∗cethbayt letting n→∞ and ǫ→0, we [2nR],independentofSn.Similarly,theprobabilityoferrorisdefined RI as ∗∗ = ∗ ∗. P(n) =Pr(W =Wˆ(Yn)). R[S] R[S]⊆RI e 6 Also it follows trivially from repeating the intermediate steps in the A pair (R,∆) is said to be achievable if there exists a sequence of converse proof of Theorem 1 that ∗ ∗∗. (2nR,n) codes with Pe(n) 0 and RI ⊆R → Finally taking a sequence of partitions with mesh 0 and hence nl→im∞n1I(Sn;Yn)≥∆. letting R∗[S∗]→R∗∗, we have the desired result. → Theclosureofallachievable(R,∆)pairsiscalledthetradeoffregion Sincebothnotionsofstateuncertaintyreduction,thelistsizereduc- ∗. (Here we use the notation ∗ instead of ∗ to temporarily tionnH(S) log Ln and the mutual information I(Sn;Yn),lead dRisItinguishthisfromtheoriginalpRroIblemformulaRtedintermsofthe to the same −answer|, w|e will subsequently use them interchangeably list size reduction.) and denote the tradeoff region by the same symbol ∗. R Wenowshowthattheoptimaltradeoff ∗betweentheinformation RI Example 3: Consider Costa’s writing on dirty paper model de- transmissionrateRandthemutual informationrate∆hasthesame solution as the optimal tradeoff ∗ between R and the list size picted in Figure 4 as the canonical example of a continuous state- R dependent channel. Here the channel output is given by Yn = reduction rate ∆. Xn+Sn+Zn,whereXn(W,Sn)isthechannel inputsubject toa nelPr(oposition,p1(:yTxh,es)t,rad)eowffitrhegsiotanteRi∗Inffoorrmaasttiaotne-dSenpennodnecnatucshaalnly- pwohwiteerGcoanusstsriaainntstatnie=,1aEndXZi2n≤nPN,(S0n,N∼IN) (is0,tQheI)wihsitteheGaadudsistiiavne knowXn a×t thSetran|smitterYistheclosure of all (R,∆) pairs satisfying noise. We assumePthat Sn and Z∼n are independent. For the writing on dirty paper model, we have the following R I(U;Y) I(U;S) (3) ≤ − tradeoff between the pure information transmission and the state ∆ H(S) (4) ≤ uncertainty reduction. R+∆ I(X,S;Y) (5) ≤ Proposition 2: The tradeoff region ∗ for the Gaussian chan- for some joint distribution of the form p(s)p(u,xs)p(yx,s) with R | | nel depicted in Figure 4 is characterized by the boundary points auxiliary random variableU. Hence, ∗ has theidentical character- ization as ∗ in Theorem 1. RI Proof:RLet ∗∗betheregiondescribedby(3)–(5).Weprovidea 1 Recall that the mutual information between arbitrary random variables sandwich proofRR∗∗ =R∗ ⊆R∗I ⊆R∗∗, which is given implicitly sXupraenmdumY iissodveefirnaeldl fiansitIe(Xpa;rtYiti)on=s PsuapnPd,QQ;I(s[eXe]KPo;l[mYo]gQo)r,ovw[h1e8re] atnhde in the proof of Theorem 1. Pinsker[25]. 7 (R(γ),∆(γ)), 0 γ 1, where inwhichthetransmitterlearnsthestatesequence on thefly,i.e.,the ≤ ≤ encoding function 1 γP R(γ)= log 1+ (26) 2 N Xi :[2nR] i , i=1,2,...,n, „ « ×S →X 2 √Q+ (1 γ)P depends causally on the state sequence. ∆(γ)= 12log01+ “ γpP +N− ” 1. (27) We state our main theorem. Theorem 2: Thetradeoffregion ∗ forastate-dependent channel B@ CA ( ,p(yx,s), ) with state inRformation Sn causally known at Proofsketch: TheachievabilityfollowsfromProposition1with X ×S | Y the transmitter is the union of all (R,∆) pairs satisfying trivial extension to the input power constraint. In particular, we use thesimplepower sharing scheme proposed in[29],where afraction R I(U;Y) (29) ≤ γ of the input power is used to transmit the pure information using ∆ H(S) (30) Costa’swritingondirtypapercodingtechnique,whiletheremaining ≤ R+∆ I(X,S;Y) (31) (1 γ) fraction of the power is used to amplify the state. In other ≤ − words, for some joint distribution of the form p(s)p(u)p(xu,s)p(yx,s), | | P where the auxiliary random variable U has cardinality bounded by X =V + (1 γ) S (28) − Q . r |U|≤|X|·|S| Asinthenoncausal case,theregionisconvex.Sincetheauxiliary with V N(0,γP) independent of S, and ∼ randomvariableU affectsthefirstinequality(29)only,thecardinality U =V +αS bound followsagainfromthe standard argument. (A |U|≤|X|·|S| looser bound can be given by counting the number of functions f : with ;seeShannon[27].)Finally,wecantakeXasadeterministic S →X γP (1 γ)P +Q function of (U,S) without decreasing the region. α= − . γP +Ns Q Compared tothenoncausal tradeoff regionR∗nc inTheorem1,the causal tradeoff region ∗ in Theorem 2 issmaller in general. More Evaluating R = I(U;Y) I(U;S) and ∆ = I(S;Y) for each γ, precisely, ∗ ischaracRtercizedbythesamesetofinequalities(3)–(5) we recover (26) and (27).− asin ∗,Rbuct thesetof joint distributionsisrestrictedtothosewith The proof of converse is essentially the same as that of [29, auxiliRaryncvariableU independentofS.Indeed,fromtheindependence Theorem 2], which we do not repeat here. between U and S, we can rewrite(29) as Asanextremepoint ofthe(R,∆),werecover Costa’swritingon dirty paper result R I(U;Y)=I(U;Y) I(U;S) (29′) ≤ − 1 P which is exactly the same as (3). Thus the inability to use the C = log 1+ future state sequence decreases the tradeoff region. However, only 2 N „ « the inequality (29), or equivalently, the inequality (3), is affected by by taking γ =1. On theother hand, if stateuncertainty reduction is the causality, and the sum rate (31) does not change from (5). thegoal,thenallofthepowershouldbeusedforstateamplification. Since the proof of Theorem 2 is essentially identical to that of The maximum uncertainty reduction rate Theorem1,weskipmostofthesteps.Theleaststraightforwardpart 1 √P +√Q 2 is the following lemma. ∆∗ = log 1+ 2 N Lemma 3: Let betheunionofall(R,∆)pairssatisfying(29)– „ ` ´ « R (31).Let betheclosureoftheunionofall(R,∆)pairssatisfying 0 is achieved with X = PS and α=0. (29), (30)R, and Q In[29,Theorem2],tqheoptimaltradeoffwascharacterizedbetween R+∆ I(U,S;Y) (32) ≤ the pure information rate R and the receiver’s state estimation error D= 1E Sn Sˆn(Yn) 2. Although thenotion of stateestimation for some joint distribution p(s)p(u)p(x|u,s)p(y|x,s) where the n || − || auxiliary random variable U has finite cardinality. Then error D in [29] and our notion of the uncertainty reduction rate ∆ appear tobedistinctobjectivesatfirstsight,theoptimalsolutionsto = . 0 both problems are identical, as shown in the proof of Proposition 2. R R Proof sketch: The proof is a verbatim copy of the proof of There is no surprise here. Because of the quadratic Gaussian nature ofbothproblems,minimizingthemeansquarederrorE(S Sˆ(Y))2 Lemma 2, except that here U is independent of S, i.e., p(x,u|s)= − p(u)p(xu,s).Thefinalstep(13)followssincethesetofconditional vcaicnebveerrseac.asAtlsinotothmeaoxpitmimizailngstathteeumnuceturtaalinintyforremdautciotinonI(rSat;eY∆),∗a(nodr distributi|ons on X,U =(V,U˜) given S of the form equivalently, the minimum state estimation error D∗ is achieved by p(x,us)=p(v)p(u˜)p(xv,u˜,s) (12′) the symbol-by-symbol amplification Xi = (P/Q)Si. | | Finally,itinterestingtocomparetheoptimalcodingscheme(28)to with deterministic p(xv,u˜,s) is as rich as any p(u˜)p(xu˜,s), and p | | theoptimal coding scheme when thegoal istominimize(instead of I(V,U˜;Y) I(U˜;Y). (13′) maximizing)theuncertaintyreduction[21],whichisessentiallybased ≥ on coherent subtraction of X and S with possible randomization. Withthisreplacement,thedesiredprooffollowsalongthesamelines as the proof of Lemma 2. As one extreme point of the tradeoff region ∗, we recover IV. OPTIMAL(R,∆)TRADEOFF:CAUSALCASE R the Shannon capacity formula [27] for channels with causal side The previous two sections considered the case in which the information at the transmitter as follows: transmitterhascompleteknowledgeofthestatesequenceSnpriorto C = max I(U;Y). (33) theactualcommunication.Inthissection,weconsideranothermodel p(u)p(x|u,s) 8 On the other hand, the maximum uncertainty reduction rate ∆∗ for There are three surprises here. First, the receiver can learn about pure state amplification is identical to that for the noncausal case thechannelstateandtheindependent messageatamaximumcut-set given in Corollary 1. rateI(X,S;Y)overalljointdistributionsp(x,s)consistentwiththe given statedistributionp(s).Second, tohelp the receiver reduce the Corollary 3: Under the condition of Theorem 2, the maximum uncertaintyintheinitialestimateofthestate(namely,toincreasethe uncertainty reduction rate ∆∗ is given by mutualinformationfromI(S;Y)toI(X,S;Y)),thetransmittercan ∆∗ =min H(S), maxI(X,S;Y) . (34) allocate the achievable information rate I(U;Y) I(U;S) in two { p(x|s) } alternativemethods—random binninganditsdual,−randomcovering. Thusthereceivercanlearnaboutthestateessentiallyatthemaximum Thirdly,asfar asthesumrateR+∆ andthemaximum uncertainty cut-set rate, even under the causality constraint. For example, the reduction rate ∆∗ are concerned, there is no cost associated with symbol-by-symbol amplification strategy X = PS is optimal for restricting the encoder to learn the state sequence on the fly. Q the Gaussian channel (Example 3) for both cqausal and noncausal cases. REFERENCES Finally, we compare the tradeoff regions ∗ and ∗ with a Rc Rnc [1] G.CaireandS.Shamai,“Onthecapacityofsomechannelswithchannel communication problem that has a totally different motivation, yet stateinformation,”IEEETrans.Inf.Theory,vol.IT-45,no.6,pp.2007– hasasimilarcapacityexpression.In[32,Situations3and4],Willems 2019,1999. and van der Meulen studied the multiple access channel with crib- [2] ——,“OntheachievablethroughputofamultiantennaGaussianbroad- bing encoders. In this communication problem, the multiple access castchannel,”IEEETrans.Inf.Theory,vol.IT-49,no.7,pp.1691–1706, 2003. channel ( ,p(yx,s), ) has two inputs and one output. The X ×S | Y [3] B. Chen and G. W. Wornell, “Quantization index modulation: a class primarytransmitter S and thesecondary transmitter X wishtosend of provably good methods for digital watermarking and information independent messages Ws [2n∆] and Wx [2nR] respectively to embedding,”IEEETrans.Inf.Theory,vol.IT-47,no.4,pp.1423–1443, ∈ ∈ the common receiver Y. The difference from the classical multiple 2001. access channel is that either the secondary transmitter X learns the [4] A.S.CohenandA.Lapidoth,“TheGaussianwatermarkinggame,”IEEE primarytransmitter’ssignal S on thefly(Xi(Wx,Si)[32,Situation [5] TMra.nHs..MIn.f.CTohsetao,ry“,Wvroilt.inIgT-o4n8,dniroty.6p,appepr.,”16IE3E9–E1T6r6a7n,s2.0I0n2f..Theory,vol. 3]) or X knows the entire signal Sn ahead of time (Xi(Wx,Sn) IT-29,no.3,pp.439–441, 1983. [32, Situation 4]). The capacity region for both cases is given by [6] T.M.Cover andY.-H.Kim,“Capacity ofaclass ofdeterminstic relay C all (R,∆) pairs satisfying channels,”inProc.IEEEInt.Symp.Inf.Theory,Nice,France,June2007, pp.591–595. R I(X;Y S) (35) [7] T.M.CoverandJ.A.Thomas,ElementsofInformationTheory,2nded. ≤ | NewYork:Wiley,2006. ∆ H(S) (36) ≤ [8] N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive R+∆ I(X,S;Y) (37) radio channels,” IEEE Trans. Inf. Theory, vol. IT-52, no. 5, pp. 1813– ≤ 1827,2006. for some joint distribution p(x,s)p(yx,s). [9] A. El Gamal, “Multiple user information theory,” unpublished course Thiscapacityregion looksalmost|identicaltothetradeoffregions notes,StanfordUniversity, 2006. ∗ and ∗ in TheorCems 1 and 2, except for the first inequality [10] U. Erez, S. Shamai, and R. Zamir, “Capacity and lattice strategies for Rnc Rc canceling known interference,” IEEE Trans. Inf. Theory, vol. IT-51, (35). Moreover, (35) has the same form as the capacity expression no.11,pp.3820–3833,2005. for channels with state information available at both the encoder [11] U. Erez and S. ten Brink, “A close-to-capacity dirty paper coding and decoder, either causally or noncausally. (The causality has no scheme,” IEEE Trans. Inf. Theory, vol. IT-51, no. 10, pp. 3417–3432, cost when both the transmitter and the receiver share the same side 2005. [12] Federal Communications Commission, Cognitive Radio Technologies information; see, for example, Caireand Shamai [1,Proposition1].) Proceeding (CRTP), ET Docket, no. 03-108. [Online]. Available: Itshouldbestressed,however, thattheproblemofcribbingmulti- http://www.fcc.gov/oet/cognitiveradio/ pleaccesschannelsandourstateuncertaintyreductionproblemhave [13] S. I. Gelfand and M. S. Pinsker, “Coding for channel with random afundamentallydifferentnature.Theformerdealswithencodingand parameters,” Problems Control Inform. Theory, vol. 9, no. 1, pp. 19– decoding of the signal Sn, while the latter deals with uncertainty 31,1980. [14] C.Heegard,“Capacity andcodingforcomputermemorywithdefects,” reductioninanuncodedsequenceSn specifiedbynature.Inasense, Ph.D.Thesis,StanfordUniversity, Nov.1981. the cribbing multiple access channel is a detection problem, while [15] C.Heegard andA.ElGamal, “Onthecapacity ofcomputer memories the state uncertainty reduction is an estimation problem. withdefects,” IEEETrans. Inf.Theory, vol.IT-29,no.5,pp.731–739, 1983. [16] S. A. Jafar, “Capacity with causal and noncausal side information: a V. CONCLUDINGREMARKS unified view,” IEEE Trans. Inf. Theory, vol. IT-52, no. 12, pp. 5468– Because the channel is state dependent, the receiver is able to 5474,2006. learn something about the channel state from directly observing [17] A.Jovicˇic´ andP.Viswanath, “Cognitive radio:aninformation-theoretic perspective,” submitted to IEEE Trans. Inf. Theory, 2006. [Online]. the channel output. Thus, to help the receiver narrow down the Available: http://arxiv.org/abs/cs.IT/0604107/ uncertainty about the channel state at the highest rate possible, the [18] A.N.Kolmogorov,“Logicalbasisforinformationtheoryandprobability sender must jointly optimize between facilitating state estimation theory,”IRETrans.Inf.Theory,vol.IT-2,no.4,pp.102–108,Dec.1956. and transmitting refinement information, rather than merely using [19] A. V. Kuznetsov and B. S. Tsybakov, “Coding in a memory with defectivecells,”ProblemyPeredachiInformatsii,vol.10,no.2,pp.52– the channel capacity to send the state description. In particular, the 60,1974. transmitter should summarize the state information in such a way [20] A. Lapidoth and P. Narayan, “Reliable communication under channel that the summary information results in the maximum uncertainty uncertainty,”IEEETrans.Inf.Theory,vol.IT-44,no.6,pp.2148–2177, reduction when coupled with the receiver’s initial estimate of the 1998. state. More generally, by taking away some resources used to help [21] N.MerhavandS.Shamai,“Informationratessubjecttostatemasking,” IEEETrans.Inf.Theory,vol.IT-53,no.6,pp.2254–2261,June2007. the receiver reduce the state uncertainty, the transmitter can send [22] J.Mitolla,III,“Cognitiveradio:anintegratedagentarchitectureforsoft- additionalpureinformationtothereceiverandtracetheentire(R,∆) waredefinedradio,”Ph.D.Thesis,KTHRoyal Institute ofTechnology, tradeoff region. Stockholm,Sweden, 2000. 9 [23] M.MohseniandJ.M.Cioffi,“Aproofoftheconverseforthecapacity ofGaussian MIMObroadcast channels,” submitted toIEEETrans.Inf. Theory,2006. [24] P. Moulin and J. A. O’Sullivan, “Information-theoretic analysis of informationhiding,”IEEETrans.Inf.Theory,vol.IT-49,no.3,pp.563– 593,2003. [25] M. S. Pinsker, Information and Information Stability of Random Vari- ablesandProcesses. SanFrancisco: Holden-Day, 1964. [26] M. Salehi, “Cardinality bounds on auxiliary variables in multiple- user theory via the method of Ahlswede and Korner,” Department of Statistics, StanfordUniversity, Technical Report33,Aug.1978. [27] C.E.Shannon,“Channelswithsideinformationatthetransmitter,”IBM J.Res.Develop., vol.2,pp.289–293,1958. [28] A.Sutivong,“Channelcapacityandstateestimationforstate-dependent channels,” Ph.D.Thesis,StanfordUniversity, Mar.2003. [29] A.Sutivong,M.Chiang,T.M.Cover,andY.-H.Kim,“Channelcapacity andstateestimationforstate-dependentGaussianchannels,”IEEETrans. Inf.Theory,vol.IT-51,no.4,pp.1486–1495,2005. [30] A.Sutivong,T.M.Cover,M.Chiang,andY.-H.Kim,“Ratevs.distortion trade-offforchannelswithstateinformation,”inProc.IEEEInt.Symp. Inform.Theory,Lausanne,Switzerland, June/July2002,p.226. [31] H. Weingarten, Y. Steinberg, and S. Shamai, “The capacity region of the Gaussian multiple-input multiple-output broadcast channel,” IEEE Trans.Inf.Theory,vol.IT-52,no.9,pp.3936–3964, Sept.2006. [32] F. M. J. Willems and E. C. van der Meulen, “The discrete memory- less multiple-access channel with cribbing encoders,” IEEE Trans. Inf. Theory,vol.IT-31,no.3,pp.313–327,1985. [33] A.D.WynerandJ.Ziv,“Therate-distortion functionforsourcecoding with side information at the decoder,” IEEE Trans. Inf. Theory, vol. IT-22,no.1,pp.1–10,1976. [34] R. Zamir, S. Shamai, and U. Erez, “Nested linear/lattice codes for structured multiterminal binning,” IEEETrans. Inf. Theory, vol. IT-48, no.6,pp.1250–1276,2002.