ebook img

On the Computation of the Shannon Capacity of a Discrete Channel with Noise PDF

0.18 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On the Computation of the Shannon Capacity of a Discrete Channel with Noise

ON THE COMPUTATION OF THE SHANNON CAPACITY OF A DISCRETE CHANNEL WITH NOISE SIMON ROBIN COWELL Abstract. Muroga[M52]showedhowtoexpresstheShannonchannelcapac- ityofadiscretechannelwithnoise[S49]asanexplicitfunctionofthetransition probabilities. His method accommodates channels with any finite number of 7 inputsymbols,anyfinitenumberofoutputsymbolsandanytransitionproba- 1 bilitymatrix. Silverman[S55]carriedoutMuroga’smethodinthespecialcase 0 ofabinarychannel(andwentontoanalyse“cascades”ofseveralsuchbinary 2 channels). Thisarticleisanoteontheresultingformulaforthecapacity C(a,c)ofa n single binary channel. We aim to clarify some of the arguments and correct a a small error. In service of this aim, we first formulate several of Shannon’s J definitions and proofs in terms of discrete measure-theoretic probability the- 1 ory. We provide an alternate proof to Silverman’s, of the feasibility of the 3 optimal inputdistributionfora binarychannel. Forconvenience, wealsoex- pressC(a,c)inasingleexpressionexplicitlydependentonaandconly,which ] T Silvermanstoppedshortofdoing. I . s c [ 1. Introduction 2 Werecommendthebeautifullywritten[S49]tothereaderwantingtounderstand v the informationtheorydiscussedin the presentpaper. We begin by recallinga few 1 3 definitions and theorems from that book. 7 In section 6 of chapter I of [S49], Shannon represents a discrete source of in- 8 formation by a discrete random variable X, taking values in {x ,x ,...,x } with 1 2 n 0 probabilities {p ,p ,...,p }, respectively. He proposes to find a way to measure 1 2 n . 1 the amount of choice, uncertainty, information or “entropy” involved in a single 0 sampling of X. This measurement should be a function H(p ,p ,...,p ) of the 1 2 n 7 distributionofX,and,Shannonsuggests,shouldobeyacertainsetof3axioms. We 1 beginbyreformulatinghis definitions andaxiomsinterms ofmoderni.e. measure- : v theoretic probability theory (albeit that we use only the discrete measure). i X From here on, unless stated otherwise, all probability spaces will be assumed to be equipped with their discrete σ-algebra. For example, in the probability space r a (Ω,Σ,P), Σ willsimply be P(Ω),the powersetofΩ. Thereforewe willsupressthe notation Σ, and in place of (Ω,Σ,P) we will simply write (Ω,P). We want H to be a function associating with every finite, discrete probability space (Ω,P) a non-negative real number H(Ω,P), and we want H to obey certain axioms. We will state those axoims, and then, closely following Shannon’s proof sketch, we will prove that such an H exists, and is unique, up to a multiplicative constant. One of our axioms will be that H must be invariant with respect to probability-preserving bijections between the possible outcomes (i.e. those having strictlypositiveprobability)offinite,discreteprobabilityspaces. Inotherwords,H Keywordsandphrases. InformationTheory,ShannonCapacity,MutualInformation,Shannon Entropy. ThisresearchwasfundedbytheEuropeanUnionthroughtheEuropeanRegionalDevelopment FundundertheCompetitivenessOperationalProgram(BioCell-NanoART =NovelBio-inspired Cellular Nano-architectures, POC-A1-A1.1.4-Enr. 30/2016). 1 THE CAPACITY OF A DISCRETE NOISY CHANNEL 2 willdependonlyonthemultisetofprobabilitiesofalloutcomesinthespacehaving positive probability. The reader might ask why then, we don’t describe H only as a function of n∈N andP~ ∈∆n−1 which is invariantwith respectto permutations of the coordinates of P~, without introducing the redundant measure space ? The answerisjustthatwefindthemeasurespaceconvenientinreformulatingShannon’s 3rdaxiomprecisely(asour4thaxiom),seebelow. Wealsofindthe4thaxiombelow tobeconvenientasityieldsimmediatelythefactthatH(X,Y)=H(X)+H(Y|X), see section 2, below. Let (Ω,P) be a finite, discrete probability space with |Ω|=n∈N and label the outcomes, that is, the elements of Ω, as Ω = {ω ,ω ,...,ω }, so that the vector 1 2 n of probabilities P~ = (P({ω }),P({ω }),...,P({ω })) takes values in ∆n−1, the 1 2 n standard (n−1)-simplex. LetΩ denotetheset{ω ∈Ω:P({ω})>0}ofoutcomeshavingstrictlypositive + probability. Whenever we have a partition E of Ω, let us define a probability measure P on E the discrete σ-algebra on E by P ({E ∈E :i∈I})=P E = P(E ), (1) E i i i ! i∈I i∈I G X where I is any index set, and ⊔ denotes disjoint union. Thus (E,P ) is a discrete E probability space. Also, for each E ∈ E such that P(E) > 0, and for each subset F ⊆E, let us denote by Q (F) the conditional probability E Q (F)=P(F|E), (2) E so that (E,Q ) is a discrete probability space. E We are now ready to reformulate Shannon’s axioms in these terms: (1) Whenever (Ω,P) and (Ω′,P′) are finite, discrete probability spaces and there is a bijection f : Ω → Ω′ such that P′({f(ω)}) = P(ω) for all + + ω ∈Ω , we must have that H(Ω,P)=H(Ω′,P′). + (2) H(Ω,P) must be continuous in the probability vector P~ ∈ ∆n−1 with respect to the topology which ∆n−1 inherits from Rn. (3) For all n ∈ N, for any discrete probability spaces (Ω,P) and (Ω′,P′) such that |Ω| = n, |Ω′| = n+1, P({ω})= 1 for all ω ∈ Ω and P′({ω′})= 1 n n+1 for all ω′ ∈Ω′, we must have H(Ω,P)<H(Ω′,P′). (3) (4) For any partition E of Ω, we must have that H(Ω,P)=H(E,P )+ P(E)H(E,Q ). (4) E E E∈E X P(E)>0 Weclaimthatthefirstandsecondaxiomsarenatural. Notethatitfollowsfromthe first axiom that, if it is convenient for the computation of H, we may delete from Ω any outcomes ω having zero probability. Such outcomes exist precisely when n ≥ 2 and P~ belongs to the boundary of ∆n−1. Deleting k outcomes having zero probabilityineffectreplaces∆n−1 byacopyof∆n−k−1 whichisisomorphictothe partoftheboundaryof∆n−1 inquestion. Intheextremecasek =n−1,wereduce ∆n−1 to a copyof∆0,that is,the singletonset{1}. The idea ofthe third axiomis that, if all outcomes are equally likely, then the amount of choice, or uncertainty, should be greater, when there are more possible outcomes. In Shannon’s words, theideaofthe fourthaxiomisthat“Ifachoicebebrokendownintotwosuccessive choices, the original H should be the weighted sum of the individual values of H.” Shannon illustrates his meaning with an example (fig 6 of section 6 of chapter 1 in THE CAPACITY OF A DISCRETE NOISY CHANNEL 3 [S49]),ofwhichhewrites: “Attheleftwehavethreepossibilitieswithprobabilities p = 1, p = 1 and p = 1. On the right we first choose between two possibilities, 1 2 2 3 3 6 eachwithprobability 1, andifthe secondoccursmakeanotherchoicebetweentwo 2 possibilitieswithprobabilities 2 and 1. Thefinalresultshavethesameprobabilities 3 3 as before. We require, in this special case, that H 1,1,1 =H 1,1 + 1H 2,1 . 2 3 6 2 2 2 3 3 The coefficient 1 on the r(cid:0)ight-ha(cid:1)nd side(cid:0)is b(cid:1)ecause t(cid:0)his s(cid:1)econd choice only occurs 2 half the time.” Translating this example to our terminology, the partition E has two parts, E ={x } and E ={x ,x }. Our equation (4) becomes 1 1 2 2 3 H({x ,x ,x },(1,1,1))=H({E ,E },(1,1)) 1 2 3 2 3 6 1 2 2 2 + P(E )H(E ,(1))+P(E )H(E ,(2,1)) (5) 1 1 2 2 3 3 =H({E1,E(cid:2)2},(12,21))+ 21H(E1,(1))+ 12H(E2,(32(cid:3),13)) , (6) (cid:2) (cid:3) wherewerepresentthevariousprobabilityfunctionsP bytheircorrespondingprob- abilityvectorsP~. Laterwewillseethat,inthisexample,thefirstterminthesquare brackets vanishes, because the entropy of the certain event is zero; no information is containedin an experiment whose outcome is knownin advance. We have intro- duced the condition P(E) > 0 in axiom 4. One effect of this will be to introduce a similar condition in the formula for H in Theorem 1.1. This allows our proba- bility vector P~ to remain in the standard simplex ∆n−1, whereas Shannon instead must delete any events with zero probability, reducing to a lesser n and replacing the standard simplex ∆n−1 by a simplex of lesser dimension. Our solution is less elegant than Shannon’s, but we like its comparative precision. For simplicity, and without risk of confusion, we will write P(ω) for P({ω}). Theorem 1.1 (Existence and uniqueness of Entropy). Let (Ω,P) be a finite, dis- crete probability space. Then the functions H(Ω,P)=−K P(ω)log P(ω), (7) b ω∈Ω X P(ω)>0 where K is a strictly positive constant and b > 1 satisfy axioms 1 - 4. These are the only functions satisfying those axioms. Proof. We reproduce Shannon’s proof of his theorem, filling in some details. For each n ∈ N let (Ω ,P ) be a finite, discrete probability space with |Ω | = n and n n n with P the uniformprobabilitymeasureonΩ , that is, P (ω)= 1 forall ω ∈Ω . n n n n n Suppose for the sake of argument that a function H exists which obeys axioms 1 - 4. LetA(n)=H(Ω ,P ). Thenbyaxiom(1),A:N→Riswell-defined. Itfollows n n from axiom (4) that A(st)=A(s)+A(t) for all s,t∈N. (8) We also have, by axiom (3), that A(n) is strictly increasing in n. Fix s,t∈N with s,t ≥ 2, and let n ∈ N. Then provided n is large enough, there exists a unique m∈N such that sm ≤tn <sm+1. (9) Since A(n) is strictly increasing, we have A(sm)≤A(tn)<A(sm+1), (10) hence by (8), mA(s)≤nA(t)<(m+1)A(s), (11) THE CAPACITY OF A DISCRETE NOISY CHANNEL 4 and m A(t) m+1 ≤ < . (12) n A(s) n In this last step we have used the fact that A(s) must be positive. Indeed, for any t∈N we have A(t)=A(1·t)=A(1)+A(t), hence A(1)=0. But A(n) is strictly increasing in n, therefore A(n) > 0 for all n ≥ 2. Let b > 1. The function log (x) b is also strictly increasing in x, therefore (9) also implies that log sm ≤log tn <log sm+1, (13) b b b so mlog s≤nlog t<(m+1)log s, (14) b b b and m log t m+1 ≤ b < . (15) n log s n b Together, (12) and (15) imply that A(t) log (t) 1 − b < . (16) A(s) log (s) n (cid:12) b (cid:12) (cid:12) (cid:12) Since n is arbitrary,we have (cid:12) (cid:12) (cid:12) (cid:12) A(t) log (t) = b for all s,t∈N,s,t≥2. (17) A(s) log (s) b Fixing s∈N with s≥2, we have A(s) A(t)= log (t) for all t∈N with t≥2, (18) log (s) b b hence A(t)=Klog (t) for all t∈N with t≥2, (19) b where K > 0 is a strictly positive constant depending on b. Since A(1) = 0, we even have A(t)=Klog (t) for all t∈N. (20) b So far we have found a formula for H(Ω,P) in the case that P is the uniform distribution on Ω, i.e. when all outcomes are equally likely. We need to be able to relax this condition. In fact, let (Ω,P) be a discrete probability space with n outcomes, not necessarily equally likely, but having comensurable probabilities P(ω ). Since the probabilities sum to 1, this comensurability is equivalent to the i P(ω ) all being rational. Assuming for simplicity that the P(ω ) are all strictly i i positive, we can write s i P(ω )= for all i, (21) i m where m ∈ N satisfies m > n, and the s ∈ N satisfy s +···+s = m. Now i 1 n let (Ω′,P′) be a discrete probability space with m equally likely outcomes, and let E be a partition of Ω into n nonempty parts E ,...,E with sizes s ,...,s , 1 n 1 n respectively. By axiom (4) we have A(m)=H(Ω′,P′)=H(E,P )+ P(E )H(E ,Q )=H(Ω,P)+ P(ω )A(s ), E i i Ei i i i i X X (22) THE CAPACITY OF A DISCRETE NOISY CHANNEL 5 hence H(Ω,P)=A(m)− P(ω )A(s ) (23) i i i X = P(ω )[A(m)−A(s )] (24) i i i X =−K P(ω )[log (s )−log (m)] (25) i b i b i X s i =−K P(ω )log (26) i b m Xi (cid:16) (cid:17) =−K P(ω )log P(ω ). (27) i b i i X Notethatinthecaseofequallylikelyoutcomes,werecoverA(n)fromthisformula. Now by axiom(2)and by the density ofQn inRn we canextend this formulaeven to the case of irrationalprobabilities, to obtain H(Ω,P)=−K P(ω )log P(ω ) (28) i b i i X for any finite probability space whose outcomes have strictly positive probability. Having thus shown that this form for H is necessary, if the 4 axioms are to hold, we claim that it is also sufficient. (cid:3) Note that by changing our choice of base b > 1, without any loss of generality we can omit the constant K > 0, i.e. assume that K = 1. Indeed, if b,b′ > 1 and K >0, then Klogbx=Kllooggbb′′xb = logKb′blogb′x, where logKb′b >0 is also a positive constant. WiththeconventionthatK =1,ifwesetb=2thentheunitsofentropy H are known as “bits”, a contraction of “binary digits”, as explained in [S49]. In fact, with K = 1 and b = 2, the entropy of a probability space having 2n equally likely outcomes will be A(2n) = 1·log (2n) = n bits. This makes sense, since n 2 binary digits can represent 2n possible states. WheneverX isarandomvariablewithfiniterange{x ,x ,...,x },wewillwrite 1 2 n H(X)to mean H({x ,x ,...,x },P),where P is the discrete probability measure 1 2 n on the range of X defined by P(x ) = P(X = x ) for all i ∈ {1,2,...,n}. So in i i the case of a random variable X with finite range {x ,x ,...,x }, the formula in 1 2 n Theorem 1.1 becomes H(X)=−K P(x )logP(x ) (29) i i 1≤i≤n X P(xi)>0 2. Muroga’s explicit solution of Shannon’s implicit equation for C Supposethattheinputtoadiscretechannelisrepresentedbyarandomvariable X with range {x ,...x } and that the output is representedby a randomvariable 1 n Y with range {y ,...y }. Throughout this section we will assume that the mes- 1 m sageto betransmittedcomprisesasequenceofsymbolsbeingindependent random samplings of X, and that the message is perturbed by noise in transmission, each symbol being perturbed independently. Let p =P(X =x ), r =P(Y =y ) and p =P(X =x ∧Y =y ). Also let i i j j i,j i j P(Y =y |X =x ), p 6=0, q = j i i (30) i,j 0, p =0. i (cid:26) Note that p =p q , for all i,j. (31) i,j i i,j THE CAPACITY OF A DISCRETE NOISY CHANNEL 6 Fix b>1 to act as the base for any logarithms andexponentials. In case b=2 the units of entropy will be bits, which will of course be particularly appropriate when we study binary channels. Also fix an arbitrary constant w ∈R. Define log x, x>0, log∗x= b (32) w, x=0. (cid:26) Then the function xlog∗x is continuous on the open interval (0,∞), and since the indeterminate limit lim xlog∗x = lim xlog x exists and is equal to x→0+ x→0+ b 0=0·w=0·log∗0,itfollowsthatxlog∗xisactuallycontinuousevenontheclosed interval[0,∞), andin particularon [0,1]. The function xlog∗x is differentiable on (0,∞),with(xlog∗x)′ =(xlog x)′ =log x+ 1 =log x+log e=log∗x+log∗e b b lnb b b for all x > 0. Hence (xlog∗x)′ tends to −∞ as x tends to 0 from the right. Also the right-hand derivative of xlog∗x at x=0 is (0+h)log∗(0+h)−0·log∗0 lim = lim log∗h= lim log h=−∞. (33) h→0+ h h→0+ h→0+ b We have that H(X)=− p log∗p , (34) i i i X H(Y)=− r log∗r , (35) j j j X and H(X,Y)=− p log∗p , (36) i,j i,j i,j X where by H(X,Y) we mean the entropy of the joint distribution of X and Y. We also define H(Y|X =x )=− q log∗q , (37) i i,j i,j j X the conditional entropy of Y given that X =x . Note that in case P(X =x )=0, i i then the conditional probabilities P(Y = y |X = x ) are undefined, but by our j i definitions of q and log∗x, H(Y|X = x ) will in this case be equal to 0. We i,j i also define the expectation of this last defined quantity with respect to X as the conditional entropy of Y given X: H(Y|X)= p H(Y|X =x ). (38) i i i X From this definition, and from axioms (1) and (4) for H, it follows that H(X,Y)=H(X)+H(Y|X), (39) and by the symmetry of the left-hand side, also that H(X,Y)=H(Y)+H(X|Y), (40) We also have H(Y|X)= p H(Y|X =x ) (41) i i i X =− p q log∗q (42) i i,j i,j i,j X =− p log∗q . (43) i,j i,j i,j X Theorem2.1. IfX andY arerandom variables withfiniterange, thenH(X,Y)≤ H(X)+H(Y), with equality if, and only if, X and Y are independent. THE CAPACITY OF A DISCRETE NOISY CHANNEL 7 Proof. H(X)+H(Y)−H(X,Y) =− p log∗p − r log∗r − − p log∗p i i j j i,j i,j   i j i,j X X X   =− p log∗p − p log∗r − − p log∗p i,j i i,j j i,j i,j   !   i j j i i,j X X X X X =− p (log∗p+log∗r −log∗p )   i,j i j i,j i,j X p r i j =− p log . i,j b p i,j:Xpi,j>0 (cid:18) i,j (cid:19) So, by the strict convexity of the logarithm, H(X)+H(Y)−H(X,Y) p r i j ≥−log p b i,j p i,j:Xpi,j>0(cid:18) i,j (cid:19) =−log p r b i j i,j:Xpi,j>0 ≥−log p r b i j i,j X =−log p r b i! j i j X X =−log [1·1]   b =0, with equality if and only if the following two conditions hold: (1) pirj =c, a constant, for all i and j such that p >0. pi,j i,j (2) For all i,j, (p >0∧r >0)⇒p >0. i j i,j Assuming that conditions (1) and(2) hold, from condition(1) we havethat p r = i j cp for all i and j such that p >0, hence i,j i,j p r =c p =c p =c. (44) i j i,j i,j i,j:Xpi,j>0 i,j:Xpi,j>0 Xi,j But it follows from condition (2) that p > 0 if and only if both p and r are i,j i j strictly positive, hence c= p r = p r = p r = p r =1·1=1. i j i j i j i j !  i,j:Xpi,j>0 i,j:pi>X0∧rj>0 Xi,j Xi Xj   (45) Thereforebycondition(1)wehavethatp r =p foralliandj suchthatp >0, i j i,j i,j thatis, by condition(2), for alli andj suchthatp >0 andr >0. Itfollows that i j p =p r for all i,j, (46) i,j i j and therefore, X and Y are independent. Conversely, if we assume that X and Y are independent, then conditions (1) and (2) follow immediately. (cid:3) THE CAPACITY OF A DISCRETE NOISY CHANNEL 8 Now define the “mutual information of X and Y” as the deficit appearing in theorem 2.1, i.e. I(X,Y)=H(X)+H(Y)−H(X,Y). (47) Then I(X,Y) is non-negative. Moreover I(X,Y)=0 if, and only if, X and Y are independent. By equations (39) and (40) we also have the relations I(X,Y)=H(Y)−H(Y|X) (48) and I(X,Y)=H(X)−H(X|Y). (49) The conditional entropy H(X|Y) of the input X given the output Y is known as the “equivocation”. Ideally, the equivocationwouldbe zero,in which case I(X,Y) would equal H(X). In case H(X|Y) = H(X), that is, in case knowledge of the output Y makes no difference in our degree of uncertainty as to the input X, we have that I(X,Y)=0. Thus the mutual information I(X,Y) measures the rate of transmission of information per symbol over the channel. Looked at another way, I(X,Y) measures a type of correlation between X and Y. Shannon defines the “capacity” C of the channel in this case as the maximum mutual information over all possible distributions of the input X. That is, C = max I(X,Y), (50) p~∈∆n−1 where p~ = (p ,...,p ). Following Shannon and Muroga, we try to compute this 1 n maximumusingthemethodofLagrangemultipliers. OneofMuroga’smanycontri- butions is toaccountfor the caseinwhichthe resultingmaximisingp~is unfeasible, i.e. is outside ∆n−1. Murogaalso accounts completely for the case that the transi- tion matrix Q =q is non-square, and even for the case where Q is less than full i,j rank. He shows exactly what must be done to compute C in this general setting. Shannon seems to have overlookedthese points. By (42) we have I(X,Y)=H(Y)−H(Y|X) (51) =− r log∗r + q p log∗q (52) j j i,j i i,j j i,j X X =− q p log∗ q p + q p log∗q (53) i,j i i,j i i,j i i,j ! ! j i i i,j X X X X =− q p log∗ q p + q p log∗q (54) i,j i i,j i i,j i i,j ! i,j i i,j X X X = q p log∗q −log∗ q p , (55) i,j i i,j i,j i !! i,j i X X where we point out that the index i in the second summation is not bound by the first summation. We are interested to know the partial derivative of I(X,Y) at a fixed point p~∈∆n−1, with respect to some particular p . For the time being we will suppose k THE CAPACITY OF A DISCRETE NOISY CHANNEL 9 that p ∈/ {0,1}. We have k ∂ ∂ I(X,Y)= q p log∗q −log∗ q p (56) i,j i i,j i,j i ∂p ∂p   k k !! Xi,j Xi  ∂ = q p log∗q −log∗ q p  (57) i,j i i,j i,j i ∂p k ( !!) i,j i X X ∂ = q δ log∗q − p log∗ q p (58) i,j i,k k,j i i,j i ∂p ( k " !#) i,j i X X ∂ = q δ log∗q − p log q p (59) i,j i,k k,j ∂p i b i,j i ( k " !#) i,jX:rj>0 Xi p q = q δ log∗q − δ log q p + i k,j i,j i,k k,j i,k b i,j i lnb q p i,jX:rj>0 ( " Xi ! i i,j i#) (60) P = q log∗q − q log q p (61) k,j k,j k,j b i,j i ! j:Xrj>0 j:Xrj>0 Xi 1 q k,j − q p (62) i,j i lnb q p j:Xrj>0Xi i i,j i P 1 = q log∗q − q log q p − q k,j k,j k,j b i,j i lnb k,j ! j:Xrj>0 j:Xrj>0 Xi j:Xrj>0 (63) 1 = q log∗q − q log∗ q p − (64) k,j k,j k,j i,j i lnb ! j j i X X X 1 = q log∗q −log∗ q p − , (65) k,j k,j i,j i lnb " !# j i X X where at (59) we have used that fact that by definition, q = 0 whenever r = 0, i,j j whether ornotp =0,andalsothe factthat q p =r . Since (1,...,1)is nor- i i i,j i j mal to ∆n−1, the method of Lagrangemultipliers dictates that we want ∇I(X,Y) P to be parallel to (1,...,1). It follows that we want q log∗q −log∗ q p =µ for all k, (66) k,j k,j i,j i " !# j i X X where µ is some constant. Multiplying by p and summing over k, k q p log∗q −log∗ q p =µ p =µ. (67) k,j k k,j i,j i k " !# j,k i k X X X Giventhattheoptimalvaluesof~pmustobeythis lastequation(ignoringquestions of feasibility for now), it follows from (55) that µ=C (68) THE CAPACITY OF A DISCRETE NOISY CHANNEL 10 (not µ = −C, a mistake in Shannon which Muroga corrects). We would like to isolate the p in (66). We have that i q log∗ q p = q log∗q −C for all k, (69) k,j i,j i k,j k,j ! j i j X X X or, as a matrix equation, log∗( iqi,1pi) jq1,jlog∗q1,j −C . . Q . = . , (70)  P.   P .  log∗( q p ) q log∗q −C  i i,m i   j n,j n,j      where Q = qi,j is the soP-called transition mPatrix. From this point, Shannon at- tempts to solve for C in terms of Q alone by inverting Q, in the special case in whichQissquareandinvertible. However,heisunabletoeliminatethep fromhis i equation, meaning that his is an implicit rather than an explicit expression for C. Muroga’sanalysisbeginswhereShannonleftoff,andsuccessfullyeliminatesthep . i Here Shannon mistakenly calls Q what is actually QT. Muroga misses the oppor- tunity to correctthatmistake,insteadsimply writingthatthe orderofsufficesina certain matrix product is different from the usual expression. Although Muroga’s conclusionsarenotharmedby this oversight,wetake the chanceto correctithere, and also to simplify slightly the argument which Muroga uses to eliminate the p i fromShannon’sequation,andisolateC. Letussupposethen,thatQissquareand invertible, with inverse F =f . We have i,j log∗( iqi,1pi) jq1,jlog∗q1,j −C . . . =F . . (71)  P.   P .  log∗( q p ) q log∗q −C  i i,m i   j n,j n,j      Making the simplifyingPassumption that pi,rPj >0 for all i and j, we exponentiate with base b, obtaining iqi,1pi jq1,jlog∗q1,j −C . . . =exp F . , (72)  P .  b  P .  q p q log∗q −C  i i,m i    j n,j n,j       that is, P P q log∗q −C j 1,j 1,j QT~p=exp F .. , (73) b  P .  q log∗q −C   j n,j n,j     hence P q log∗q −C j 1,j 1,j p~=FT exp F .. . (74) b  P .  q log∗q −C   j n,j n,j     So for all i∈{1,...,n}, we have P p = (FT) exp f q log∗q −C (75) i i,k b k,l l,j l,j  k l j X X X    = f exp −C f + f q log∗q . (76) k,i b k,l k,l l,j l,j k l l,j X X X  

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.