ebook img

Geometrical interpretation and improvements of the Blahut-Arimoto's algorithm PDF

0.11 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Geometrical interpretation and improvements of the Blahut-Arimoto's algorithm

GEOMETRICALINTERPRETATION ANDIMPROVEMENTS OFTHE BLAHUT-ARIMOTO’SALGORITHM Ziad NAJA(1), FlorenceALBERGE(1),PierreDUHAMEL(2) Laboratoiredessignauxet syste`mes(L2S) UnivParis-Sud(1), CNRS(2) Supelec, 3 rueJoliot-Curie91192Gif-sur-Yvettecedex(France) E-mails: {naja,alberge, pierre.duhamel}@lss.supelec.fr 0 1 0 2 ABSTRACT 2. TOOLS n ThepaperfirstrecallstheBlahutArimotoalgorithmforcomputing 2.1. Kullback-LeiblerdivergenceandMutualInformation a thecapacityofarbitrarydiscretememorylesschannels,asanexam- J pleof aniterativealgorithmworking withprobability densityesti- The Kullback-Leibler divergence (KLD) [7, 8] is defined for two 2 mates. Then, a geometrical interpretation of this algorithm based probabilitydistributionsp={p(x),x∈X}andq={q(x),x∈X} 1 onprojectionsontolinearandexponential familiesofprobabilities ofadiscreterandomvariableXtakingtheirvaluesxinadiscreteset is provided. Finally, this understanding allows also to propose to Xby: ] writetheBlahut-Arimotoalgorithm, asatrueproximal pointalgo- p(x) T rithm. itisshownthat thecorresponding version hasanimproved D(p||q)= p(x)log q(x) .I convergence rate, compared to the initial algorithm, as well as in Xx∈X s comparisonwithotherimprovedversions. TheKLD(alsocalledrelativeentropy)hassomeofthepropertiesof c ametric: D(p||q)isalwaysnon-negative,andiszeroifandonlyif [ IndexTerms— Iterativealgorithm,Blahut-Arimotoalgorithm, p=q.However,itisnotatruedistancebetweendistributionssince 1 Geometricalinterpretation,Convergencespeed,Proximalpointmethod. it is not symmetric (D(p||q) 6= D(q||p)) and does not satisfy the v triangleinequalityingeneral.Nonetheless,itisoftenusefultothink 5 1. INTRODUCTION ofrelativeentropyasadistancebetweendistributions. 1 Thechannelcapacityisgivenby: In1972, R.BlahutandS.Arimoto[1,2]receivedtheInformation 9 1 TheoryPaperAwardfortheirTransactionsPapersonhow tocom- . pute numerically the capacity of memoryless channels with finite 1 inputandoutputalphabets. Fig.1. Channelmodel 0 TheBlahut-Arimotoalgorithm wasrecently extended tochan- 0 nelswithmemoryandfiniteinputalphabetsandstatespaces[3]. 1 Recently,analgorithmwasproposedforcomputingthecapacity C =maxI(X,Y) v: ofmemorylesschannelswithcontinuousinputand/oroutputalpha- p(x) i betswheretheBlahut-Arimotoalgorithmisnotdirectlyapplied[4]. Wherethemutualinformationofthetwodiscreterandomvariables X In[5],informationgeometricinterpretationoftheBlahut-Arimoto XandYisgivenby: r algorithm in terms of alternating information projection was pro- a vided. Basedon thislastapproach, Matz[6]proposed amodified I(X,Y)=Ep{D(p(y|x)||p(y))} Blahut-Arimoto algorithm that converges significantly faster than thestandardone. 2.2. Linearandexponentialfamiliesofprobability The algorithm proposed by Matz is based on an approximation of a proximal point algorithm. Instead, we propose a true proximal Alinearfamilyofprobabilityisdefinedas[5]: pointreformulationthatpermitstoacceleratetheconvergencespeed ∀f1,f2,...,fK ∈Xand∀α1,α2,...,αK comparedtotheclassicalBlahut-Arimotoalgorithmandalsotothe approachin[6]. L={p:Ep(fi(x))=αi,1≤i≤K} Our contributions regarding capacity computation for discrete TheexpectedvalueEp(fi(x))oftherandomvariablexwithrespect memorylesschannels(DMCs)inthispaperare: tothedistributionp(x)isrestrictedtoαi. Alinearfamilyofproba- • Geometrical interpretation of Blahut-Arimoto algorithm in bilityischaracterizedby{fi(x)}1≤i≤K and{αi}1≤i≤K. terms of projection onto linear and exponential families of Thevector α = [α1,...,αk]servesasacoordinate systeminthe probability. manifoldofthelinearfamily. These4coordinatesarecalled”mix- turecoordinates”. • Trueproximalpointinterpretation. An exponential family [5] of discrete probability distributions • Improvementoftheconvergenceratebasedontheproximal p(x)onanalphabetXistheset pointformulation. ThankstoNewcom++WPR4forfunding. E ={p:p(x)= PxQ((Qx()xe)xepxPpPKi=Ki1=(1θ(iθfiif(ix()x))))} TheexponentialfamilyE iscompletelydefinedbyfi(x)andQ(x) speedoftheclassicalBlahut-Arimotoalgorithminwhichλ1=1. andparameterizedbyθi. SotheBlahut-ArimotoAlgorithmcanbeinterpretedastheprojec- ThedistributionQ(x)isitselfanelementoftheexponentialfamily. tion of p(k)(x) onto a linear family of probability L at the point AnyelementofE couldplaytheroleofQ(x),butifitisnecessary p(k+1)(x)whereLisdefinedbyf (x)=Dk =D(p(y/x)||p(k)(y)) 1 x toemphasizethedependenceofEonQ(x),wewillwriteEQ. andαk1 suchasEp(Dxk)=αk1. Bychoosingincreasingαk,wewouldensurethatthemutualin- 1 3. BLAHUT-ARIMOTO-TYPEALGORITHM formationincreasesfromoneiterationtotheother(I(k+1)(p(x))≥ I(k)(p(x))). However,thisquantityisonlyimplicitlydefinedinthe 3.1. TheoriginalBlahut-Arimotoalgorithm algorithmandanappropriatechoiceisnotavailable.Inthefollowing, weshowthatthisproblemwillbesolvedbasedonaproximalpoint Letconsider thecase of adiscrete memoryless channel withinput interpretationthatensuresthatthemutualinformationincreasesdur- symbolXtakingitsvaluesintheset{x0,...,xM}andoutputsym- ingiterations. bol Y taking its values in the set {y0,...,yN}. This channel is Notethatthislinearfamilyofprobabilityischangingfromone defined by its transition probabilities channel matrix Q as [Q]ij = iterationtotheother. Qi|j = Pr(Y = yi|X = xj). Wealsodefinepj = Pr(X = xj) Ontheotherhand,theBlahut-Arimotoalgorithmcanbeinterpreted andqi =Pr(Y =yi). astheprojectionofaprobabilitydensityfunction(pdf)ontoanexpo- Themutualinformationisgivenby: nentialfamilyofprobabilityEdefinedbyQ(x)=p(k)(x),f(k)(x)= 1 I(X,Y)=I(p,Q)= M N pjQi|jlogQqii|j = M pjD(Qj||q) DxkaTnoddpoatrhaims,ewtreizsehdowulidthsθo1(lvk)eathtitsheprpoobilnetmp:(k+1)(x). j=0i=0 j=0 XX X minD(R(x)||p(x,θ)) Andthechannelcapacityby: θ ( p(x,θ)= PxQQ(x()xe)xepx(pθ(fθ1f(1x()x)) C =maxI(p,Q) p where R(x) is a certain pdf. We try now to find some interest- ingcharacteristicsof R(x). Todo this, letsolve theminimization Bysolvingthismaximizationproblemandtakingintoconsider- problem given above. ∂(R(x)logp(x)) =0 with logp(x,θ) = ationthenormalizationcondition: p(x)=1,wefind: x ∂θ x logQ(x)+θf (x)−log( Q(x)exp(θf (x))) 1 P x 1 pj = Pjpjpjex[epx(pD(D(P(P(YP(Y|X|X==xjx)j||)P||P(Y(Y))))))] So xR(x)f1(x)− PxRP(xP)PxQx(Qx()xe)xfp1((θxf)1e(xxp)()θf1(x)) =0 Hence,theClassicalBlahut-Arimotoalgorithm[1,2]isaniterative HenPce xR(x)f1(x)− PxPQx(Qx)(fx1)(exx)pe(xθpf(1θ(fx1)()x)) xR(x)=0lead- ingto (R(x)−p(x,θ))f (x) = 0havingthat R(x) = 1 procedure: Px 1 P x andp(x,θ)= Q(x)exp(θf1(x)) . p(k+1)(x)= p(k)(x)exp(Dxk) (1) WeobtPain PxQ(x)exp(θf1(x)) P Mp(k)(x)exp(Dk) x x (R(x)−p(k+1)(x))Dk =0 x x withDxk =D(p(Y =y|X =Px)||p(Y =y(k))). WhichcanberefoPrmulatedas 3.2. GeometricalInterpretationofBlahut-ArimotoAlgorithm I(R,Q)=ER(Dxk)=E(pk+1)(Dxk)=I(p(k+1)(x),Q)≥ I(p(k)(x)) TheBlahut-Arimotoalgorithmin(1)canberecalculatedasamini- HencetheBlahut-Arimotoalgorithmcanbeinterpretedasthepro- mizationproblem: jectionofpdfsR(x)withhighermutualinformationthanI(p(k)(x)) minp D(p(x)||p(k)(x)) ontoanexponentialfamilyEdefinedbyQ(x)=p(k)(x),f1(k)(x)=  s.c I(k)(p(x))=α Dxkandparameterizedbyθ1(k) =1/λkatthepointp(k+1)(x).Note  s.c xp(x)=1 thatthisexponentialfamilyisalsochangingfromiterationtoanother sinceQ(x)andf(k)(x)dependsontheiteration. Hereagain,anap- whereI(k)(p(x))=Ep{D(p(Py|x)||pk(y))}isthecurrentcapacity propriatechoice1oftheparameterforincreasingconvergencerateis estimateattheiterationkandαisrelatedtotheLagrangianmulti- difficult, because of the implicit definition of the family. Thus, a plierofthisminimizationproblem. proximal point interpretation maximizing explicitly the mutual in- TheLagrangiancorrespondingtothisminimizationproblemcanbe formationisconsideredwithagivenpenaltyterm. writtenasfollow: L=D(p(x)||p(k)(x))−λ (I(k)(p(x))−α)−λ ( p(x)−1) 3.3. Proximal pointinterpretation of B.A.and amelioration in 1 2 x termsofconvergencespeed ∂L =0⇒log(p(x))+1−log(p(k)(x))−λ DkP−λ =0and ∂p(x) 1 x 2 Following the results above, and based on a proximal point inter- p(x)=p(k)(x)exp(λ −1)exp(λ Dk) pretation,wecansolvetheproblemstatedbytheimplicitdefinition 2 1 x Takingintoconsiderationthenormalizationconstraint,wecaneasily ofthefamilies. Infact,weproposeaclearequivalencewithatrue obtainthatexp(λ −1) = 1 andp(k+1)(x) = proximal point interpretation, in which all constants are explicitly Ppx(pk)(k(x)()xe)xepx(pλ(1λD1xkD2)xk) Pxp(k)(x)exp(λ1Dxk) idseefianseidly,tshhuoswanlltohwatinthgetoBplarhoupto-sAericmonovtoeraglegnocreithramteisimeqpuroivvaelmenetntto. It Inthefollowing,wewillseethatthisparameterλ isastepsizepa- rameterwhich,forconvenientvalues,canaccelerat1etheconvergence p(k+1)(x)=argmax{I(k)(p(x))−D(p(x)||p(k)(x))} (2) p Infact,byderivingthisexpressionoverp(x)andsetitequaltozero, I(p(x))−λ (D(p(x)||p(k)(x))−D(q(y)||q(k)(y))) k wefindexactlytheiterativeexpressionoftheBlahut-Arimotoalgo- andsetitequaltozerowefind: rithm. But till now we cannot say that the Blahut-Arimoto algorithm p(k+1)(x) = p(k)(x)exp p(y|x)log q(y) − 1 canbeinterpretedasaproximal point methodsincethecostfunc- y q(k)(y) λk tionI(k)(p(x))dependsontheiterations,justlikethefamilieswere + 1 p(ny|Px)logp(y|x) dependingontheiterations. Infact,atrueproximalpointalgorithm λk y q(y) o canbewrittenforamaximizationproblem[9]asfollow: Here,itisimportanttonotePthatwecanobtaintheclassicalcaseby simplyreplacingλ by1. θ(k+1) =argmax{ξ(θ)−βkkθ−θ(k)k2} (3) Moreover, we cankalso obtain the approach proposed by Matz [6] θ byintuitivelyreplacingtheprobabilitydistributionq(y)intheright in which ξ(θ), the cost function to be maximized, is independent handoftheequationbythesamedistributioncalculatedatthepre- fromtheiterations,kθ−θ(k)k2isapenaltytermwhichensuresthat viousiteration(q(k)(y)).Namely: theupdateθ(k+1)remainsinthevicinityofθ(k)andβ isasequence ofpositiveparameters. In[10],Rockafellarshowedthkatsuperlinear p(k+1)(x) = p(k)(x)exp p(y|x)logq(k)(y) − 1 y q(k)(y) λk convergenceofthismethodisobtainedwhenthesequenceβ con- k + 1 p(ny|Px)log p(y|x) vergestowardszero. λk y q(k)(y) Thedefinitionoftheproximalpointalgorithmin(3)canbegeneral- Afternormalization,wegetPp(k+1)(x)=p(k)(x)eoxp(Dk/λ )which izedtoawiderangeofpenaltytermsleadingtothisgeneralformu- x k istheexpressionofMatz’sapproach. Thisisgloballysimilartothe lation: One-Step-LatealgorithmsuggestedbyGreen[11] θ(k+1) =argmax{ξ(θ)−β f(θ,θ(k))} k θ WeconcludethatMatz’sapproachisbasedonanapproximationof wheref(θ,θ(k))isalwaysnonnegativeandf(θ(k),θ(k))=0. theproximalpointmethod,butwhatislostincomparisonwiththe ThemutualinformationI(p(x))canbeexpressedas: true proximal point method is the guarantee that the method con- verges,sinceconvergenceconditionsmustbereviewedagain. I(p(x))=I(k)(p(x))−D(q(y)||q(k)(y)) (4) Wecanwriteaccordingto(7): Introducing(4)in(2)leadsto I(p(k+1)(x))− p(k+1)(x)= λ (D(p(k+1)(x)||p(k)(x))−D(q(k+1)(y)||q(k)(y)))≥ k argmaxp{I(p(x))−(D(p(x)||p(k)(x))−D(q(y)||q(k)(y)))} I(p(k)(x))−λk(D(p(k)(x)||p(k)(x))−D(q(k)(y)||q(k)(y))) Thisnewformulationestablishesaclearlinkwiththedefinitionof Hence thecapacitybasedonthemutual information. However, foratrue proximalpintformulation,weneedtoshowthat: I(p(k+1)(x))≥ I(p(k)(x))+λ (D(p(k+1)(x)||p(k)(x))−D(q(k+1)(y)||q(k)(y))) D(p(x)||p(k)(x))−D(q(y)||q(k)(y))≥0 k Toensuretheincreasingofthemutualinformationduringiterations, with equality iff p(x) = p(k)(x) and q(y) = q(k)(y) in order to wemusthave: provethattheBlahut-Arimotoisaproximalpointalgorithm. ThepenaltytermD(p(x)||p(k)(x))−D(q(y)||q(k)(y))canberewrit- I(p(k+1)(x))≥I(p(k)(x)) tenasEp(x,y)[log pp((kx))(Px)x˜Pp(x˜yp|x˜(y)p|x˜(k)p)((xx˜˜))]. Sothat λk(D(p(k+1)(x)||p(k)(x))−D(q(k+1)(y)||q(k)(y))) ≥ 0 WecanalsowriteaccordingtoJensen’sinequality[7]: which is true, from (5) for every λ ≥ 0 which is not true in the k approach proposed by Matz. In our method, we choose λ such k E [−logp(k)(x) x˜p(y|x˜)p(x˜)] (5) that: (p(x,y) p(x) p(y|x˜)p(k)(x˜) x˜P max λ (D(p(k+1)(x)||p(k)(x))−D(q(k+1)(y)||q(k)(y))) λk k ≥−log( P p(x,y))=0 (6) inwhichp(k+1)(x)andq(k+1)(y)dependonλ . y x k XX ThisensuresthatthedifferencebetweenI(p(k+1)(x))andI(p(k)(x)) This proves that the Blahut-Arimoto algorithm can be interpreted isasmaximumaspossiblefromoneiterationtotheotherone.Note asatrueproximalpointmethodwherethecostfunctionisthetrue thatthismaximizationproblemissolvedbytheconjuguategradient mutualinformationandthepenaltytermreads method which gives themost convenient value of the stepsizeλ k comparingtotheapproachproposedbyMatz. D(p(x)||p(k)(x))−D(q(y)||q(k)(y)) Notethat,intermsofalgorithmiccomplexity,theupdatedvalueof Thecorrespondingproximalpointalgorithmreads: λkineachiterationrequires: (N+M+1)divisionsand(N+M)multiplicationsinMatz’sapproach. p(k+1)(x) = argmax I(p(x))−λ (D(p(x)||p(k)(x))) (2N+M+1)divisions,(2N+M+2)multiplicationsand2additionsin p(x) k ourcasebasedontheproximalpointmethod. −D(q(y)||q(nk)(y))} Hence,ourmethodrequireslessthantwiceoperationsperiteration o (7) comparedtotheapproachproposedbyMatz,however,itconverges whereλ isthestepsizeintroducedinordertoacceleratethecon- faster(aswecanseeinthesimulationresultsshowedbelow,theit- k vergencerateoftheclassicalBlahut-Arimotoalgorithm. erationnumberisdividedbytwointheworstcase). Acompromise Byderivingthisfunction mustbeestablisheddependingonourinterests. 4. SIMULATIONRESULTS First,wetestthe3versionsoftheBlahut-Arimotoiterativealgorithm on a Discrete Binary Symmetric Channel (DBSC) defined by the transitionmatrix: 0.7 0.2 0.1 Q= 0.1 0.2 0.7 (cid:26) (cid:27) Theresults(fig.2)showthatthechannelcapacityisachievedafter20 iterationsintheclassicalcase,7iterationsinMatz’sapproachand4 Fig. 3. Comparisionbetweenthe2approaches inthecaseofaGaussian iterationsinourcase(withaprecisionof10−11). Bernouilli-Gaussianchannel 6. REFERENCES [1] S.Arimoto,“Analgorithmforcomputingthecapacityofarbi- trarydiscretememorylesschannels,”IEEETrans.Inf.Theory, vol.18,pp.14–20,1972. [2] R. E. Blahut, “Computation of channel capacity and rate- distortion functions,” IEEE Trans. Inf. Theory, vol. 18, pp. 460–473,1972. [3] F.Dupuis,W.Yu,andF.Willems,“Arimoto-Blahutalgorithms Fig.2.Comparisionbetweenthe3approachesinthecaseofaDBSCchan- forcomputing channel capacityandrate-distortionwithside- nel information,”inISIT,2004. Asecondexampleintendstocharacterizebettertheefficiencyof [4] J. Dauwels, “On graphical models for communications and ourmethodincomparisonwiththeonebyMatz.Inordertodosowe machinelearning: Algorithms,bounds,andanalogimplemen- needahigherdimensionproblem.Wehavechosenthediscretization tation,”Ph.D.dissertation,May2006. ofsomecontinuousGaussianBernouilli-Gaussianchannelinorder [5] I. Csisza´r and G. Tusna´dy, “Information geometry and alter- toformatransitionchannelmatrixQwithhigherdimensions. Such natingminimizationprocedure,”StatisticsandDecisions,vol. achannelisdefinedasfollows: supplementissue1,pp.205–237,1984. yk =xk+bk+γk [6] G.MatzandP.Duhamel,“Informationgeometricformulation and interpretation of accelerated Blahut-Arimoto-Type algo- where rithms,”inProc.InformationTheoryWorkshop,2004. • b∼N(0,σ2) b [7] T.M.CoverandJ.A.Thomas,ElementsofInformationThe- • γ =e g with e:Bernouilli(p)sequence k k k ory. Wiley,NewYork,1991. • g∼N(0,σ2) with σ2 ≪σ2 g b g [8] R.G.Gallager,InformationTheoryandReliableCommunica- Hence tion. Wiley,NewYork,1968. y =x +n k k k [9] G.Vige, “Proximal-point algorithmforminimizingquadratic with functions,”INRIA,RR-2610,Tech.Rep.,1995. p(n )=(1−p)N(0,σ2)+pN(0,σ2+σ2) [10] R. T. Rockafellar, “Monotone operators and the proximal k b b g pointalgorithm,”SIAMJournalonControlandOptimization, Theoutput y hasbeendiscretizedon40values, andtheinputx k k vol.14,pp.877–898,1976. on 10 values. The results plotted on (fig.3) for parameters (p = 0.3,σb=0.01,σg =1)showtheaccelerationoftheBlahut-Arimoto [11] P.J.Green, “Onuseof theEMalgorithmfor penelized like- algorithmfrom14iterationsinMatz’sapproachto7iterationsinour lihood estimation,” Journal of the Royal Statistical Society, method. 1990. 5. CONCLUSIONS Wehaveproposedgeometricalinterpretationsandimprovementson the Blahut-Arimoto (BA) algorithm for computing the capacity of discretememoryless channels (DMC). Based on the trueproximal pointapproachandsolvingthemaximizationproblemwiththecon- jugategradientmethod,wehaveacceleratedtheconvergencerateof thisiterativealgorithmcomparedtotheaproachproposedbyMatz whichisbasedonanapproximationoftheproximalpointmethod. Wearecurrentlyinvestigatingtheuseofsimilartechniquesforim- provingtheconvergencerateofotheriterativealgorithms.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.