TOWARDSANALYTICALCONVERGENCEANALYSISOFPROPORTIONATE-TYPENLMS ALGORITHMS KevinT.Wagner Milosˇ I.Doroslovacˇki NavalResearchLaboratory TheGeorgeWashingtonUniversity RadarDivision DepartmentofElectricalandComputerEngineering Washington,DC20375,USA Washington,DC20052,USA ABSTRACT Table1 NLMSAlgorithmwithArbitraryStepsizeMatrix Todatenotheoreticalresultshavebeendevelopedtopre- dict the performance of the proportionate normalized least x(k) = [x(k)x(k−1)...x(k−L+1)]T mean square (PNLMS) algorithm or any of its cousin algo- yˆ(k) = xT(k)wˆ(k) rithms such as the μ-law PNLMS (MPNLMS), and the (cid:2)- e(k) = d(k)−yˆ(k) law PNLMS (EPNLMS). In this paper we develop an ana- G(k+1) = diag{g1(k+1),...,gL(k+1)} lyticapproachtopredictingtheperformanceofthesimplified wˆ(k+1) = wˆ(k)+ xTβ(Gk()kG+(k1)+x1()kx)(ek(k)+)δ PNLMSalgorithmwhichiscloselyrelatedtothePNLMSal- gorithm. In particular we demonstrate the ability to predict the Mean Square Output Error of the simplified PNLMS al- R = σ2I, and β is so small that the LMS coefficient esti- x gorithmusingourtheory. matoractsasalowpassfilter, thenwecanrewritetheMSE inthefollowingform[4]: Index Terms— Adaptive filtering, convergence, propor- tionate-type normalized least mean square (PtNLMS) algo- (cid:2)L rithm,sparseimpulseresponse. J(k)=J +σ2 E{z2(k)} min x i i=1 1. INTRODUCTION wherethefirsttermJ isequaltothevarianceofthenoise, min We begin by assuming there is some input signal denoted σ2, and z (k) are the elements of z(k). Hence in order to as x(k) fortime k thatexcites an unknownsystem with im- v i calculatetheMSEweneedtofindtheexpectedvalueofthe pulseresponsewopt. Lettheoutputofthesystembey(k) = squareweightdeviationsz2(k). wT x(k)wherex(k)=[x(k),x(k−1),...,x(k−L+1)]T. i opt Themeasuredoutputofthesystem,d(k),containszero-mean AtthisstageweproceedbyconsideringtheMSEforspe- stationary measurement noise v(k) and is equal to the sum cificproportionatetypeNLMSalgorithms.Manyproportion- ofy(k)andv(k). Theimpulseresponseofthesystemises- atetypeNLMSalgorithms,suchasthePNLMS[3],MPNLMS timatedwiththeadaptivefiltercoefficientvector,wˆ(k). The [1], and EPNLMS [2] imply highly non-linear (threshold- errorsignale(k)betweentheoutputoftheadaptivefilteryˆ(k) based)operations. Inordertosimplifythederivationofana- and d(k) drives the adaptive algorithm. The weight devia- lyticalresultsweexamineinthispaperasimplifiedPNLMS tion (WD) vector is given by z(k) = w − wˆ(k). The algorithm. ThecalculationofthegainforthesimplifiedPN- opt LMS algorithm is given in Table 2. The simplified PNLMS normalizedleastmeansquare(NLMS)algorithmforanarbi- algorithm avoids the usage of the maximum function which trary time-varying stepsize control matrix is shown in Table isemployedinthePNLMS,MPNLMS,andEPNLMSalgo- 1, as given in [1]. Here, β is the fixed stepsize parameter, G(k +1) = diag {g1(k +1),...,gL(k +1)} is the time- rithms. varying stepsize control matrix, and L is the length of the adaptive filter. The constant δ is typically a small positive Table2 numberusedtoavoidoverflowing. SimplifiedPNLMSAlgorithm Next,weseektherepresentationoftheMeanSquareOut- F (k)=ρ+|wˆ (k)|, i=1,...,L, ρ>0 i i putError(MSE)(LearningCurve)fortheproportionate-type F(k)=[F1(k),...,FL(k)]T nMoSrmEailsizgedivelenasbtymJea(nk)sq=uarEe{(P|etN(kL)M|2}S.)aBlgyoreixthpman[d2in].gTthhee g(k+1)= 1/L(cid:0)F(ikF)i(k) e(k) term and assuming that the input signal is white, i.e. 1-4244-1484-9/08/$25.00 ©2008 IEEE 3825 ICASSP 2008 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2009 2. REPORT TYPE 00-00-2009 to 00-00-2009 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Towards Analytical Convergence Analysis of Proportionate-Type NLMS 5b. GRANT NUMBER Algorithms 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Naval Research Laboratory,Radar Division,Washington,DC,20375 REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES See also ADM002091. Presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), Held in Las Vegas, Nevada on March 30-April 4, 2008. Government or Federal Purpose Rights License. 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE Same as 4 unclassified unclassified unclassified Report (SAR) Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 2. RECURSIVECALCULATIONOFTHEMEANWD Simulationshaveconfirmedthatthisassumptionholdsinthe ANDMEANSQUAREWD situationsdiscussedinthispaper. Also,whenρisverysmall (ρ < 10−4 )theexperimentsshowthattheassumptiondoes WecanrepresenttheWDattimek+1intermsoftheprior not hold. However most real world applications use larger WD at time k using the recursion for the estimated optimal valuesfortheρparameterandthereforethisisnotanissue. coefficientvector. Usingtheconventionthatxi(k) = x(k− AssumptionIV:Theexpectationofthedenominatorterm i+1),thisrecursionincomponent-wiseformisgivenby squared is equal to the square of the expectation of the de- nominator. Thisassumptionleadsto z (k+1) = z (k) i i (cid:3) βg (k+1)x (k) L x (k)z (k) E{(xT(k)G(k+1)x(k)+δ)2}=(σ2L+δ)2. − i i j=1 j j x xT(k)G(k+1)x(k)+δ Itholdsifthedenominatorisnearlyconstant. βg (k+1)x (k)ν(k) − i i . (1) Therefore we can write that the expectation of the WD xT(k)G(k+1)x(k)+δ canbefoundrecursivelyfromthepriortimestepby Thecomponent-wiseformoftherecursionforthesquare E{z (k+1)}=E{z (k)}−β E{g (k+1)z (k)} (4) oftheWDisgivenby i i o i i z2(k+1)=z2(k) whereβ = βσx2 . i i o σx2L+δ −2βgi(k+1)xi(k)(cid:0)Lj=1xj(k)zj(k)zi(k) Similarlybaseduponourassumptions,theexpectedvalue xT(k)G(k+1)x(k)+δ ofthesquareWDisgivenby −2βgi(k+1)xi(k)ν(k)zi(k) xT(k)G(k+1)x(k)+δ E{z2(k+1)}=E{z2(k)}−2β E{g (k+1)z2(k)} i i o i i −β2gi2(k+1)x2i(k)(cid:0)j(cid:0)mxj(k)xm(k)zj(k)zm(k) (2) (cid:2)L 2 2 (xT(k)G(k+1)x(k)+δ)2 +β2E{g2(k+1) z2(k)}+ βoσvE{g2(k+1)}. (5) + β2gi2(k+1)x2i(k)ν2(k) o i j=1 j σx2 i (xT(k)G(k+1)x(k)+δ)2 +β2gi2(k+1)x2i(k)(cid:0)jxj(k)zj(k)ν(k). Atthispointwehavethepotentialtorecursivelyestimate (xT(k)G(k+1)x(k)+δ)2 the expected value of the WD and the square WD vectors. Oneissueremainingisthecalculationoftermssuchas NextwetaketheexpectedvalueoftheWDandthesquare WD.Inordertodosowemakethefollowingsetofassump- E{gn(k+1)zm(k)} (6) tions. i j AssumptionI:Theadaptationstepsizeparameterβissuf- forn∈{1,2}, m∈{0,1,2}andi,j ∈{1,2,...,L}. ficientlysmallandtheLMScoefficientestimatoractsasalow Weassume,that passfilter. Hence,z (k)changesslowlyrelativetox (k). i i AssumptionII:Theinputsignalandobservationnoiseare E{gn(k+1)zm(k)}=E{g (k+1)}nE{zm(k)}ifi(cid:3)=j. i j i j uncorrelated. This assumption is justified provided that the useofthelinearunknownsystemmodelisapplicableandthe Now, we can take two approaches when calculating the lengthoftheWieneroptimalsolutionfortheadaptivefilteris expectation for i = j. In the first approach we assume that exactlyequaltotheorderoftheunknownsystem. theexpectationoftheproductofgn(k+1)andzm(k)issep- i i AssumptionIII:Theexpectationofaratiooftworandom arable. Inadditiontothis,weassumethattheexpectationof variablesisequaltotheratiooftheexpectationsofeachran- theproductofthegainsisequaltotheproductoftheexpec- domvariable. Inourcasethedenominatorofinterestistyp- tationsofthegains(thisassumptionholdswheng (k+1)is i ically the term x(k)TG(k + 1)x(k) + δ. This assumption slowvarying),thatis holdsifthedenominat(cid:4)orisnearlyconstantorifwehavethe (cid:3) E{gn(k+1)}=E{g (k+1)}n. (7) condition that L >> 2 L E{g2(k+1)}, [5]. We can i i i=1 i derivetheexpectationofthedenominatortermbylookingat Thereforewehave itincomponent-wiseformandapplyingAssumptionI,[5]: E{gn(k+1)zm(k)}=E{g (k+1)}nE{zm(k)}. i i i i (cid:2)L E{ x2(k)g (k+1)+δ} Thisapproachhasbeendubbedthe‘SeparableApproach’. j j j=1 Alternatively,wecancalculateexplicitlytheexpectations (cid:2)L in (6). We refer to this approach as the ‘Non-Separable Ap- =E{ E{x2(k)}g (k+1)+δ}=σ2L+δ (3) proach’.Inthenextsectionwedeveloptheneededprobability j j x distributionsandexpressionsforthetwoapproaches. j=1 3826 3. RECURSIVECALCULATIONOFEXPECTATIONS E{z2(k+1)}=E{z2(k)}−2β E{g (k+1)}E{z2(k)} i i o i i (cid:2)L 2 2 We begin by assuming that the ith component of the weight +β2E{g (k+1)}2 E{z2(k)}+ βoσvE{g (k+1)}2 deviationattimekhasanormaldistributionwithmeanμi(k) o i j=1 j σx2 i andvarianceσ2(k)i.e. i (15) zi(k)∼N(μi(k),σi2(k)). respectively. Noteσi2(k) = E{zi2(k)}−E2{zi(k)}. Atthis pointwearelefttofindE{g (k+1)}.Thistermcanbefound Thisassumptionisbasedonapossibilityofapplyingthecen- i as trallimittheoremtotherecursionfortheweightdeviationin (1),aswellassimulations. Giventhisassumptioneachcom- E{g (k+1)}=E{ F(cid:3)i(k) } ponent of the estimated optimal weight vector is distributed i 1/L F (k) j j as ρ+E{|wˆ (k)|} wˆ (k)=w −z (k)∼N(m (k),σ2(k)) ≈ (cid:3) i . (16) i i i i i 1/L (ρ+E{|wˆ (k)|}) j j wherem (k)=w −μ (k). Thep.d.f. of|wˆ (k)|isgivenby i i i i This algorithm is initialized by setting E{z (0)} = w and i i 1 −(|wˆi(k)|−mi(k))2 E{z2(0)}=w2. f(|wˆi(k)|)= (cid:5)2πσ2(k)[e 2σi2(k) i i i −(|wˆi(k)|+mi(k))2 3.2. Non-SeparableExpectationCalculations +e 2σi2(k) ]U(wˆi(k)) (8) InordertocalculatethemeanWDandthemeansquareWD whereU(x)istheunitstepfunction[6]. wefind: (cid:6) (cid:7) We now take advantage of the form of this p.d.f. and calculate several expectations which will be useful in future E{gi(k+1)zi(k)}=E{1/Lρ+(cid:0)|wji(−ρ+zi|(wkj)−|zzji((kk))|)} (17) derivations. Webeginbyfindingthemeanofthisdistribution ≈ ρE{zi(k)}+E{|wˆi(k)|(wi−wˆi(k))}. whichisgivenby 1/L(cid:0)j(ρ+E{|wˆj(k)|}) E{|wˆi(k)|}=mi(k)erf(cid:6)(cid:5)m2σi(2k()k)(cid:7)+(cid:8)π2σi(k)e−2mσ2ii2((kk)). ≈E{ρgEi({kzi2+(k)1})+zEi2({k|wˆ)i}(k=)|(Ewi{−1wˆ/(iLρ(+k(cid:0))|)wj2i}(−ρ.+zi|(wkj)−|)zzi2j((kk))|)} (18) i 1/L(cid:0)j(ρ+E{|wˆj(k)|}) (9) (cid:6) (cid:7) E{g2(k+1)z2(k)}=E{(cid:6) ρ+|wi−zi(k)| 2zi2(k)(cid:7) } Additionally,thesecondmomentisgivenby (cid:11)i i 1/L(cid:0)j(ρ+|wj−zj(k)|) 2 E{|wˆ (k)|2}=m2(k)+σ2(k). (10) ≈ ρ2E{z2(k)}+2ρE{|wˆ (k)|(w −wˆ (k))2} i i i i i (cid:12) i i (cid:6) (cid:3) (cid:7) Wecanalsocalculatethefollowingexpectations: +E{|wˆ (k)|2(w −wˆ (k))2} / 1 (ρ+E{|wˆ (k)|}) 2. (cid:6) (cid:7) i i i L j j E{|wˆ(cid:9)i(k)|(wi−(cid:10)wˆi(k))}= wiμi(k)−σi2(k)−μ2i(k) (cid:6) (cid:7) (19) ×erf √m2σi(i2k()k) + 2σi(√k)2μπi(k)e−2mσ2ii2((kk)) E{gi2(k+1)}=E{(cid:6)1/L(cid:0)ρ+j|(wρ+i−|wzij(−kz)|j(2k)|)(cid:7)2} (20) (cid:6) (11) ≈ ρ2(cid:6)+2ρE{|wˆi(k)|}+E{|wˆi(k(cid:7))|2}. E{|wˆi(k)(|wi−wˆi(k))2(cid:7)}=(cid:9)wiμ2i(k)(cid:10)+wiσi2(k) 1/L(cid:0)j(ρ+E{|wˆj(k)|}) 2 −3μ (k)σ2(k)−μ3(k) erf √mi(k) Usingequations(9)-(13)thesetermscanbecalculated. i i i 2σi2(k) (12) +(cid:6)2μ2(k)+4σ2(k)(cid:7)σ√i(k)e−2mσ2ii2((kk)) 4. RESULTS i i 2π (cid:6) (cid:7) E{|wˆi(cid:6)(k)|2(wi−wˆi(k))2}=(cid:7) wi2 μ2i(k)+σi2(k) Now we compare the theory derived to actual results from −2w μ3(k)+3μ (k)σ2(k) (13) MonteCarlosimulations. Inthesimulationsandfiguresthat i i i i +μ4(k)+6μ2(k)σ2(k)+3σ4(k) areshownthefollowingparametershavebeenchosenunless i i i i specified otherwise, L = 512, σ2 = 10−2 σ2 = 10−6, and x v δ =10−4.Wehavedevelopedametrictoquantitativelymea- 3.1. SeparableExpectationCalculations sure how well the theory fits the ensemble averaged results. IntheseparablecasetheexpectationoftheWDandthesquare Themetricisgivenby WDaregivenby (cid:3) |e2(k)−e2 (k)| E{zi(k+1)}=E{zi(k)}−βoE{gi(k+1)}E{zi(k)}(14) C = k (cid:3)T e2 (Mk)C k MC 3827 wheree2(k)isthesquaredoutputerrorgeneratedbythethe- T ENSEMBLE−AVERAGED SE VS. THEORY MSE ory at time k and e2 (k) is the squared output error gen- MC ENSEMBLE AVERAGED erated by the ensemble average at time k. The term in the −20 THEORY denominatorhasbeenaddedinanattempttomakethemetric independentoftheinputsignalpower. −30 Wecomparetheperformanceofthe‘SeparableApproach’ −40 theoryversusthe‘NonseparableApproach’theorywhenus- ingtheecho-pathimpulseresponsepresentedin[7].Thisim- B d−50 pulse is sparse because very few coefficients have non-zero values. Theperformanceofthe‘SeparableApproach’theory for ρ = 10−2 is shown in Figure 1. The results when using −60 β = 0.1 ρ = 0.01 the‘NonseparableApproach’theoryforρ = 10−2 isshown δ = 0.0001 −70 Random Seed = 50 in Figure 2. The ‘Nonseparable Approach’ theory performs Monte Carlo = 100 slightly better than the ‘Separable Approach’ theory. This C = 0.11011 −80 improvement is reflected in the metric C where it has been 0 0.5 1 1.5 2 reduced from a value of 0.14631 to 0.11011 after applying ITERATIONS x 104 the‘NonseparableApproach’theory. Fig.2. LearningcurveofsimplifiedPNLMSalgorithmρ = 10−2using‘NonseparableApproach’theory ENSEMBLE−AVERAGED SE VS. THEORY MSE AS A FUNCTION OF TIME ENSEMBLE AVERAGED −20 THEORY 6. REFERENCES −30 [1] H. Deng and M. Doroslovacˇki, ”Improving Conver- −40 genceofthePNLMSAlgorithmforSparseImpulseRe- B sponseIdentification,”IEEESignalProcessingLetters, d−50 vol.12,no.3,pp.181-184,Mar.2005. −60 β = 0.1 [2] K. Wagner, M. Doroslovacˇki, and H. Deng ”Con- ρ = 0.01 vergecne of proportionate-type NLMS adaptive filters δ = 0.0001 −70 Random Seed = 50 andchoiceofgainmatrix,”Proc.40thAsilomaronSig- Monte Carlo = 100 nals,Systems,andComputers,PacificGrove,CA,Oct. C = 0.14631 −80 29-Nov.1,2006. 0 0.5 1 1.5 2 ITERATIONS x 104 [3] D. Duttweiler, ”Proportionate normalized least-mean- squares adaptation in echo cancellers,” IEEE Trans. Fig.1. LearningcurveofsimplifiedPNLMSalgorithmρ = Speech Audio Processing, vol. 8, pp. 508-518, Sept. 10−2using‘SeparableApproach’theory 2000. 5. CONCLUSIONS [4] S.Haykin,AdaptiveFilterTheory,fourthedition,Pren- ticeHall2002. Wehavedevelopedtwoanalyticalmethodstopredicttheper- [5] H. Deng and M. Doroslovacˇki, ”On Convergence of formanceofthesimplifiedPNLMSalgorithmbydeveloping Proportionate-TypeNLMSAdaptiveAlgorithms,”Proc. recursions for the mean weight deviation and mean square IEEE International Conference on Acoustics, Speech, weight deviation. The weight deviation is assumed to have and Signal Processing, vol. 3, pp. 105-108, Toulouse, a Gaussian distribution. In the first method the expectation France,May2006. oftheproductofthegainandweightdeviationisconsidered tobeseparable. Inthesecondmethodtheexpectationofthe [6] A. Oppenheim and A. Willsky with S. Nawab, Signals productofthegainandweightdeviationisderivedwithoutas- andSystems,fourthedition,PrenticeHall,1997. sumingtheseparability.Thesecondmethodwhilemorecom- putationallyintensiveofferssomeimprovementintheability [7] K. Wagner and M. Doroslovacˇki, ”Proportionate-Type to predict the performance of the simplified PNLMS algo- Steepest Descent and NLMS Algorithms,” Proc. 41st rithm. Further analysis shows that the improvement comes Conference on Information Sciences and Systems, Bal- mainlyfromthedirectcalculationoftheE{g2(k)}insteadof timore,MD,Mar.14-16,2007. i theassumptionin(7). 3828