AWARPEDFILTERIMPLEMENTATION FORTHELOUDNESSENHANCEMENTOFSPEECH By MARCANDREBOILLOT ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA MAY2002 ACKNOWLEDGMENTS Thecompletionofadissertationisaconsiderableundertaking,andonewhichistem- peredbydisciplineandpatience. Theachievementisthedegreeofphilosophy,themost nobleofacademicrewards.Inthecourseofthisjourneytherehavebeenafewindividuals whohavechangednotonlythewayIthinkasanengineerbutthewayIthinkasaperson. Tothesepeople,Iamgrateful.IsincerelythankDr.JohnHarris,whohasbeenmymentor, advisor,andfriend. Hehascompelledmetothinkwithsuchenthusiasmandclarity,that itexaminestheoriginofmythoughts. Hehasenlightenedme,andIwillhavetherest ofmylifetofavorablylookbackuponthisexperience. IthankDr. Principe,whoseabil- itytocapturethoughtthroughexpression,andapproachinconveyingthisunderstanding throughengineeringandmath,isthegiftofatruecomposer. IthankDr. Taylor,who encouragedmetopursuetheseadvancedstudies,andIlistened.And,toDr.Bedenbaugh, myadmirationandrespectforthestudyofneuroscienceiswhatbroughtmetograduate academics. Thisresearchcommitmentwouldnotbepossiblewithoutthesupportanddedication ofChinWongandScottKoenigsman,mymanagersatMotorolawhohavegivenmean unprecedentedopportunitytopursuethisresearch. I wouldliketoexpressmysincere gratitudetoV.P.JaimeBorrasandZafferMerchantfornotonlyfundingthisresearchbut proposingthetopicthathasbecomethisdissertation. Jaime’svisionthatifyoucandream it,youcandoit,isthegenesisofthiscommitment. Igratefullythanktheseindividuals foralwaysplacinginmetheirconfidenceandtrust. Theyhaveallowedmetodevelop professionally,inmycareerandasanindividual. Ihumblythankmyfatherwhoseeccentricphilosophiesandexaminationsoflifearethe inspirationofmyachievements. Aswell,Icannotthankenoughmymother,sister,fiancee, andfamilyfortheirloveandneverendingsupport.And,toallmycolleagues,ithasbeen awonderfuljourneyinintellect,understanding,andfriendship. 11 TABLEOFCONTENTS page ACKNOWLEDGMENTS ii LISTOFTABLES vi LISTOFFIGURES viii ABSTRACT xii CHAPTER 1 INTRODUCTION 1 1.1 Background 2 1.2 SpeechEnhancement 5 1.3 ContributionsandChapterOrganization 7 2 MODELSOFLOUDNESS 11 2.1 Loudness 12 2.1.1 CriticalBands 14 2.1.2 AuditoryFilters 16 2.1.3 Excitation 18 2.2 MeasuringLoudness 21 2.2.1 PowerLawofHearing 22 2.2.2 LoudnessandBandwidth 25 2.2.3 OutertoMiddleEarFilter 27 2.3 CalculatingISO-532BLoudness 30 2.3.1 SpecificLoudness 30 2.3.2 SlopeExcitation 32 2.3.3 Discussion 33 2.4 SimplifyingtheLoudnessModel 34 2.4.1 PLPTechnique 35 2.4.2 ExtendingPLPforLoudness 36 2.4.3 TheLoudnessApproximation 38 2.4.4 ModelDiscussion 42 3 VOWELPOWER 46 3.1 Vowels 47 3.1.1 SyntheticModel 48 3.1.2 MaskingEffects 50 3.1.3 TIMIT 54 3.2 Identification 59 iii 3.2.1 LoudnessAdaptionandAuditoryFatigue 60 3.2.2 FormantExpansion 62 3.2.3 ModulationDepth 64 3.2.4 SyntheticVowelLoudness 65 4 WARPEDLINEARPREDICTION 68 4.1 LinearPredictionModel 69 4.2 BandwidthExpansion 71 4.3 Vocoders 73 4.3.1 PerceptualNoiseWeighting 74 4.3.2 AdaptivePost-filtering 75 4.4 WarpedFiltering 78 4.5 WarpedFilterStructures 83 4.5.1 AnalysisFilter 83 4.5.2 SynthesisFilter 88 4.5.3 DirectFormFilter 89 4.5.4 WarpedBandwidthExpansion 92 4.5.5 FilterStructure 94 4.6 AuditoryModelling 102 4.7 TheGammaFilter 105 5 OBJECTIVEEVALUATIONS 112 5.1 ISO-532BAnalysis 113 5.1.1 TheOptimalWarpingFactor 119 5.1.2 WarpedFilterLoudness 121 5.1.3 EquatingEnergytoLoudness 125 5.1.4 Results 129 5.2 SpeechRecognition 137 5.2.1 SpectralDistortionMeasures 137 5.2.2 AMeasureofLoudnessDistortion 139 5.3 RecognitionResults 144 5.3.1 DTWResults 147 5.3.2 HMMResults 148 6 SUBJECTIVEEVALUATIONS 152 6.1 MeasuresofSpeechIntelligibility 152 6.2 IntelligibilityTest 155 6.2.1 Procedure 156 6.2.2 IntelligibilityResults 157 6.3 LoudnessTest 158 6.3.1 Procedure 158 6.3.2 SensitivityScreening 160 6.3.3 LoudnessResults 161 6.4 AcceptabilityTest 163 6.4.1 Procedure 163 6.4.2 AcceptabilityResults 164 IV 7 CONCLUSIONS 166 APPENDICES A FILTERCOEFFICIENTTRANSFORMATION 171 B WARPEDPHASE 176 C HMMTRAINING 177 REFERENCES 184 BIOGRAPHICALSKETCH 194 V 3.1 LISTOFTABLES Tab3l.e2 page TIMITTESTphonemeoccurrences (N), power (P%), accessory Loudness (aL%),maskedPower(mP%),andsoneloudnessapproximationerror(E%) 55 Relative occurrence (N), total average power (P), average masked Power (mP),averagecontributionofaccessoryloudness(aL),andapproximation 5.3 error(E)forallphonemecategoriesoftheTIMITtestset 56 5.1 Dialectregionsandnumberofspeakersineachregion 120 5.2 Phonemecategoriesandtheloudnessgaindifferencebetweenthewarpedand linearbandwidthexpansionfilters 123 TIMITTESTphonemeoccurrences(N),power(P%),Linear(a=0)loudness increase (numberoftimeslouder)Ny/N^ , Warped (a = 0.5) loudness increaseNy/Nx 124 5.4 Equalenergyphonemegainsforlinearexpansiona=0withthetrueISO- 532Bforthe1,681sentencesoftheTIMITtestset.TheratioNy/Nxisthe loudnessincreaseoftheenhancedphonemetotheoriginal(numberoftimes louder),dBpa,„isthegainfromEq(5.13)requiredtoscaleuptheoriginal toachieveequalloudness,Ny/Ngxistheloudnessincreaseoftheenhanced tothescaledoriginalandE=\1—Ny/Ngx\istheapproximationerrorof dBgojjj 131 5.5 Equalenergyphonemegainsforlinearexpansiona = 0withthewarped approximationoftheISO-532Bforthe1,681sentencesoftheTIMITtest setwithSPLlevelsbetween50and80dB.TheratioNy/Nxistheloudness increaseoftheenhancedphonemetotheoriginal(numberoftimeslouder), dBgainisthegainfromEq(5.13)requiredtoscaleuptheoriginaltoachieve equalloudness, Ny/Ngx istheloudnessincreaseoftheenhancedtothe scaledoriginalandE=\l—Ny/Ngx\istheapproximationerrorofdBgain- 132 5.6 Equalenergyphonemegainsforwarpedexpansiona=0.5withthetrueISO- 532Bforthe1,681sentencesoftheTIMITtestsetwithSPLlevelsbetween 50and80dB. TheratioNy/Nxistheloudnessincreaseoftheenhanced phonemetotheoriginal(numberoftimeslouder),dBgamisthegainfrom Eq(5.13)requiredtoscaleuptheoriginaltoachieveequalloudness,Ny/Ngx istheloudnessincreaseoftheenhancedtothescaledoriginalandE= |1—Ny/Ngx\istheapproximationerrorofdBgain 134 VI 5.7 Equalenergyphonemegainsforwarpedexpansiona=0.5withtheapproxi- mationoftheISO-532Bforthe1,681sentencesoftheTIMITtestsetwith SPLlevelsbetween50and80dB.TheratioNy/Nxistheloudnessincrease oftheenhancedphonemetotheoriginal(numberoftimeslouder),dB(,Qj„ isthegainfromEq(5.13)requiredtoscaleuptheoriginaltoachieveequal loudness, Ny/Ngxistheloudnessincreaseoftheenhancedtothescaled originalandE=\1—NyjNgx\istheapproximationerrorofdBpaj„. . . . 135 5.8 AverageLoudnessincrease,equivalentdBgain,andapproximationerrorfor phonemecategoriesusingthetrueISO-532Bfora=0anda=0.5from Table(5.6)forTIMITtestsentenceswithSPLlevelsbetween50and80dB. 136 5.9 ComparisonoftheaveragecorrelationcoefficientpbetweenObjectiveand Subjectivespeechquality[21] 138 5.10 DTWresultsfororiginalandwarpedspeechtemplates:Numberofvocabulary wordscorrectlyrecognizedforspeaker901inMotorolastarsdatabase. 20 wordsvs6enumeratedconditions;Trainin:cond1rep12,Testin:cond 123456repall 149 5.11 HMMDiscreteresultsforvocabularywordscorrectlyrecognizedforspeaker 901inMotorolastarsdatabase. 20wordsvs6conditionsfororiginaland warpedspeechtemplates.Trainin: cond123456rep12Testin:cond 123456repall 150 5.12 HMMContinuousresultsforvocabularywordscorrectlyrecognizedforspeaker 901inMotorolastarsdatabase. 20wordsvs6conditionsfororiginaland warpedspeechtemplates.Trainin:cond123456rep12 151 66..15 VocabularyofwordsusedforRhymingTestofIntelligibility,subdividedinto confusablesetsI-III 156 6.2 Averageintelligibilityresultsoftherhymetestfor 16listenershearing60 wordswith0dBSNR.Tableresultsaredisplayedasthepercentcorrect Pnpopulationmeanwith±95%ConfidenceLevel 157 6.3 VocabularyofwordsusedforLoudnessTest 158 6.4 Loudnesslisteningtestforwarpedfilterwitha—0.5:Totalnumberoftimes theprocessedwordwasselectedovertheoriginalwordforall16listeners. 162 Sentenceacceptabilityresultsfororiginalsentences(A) andprocessedsen- tences(B)withwarpedfilterfora=0.5. Twentyrandomsentencesfrom theTIMITdatasetwerepresentedtoeachof16Listeners. TheQuality rating1(excellent)to3(fair)istheirmeanresponseforthe20sentences, and#Lcolumnisthenumberoftimesasentencewasselectedasbeing louder.Itisgivenasapercentageinthelastcolumn 165 vii LISTOFFIGURES Figure page 2.1 Equalloudnesscurves 14 2.2 Mappingofthelinearfrequencyscaletothecriticalbandscalegivenby Eqs(2.2)and(2.3) 16 2.3 Roexauditoryfiltersforinputlevels50to90dBatcenterfrequencieslOOHz, IKHz,and3KHz 18 2.4 Exampleofpuretonemaskingthresholdgeneratedbyanarrowbandmasker. 19 2.5 Exampleofnoisenotchmethodtotraceoutauditoryfiltershapes 20 2.6 Generationofexcitationfunction,a)individualauditoryfilterresponsesfrom aIKHzsinusoidinput,andb)resultingexcitationpattern 20 2.7 Excitationlevelversuscriticalbandpatternfor IKHztone. Thresholdin quietindicatedbydashedline 21 2.8 Relationbetweenloudnessandbandwidtha)inputnarrowbandnoisecen- teredatIKHzwithbandwidths40,80, 160,320,640and1280Hzallat constant60dBSPLb)correspondingexcitationpatterns,andc)resulting loudnesspattern 26 2.9 Loudnessoftonesseparatedbyacriticalband 27 2.10 OutertomiddleearfiltergivenbyEq(2.20)forvariousvaluesofi? 29 2.11 16weightingfunctions usedtocompute0(fl(m)) 37 2.12 Linearapproximationtoexcitationslopesgeneratedbyroexauditoryfilters. 38 2.13 FrequencywarpingusingOppenheimrecursiononautocorrelationsequence. 39 2.14 Outertomiddlesensitivitycharacteristics 40 2.15 Determinationofmaximuminterimexcitations 41 2.16 Absolutethresholdofhearing 42 2.17 LoudnesspredictionsoftheISO-532B{dotted)andthewarpedloudnessap- proximation{solid) 43 2.18 LoudnesspredictionofISO-532B{dotted)andapproximation{solid) .... 44 3.1 Averageformant locationsforvowels inAmericanEnglish (Petersonand Barney,1952) 46 viii , 3.2 AverageformantlocationsandbandwidthsforvowelsinAmericanEnglish withcorrespondingdBdropofformantamplitudefrom60dBreference[21]. 48 3.3 Fivepoleformantsynthesisof10AmericanEnglishvowelspectra{y-axisin dB,x-axisisO-^KHz) 49 3.4 ISO-532Bvowelloudnesspatternswithaccessoryloudnessduetomasking inshadedregions 50 3.5 a)Toneandb)narrowbandmaskingthresholds 52 3.6 AuditoryMaskingThreshold 54 3.7 Percentofmaskedpowerinvowelregions{darkened)ofTIMITspeechsen- tence 57 3.8 AccessoryloudnessofTIMITspeechsentence{vowelregionsdarkened). . . 58 3.9 Formantbandwidthexpansiononsyntheticvowel/a/;a)LPCpoledisplace- mentbroadensbandwidthbyreducingformantpolepeaks,andb)elevation ofspectrumtorestoreenergy 64 3.10 Perceivedequalloudnesstimefunctions{sentenceandnoise) 66 3.11 Theincreaseofloudnessasafunctionofvowelbandwidth 67 4.1 Poledisplacementmodelusedtodemonstrateanevaluationofftheunitcircle withr>1resultsinabroadenedpoleresponse(shadedregion) 71 4.2 Relationofpoledistancefromjw-ax\stopolebandwidthinLaplacespace. 72 4.3 GeneralCELPcoderblockdiagram 74 4.4 Perceptualnoiseweightinga)vocaltract1/A(z),b)codingnoiseA{z/fd)jA{z) andc)excitation 75 4.5 GeneralCELPdecoderblockdiagram 76 4.6 Responseofl/A(z//3)forvariousvaluesof,0 77 4.7 CriticalbandfrequencywarpingusingOppenheimrecursiononautocorrela- tionsequence 82 4.8 Analysisfilterelement 83 4.9 Unitdelayreplacementwithall-pass 84 4.10 Frequencywarpingcharacteristicsoftheall-passelementdescribedbyEq(4.23) fordifferentvaluesofthewarpingfactor—0.8<a<0.8 85 4.11 Frequency warpingcharacteristicsofall-pass {dottedfromEq(4.23)) com- paredtocriticalbandscale{solidfromEq(2.3)) 86 4.12 Directsubstitutionofall-passelementsinFIR 87 4.13 Synthesisfilterelement 88 IX 4.14 Modifiedsynthesisfilterelementaftercoefficienttransformation 90 4.15 Modifiedanalysisfilter 90 4.16 WLPCvocodercitedin[130] 92 4.17 ChangingimplementationorderofWLPCvocoderforuseasaWIIRfilter. 94 4.18 Formantbandwidthexpansionfilterwithfrequencyscalesetbylocallyre- currentaparameter 95 4.19 Familyofcurvesshowingfrequencydependentbandwidthexpansionfora particularevaluationradiusfromthewarpedfilter 96 4.20 Warpedfilteroutputgaincurves(normalizedforunityata=0)forasinu- soidalchirpsignalonanevaluationradiusof1.02 97 4.21 FamilyofgaincompensationcurvesgivenbyEq(4.37) 99 4.22 Spectralenvelopeofasyntheticvowelforwarpedbandwidthexpansionl/A(z//3) {solid)andoriginal1/j4(z) {dotted). Demonstratesanevaluationoffthe unitcirclewithawarpingfactora=0resultsinauniformbandwidth changeforallformants;a)timeresponse,b)frequencyresponse,andc) spectralenvelope. Onesliderisusedtosetthewarpingfactor—0.6<a< 0.6. Anotherisusedtosettheevaluationradiusandathirdsliderallows firstorderlow-passorhigh-passfiltering1± toadjustforspectral tilt. LoudnesslevelsusingtheISO-532Baregivenfortheoriginaland processedvowel,andoriginalformantbandwidths{dotted)areall50Hz. . 100 4.23 Spectralenvelopeofasyntheticvowelforwarpedbandwidthexpansion1/A{zf0) {solid) andoriginal1/A{z) {dotted). Demonstratesanevaluationoffthe unitcirclewithawarpingfactora- 0.34resultsinnon-uniformband- widthchangeforallformants; a) timeresponse,b) frequencyresponse, andc) spectralenvelope. Onesliderisusedtosetthewarpingfactor —0.6<a<0.6. Anotherisusedtosettheevaluationradiusandathird sliderallowsfirstorderlow-peissorhigh-passfiltering1±/iz“^toadjustfor spectraltilt.LoudnesslevelsusingtheISO-532Baregivenfortheoriginal andprocessedvowel,andoriginalformantbandwidths{dotted)areall50Hz.l01 4.24 WLPCGainadjustment 103 4.25 Modelofthesyntheticvowel/a/withLPCandWLPCenvelopeonlinear frequencyscale(top)andwarpedscale(bottom) 104 4.26 PoleradiiscalingnecessarytoachievebandwidtheffectsofFig(4.19) 105 4.27 GammabasesgivenbyEq(4.45) 107 4.28 Relationbetweenthezand7domains 109 4.29 StablehigherorderWIIRfilters 110 4.30 Substitutionoflocallyrecurrentfeedbackloopwithgammakernel Ill X