ebook img

Speech Processing in Mobile Environments PDF

129 Pages·2014·4.726 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Speech Processing in Mobile Environments

SPRINGER BRIEFS IN ELECTRICAL AND COMPUTER ENGINEERING  SPEECH TECHNOLOGY K. Sreenivasa Rao Anil Kumar Vuppala Speech Processing in Mobile Environments 123 SpringerBriefs in Electrical and Computer Engineering SpringerBriefs in Speech Technology SeriesEditor AmyNeustein Forfurthervolumes: http://www.springer.com/series/10043 Editor’sNote The authors of this series have been hand selected. They comprise some of the most outstanding scientists—drawn from academia and private industry—whose researchismarkedbyitsnovelty,applicability,andpracticalityinprovidingbroad- basedspeechsolutions.TheSpringerBriefsinSpeechTechnologyseriesprovides the latest findings in speech technology gleaned from comprehensive literature reviewsandempiricalinvestigationsthatareperformedinbothlaboratoryandreal life settings. Some of the topics covered in this series include the presentation of reallifecommercialdeploymentofspokendialogsystems,contemporarymethods of speech parameterization, developments in information security for automated speech, forensic speaker recognition, use of sophisticated speech analytics in call centers,andanexplorationofnewmethodsofsoftcomputingforimprovinghuman- computerinteraction.Thoseinacademia,theprivatesector,theselfserviceindustry, lawenforcement,andgovernmentintelligenceareamongtheprincipalaudiencefor thisseries,whichisdesignedtoserveasanimportantandessentialreferenceguide for speech developers, system designers, speech engineers, linguists, and others. In particular, a major audience of readers will consist of researchers and technical experts in the automated call center industry where speech processing is a key componenttothefunctioningofcustomercarecontactcenters. Amy Neustein, Ph.D., serves as editor in chief of the International Journal of Speech Technology (Springer). She edited the recently published book Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics (Springer 2010), and serves as quest columnist on speech processing for Womensenews. Dr. Neustein is the founder and CEO of Linguistic Technology Systems, a NJ- basedthinktankforintelligentdesignofadvancednaturallanguage-basedemotion detectionsoftwaretoimprovehumanresponseinmonitoringrecordedconversations ofterrorsuspectsandhelplinecalls. Dr. Neustein’s work appears in the peer review literature and in industry and mass media publications. Her academic books, which cover a range of political, social, and legal topics, have been cited in the Chronicles of Higher Education and have won her a pro Humanitate Literary Award. She serves on the visiting facultyoftheNationalJudicialCollegeandasaplenaryspeakeratconferencesin artificial intelligence and computing. Dr. Neustein is a member of MIR (machine intelligence research) Labs, which does advanced work in computer technology to assist underdeveloped countries in improving their ability to cope with famine, disease/illness, and political and social affliction. She is a founding member of the New York City Speech Processing Consortium, a newly formed group of NY- basedcompanies,publishinghouses,andresearchersdedicatedtoadvancingspeech technologyresearchanddevelopment. K. Sreenivasa Rao (cid:129) Anil Kumar Vuppala Speech Processing in Mobile Environments 123 K.SreenivasaRao AnilKumarVuppala IndianInstituteofTechnology InternationalInstituteofInformation Kharagpur,WestBengal,India Technology Hyderabad,Gachibowli,India ISSN2191-8112 ISSN2191-8120(electronic) ISBN978-3-319-03115-6 ISBN978-3-319-03116-3(eBook) DOI10.1007/978-3-319-03116-3 SpringerChamHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2013956607 ©SpringerInternationalPublishingSwitzerland2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface Robust speech systems in mobile environment have gained a special interest in recent years in order to enable access to remote voice-activated services. In this context,threemajorchallengesthatneedtobeconsideredare:varyingbackground conditions,speechcoding,andtransmissionchannelerrors.Inthisbook,wefocus on improving the recognition performance of speech systems in the presence of speechcodingandbackgroundnoiseconditionsbyusingvowelonsetpoints(VOPs) asanchorpoints.VOPisanimportanteventinspeechproduction,anditisdefined astheinstantatwhichtheonsetofvoweltakesplace.Speechcodersconsideredin this work are GSM full rate (ETSI 06.10), GSM enhanced full rate (ETSI 06.60), CELP(FS-1016),andMELP(TI2.4kbps). Themajorworkspresentedinthisbookare: (cid:129) MethodsareproposedforthedetectionofVOPsinthepresenceofspeechcoding andbackgroundnoiseconditions. (cid:129) A two-stage hybrid approach based on hidden Markov models (HMMs) and supportvectormachines(SVMs)isproposedforimprovingtheperformanceof consonant-vowel(CV)recognitionsystem. (cid:129) Two-stage VOP detection method is proposed for spotting CV units from continuousspeech. (cid:129) Combinedtemporalandspectralpreprocessingmethodsareexploredtoimprove theperformanceofCVrecognitionsystemunderbackgroundnoise. (cid:129) A method based on VOPs is proposed to improve the performance of speaker identification(SI)systeminthepresenceofcoding. (cid:129) A method is proposed for nonuniform time scale modification using VOPs and instantsofsignificantexcitation. Someimportantconclusionsdrawnoutofthisworkare:(i)Performanceofthe proposed VOP detection method based on spectral energy in the glottal closure region is found to be better compared to existing methods under clean, coded and noisyconditions.(ii)Performanceoftheproposedtwo-stagehybridCVrecognition approachhasshownsignificantimprovementcomparedtootherapproaches,under clean, coded, and noisy conditions. (iii) Performance of CV recognition system v vi Preface under background noise is improved by using combined temporal and spectral processing-based preprocessing method. (iv) Proposed two-stage VOP detection method used for spotting CV segments from continuous speech has found to be efficient in minimizing the missing and spurious VOPs. (v) In the presence of coding, performance of SI system is improved by using features extracted from steady vowel speech segments. Improvement in SI system performance is mainly due to the presence of crucial speaker-specific information in the steady vowel segments of speech, even after coding. (vi) Performance of the proposed time scalemodificationmethodissuperiorcomparedtoexistingmethods.Thesuperior performance of the proposed method is due to the nonuniform modification of different speech segments and accurate detection of various speech segments with thehelpofinstantsofsignificantexcitationandVOPs. Thisbookismainlyintendedforresearchersworkingonbuildingrobustspeech systemsinmobileenvironment.Thisbookisalsousefulfortheyoungresearchers, who want to pursue the research in speech processing. Hence, this may be recommended as the text or reference book for the postgraduate level advanced speechprocessingcourse. Many people have helped us during the course of preparation of this book. We would especially like to thank all professors of G.S. Sanyal School of Telecom- munication and School of Information Technology, IIT Kharagpur for their moral encouragement and technical discussions during the course of editing and orga- nization of the book. Special thanks to our colleagues at IIT Kharagpur for their cooperationandcoordinationtocarryoutthework.Finally,wethankallourfriends andwell-wishers. WestBengal,India K.SreenivasaRao Hyderabad,India AnilKumarVuppala Contents 1 Introduction .................................................................. 1 1.1 Introduction............................................................. 1 1.2 ObjectiveoftheBook .................................................. 2 1.3 OrganizationoftheBook .............................................. 2 2 BackgroundandLiteratureReview ....................................... 5 2.1 ApproachesforDetectionofVowelOnsetPoints..................... 5 2.1.1 VOPDetectionUsingExcitationSource,Spectral Peaks,ModulationSpectrum,andTheirCombination....... 6 2.2 SpeechProcessinginMobileEnvironment ........................... 9 2.2.1 SpeechandSpeakerRecognitionUnderCoding ............. 10 2.2.2 SpeechRecognitionUnderBackgroundNoise............... 11 2.3 RecognitionofCVUnitsofSpeechinIndianLanguages............ 12 2.4 TimeScaleModification ............................................... 13 2.5 Summary................................................................ 14 3 VowelOnsetPointDetectionfromCodedandNoisySpeech ........... 17 3.1 SpeechDatabasesforVOPDetection ................................. 17 3.1.1 TIMITDatabase................................................ 18 3.1.2 BroadcastNewsDatabase...................................... 18 3.2 VOPDetectionMethodforCodedSpeech............................ 18 3.2.1 ExtractionofGlottalClosureInstantsUsing ZFFMethod .................................................... 20 3.2.2 SequenceofStepsintheProposedVOP DetectionMethod .............................................. 21 3.2.3 ChoiceofFrameSize .......................................... 24 3.2.4 ChoiceofFrequencyBand..................................... 27 3.3 Performance of the VOP Detection Method inthePresenceofSpeechCoding ..................................... 30 3.3.1 VOPDetectionfromContinuousSpeechUnderCoding..... 30 3.3.2 VOPDetectionfromCVUnitsUnderCoding................ 32 vii viii Contents 3.4 VOPDetectionMethodforNoisySpeech............................. 35 3.4.1 FormantExtractionUsingGroupDelayFunction............ 35 3.4.2 Sequence of Steps in the Proposed VOP DetectionMethodforNoisySpeech .......................... 37 3.5 Performance of the VOP Detection Method inthePresenceofBackgroundNoise.................................. 37 3.5.1 VOPDetectionfromContinuousSpeechUnderNoise ...... 38 3.5.2 VOPDetectionfromCVUnitsUnderNoise ................. 39 3.6 Summary................................................................ 41 4 Consonant–VowelRecognitioninthePresenceofCoding andBackgroundNoise ...................................................... 43 4.1 Consonant–VowelUnitDatabases..................................... 44 4.2 Two-StageCVRecognitionSystem................................... 45 4.2.1 MotivationsfortheProposedCVRecognitionApproach.... 45 4.2.2 ProposedCVRecognitionApproach.......................... 46 4.2.3 Framework...................................................... 47 4.2.4 PerformanceoftheCVRecognitionSystem.................. 47 4.3 ImpactofAccuracyinVOPDetectiononCVRecognition.......... 50 4.4 PerformanceofCVRecognitionSystemUnderCoding ............. 51 4.4.1 IsolatedCVUnitsRecognitionUnderCoding................ 51 4.4.2 CVUnitsRecognitionfromContinuousSpeech inthePresenceofCoding...................................... 53 4.5 PerformanceofCVRecognitionSysteminthePresence ofBackgroundNoise................................................... 54 4.6 Application of Combined Temporal and Spectral ProcessingMethodsforCVUnitsRecognitionUnder BackgroundNoise...................................................... 56 4.6.1 CombinedTSPMethodforEnhancement ofNoisySpeech................................................ 56 4.6.2 CV Units Recognition Under Different Background Noise Cases Using Temporal andSpectralPreprocessingTechniques....................... 62 4.7 Summary................................................................ 63 5 Spotting and Recognition of Consonant–Vowel Units fromContinuousSpeech.................................................... 65 5.1 Two-StageApproachforDetectionofVowelOnsetPoints .......... 66 5.1.1 SequenceofStepsintheProposedVOP DetectionMethod .............................................. 66 5.1.2 ChoiceofDeviationThresholdforDetermining theUniformityintheEpochIntervals......................... 67 5.1.3 Performance of the Proposed Two-Stage VOPDetectionMethod........................................ 69 5.2 PerformanceofSpottingandRecognitionofCVUnits inContinuousSpeech .................................................. 70 Contents ix 5.3 SpottingandRecognitionofCVUnitsfromCodedSpeech ......... 73 5.4 SpottingandRecognitionofCVUnitsfromNoisySpeech.......... 74 5.5 Summary................................................................ 75 6 SpeakerIdentificationandTimeScaleModificationUsing VOPs .......................................................................... 77 6.1 SpeakerIdentificationinthePresenceofCodingUsing VowelOnsetPoints..................................................... 78 6.1.1 SpeechDatabases .............................................. 78 6.1.2 SISystemUsingAANNModels.............................. 78 6.1.3 EffectofSpeechCodingonSpeakerIdentification........... 80 6.1.4 ProposedSpeakerIdentificationMethod...................... 82 6.1.5 PerformanceoftheSpeakerIdentificationSystem UsingFeaturesExtractedfromSteadyVowelRegions....... 84 6.2 NonuniformTimeScaleModificationUsingInstants ofSignificantExcitationandVowelOnsetPoints .................... 84 6.2.1 DurationAnalysisofVowelsinFastandSlowSpeech....... 85 6.2.2 DeterminationofDifferentSpeechSegments ................ 87 6.2.3 ProposedMethodforTimeScaleModification............... 88 6.2.4 EvaluationoftheProposedNonuniformTime ScaleModificationMethod.................................... 92 6.3 Summary................................................................ 93 7 SummaryandConclusions ................................................. 97 7.1 SummaryofthePresentWork ......................................... 97 7.2 ContributionsofthePresentWork..................................... 100 7.3 DirectionsforFutureWork ............................................ 100 AppendixA MFCCFeatures.................................................. 103 AppendixB SpeechCoders ................................................... 107 B.1 GlobalSystemforMobileFullRateCoder(ETSIGSM06.10) ..... 107 B.2 GSMEnhancedFullRateCoder(ETSIGSM06.60)................. 107 B.3 CodebookExcitedLinearPrediction(CELPFS-1016)............... 107 B.4 MixedExcitedLinearPrediction(MELPTI2.4kbps) ............... 108 B.5 DegradationMeasures.................................................. 109 AppendixC PatternRecognitionModels.................................... 111 C.1 HiddenMarkovModels................................................ 111 C.2 SupportVectorMachines............................................... 112 C.3 Auto-AssociativeNeuralNetworkModels............................ 113 References......................................................................... 115

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.