ebook img

Machine Learning for Speaker Recognition PDF

274 Pages·2017·8.99 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning for Speaker Recognition

INTERSPEECH 2016 Tutorial: Machine Learning for Speaker Recognition † ‡ Man-Wai Mak and Jen-Tzung Chien † The Hong Kong Polytechnic University, Hong Kong ‡ National Chiao Tung University, Taiwan September 8, 2016 1 / 274 Table of Contents 1 Introduction 2 Learning Algorithms 3 Learning Models 4 Deep Learning 5 Case Studies 6 Future Direction 2 / 274 Outline 1 Introduction 1.1. Fundamentals of speaker recognition 1.2. Feature extraction and scoring 1.3. Modern speaker recognition approaches 2 Learning Algorithms 3 Learning Models 4 Deep Learning 5 Case Studies 6 Future Direction 3 / 274 Speaker identification Speaker verification Speaker diarization Fundamentals of speaker recognition Speaker recognition is a technique to recognize the identity of a speaker from a speech utterance. Text dependent Speaker recognition Text independent Open set Close set 4 / 274 Speaker identification Determine whether unknown speaker matches one of a set known speakers One-to-many mapping Often assumed that unknown voice must come from a set of known speakers – referred to as close-set identification Adding “none of the above” option to closed-set identification gives open-set identification 5 / 274 Speaker verification Determine whether unknown speaker matches a specific speaker One-to-one mapping Close-set verification: The population of clients is fixed Open-set verification: New clients can be added without having to redesign the system. 6 / 274 Speaker diarization Determine when a speaker change has occurred in speech signal (segmentation) Group together speech segments corresponding to the same speaker (clustering) Prior speaker information may or may not be available 7 / 274 Input mode Text-dependent Recognition system knows text spoken by persons Fixed phrases or prompted phrases Used for applications with strong control over user input, e.g., biometric authentication Speech recognition can be used for checking spoken text to improve system performance Sentences typically very short Text-independent No restriction on the text, typically conversational speech Used for applications with less control over user input, e.g., forensic speaker ID More flexible but recognition is more difficult Speech recognition can be used for extracting high-level features to boost performance Sentences typically very long 8 / 274 Outline 1 Introduction 1.1. Fundamentals of speaker recognition 1.2. Feature extraction and scoring 1.3. Modern speaker recognition approaches 2 Learning Algorithms 3 Learning Models 4 Deep Learning 5 Case Studies 6 Future Direction 9 / 274 Acoustic Features •  Speech is a continuous evolution of the vocal tract •  Need to extract a sequence of spectra or sequence of spectral coefficients •  Use a sliding window - 25 ms window, 10 ms shift MFCC DCT log|X(ω)| Feature extraction Speech is a time-varying signal conveying multiple layers of information Words Speaker Language Emotion Information in speech is observed in the time and frequency domains 10 / 274

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.