ebook img

Single Channel Overlapped-Speech Detection and Separation of Spontaneous Conversations PDF

193 Pages·2017·8.68 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Single Channel Overlapped-Speech Detection and Separation of Spontaneous Conversations

University of Newcastle Faculty of Science, Agriculture and Engineering School of Electrical & Electronic Engineering Single Channel Overlapped-Speech Detection and Separation of Spontaneous Conversations Hasan Mohammad-Ali Kadhim A thesis submitted to the University of Newcastle for the degree of Doctor of Philosophy March − 2017 I present my PhD thesis to: ▪ My country IRAQ ▪ My Family in Iraq and the UK Table of Content Table of Content ................................................................................................................. i Abstract ............................................................................................................................ v Acknowledgment ............................................................................................................ vii List of Figures ................................................................................................................... ix List of Tables .................................................................................................................... xi List of Symbols, Abbreviations and Acronyms ................................................................. xiii List of Publications ........................................................................................................ xvii List of Hibernating .......................................................................................................... xix Chapter 1. Introduction .............................................................................................. 1 1.1 Structure of the Thesis ................................................................................................. 1 1.2 Speech versus Audio .................................................................................................... 2 1.3 Pitch of Speech Signal .................................................................................................. 4 1.4 Spontaneous Conversation, Dialog Speech and Mixture Speech ................................ 6 1.5 Overlapped-Speech Detection, Speaker Diarization and Speech Separation ............. 8 1.6 Samples, Window-Frame and Hopping period ............................................................ 8 1.7 Supervised, Semi-Supervised and Unsupervised Machine Learning ......................... 10 1.8 Blind Speech Separation versus Informed Speech Separation .................................. 11 1.9 Overall-System ........................................................................................................... 12 1.10 Subjective Test versus Objective Test ........................................................................ 15 1.11 Masking ...................................................................................................................... 17 1.12 Objectives and Aims of the Research ........................................................................ 19 1.13 Contributions ............................................................................................................. 20 Chapter 2. Literature Reviews ................................................................................... 23 2.1 Introduction ............................................................................................................... 23 2.2 Literature Review of Overlapped-Speech Detection ................................................. 24 2.3 Literature Review of Speech Separation by Non-negative Matrix Factorization ...... 31 2.4 Literature Review of Informed Speech Separation ................................................... 35 2.4.1 Video-Assisted Source Separation ...................................................................... 35 2.4.2 Spatial Audio Object Coding ............................................................................... 36 2.4.3 Reverberant Models for Source Separation ........................................................ 37 2.4.4 Score-Informed Source Separation ..................................................................... 38 i 2.4.5 Language-Informed Speech Separation ............................................................. 38 2.4.6 User-Guided Source Separation ......................................................................... 39 2.4.7 Dictionary-Based Methods ................................................................................. 39 2.4.8 NMF-Based Informed Speech Separation .......................................................... 40 Chapter 3. Overlapped-Speech Detection based-on Stochastic Properties ................. 43 3.1 Introduction ............................................................................................................... 43 3.2 Functional Block Diagram and Illustrative Waveforms ............................................. 44 3.3 An Algorithm of Overlapped-Speech Detection ........................................................ 45 3.3.1 Framing and Overlapping-Window of the Input Signal ..................................... 46 3.3.2 Extraction of Audio Features by RASTA-PLPC .................................................... 46 3.3.3 k-means Clustering of the Features ................................................................... 52 3.3.4 Groups and Statistical Variances ....................................................................... 58 3.3.5 Optimizing the Groups ....................................................................................... 60 3.3.6 Re-clustering ...................................................................................................... 64 3.3.7 Hierarchical Clustering Scenarios ....................................................................... 65 3.4 Experiments ............................................................................................................... 67 3.5 Result and Test .......................................................................................................... 70 3.6 Comparison ............................................................................................................... 76 3.7 Summary.................................................................................................................... 80 Chapter 4. Blind Speech Separation by Filter-Bank, Non-negative Matrix Factorization and Speaker Clustering ........................................................................................... 83 4.1 Introduction ............................................................................................................... 83 4.2 Source Separation ..................................................................................................... 84 4.3 Functional Block Diagrams and Waveforms ............................................................. 86 4.3.1 Preparation of the Required Resources ............................................................. 88 4.3.2 Filter-Bank Analysis Technique .......................................................................... 89 4.3.3 Non-negative Matrix Factorization NMF ........................................................... 91 4.3.4 Speaker Clustering.............................................................................................. 94 4.4 Experiments ............................................................................................................... 95 4.5 Result and Test ........................................................................................................ 100 4.6 Comparison ............................................................................................................. 113 4.7 Summary.................................................................................................................. 114 Chapter 5. Informed Speech Separation by Semi-Supervised Non-negative Matrix Factorization .................................................................................................. 115 5.1 Introduction ............................................................................................................. 115 5.2 Functional Block Diagrams and Waveforms ........................................................... 116 5.3 Informed Speech Separation Procedure ................................................................. 117 5.3.1 Preparation of the Required Resources ........................................................... 118 ii 5.3.2 Training the Virtual Speech Signals .................................................................. 119 5.3.3 The Virtual Assists the Real Speech Signals ...................................................... 121 5.3.4 Soft and Binary Masking ................................................................................... 122 5.3.5 Exploiting both Masks ...................................................................................... 123 5.4 Experiments ............................................................................................................. 125 5.5 Results and Tests ..................................................................................................... 126 5.6 Comparison .............................................................................................................. 142 5.7 Summery .................................................................................................................. 144 Chapter 6. Notes, Conclusions and Future Works .................................... 145 6.1 Notes and Conclusions ............................................................................................. 145 6.2 Future Works ........................................................................................................... 146 Appendix A. Historical Overviews ............................................................... 149 A.1 Historical Overview of Filter-Bank ............................................................................... 149 A.2 Historical Overview of k-means ................................................................................... 149 A.3 Historical Overview of Overlapped-Speech Detection ................................................ 150 A.4 Historical Overview of Speech Separation .................................................................. 150 A.5 Historical Overview of NMF and NMF-based Speech Separation ............................... 151 Appendix B. Software ................................................................................. 153 Appendix C. k-means Flowchart ................................................................. 155 Appendix D. NMF Procedure ...................................................................... 157 References ................................................................................................. 159 iii iv Abstract In the thesis, spontaneous conversation containing both speech mixture and speech dialogue is considered. The speech mixture refers to speakers speaking simultaneously (i.e. the overlapped- speech). The speech dialogue refers to only one speaker is actively speaking and the other is silent. That Input conversation is firstly processed by the overlapped-speech detection. Two output signals are then segregated into dialogue and mixture formats. The dialogue is processed by speaker diarization. Its outputs are the individual speech of each speaker. The mixture is processed by speech separation. Its outputs are independent separated speech signals of the speaker. When the separation input contains only the mixture, blind speech separation approach is used. When the separation is assisted by the outputs of the speaker diarization, it is informed speech separation. The research presents novel: overlapped-speech detection algorithm, and two speech separation algorithms. The proposed overlapped-speech detection is an algorithm to estimate the switching instants of the input. Optimization loop is adapted to adopt the best capsulated audio features and to avoid the worst. The optimization depends on principles of the pattern recognition, and k-means clustering. For of 300 simulated conversations, averages of: False-Alarm Error is 1.9%, Missed- Speech Error is 0.4%, and Overlap-Speaker Error is 1%. Approximately, these errors equal the errors of best recent reliable speaker diarization corpuses. The proposed blind speech separation algorithm consists of four sequential techniques: filter- bank analysis, Non-negative Matrix Factorization (NMF), speaker clustering and filter-bank synthesis. Instead of the required speaker segmentation, effective standard framing is contributed. Average obtained objective tests (SAR, SDR and SIR) of 51 simulated conversations are: 5.06dB, 4.87dB and 12.47dB respectively. For the proposed informed speech separation algorithm, outputs of the speaker diarization are a generated-database. The database associated the speech separation by creating virtual targeted- speech and mixture. The contributed virtual signals are trained to facilitate the separation by homogenising them with the NMF-matrix elements of the real mixture. Contributed masking optimized the resulting speech. Average obtained SAR, SDR and SIR of 341 simulated conversations are 9.55dB, 1.12dB, and 2.97dB respectively. Per the objective tests of the two speech separation algorithms, they are in the mid-range of the well-known NMF-based audio and speech separation methods. v vi

Description:
Jordi Luque Serrano (2011) investigated the environments of multi-sensors B Yuan, L Taslaman, B Nilsson, S Arora, R Ge, Y Halpern, D Mimno,
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.