ebook img

exploration of behavioral, physiological, and computational approaches to auditory scene analysis PDF

129 Pages·2004·0.8 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview exploration of behavioral, physiological, and computational approaches to auditory scene analysis

EXPLORATION OF BEHAVIORAL, PHYSIOLOGICAL, AND COMPUTATIONAL APPROACHES TO AUDITORY SCENE ANALYSIS A THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Peter Sou-Kong Chang, B.S., B.A. The Ohio State University 2004 Master’s Examination Committee: Approved by Professor DeLiang Wang, Advisor Professor Eric Fosler-Lussier Dr. Douglas Brungart Advisor Graduate Program in Computer and Information Science ABSTRACT We present an overview for the study of auditory perception and scene analysis through the three main approaches researchers have used to study perception in general: behavioral, physiological, and computational. At the behavioral level, we discuss the principles and origins of auditory scene analysis, and establish the relationship between auditory scene analysis and auditory masking. Within auditory masking, we note the coexistence of informational and energetic masking, and utilize the ideal time-frequency binary masks in a series of speech intelligibility experiments to isolate the energetic component of speech-on-speech masking. At the physiological level, we propose the adoption of the two-dimensional time-frequency oscillatory correlation representation as a main representation in auditory perception, after reviewing several of the theories and experiments in neurophysiology in effort to find its support. Finally, at the computational level, we extend an existing implementation of oscillatory correlation, LEGION [144], to simulate the major behavioral principles in alternating-tone sequences. Most notably, the decision boundaries of the temporal coherence boundary (TCB) and fission boundary (FB) first observed by Van Noorden [135] are automatically generated by the model. The results are compared to several existing implementations designed to simulate alternating-tone sequences [11, 104, 139]. Throughout this thesis, we use the three levels of analysis proposed by Marr in vision [89]. We emphasize the importance of balance at each level of analysis, and their relationship with the three approaches in the study auditory perception. ii Dedicated to my Parents and to bb iii ACKNOWLDGMENTS I want to thank my research advisor, Prof. DeLiang Wang, for his constant guidance throughout my graduate study at Ohio State. His knack for explaining the most difficult scientific concepts in a clear and concise manner is unparalleled. His extensive knowledge in so many scientific fields and his patience in helping me understand them have not only driven me to complete my research projects but also fueled my passion for pursuing future scientific endeavors in a variety of subjects. I am also grateful to have the opportunity to work with Dr. Douglas Brungart, who has built for me the bridge between behavioral and engineering methodologies through psychophysics. I will always be awed by his amazingly sharp mind to constantly generate new and creative ideas for experiments, and his remarkable capability in sorting through and analyzing the most complicated experimental data. I gain a great deal of insights every time I speak with him, and I am indebted to his patience in always taking the time to explain a problem in detail so that I really understand each issue. Thanks are due to Brian Simpson, who has given me much assistance while working in the Air Force Research Laboratory. He has shown me a great deal on the methods to conduct psychophysical experiments. Coming from an interesting combination of backgrounds in psychology and music, I definitely have enjoyed many interesting discussions with him. I would also like to thank Prof. Eric Fosler-Lussier for being on my committee and providing insightful and objective critiques to my thesis. His advice and opinions have iv allowed me to think of my research through different points of view, and help me to make my work relevant for more people. I want to also thank my undergraduate research advisor, Prof. Vera Maljkovic. She not only introduced to me the wonderful field of psychology and perception, but always gives me words of encouragement and valuable advice. Her belief in my ability has given me the confidence to pursue graduate school and tackle challenging problems, and will continue to inspire me in the future. I wish to thank my lab mates from the Perception and Neurodynamics Lab, who are always around to help and discuss interesting topics. I am grateful to Soundarajan Srinivasan, who often puts his own work aside to help others, and always provides insightful answers regardless of the subject matter of my concerns. I will certainly miss his good-natured and approachable personality plus the breadth of knowledge he possesses. I want to thank Yipeng Li, whom I also frequently approach with my academic dilemmas, and would always patiently explain to me fundamental issues that boggle my mind. Guoning Hu continues to impress me with his strong scientific knowledge, self-discpline, and a balanced lifestyle. I thank Yang Shao for his technical assistance with a variety of software and hardware that I depend on dearly. I have enjoyed my conversations on life and family with Nicoleta Roman, and also want to thank her for the help she gave me in getting started with the ideal binary mask. Zhaozhang Jin immediately impacted me in the short time that I have known him, by gladly answering my questions on signal processing fundamentals that I have always been afraid to ask. Lab alumnus Mingyang Wu has shown me the importance of maintaining social networks and good communication skills even in academic pursuits; his advice on careers will long be remembered. v I would like to thank my family and friends because they have supported me all these years and have kept life interesting, especially my parents whom I will forever look to as my role models. Last but most, I am grateful to have met my girlfriend Jessica Yi Jin during my graduate study, because the endless hours of hard work sifting through hundreds of textbook pages, journal articles, astronomical equations, and problem sets are quickly forgotten when I remember that we have always been side by side throughout this amazing journey, and the bond we have built is in itself worth the most novel scientific discovery. On a final note, I want to give credit to the financial support from the Air Force Research Laboratory that has made my research possible. vi VITA December 17, 1980 ……………………….. Born in Taipei, Taiwan September, 1998 – June, 2002 ……………. B.S., Computer Science B.A., Psychology The University of Chicago, Chicago, IL September, 2002 – Present ……………….. Graduate Teaching Associate Graduate Research Associate The Ohio State University, Columbus, OH FIELDS OF STUDY Major Field: Computer and Information Science vii TABLE OF CONTENTS PAGE Abstract …………………………………………………………………………... ii Dedication ………………………………………………………………………... iii Acknowledgments ……………………………………………………………….. iv Vita ……………………………………………………………………………….. vii List of Tables …………………………………………………………………….. xi List of Figures ……………………………………………………………………. xii Chapters: 1. Introduction ………………………………………………………………….. 1 1.1 Motivation: From Sensation to Perception …………………………... 1 1.2 Linking the Stimulus and Perception: Behavioral, Physiological, and Computational Approaches …………………………………………... 2 1.3 A Closer Look at Audition …………………………………………… 6 1.4 Thesis Overview ……………………………………………………... 7 2. Auditory Scene Analysis …………………………………………………….. 9 2.1 Introduction ………………………………………………………….. 9 2.2 Primitive Auditory Scene Analysis …………………………………... 10 2.3 Schema-Based Integration …………………………………………… 14 2.4 Speech Scene Analysis ………………………………………………. 15 2.5 Computational Auditory Scene Analysis …………………………….. 16 2.6 Summary ……………………………………………………………... 19 3. On the Ideal Binary Mask and its Effects on Masking in Multitalker Speech Mixtures ………..………..………..………..………..………..……………… 21 3.1 Introduction ………..………..………..………..………..……………. 21 viii 3.1.1 Energetic and Informational Masking ………..………..………… 21 3.1.2 Isolating the Informational Component of Speech-on-Speech Masking . ………..………..………..………..………..………….. 23 3.1.3 Isolating the Energetic Component of Speech-on-Speech Masking ………..………..………..………..………..…………… 24 3.2 The Ideal Binary Mask ………..………..………..………..…………. 27 3.2.1 Background ………..………..………..………..………..………... 27 3.2.2 Implementation ………..………..………..………..……………... 32 3.3 Experiment 1: Effects of the Ideal Binary Mask on Speech Intelligibility ………..………..………..………..………..…………... 33 3.3.1 Methods ………..………..………..………..………..…………… 33 3.3.2 Results and Discussion ………..………..………..………..……... 35 3.4 Experiment 2: Effects of Sex and Characteristics of Interfering Speakers with Ideal Binary Masking ………..………..………..…….. 44 3.4.1 Methods ………..………..………..………..………..…………… 45 3.4.2 Results and Discussion ………..………..………..………..……... 48 3.5 Experiment 3: Effects of Number of Competing Speakers with Ideal Binary Masking ………..………..………..………..………..……….. 55 3.5.1 Methods ………..………..………..………..………..…………… 55 3.5.2 Results and Discussion ………..………..………..………..……... 56 3.6 Experiment 4: Effects of TMR on Resynthesis of Mixture Signal …... 61 3.7 Summary ………..………..………..………..………..………..……... 64 3.7.1 Conclusions from the Experiments ………..………..……………. 64 3.7.2 Limitations and Future Research ………..………..……………… 65 3.7.3 Ideal Binary Mask as a Computational Goal of CASA …………. 67 4. An Oscillatory Correlation Approach to ASA and the Computational Segregation of Alternating-Tone Sequences ………..………..……………… 69 4.1 Introduction ………..………..………..………..………..……………. 69 4.2 Neurophysiological Mechanisms for Auditory Streaming …………... 69 4.2.1 Carrying the Sound from the Ear to the Brain ………..………….. 70 4.2.2 Theories of Auditory Neural Coding and Representation ……….. 70 ix

Description:
built for me the bridge between behavioral and engineering methodologies through psychophysics. I will always be awed by his amazingly sharp mind
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.