ebook img

Intelligent Audio Analysis PDF

358 Pages·2013·5.842 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Intelligent Audio Analysis

Signals and Communication Technology Björn W. Schuller Intelligent Audio Analysis Signals and Communication Technology For furthervolumes: http://www.springer.com/series/4748 Björn W. Schuller Intelligent Audio Analysis 123 BjörnW. Schuller LSfürMensch-Maschine-Kommunikation TU München München Germany ISSN 1860-4862 ISBN 978-3-642-36805-9 ISBN 978-3-642-36806-6 (eBook) DOI 10.1007/978-3-642-36806-6 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2013933575 (cid:2)Springer-VerlagBerlinHeidelberg2013 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) For Thorben Amadeus Bryan & Benno Olav Sylvain Foreword IntelligentAudioAnalysisunitesmethodsofaudiosignalprocessingandmachine learning. Other terms exist for this field or sub-fields and might have been used instead,suchasComputerAuditionorMachineListening—eachofwhichisbeing usedbypartlydifferentresearchcommunitieswithslightlydifferentunderstanding of the core application field and the inventory of methods. BesidesAutomaticSpeechRecognitionbeingresearchedsincemorethanhalfa century, recently an increasing number of further speech and speaker character- isation tasks have been pursued in the literature. In addition, the younger field of Music Information Retrieval is growing and there is emerging interest in the computationally ‘intelligent’ analysis of general sound events. Fields of applica- tion comprise audio coding, edition, interaction, search, surveillance as well as coaching and entertainment applications. This book first propagates a unified view on the multiplicity of resulting tasks. It further provides a broad overview of the field enriched by extensive recent researchapplicationexamplesmostlybasedontheauthor’slatestwork.Thefocus thereby lies on realistic conditions and standardisation by open-source software implementations and comparative evaluations. The main goal is to increase robustnessbytemporaryandinnovativemethodssuchasautomateddata-acquisition by semi-supervised learning, audio signal enhancement by non-negative matrix factorisation, systematic feature brute-forcing and application of memory- enhanced learning algorithms—for example in combination with graphical model structures. Machine-based recognition of speech, non-linguistic vocalisations and para-linguistic speaker states and traits serve as examples of application in the domain of speech processing. As for music processing, examples include blindseparationofinstruments,determinationoftempo,metreandballroomdance style, as well as analysis of musical key, chord progression and structure, next toestimationofmusicmoodandsingertraits.Finally,examplesarecomplemented bytherecognitionofgeneralsoundeventsalongwiththeiremotionalconnotation. Inthe outlook, avenues towardsevolutionary, unsupervised andholistic audio- signal analysis are shown. It is thus hoped that the book may find interest by the very broad and interdis- ciplinaryrangeofresearchersandpractitionersinacademiaandindustryreaching fromengineeringandcomputersciencetothefieldsofspeech,language,musicand vii viii Foreword generalaudiosciencewiththeirmanifoldsub-fields.Itfurtheraddresseslevelsfrom earlytoveryadvancedlevel—obviously,though,notalldetailscanbeprovidedat anytime,andfurtherreadingwillbeofhelpwherethereaderfindsitmosthelpful for oneself. Preface This book is based on my habilitation thesis and by that on selected essential research application examples added by explanatory chapters made during the period of my habilitation at the Institute for Human–Machine Communication of the Technische Universität München (TUM) in Munich, Germany, to obtain the German state doctorate (fakultas docendi) and private lectureship (venia legendi, German PD) in the subject area of Signal Processing and Machine Intelligence. A representative selection of application examples was made basing on coverage of the broader field, scientific relevance and recency. The book further includes knowledgeandfindingsofresearchconductedandlecturesheldduringthisperiod at TUM, the CNRS LIMSI’s Spoken Language Processing Group in Orsay, France,theImperialCollegeLondon’sDepartmentofComputinginLondon,UK, the Università Politecinicà delle Marche in Ancona, Italy and the National ICT Australia in Sydney, Australia. The aim is to provide a handbook that can be read from the beginning to the end, structured into methods and examples of their application. Reference to the original research is repeatedly made throughout, such that the interested reader is referred to these, as well as to further reading from myself and my colleagues or further research in the field. By that, the book introduces a broader view on and newavenuestowardsthecomputationaland‘intelligent’analysisofaudioaiming at the higher goal of lending machines the ability to listen to and understand arbitrary and complex compounds of speech, music and sound. Gilching, December 2012 Björn W. Schuller ix Acknowledgments Great discoveries and improvements invariably involve the co-operation of many minds. I may be given credit for having blazed the trail, but when I look at the subsequent developmentsIfeelthecreditisduetoothersratherthanto myself. —AlexanderGrahamBell I would first like to express my deepest gratitude to my habilitation mentor and year-long advisor Prof. Gerhard Rigoll of Technische Universität München in Munich, Germany, for his immeasurable technical inspiration and guidance and consultancyalongtheacademicpath.Next,fortheirutmostvaluablepreceptorship and chance of common scientific work my further habilitation mentors Prof. Kristian Kroschel of the Karlsruhe Institute of Technology and the Fraunhofer IOSB and Prof. Andreas Wendemuth of the Otto-von-Guericke-University in Magdeburg. Forguidancealongthispathuptothedoctorallevel,Ithankmyformeradvisor Prof. Manfred K. Lang again at this point. Next, I would particularly like to thank my precursor Prof. Günther Ruske for yielding his lectures on Digital Processing of Speech Signals, Automatic Pattern Recognition in Speech Processing and Data Analysis and Information Reduction to myself and of course for his countless advice. Aspecial thankyou isalsodedicatedtomysuperiors atthe CNRS-LIMSI and itsSpokenLanguageProcessingGroupinOrsay,France,Prof.LaurenceDevillers, Dr. Jean-Luc Gauvain and his wife Dr. Lori Lamel and Dr. Joseph-Jean Mariani, as well as to my colleagues at the time, Dr. Matthias Brendel and Dr. Riccardo Zaccarelli for the common work and to Clement Chastagnol, Agnes Delaborde, Nicolas Foucault, Marie Tahon and Christophe Vaudable for the manifold discussions—merci à tous. ForwelcomingmeattheImperialCollegeLondon’sDepartmentofComputing in London, UK, and for our constant co-operation at the time of writing, I would also like to particularly thank Prof. Maja Pantic and her excellent team of the IntelligentBehaviourUnderstandinggroup(i-bug),inparticularDr.HaticeGunes xi xii Acknowledgments (Queen Mary University of London) and Dr. Michel Valstar (University of Not- tingham) for our common research. For hosting me as guest researcher at the National ICT Australia (NICTA) in Sydney, Australia, and for their infinite hospitality, I would next like to thank Dr. Julien Epps and Dr. Fang Chen. For hosting me as guest lecturer and researcher at the Harbin Institute of TechnologyinHarbin,China,andendlesscareduringthesedays,Iwanttofurther express my thanks to Prof. Haifeng Li. Further,forinvitingmeaslecturerinIntelligentAudioAnalysisandhostingme attheUniversitàPolitecnicadelleMarcheinAncona,Italy,aswellasformanifold commonworkandorganisationwiththeA3Lab,IwouldliketothankDr.Stefano Squartini and Prof. Francesco Piazza and from their team in particular Dr. Emanuele Principi and Dr. Rudy Rotili—grazie mille tutti. For their outstanding contributions, I would further like to thank my team of post doctoral and doctoral students at the Institute for Human–Machine Com- munication, namely Dr. Silvia Monica Feraru, Dr. Cyril Joder, Dr. Fabien Ring- evalandJunDeng,FlorianEyben,ErikMarchi,FelixWeninger,MartinWöllmer and Zixing Zhang. Next, I would like to thank the external doctoral student Raymond Brückner, and those who visited my working group in ‘sandwich doctoral programs’, namely, Wenjing Han (Harbin Institute of Technology, China) and Masao Yamagichi (Tokyo Institute of Technology, Japan). Also,Iwouldliketothanktheresearchassistantsfortheircontributionsinthis period in temporal order from the beginning: Thomas Hook, Claudia Tiddia, Fabian Bross, Erik Marchi, Jordi Feliu, Simone antke, Thomas Knauer, Otto- ThadäusJandlandYoussefAdel,aswellasthestudenttraineesSunYang,Judith Köppe and Salma Zouari. Averywarmandspecialthankyouisalsodedicatedtoallthemaster’sstudents at the Institute for Human–Machine Communication of this period, who pursued their theses under my guidance, in temporal order Thomas Mikschl, Ricardo Minguez, Naijiang Lu, Florian Dibiasi, Hermann Anton Karl, Jin Yao, Martin Wöllmer, Florian Eyben, Yuan Ma, Josef Zwack, Sebastian Kopp, Xiaodi Wang, Cong Hui, Peter Grosche, Christoph Scheuermann, Tobias Knaup, Fabian He- inemann, Yang Sun, Haroon Jacob, Yunpeng Sun, Felix Weninger, Benedikt Gollan, Xiaohua Zhang, Christian Landsiedel, Erik Marchi and Duc Bui Tran, as wellasthebachelorstudents—alsointemporalorder—MoritzDausinger,Florian Eyben, Qianqian Xu, Benedikt Gollan, Xiaohua Zhang, Ugur Özcan, Kayssar Chaabane, Chi He, Haitao Liu, Jian Li, Clemens Hage, Johannes Dorfner, Jürgen Glatz, Christoph Kozielski, Judith Köppe and Felix Friedmann, and the interdis- ciplinary project students Damir Ismailovic, Kevin Tunstall, Maximilian Wöhrl, ChristianKern,LorenzMösenlechner,TayfurCoskun,AlexanderLehmann,Felix Weninger and Marcel Knapp. Among the many colleagues at the Institute in Munich, I will mention only a few from these years, but thank them all for unforgettable years and exchanges:

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.