Towards Automatic Rhythmic Accompaniment Thesis submitted in partial fulfilment of the requirements of the University of London for the Degree of Doctor of Philosophy Matthew E. P. Davies Submitted: May 2007 Corrections: August 2007 Department of Electronic Engineering, Queen Mary, University of London Icertifythatthisthesis,andtheresearchtowhichitrefers,aretheproduct of my own work, and that any ideas or quotations from the work of other people, published or otherwise, are fully acknowledged in accordance with the standard referencing practices of the discipline. I acknowledge the helpful guidance and support of my supervisor, Dr Mark Plumbley. 2 Abstract Inthisthesisweinvestigatetheautomaticextractionofrhythmicandmet- rical information from audio signals. Primarily we address three analysis tasks: theextractionofbeattimes, equivalenttothehumanabilityoffoot- tapping in time to music; finding bar boundaries, which can be considered analogous to counting the beats of the bar; and thirdly the extraction of a predominant rhythmic pattern to characterise the distribution of note onsets within the bars. We extract beat times from an onset detection function using a two- state beat tracking model. The first state is used to discover an initial tempo and track tempo changes, while the second state maintains contex- tual continuity within consistent tempo hypotheses. The bar boundaries are recovered by finding the spectral difference between beat synchronous analysis frames, and the predominant rhythmic pattern by clustering bar length onset detection function frames. In addition to the new techniques presented for extracting rhythmic information, we also address the problem of evaluation, that of how to quantify the extent to which the analysis has been successful. To this aim we propose a new formulation for beat tracking evaluation, where accuracy is measured in terms of the entropy of a beat error histogram. To illustrate the combination of all three layers of analysis we present this research in the context of automatic musical accompaniment, such that the resulting rhythmic information can be realised as an automatic percussive accompaniment to a given input audio signal. 3 For Kenneth and Vera 4 Acknowledgements I’d like to take this opportunity to thank those people who have had a profound effect on me and on the work that has gone into this thesis. First, I would like to thank my supervisor, Mark Plumbley. The fact that this thesis is finished at all, is in no small part down to his endless enthusiasm for talking about my work and for commenting on the nu- merous drafts of this document. I am extremely grateful for the effort he put in to make it so easy to join the group back in the Summer of 2003, and during my time here, for always making funding available for travel to conferences. I must also acknowledge the support of a College Studentship from Queen Mary and the kind (anonymous) person who gave up their studentship just before I started. Special thanks also to Juan Bello and Chris Harte. Not just for out- standing insight into my work, but for always being available to talk, even when it had nothing to do with beat tracking. I owe a great deal to the work put in by Stephen Hainsworth. It’s hard to quantify the impact of being able to use his annotated beat tracking database – without it, I would have no results. Thanks to Nick Collins for sending the data through to me. I must also thank Anssi Klapuri for lending me his source code so that I could run my comparative tests and to Nico Ch´etry for actually getting the code to compile. ThereareofcoursemanymorepeopleIwanttothank. Innoparticular order: Andrew N, Andrew R, Katy, Samer, Paul, Mark S, Mark L, Adam, Yves, Steve, Dan, Beiming, Hamish and all the people who have been part of C4DM while I’ve been here. Finally I would like to thank my wife Sarah for being so patient and understanding (especially towards the end) and to my family for their continued support. 5 Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.1 Objectives and Motivation . . . . . . . . . . . . . . . . . . 16 1.2 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . 17 1.3 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.4 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . 23 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1 Rhythm and Metre . . . . . . . . . . . . . . . . . . . . . . 25 2.1.1 Foot-tapping . . . . . . . . . . . . . . . . . . . . . 26 2.1.2 Metrical Structure . . . . . . . . . . . . . . . . . . 29 2.2 Relevance to Automatic Accompaniment Systems . . . . . 30 2.3 Towards a Rhythmic Representation . . . . . . . . . . . . 36 2.3.1 Onset Detection . . . . . . . . . . . . . . . . . . . . 36 2.3.2 Derivation of Onset Detection Functions . . . . . . 37 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3 Context-dependent Beat Tracking . . . . . . . . . . . . . . 44 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Previous Approaches . . . . . . . . . . . . . . . . . . . . . 45 3.3 The State of the Art in Beat Tracking . . . . . . . . . . . 50 3.4 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5 Onset Detection Function . . . . . . . . . . . . . . . . . . 52 3.6 General State . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.6.1 Beat Period Induction . . . . . . . . . . . . . . . . 54 3.6.2 Beat Alignment Induction . . . . . . . . . . . . . . 58 3.7 Context-dependent State . . . . . . . . . . . . . . . . . . . 61 3.7.1 Beat Period . . . . . . . . . . . . . . . . . . . . . . 63 3.7.2 Beat Alignment . . . . . . . . . . . . . . . . . . . . 65 6 3.8 Two State Model . . . . . . . . . . . . . . . . . . . . . . . 68 3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4 Beat Tracking Evaluation Methods . . . . . . . . . . . . . 71 4.1 Evaluation Issues . . . . . . . . . . . . . . . . . . . . . . . 71 4.1.1 Subjective Approaches . . . . . . . . . . . . . . . . 73 4.1.2 Objective Approaches . . . . . . . . . . . . . . . . 75 4.1.3 Comparison of Objective Evaluation Methods . . . 85 4.2 Entropy-Based Evaluation . . . . . . . . . . . . . . . . . . 88 4.2.1 Beat Error . . . . . . . . . . . . . . . . . . . . . . . 88 4.2.2 Beat Error Histogram . . . . . . . . . . . . . . . . 90 4.2.3 Beat Accuracy . . . . . . . . . . . . . . . . . . . . 91 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5 Beat Tracking Results . . . . . . . . . . . . . . . . . . . . . 94 5.1 Scope of Evaluation . . . . . . . . . . . . . . . . . . . . . . 94 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.2.1 MIREX 2006 Audio Beat Tracking Results . . . . . 96 5.2.2 Continuity-based Results . . . . . . . . . . . . . . . 97 5.2.3 Entropy-based Results . . . . . . . . . . . . . . . . 102 5.2.4 Entropy vs Continuity-based Evaluation . . . . . . 108 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6 Towards Rhythmic Accompaniment . . . . . . . . . . . . . 110 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.1.1 Time-signature Estimation . . . . . . . . . . . . . . 112 6.1.2 Extracting Downbeats . . . . . . . . . . . . . . . . 114 6.1.3 Downbeat Applications . . . . . . . . . . . . . . . . 116 6.2 Downbeat Extraction . . . . . . . . . . . . . . . . . . . . . 117 6.2.1 A Beat-synchronous Spectral Representation . . . . 118 6.2.2 Measuring Spectral Change . . . . . . . . . . . . . 120 6.2.3 Detecting Downbeats . . . . . . . . . . . . . . . . . 122 6.2.4 Time-signature Extraction . . . . . . . . . . . . . . 123 6.3 Predominant Rhythmic Pattern . . . . . . . . . . . . . . . 124 6.4 Downbeat Tracking Evaluation . . . . . . . . . . . . . . . 129 6.4.1 Semi-automatic Downbeat Tracking Accuracy . . . 130 6.4.2 Fully Automatic Downbeat Tracking Accuracy . . . 132 7 6.5 Beat Tracking with Prior Knowledge . . . . . . . . . . . . 136 6.5.1 Beat Tracking with Multiple Detection Functions . 137 6.5.2 Beat Tracking with a Count-in Initialisation . . . . 139 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . 142 7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 144 7.3 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8 List of Figures 2.1 Metrical Structure for a Samba rhythm. (a) Audio signal. (b) Note onset locations. (c) Lowest metrical level: the Tatum. (d) Beat locations. (e) Bar boundaries. . . . . . . 28 2.2 Onset detection functions for a short excerpt of rock mu- sic. Dotted vertical lines indicate beat locations. (a) Audio signal. (b) Energy-based. (c) High Frequency Content. (d) Spectral Difference. (e) Phase Deviation. (f) Complex Spectral Difference. . . . . . . . . . . . . . . . . . . . . . 42 3.1 (a): Flow chart of typical beat tracking process. (b)-(f): Proposed model. (b) input audio signal, (c) onset detec- tion function, (d) autocorrelation function with beat pe- riod, (e) detection function with beat alignment, (f) audio signal with extracted beats. . . . . . . . . . . . . . . . . . 52 3.2 Autocorrelation function with metrically unbiased comb template. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.3 Shift-invariant comb filterbanks for the General State (a) and the Context-dependent State (c), with respective out- put signals: (b) and (d). The index of the maximum value in the output is used to find the beat period for the current analysis frame. . . . . . . . . . . . . . . . . . . . . . . . . 59 3.4 Comb filter matrices for the General State (a) and Context- dependent State (b), with respective example output sig- nals (c) and (d). The index of the maximum value in the output is selected as the phase of the beats for the current analysis frame. . . . . . . . . . . . . . . . . . . . . . . . . 62 3.5 Two state switching model: Analysis begins in General State S . The observation of a consistent beat period gen- 1 erates a Context-dependent state, S , which is replaced by 2 S′ if a tempo change is observed. . . . . . . . . . . . . . . 69 2 9 4.1 (a) Dixon [2001a] approach to beat tracking evaluation. (b) Sizeofevaluationwindowforafasttempo, and(c)ataslow tempo. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2 Cemgil et al [2001] approach to beat tracking evaluation. (a) The sequence of beat times (dotted lines) and annota- tions (bold solid lines). (b) The Gaussian cost function W used to define beat accuracy. The 70ms window refers to Dixon’s allowance window [Dixon, 2001a]. . . . . . . . . . 79 4.3 Goto and Muraoka [1997a] approach to beat tracking eval- uation. The bold vertical lines are beat annotations a , j the dotted lines are beats γ , the diagonal lines represent b the perceptually acceptable region around each annotation where the beat error ζ ≤ 0.35. . . . . . . . . . . . . . . . . 82 4.4 (a) Continuity-based beat tracking evaluation metric. Ac- ceptance of beat, γ , depends on its proximity to annota- b tion, a , and the localisation of the previous beat. θ de- j fines the acceptance window around each annotation. (b) Overview of evaluation metric over several beats. . . . . . 84 4.5 Extraction of beat error ζ from beats γ and annotations a . 89 b j 4.6 (a) Beat error histogram, (b) Beat error histogram mapped onto the unit circle. . . . . . . . . . . . . . . . . . . . . . . 90 5.1 Comparison of non-causal beat tracking for our approach DP, KEA [Klapuri et al., 2006], Dixon [2006a], Ellis [2006], Hainsworth [2004b] and the Human tapping performance as a function of the acceptance window θ. Correct met- rical level with continuity required (a). Correct metrical level without continuity (b). Allowed metrical levels with continuity required (c) and Allowed metrical levels without continuity requirement (d). . . . . . . . . . . . . . . . . . . 100 5.2 Beat error histograms mapped onto the unit circle: (a) DP (non-causal); (b) DP (causal); (c) KEA (non-causal) [Kla- puri et al., 2006]; (d) KEA (causal) [Klapuri et al., 2006]; (e) Dixon [2006a]; (f) Hainsworth [2004b]; (g) Ellis [2006]; (h) Human. . . . . . . . . . . . . . . . . . . . . . . . . . . 105 10
Description: