Perceptual Audio Quality Assessment using a Non-Linear Filter Bank (Gehörbezogene Qualitätsbewertung von Audiosignalen unter Verwendung einer nichtlinearen Filterbank) vorgelegt von Dipl. Ing. Thilo Thiede Vom Fachbereich Elektrotechnik der Technischen Universität Berlin zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften - Dr. Ing. – genehmigte Dissertation Berlin 1999 D 83 Tag der Abgabe: 27.10.1998 Tag der wissenschaftlichen Aussprache: 19.04.1999 Promotionsausschuß: Vorsitzender: Prof. Dr.-Ing. Heinrich Klar 1. Berichter: Prof. Dr.-Ing. Peter Noll 2. Berichter: Prof. Dr.-Ing. Manfred Krause Abstract Perceptual Audio Quality Assessment using a Non-Linear Filter Bank This thesis describes a new method for the objective measurement of perceived audio quality. The method is based on a non-linear filter bank which provides a good approximation of auditory filter shapes and even models the level dependence of these filter characteristics. Unlike other measurement schemes, the quality estimation is not solely based on models for steady-state signals, but considers also the temporal structure of the envelopes of the auditory filter outputs. A further improvement compared to other measurement methods is a separation between linear and non- linear distortions. This takes into account the fact that imbalances in the frequency response of an audio device are less annoying than the same amount of non-linear distortions like, for example, quantisation noise. The computational complexity of the filter bank implemented in this method is lower than for most other filter banks applicable to perceptual measurement. The method has proven to be superior to most other measurement methods used in this field and a large part of it will be included in the ITU-recommendation „method for objective measurements of perceived audio quality“. Especially the part of this recommendation that addresses applications requiring maximum possible accuracy („advanced version“) is mainly based on this method. Zusammenfassung Gehörbezogene Qualitätsbewertung von Audiosignalen unter Verwendung einer nichtlinearen Filterbank Die vorliegende Arbeit beschreibt ein neues Verfahren zur gehörbezogenen Qualitätsbewertung von Audiosignalen. Das Verfahren verwendet eine nichtlineare Filterbank, die eine gute Nachbildung der im Gehör vorliegenden Filter- charakteristiken liefert und dabei auch die Pegelabhängigkeit der Frequenzauflösung des Gehörs berücksichtigt. Im Gegensatz zu anderen Meßverfahren basiert das verwendete Gehörmodell nicht mehr ausschließlich auf Modellen für abschnittsweise stationäre Signale, sondern berücksichtigt auch die zeitliche Struktur der Hüllkurven der gefilterten Signale. Ein weiterer Vorteil gegenüber anderen Verfahren liegt in der vorgenommenen Trennung zwischen linearen und nichtlinearen Verzerrungen, durch die der Umstand berücksichtigt wird, daß Fehler, die durch einen ungleichmäßigen Frequenzgang entstehen, meist sehr viel weniger störend sind als nichtlineare Verzerrungen wie z. B. Quantisierungsrauschen. Die verwendete Filterbank erfordert im Vergleich zu anderen in gehörangepaßten Meßverfahren verwendbaren Filterbänken einen relativ geringen Rechenaufwand. Die neue Meßmethode ist den meisten anderen Meßverfahren für diesen Anwendungsbereich deutlich überlegen, und wird zu großen Teilen in die ITU-Empfehlung zur objektiven Messung der wahrgenommenen Tonqualität („method for objective measurement of perceived audio quality“) eingehen. Insbesondere der Teil der ITU-Empfehlung, der sich auf Anwendungen bezieht, für die eine maximale Genauigkeit erforderlich ist, besteht zum überwiegenden Teil aus dem hier vorgestellten Verfahren. Contents 7 Contents ABSTRACT......................................................................................................................................3 ZUSAMMENFASSUNG..................................................................................................................5 CONTENTS......................................................................................................................................7 1. INTRODUCTION.....................................................................................................................11 2. FUNDAMENTAL PRINCIPLES OF PERCEPTUAL MEASUREMENT..............................13 2.1 SOME CHARACTERISTICS OF THE HUMAN AUDITORY SYSTEM..................................................13 2.1.1 Masking.....................................................................................................................13 2.1.2 Loudness Perception...................................................................................................18 2.1.3 Auditory Frequency Scales.........................................................................................20 2.1.4 Other Effects..............................................................................................................21 2.2 PSYCHOACOUSTICAL MODELS................................................................................................22 2.2.1 Zwicker’s Model for the Calculation of Perceived Loudness........................................23 2.2.2 Moore’s Model for the Calculation of Partial Loudness...............................................24 2.2.3 Analytical Expressions for Psychoacoustical Phenomena............................................26 2.2.4 Summary and Conclusions.........................................................................................32 2.3 CONCEPTS OF PERCEPTUAL MODELS.......................................................................................34 2.3.1 Masked Threshold Concept........................................................................................35 2.3.2 Comparison of Internal Representations.....................................................................36 2.3.3 Analysis of Linear Error Spectra................................................................................37 2.4 PERCEPTUAL MEASUREMENT METHODS.................................................................................37 2.4.1 NMR..........................................................................................................................39 2.4.2 PAQM........................................................................................................................39 2.4.3 PERCEVAL...............................................................................................................40 2.4.4 POM..........................................................................................................................40 2.4.5 OASE.........................................................................................................................40 2.4.6 Comparison of Different Concepts..............................................................................41 2.4.7 Shortcomings of Existing Models...............................................................................45 3. APPLICATION OF FILTER BANKS......................................................................................46 3.1 COMPARISON WITH FFT-BASED METHODS.............................................................................46 3.1.1 Temporal Resolution..................................................................................................47 3.1.2 Spectral Resolution.....................................................................................................47 3.1.3 Summary....................................................................................................................48 3.2 REQUIREMENTS ON THE FILTERS.............................................................................................48 3.3 FILTER BANKS USED IN PERCEPTUAL MODELS.........................................................................49 3.3.1 One-Third-Octave Filters...........................................................................................49 3.3.2 FIR Filters..................................................................................................................49 3.3.3 BARK-Transform.......................................................................................................50 3.3.4 IIR Filters...................................................................................................................51 3.3.5 Warped Filters............................................................................................................52 3.3.6 FTT (Fourier-Time-Transform)..................................................................................55 3.3.7 Gammatone Filter Banks............................................................................................56 3.4 WAVELET-TRANSFORMS........................................................................................................57 3.5 DESIGN OF A NON-LINEAR FILTER BANK.................................................................................58 3.5.1 Modelling Level-Dependent Excitation Patterns.........................................................58 3.5.2 FIR Filters using Recursive Algorithms......................................................................61 3.6 COMPUTATIONAL EFFICIENCY................................................................................................68 8 Contents 3.7 FLEXIBILITY..........................................................................................................................70 3.8 CONCLUSIONS.......................................................................................................................70 4. DIX - A NEW PERCEPTUAL MEASUREMENT METHOD................................................71 4.1 OBJECTIVES AND REQUIREMENTS...........................................................................................72 4.2 OVERVIEW............................................................................................................................73 4.3 PERIPHERAL EAR MODEL.......................................................................................................74 4.3.1 Structure of the Filter Bank........................................................................................74 4.3.2 Pre-Filtering and Scaling............................................................................................76 4.3.3 Sampling in the Time and Frequency Domain............................................................77 4.3.4 Rectification...............................................................................................................79 4.3.5 Time Domain Smearing.............................................................................................80 4.3.6 Subsampling...............................................................................................................80 4.3.7 Threshold in Quiet.....................................................................................................81 4.3.8 Characteristics of the Filter Bank...............................................................................82 4.4 ALIGNMENT AND ADAPTATION...............................................................................................88 4.4.1 Introduction................................................................................................................88 4.4.2 Time Alignment.........................................................................................................89 4.4.3 Dynamical Level and Pattern Adaptation....................................................................89 4.5 EVALUATION OF ENVELOPE MODULATIONS............................................................................93 4.6 MODEL OUTPUT VALUES.......................................................................................................95 4.6.1 Measures for Non-Linear Distortions..........................................................................95 4.6.2 Measures for Linear Distortions..................................................................................98 4.6.3 Measures for Changes in the Temporal Structure........................................................99 4.6.4 Temporal Averaging................................................................................................101 4.6.5 Selection of Valid Sequences of the Audio Signal.....................................................102 4.7 OPTIMISATION OF THE MODEL..............................................................................................103 4.7.1 Auditory Frequency Scales.......................................................................................103 4.7.2 Sampling in the Time and Frequency Domain..........................................................104 4.7.3 Simultaneous Masking.............................................................................................104 4.7.4 Temporal Masking...................................................................................................105 4.7.5 Dynamical Level and Pattern Adaptation..................................................................105 4.8 SUMMARY...........................................................................................................................106 5. PEAQ - THE NEW STANDARD FOR OBJECTIVE MEASUREMENT OF PERCEIVED AUDIO QUALITY..................................................................................................................107 5.1 INTRODUCTION: STANDARDISATION WITHIN THE ITU...........................................................107 5.2 COMBINING OUTPUT VALUES OF DIFFERENT MODELS...........................................................108 5.2.1 Advanced Version....................................................................................................108 5.2.2 Basic Version...........................................................................................................108 5.3 DESCRIPTION OF THE COMBINED MODEL..............................................................................109 5.3.1 Outline.....................................................................................................................109 5.3.2 FFT-Based Ear Model..............................................................................................111 5.3.3 Filterbank-based Ear Model......................................................................................113 5.3.4 Model Output Variables...........................................................................................113 5.3.5 Temporal and Spectral Averaging............................................................................118 5.4 OPTIMISATION OF THE MODEL..............................................................................................118 5.4.1 Forward Masking.....................................................................................................118 5.4.2 Dynamic Level and Pattern Alignment.....................................................................118 6. ESTIMATION OF PERCEIVED BASIC AUDIO QUALITY..............................................119 6.1 GRADING SCALES FOR SUBJECTIVE AUDIO QUALITY ASSESSMENT.........................................119 6.1.1 The Five-Grade Impairment Scale............................................................................119 6.1.2 SDG (“Subjective Difference Grade”).......................................................................120 6.1.3 ODG (“Objective Difference Grade”)........................................................................120 6.2 ACCESSIBLE DATABASES......................................................................................................120 6.2.1 Database 1................................................................................................................121 6.2.2 Database 2................................................................................................................121 Contents 9 6.2.3 Database 3................................................................................................................121 6.2.4 Other Databases.......................................................................................................121 6.3 MAPPING FROM MODEL OUTPUT VALUES TO OBJECTIVE DIFFERENCE GRADES.......................122 6.3.1 Polynomial Mapping Functions................................................................................122 6.3.2 One-Dimensional Mapping Functions......................................................................122 6.3.3 Multidimensional Mapping Functions......................................................................123 6.4 APPLICATION OF AN ARTIFICIAL NEURAL NETWORK..............................................................125 6.5 DEFINITION OF A DISTORTION INDEX....................................................................................126 7. PERFORMANCE OF THE MEASUREMENT METHOD..................................................127 7.1 APPLIED CRITERIA...............................................................................................................127 7.1.1 Correlations between Model Predictions and Subjective Gradings.............................127 7.1.2 Average Prediction Error..........................................................................................128 7.1.3 Absolute Error Scores...............................................................................................128 7.1.4 Tolerance Scheme....................................................................................................128 7.2 PREDICTIONS OF COMPARATIVE TESTS AMONG AUDIO CODECS.............................................129 7.3 ITU-R COMPARATIVE TEST 1996.........................................................................................131 7.3.1 Results of the First Phase..........................................................................................131 7.3.2 Second Phase: Fitting the Models to the Database.....................................................133 7.3.3 Results of the Third Phase........................................................................................133 7.3.4 Conclusions..............................................................................................................133 7.4 ITU-R VALIDATION TEST 1997............................................................................................134 7.4.1 Optimisation and Preselection..................................................................................134 7.4.2 Results of the First Phase..........................................................................................136 7.4.3 Second Phase: Fitting the Models to the Database.....................................................137 7.4.4 Results of the Third Phase........................................................................................139 7.5 SUMMARY...........................................................................................................................146 8. SIMULATING PSYCHOACOUSTICAL EXPERIMENTS WITH PEAQ..........................147 8.1 PSYCHOACOUSTICAL MODELS AND PERCEPTUAL MEASUREMENT...........................................147 8.2 SIMULTANEOUS MASKING....................................................................................................147 8.2.1 Masking Properties of Tones and Noises...................................................................147 8.2.2 Additivity of Masking...............................................................................................151 8.3 TEMPORAL MASKING...........................................................................................................152 8.4 SUMMARY...........................................................................................................................153 9. OUTLOOK..............................................................................................................................154 10.SUMMARY.............................................................................................................................155 11.REFERENCES........................................................................................................................156 12.ACKNOWLEDGEMENT.......................................................................................................160 13.REMARKS..............................................................................................................................160 14.APPENDIX..............................................................................................................................161 14.1 INDEX.......................................................................................................................161 14.2 TABLE OF FIGURES....................................................................................................163 14.3 ABBREVIATIONS........................................................................................................166
Description: