ebook img

DTIC ADA436659: Time-Variant Least Squares Harmonic Modeling PDF

5 Pages·0.13 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA436659: Time-Variant Least Squares Harmonic Modeling

Time-Variant Least Squares Harmonic Modeling Qin Li and Les Atlas Department of Electrical Engineering, University of Washington Seattle, WA 98195-2500, USA ABSTRACT noise [7]. We also apply this technique to estimate pitch epochs from linear prediction (LP) residuals of speech signals. An algorithm for harmonic decomposition of time-variant The quantitative modeling accuracy of harmonic estimation is signals is derived from a least squares harmonic (LSH) also available as a reliable indicator to classify voiced versus technique. The estimates of harmonic amplitudes and phases unvoiced sounds. are formulated as the solution of a set of linear equations which minimizing mean square error; the signal frequency is 2. EXTENDED LEAST SQUARES HARMONIC modeled by a linear or quadratic polynomial and obtained via MODEL a local search over polynomial coefficients. An initial estimate of signal frequency is necessary to reduce Suppose we have a harmonic signal that consists of a set of computation time. This method is capable of producing sinusoi∑ds, which is given by accurate and robust harmonic estimation in low SNR M situations. We show applicability to high accuracy speech s(k)= C cos(iω(k)k+φ) , (1) i 0 i pitch and heart sound beat epoch estimation. i=1 where s(k) is a segment of a harmonic signal with length N. k=0,1, … , N-1 is the discrete time index, M is the total 1. INTRODUCTION number of harmonic components, C and φ are the amplitude i i and phase angle for each harmonic component i=1,2, … , M, Harmonic modeling, which is also known as a sinusoidal and ω(k) is the normalized fundamental frequency. For representation, has been widely applied in a number of areas 0 ELSH, our key difference from LSH [1], where ω was such as speech coding, compression, enhancement, synthesis, 0 and pitch estimation. Basically, harmonic modeling can be constant, is that ω0(k) is allowed to vary with time. described as a finite combination of sinusoidal components, and was first introduced by McAulay and Quatieri [4]. The LSH model, with our time-varying extension, assumes that the signal is the sum of a harmonic signal and a noise, so Usually speech signals can be decomposed into two parts: a that the signal is give∑n by quasi-periodic (harmonic component) and a non-periodic part s(k)=h(k)+n(k)= M C cos(iω(k)k+φ)+n(k) . (2) (noise component). A crucial step for harmonic modeling is to i 0 i i=1 find the parameters of harmonic components, e.g. their The harmonic component h(k) in equations (2) can be amplitudes, frequencies, and phases. A variety of techniques rewritten as: ∑ have been proposed for this decomposition of harmonic M signals, such as the high-resolution analysis of the short-time h(k)= C cos(iω(k)k+φ) i 0 i Fourier Transform (STFT) [4], the multiband excitation ∑i=1 (3) model (MBE) [3], and the least squares harmonic model = M C [cos(iω(k)k)cos(φ)−sin(iω(k)k)sin(φ)] (LSH) [1]. The first two techniques have been successfully ∑i=1 i 0 i 0 i used for low bit-rate speech coding; however their = M [Acos(iω(k)k)−B sin(iω(k)k)] , performance degrades at low SNR. The LSH model is capable i 0 i 0 i=1 oevf epnr oadtu cvienrgy mloowre SaNccRu;r atheo wanedv erro, bauss t whiallr mboen isch oawnanl,y siitss, where Ai=Cicos(φi), Bi=Cisin(φi), and psiegrnfoarl mfraenqcuee ndceyg. r ades significantly with rapid changes in φ=tan−1Bi Ai , if Ai ≥0 . (4) In this paper, we propose a harmonic analysis method for i tan−1Bi A +π, if Ai <0 i time-variant signals, which is a substantially modified version of the LSH approach developed by Abu-Shikhah and Deriche The weighted mean square error (MSE) between s(k) and h(k) [1]. The key difference from LSH [1] is that the fundamental is then: frequency of signal is allowed to vary with time within the ∑ data segment. A linear or quadratic polynomial is used to fit E= 1 N−1{s(k)−h(k)}2W(k) (5) the frequency variation in the data segment. The best-fit N∑k=0 ∑  harmonic estimation is then obtained via minimizing mean = 1 N−1s(k)− M [Acos(iω(k)k)−Bsin(iω(k)k)]2W(k) , square error (MSE). As we will show, our extended LSH N i 0 i 0 (ELSH) model has been successfully used to detect heartbeats where kW=0(k) is iw=1eight of the kth sample point for MSE from acoustic signals recorded from body-worn sensors with calculation. presence of very strong body-motion interference and ambient Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 2003 2. REPORT TYPE - 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Time-Variant Least Squares Harmonic Modeling 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION Army Research Laboratory,Adelphi,MD,20783 REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT An algorithm for harmonic decomposition of time-variant signals is derived from a least squares harmonic (LSH) technique. The estimates of harmonic amplitudes and phases are formulated as the solution of a set of linear equations which minimizing mean square error; the signal frequency is modeled by a linear or quadratic polynomial and obtained via a local search over polynomial coefficients. An initial estimate of signal frequency is necessary to reduce computation time. This method is capable of producing accurate and robust harmonic estimation in low SNR situations. We show applicability to high accuracy speech pitch and heart sound beat epoch estimation. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 4 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 For a given sequence of fundamental frequency ω(k), the range of discrete ω. This step is same as the algorithm 0 0 minimum MSE is found by describe in [1]. Then we step up with a linear frequency ∂E ∂E model. We formulate the instantaneous frequency of the =0, =0, for j=1, 2, … , M. (6) ∂Aj ∂Bj signal as ω (k)=a+bk . (10) inst Since the instantaneous frequency is the derivative of the ThYe1 ab=oveQ equRatioin Aend s, u p with a linear equation (7) winh eoqleu aptihoans e(2 t)e irsm t h(eirωe0fo(kre) kw+riφttie)n wasit h respect to time, ω0(k) Y2 S T B 1 where A and B are M×1 unknown vectors to be determined, ω(k)=a+ bk . (11) 0 2 and other matrices are defined as: ∑ For convenience, we set N−1  Q(j,i)=k=∑N0−c1os(iω0k)W(k)cos(jω0k) , k=−N2, ... , -1, 0, 1, ... , N2−1 if N is even (12) R(j,i)=− sin(iω0k)W(k)cos(jω0k) , −N−1, ... , -1, 0, 1, ... , N−1 if N is odd ∑k=0 (8) 2 2 N−1 S(j,i)= cos(iωk)W(k)sin(jωk) , without losing loss of generalization, so that the instantaneous k=∑0N−1 0 0 frequency ωinst(k) has mean of a and slope of b. We use ω0 T(j,i)=− sin(iωk)W(k)sin(jωk) , calculated from the constant frequency model as the initial ∑k=0 0 0 estimate of a. Then we search all combinations of a and b to Y(j)=N−1s(k)W(k)cos(jωk) , find the one with minimum MSE. 1 0 k∑=0 N−1 If necessary, we can continue to step up to a quadratic time- Y(j)= s(k)W(k)sin(jωk) , 2 0 varying frequency model, where we model instantaneous where i, kj= 0=1, 2, … , M, and k=0, 1, … N-1. frequency as ∑ 1 ω (k)=a+bk+c(k2− k2) , (13) Solving equation set (7) for a given ω(k) results in inst N k AB=QS RT−1iYY12 , 0 (9) aωn0d( kth)e= coar+re1s2pboknd+incg( 13pkha2s−e tN∑1erm∑ ωk0k(2k)) .is written as (14) a nd then Ci and φi can be obtained from equation (4). Adding a constant term − 1 k2 conveniently makes a the N k Solving equation (7) only yields amplitudes and phases given mean and b the slope of the instantaneous frequency. a specified fundamental frequency ω(k). There is no closed- 0 Therefore we can directly use a and b calculated from the form analytical solution for the unknown input signal linear model as the initial estimate. frequency. In order to estimate the unknown signal frequency, we need to repetitively solve equation (7) for a range of We can ostensibly continue to step up to obtain high order discrete ω0(k)’s, and select the ω0(k) that gives the minimum frequency model via a similar formulation. However higher MSE with corresponding Ci and φi as the final results. Since order frequency models require substantially more the result of a Fourier decomposition is unique, the true signal computation time and tend to fit noise. Using overlapped data fundamental frequency ω0(k) always yields the minimum windows also helps to obtain a better fit to frequency MSE among all possible ω0(k)’s. variation. In practice, the polynomial order should be determined from data quality, the nature of the unknown signal, and computation efficiency. 3. FUNDAMENTAL FREQUENCY ESTIMATE If we allow ω(k) to vary independently for each k=0,1,…,N-1 0 4. APPLICATIONS AND PERFORMANCE and then search for all possible combinations of ω(k), the 0 number of computations would be impractical. So here we Our extended LSH (ELSH) approach has been successfully choose a polynomial to model ω(k)’s evolution in k (discrete applied to detect acoustic heartbeat signal at very low SNR. 0 time). The polynomial could be zeroth order (constant We also demonstrate its application to pitch estimation. frequency, one free parameter), first order (linear chirp, two free parameters), second order (quadratic frequency, three 4.1 Acoustic Heartbeat Detection parameters), or higher order. Since the computation time For healthy and safety reasons there are needs for, say, an increases exponentially with polynomial order, the army or fire department to monitor soldier or firefighter’s computation time for high order polynomials is considerable. physiological indicators via body-worn acoustic sensors while An accurate initial estimation of frequency is necessary to they are doing their missions [7]. Heart rate is the most narrower down the search range and lower computation time. important physiological indicator for human health. Reliable algorithms are needed to estimate heart rate from acoustic In order to lower computation time, we implemented the heartbeat sounds. In such situations, the acoustic heartbeat polymonial models via a step-up recursion. We start with a signals: (1) have strong periodicity, but usually buried into constant frequency model and search over all candidates of a strong ambient noise and body motion interference; (2) are 4.2 Pitch estimation and voicing detection typically corrupted by additive noise sources that are non- In recent years, harmonic analysis methods have received stationary and diverse in structure; (3) have a rate which, due much attention on speech coding [3, 6] and pitch estimation to varied activity, can change rapidly over a short time period. [2, 5]. We proposed a method to estimate the pitch frequency In other words, we are dealing with a harmonic analysis of speech signals using ELSH. The schematic diagram for problem with potentially abrupt and rapid fundamental pitch estimation is shown in figure 3. First a linear prediction frequency change and low SNR. (LP) analysis was performed for every 100 ms and a LP 2.5 residual are generated. We partitioned the residual signal into 50 ms segments with a 25 ms overlap between segments to 2.48 avoid discontinuities. The ELSH was then performed for each 2.46 residual segment to obtain harmonic parameters and harmonic Hz)2.44 model resynthesis. A cos2 windowing function was applied cy (2.42 for accurate model reconstruction throughout overlapped n e regions of the segments. qu 2.4 e Fr 2.38 Usually for speech coding or pitch estimation, a voiced- 2.36 SCiognnsatla Innts Ftarenqtauneenocuys M Foredqeulency unvoiced (U/V) classifier is also needed. One of the Linear Frequency Model advantages of our approach is that the U/V discrimination 2.34 Quadratic Frequency Model indicator can directly be obtained from harmonic modeling -3 -2 -1 0 1 2 3 Time (sec) accuracy, e.g. a ha∑rmonic-to-noise ratio (HNR) Figure 1. Comparison of modeling changes of instantaneous frequencies. HNR=10log10∑ [s(kkh)(−kh)2(k)]2 , (15) Frequency Model Constant Linear Quadratic k Normalized which is very similar to HNR defined in [2]. Our HNR was 54.0% 53.4% 12.0% Waveform MSE calculated over a much shorter window (10 ms) to quickly follow transitions between voiced and unvoiced sounds. Table 1. Comparison of modeling errors 6 4 de 2 mplitu 0 A-2 -4 Original data ELSH model -6 805 810 815 820 825 Time (sce) 2.9 Figure 3. Schematic diagram of pitch estimation approach. Hz)2.8 quency (2.7 Wapep roteascthe dw tihthe ppoelryfnoormmaianlc et imaned-v arroybinugst nfersesq uoefn ctyh em EodLeSlHs. Fre2.6 SQiugandarla itnics tfarenqtauneenocuys m froedqeulency Speech test data was obtained from the Carnegie Mellon 805 810 815 820 825 University speech group website, Time (sec) htt p://www.festvox.org/dbs/dbs_time.html. We hand marked Figure 2. Results of ELSH approach on low SNR (-10 dB) data. pitch epochs for a comparison reference. ELSH was performed on both original speech and the same speech with We first give an example for ELSH applied to a clean added white noise. For each case, constant, linear and heartbeat signal (SNR = 10 dB) with a rapid heart rate change quadratic time-varying frequency models were used. We also over a short time period. Figure 1 depicts the modeled recorded computation time to compare computational instantaneous frequencies for the constant, linear and efficiency. quadratic model, respectively. Table 1 shows the corresponding normalized mean square error (MSE) between Typical results for clean data (no additive noise) are shown in the ELSH model and true waveform. From the constant to figure 4 and figure 5. In figure 4, the upper panel depicts linear frequency model, there is a minor improvement; yet comparison of pitch estimation results and hand marked from linear to quadratic frequency model, the improvement is pitches; the lower panel depicts the HNR and the subsequent significant. result for U/V classification. The U/V discrimination threshold was set at 3 dB by averaging results of other Another example is given for presence of colored noise with utterances. In figure 5, we can clearly see that the HNR low SNR of -10 dB. In figure 2, the upper panel depicts the performs very well as an indication of U/V classification. original signal and the ELSH model; the lower panel depicts the instantaneous frequency of the signal and our results from The comparisons for different noise levels and frequency quadratic frequency model. The results clearly show that our models are shown in table 2. We used a previously-defined harmonic analysis algorithm is very stable and reliable at low relative accuracy [8] to qualify the performance of the SNR. algorithm, which is defined as  ∑   1 N f(k)−f (k) (16) There are differences between frequency models. At low Relative Accuracy = 1− ELSH ×100% , noise level (SNR≥ -5dB), a higher order model generates N f(k) k=1 better results, but also takes more computation time; at high where f(k) and f (k) are the hand marked pitch and ELSH noise level (SNR≤ -10dB), a higher order model tends to fit estimated pitch from ELSH model at kth pitch period, noise and thus generates worse results. In practice, a proper respectively. The accuracy values show that the ELSH order should be chosen by balancing between accuracy and approach performs extremely well in low SNR situations. computation efficiency. Order is also affected by data quality Going from clean data to noise level of -10dB, there is only and window size. In general, a short window potentially has about a 1.3 percent drop in overall accuracy. It should also be less frequency variation and a low order model may be noted that best results were obtained by using different sufficient; a long window potentially has more frequency window size for different noise levels. variation and thus may need a high order model, yet offers higher tolerance to noise. For the speech tests, noise levels, and window sizes presented in this paper, since the z)160 advantages of the quadratic model were slight, we preferred uency (H111024000 Unvoiced Voiced Unvoiced t he linear frequency model at low noise level. eq 80 Quadratic frequency model Fr 60 Hand-marked pitches 5. CONCLUSION 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 Time (sec) (a) In this paper, we presented an approach for harmonic analysis, which is an extension to LSH. It has been demonstrated that 20 this extended LSH approach not only is exceptionally robust B) NR (d10 Unvoiced Voiced Unvoiced aranpdi da cfcreuqrauteen cayt lcohwan SgeN. RT, wbou t aaplpsloi caist iocnasp aobfl et hoifs caappptruorainchg H 0 Threshold = 3 dB were shown in the paper. The application to acoustic heartbeat -10 detection, where the quadratic time-varying model was used, 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 Time (sec) (b) has shown success on difficult data. The application to pitch Figure 4. Results of pitch estimation (no additive noise). (a) shows estimation has potential for high resolution pitch estimation. results of linear frequency model and hand-marked pitches. (b) The disadvantage of this method is that the computation shows HNR (solid line) and decisions of U/V classification complexity is high. It is thus not currently efficient to use this (dotted line, high level – voiced; low level – unvoiced) . The approach for U/V classification only. Future work on a large threshold is 3 dB. speech data base is needed to confirm the conclusion of sufficiency of the linear time-varying model. We acknowledge help from technical discussions with Prof. Mari Ostendorf. This work was funded by the Army Research Lab. LP residual 6. REFERENCES Unvoiced Voiced Unvoiced [1] N. Abu-Shikhah and M. Deriche, "A robust technique for harmonic analysis of speech," in Proc. IEEE ICASSP'01, vol. 2, Harmonic synthesis pp. 877-80, Piscataway, NJ, 2001. [2] Ahn R, Holmes Wh, Deriche M, Moody M, and Bennamoun M, 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 Time (sec) "Harmonic-plus-noise decomposition and its application in voiced/unvoiced classification," in Proc. IEEE TENCON '97, vol. Figure 5. LP residual and Harmonic Synthesis (no additive noise). 2, pp. 587-90, Brisbane, Australia, 1997. The upper trace is residual signal and lower trace is harmonic [3] D. W. Griffith and J. S. Lim, "Multiband excitation vocoder," synthesis. The dotted line separates estimated voiced and IEEE Transactions on Acoustics, Speech and Signal Processing, unvoiced regions. vol. 36, pp. 1223-1235, 1988. [4] R. J. McAulay and T. F. Quatieri, "Speech analysis/synthesis Frequency Model Constant Linear Quadratic based on a sinusoidal representation," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 34, pp. 744-754, No additive noise 98.9% 99.1% 99.2% 1986. 50 ms data window [5] R. J. McAulay and T. F. Quatieri, "Pitch estimation and voicing White noise (SNR= -5 dB) 98.4% 98.6% 98. 7% detection based on a sinusoidal speech model," in Proc. IEEE 50 ms data window ICASSP'90, vol. 1, pp. 249-252, 1990. White noise (SNR= -10 dB) 98.0% 97.8% 97.9% [6] R. J. McAulay and T. F. Quatieri, "The sinusoidal transform 100 ms data window coder at 2400 b/s," in IEEE MILCOM '92, vol. 1, pp. 378-380, White noise (SNR= -15 dB) 92.7% 92.6% 92.6% 1992. 100 ms data window [7] M. Scanlon, "Acoustic monitoring of first responder's physiology Approx. Computation 110 340 830 for health and performance surveillance," in SPIE 16th Annual Time (sec) International Symposium on Aerospace/Defense Sensing, Table 2. Relative accuracy for different noise levels and frequency Simulation, and Controls, Orlando, Florida, USA, 2002. models. The algorithm is implemented in MATLAB R13 [8] A. Shah, R. P. Ramachandran, and M. A. Lewis, "Robust pitch running on AMD Athlon 1.5GHz system with Windows 2000. estimation using an event based adaptive gaussian derivative The computation time is for 5 second speech sampled at 16 KHz. filter," in IEEE ISCAS'02, vol. 2, pp. 843-846, 2002.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.