Second generation wavelet transform-based pitch period estimation and voiced/unvoiced decision for speech signals

doi:10.1016/S0003-682X(02)00055-5

Applied Acoustics

Volume 64, Issue 1, January 2003, Pages 25-41

https://doi.org/10.1016/S0003-682X(02)00055-5 Get rights and content

Abstract

Pitch detection is an important part of speech recognition and speech processing. In this paper, a pitch detection algorithm based on second generation wavelet transform was developed. The proposed algorithm reduces the computational load of those algorithms that were based on classical wavelet transform. The proposed pitch detection algorithm was tested for both real speech and synthetic speech signal. Some experiments were carried out under noisy environment condition to evaluate the accuracy and robustness of the proposed algorithm. Results showed that the proposed algorithm was robust to noise and provided accurate estimates of the pitch period for both low-pitched and high-pitched speakers. Moreover, different wavelet filters that were obtained using second generation wavelet transform were considered to see the effects of them on the proposed algorithm. It was noticed that Haar filter showed good performance as compared to the other wavelet filters.

Introduction

Pitch period is the fundamental frequency of an audio waveform and is an essential component in various speech processing applications such as speech segregation, speaker identification and verification, hearing impaired diagnostic, and speech coding. Since pitch detection has played an important role in speech processing areas, a wide variety of pitch detection algorithm (PDA) have been proposed in speech processing literature. The pitch detection algorithms can be generally classified into either event detection algorithms or non-event detection algorithms. The event detection algorithm that is based on autocorrelation function computes the pitch frequency by finding instance where the glottis closes and then measuring the time interval between two such glottis closures. So far only a few event detection algorithms have been proposed. They are based on the calculation of autocorrelation function that displays a fairly prominent peak at the pitch period. The disadvantage of event detection based on autocorrelation function is that it can estimate the pitch period exactly in certain vowels. Its performance also degrades non-stationary pitch periods. The non-event based pitch detectors estimate pitch period by a direct approach. Therefore, they are computationally simple as compared to event pitch period over a segment of a speech signal that is obtained by using a window. Some non-event methods are: (1) modified autocorrelation method using clipping (AUTOC) [1], (2) cepstrum method [2], (3) simplified inverse filtering technique [3], (4) average magnitude difference function (AMDF) [4]. However, non-event methods are insensitive to non-stationary variations in the pitch period variations during the measurement interval. They are not suitable for wide range of speaker as well.

Some reasons prevent pitch period of speech signal from being estimated accurately. The first reason is that the formants of vocal tract have a significant effect on the structure of the glottal waveform so that the effect causes difficulty in computing pitch period. The second problem in reliably measuring pitch period is the difference between low-level voiced speech and unvoiced speech. A third difficulty in pitch detection is determination of beginning and end points of pitch period during voiced speech segments.

The PDA based on classical wavelet transform (CWT) in literature [5], [6], [7] estimates the pitch period by determining the glottal closure instant (GCI) and measuring the time period between such two events because when a GCI occurs in a speech waveform, maximum occurrence in the adjacent scales of wavelet transform. However, construction of the CWT relies on the Fourier transform (FT) and needs clumsy mathematical operations.

In this work, a novel approach is proposed for pitch period estimation. The proposed method is based on second-generation wavelets that are constructed by lifting scheme. A lifting scheme is a new method to construct wavelets. The basic idea behind the lifting scheme is very simple. It starts with a trivial wavelet, which doesn't do anything but it holds the formal properties of a wavelet. The lifting scheme, then, gradually constructs a new wavelet with improved properties. Classical wavelets, which are so-called as first generation wavelets, are known as translation and dilation of one fixed function. The Fourier transform is then very important tool for first generation wavelets. On the other hand, a construction with lifting scheme is entirely spatial and is, therefore, ideally suited for building second-generation wavelets when no Fourier transform is available. In addition, construction of the second-generation wavelet not only needs less computational time compared to that of CWT but also it is an easier process.

Performance analysis is done on real speech signals, which reveals relatively accurate and reliable pitch detection. It shows that the algorithm improves performance considerably when compared to the method based on CWT.

The rest of the paper is organized as follows: in Section 2, we review the classical wavelet transform. Section 3 is devoted to the description of construction of the second-generation wavelet transform (SGWT). In Section 4, the pitch period determination is discussed. In Section 5 results and the relevant discussion are given. In Section 6, conclusions are given.

Section snippets

Classical wavelet transform

Classical wavelets are functions generated from one single function ψ by dilations and translations [8], [9], [10]. $ψ_{a,b} t = a^{−12} ψ t−b a$

Where b is real valued and called the shift parameter. The function set $ψ_{a,b} t$ is called a wavelet family. Since the parameters (a, b) are continuous valued, the transform is called continuous wavelet transform. The definition of classical wavelets as dilates of one function means that high frequency wavelets correspond to a<1 or narrow width, while low frequency

Construction of the second generation wavelet transform

Wavelets make versatile tool for representing general functions. They can be considered as data building blocks. Moreover, they capture the core of a data set with only a small set of coefficients. This is based on the fact that correlation structure exists in neighboring points to obtain a sparse representation of the signal. The correlation structure is typically local in time and frequency. The SGWT, which is constructed by lifting scheme, is where wavelets are not necessarily translates and

Pitch period determination and voiced/unvoiced decision

The proposed pitch determination algorithm is based on the SGWT. Here, we use the assumption that when a GCI occurs in a speech waveform, maximums also occur in the adjacent scales of the wavelet transform. Although we were inspired by the studies of Kadambe and Obadiat [5], [6], [7] there are many differences in the construction of the wavelets. Unlike the pitch determination algorithms based on the CWT, the proposed algorithm utilizes a lifting scheme to construct the SGWT. The SGWT is

Experimental results

The proposed algorithm was tested on a wide variety of speech data for both synthetic speech and real speech signals. The real speech data consisted of two utterances “ should we chase those cowboys” spoken by an American female and male. First, a synthetic voiced signal, which is shown in Fig. 5a, was produced with pitch frequency 120 Hz. In Fig. 5b–d, the SGWT coefficients of the synthetic speech were plotted at different scales. As shown in Fig. 5b–d, the SGWT gives local maxima across all

Conclusions

The PDA based on SGWT was presented for a wide range of the pitch periods. It was tested for both real speech and synthetic speech signals. As can be seen in all illustrations, pitch periods of synthetic and real speech signal were estimated accurately. Different wavelet filters such as Haar, Dauebchies 4, Dauebchies 6, (9–7), and cubic B-splines were considered to see effects of them on the proposed PDA. The algorithm showed better performance with Haar filter. The proposed PDA is superior to

References (18)

M.S. Obaidat et al.
Estimation of pitch period of speech signal using a new dyadic wavelet transform
Journal of Information Sciences
(1999)
M.S. Obaidat et al.
A performance evaluation study of four wavelet algorithms for the pitch period estimation of speech signals
Journal of Information Sciences
(1998)
W. Sweldens
The lifting scheme: a custom-design construction of biorthogonal wavelets
Appl. Computer Harmon. Anal
(1996)
R.R. Lawrence
On the use of autocorrelation analysis for pitch detection
IEEE Trans. on Acous., Speech, and Signal Proc. ASSP-25
(1977)
A.M. Noll
Cepstrum pitch determination
IEEE Trans. Audio Electro Acoustic. Vol. AU-16
(1968)
J.D. Markel
The SIFT algorithm for fundamental frequency estimation
IEEE Trans. Audio Electro Acoustic. Vol. AU-20
(1972)
M.J. Ross et al.
Average magnitude difference function pitch extractor
IEEE Trans. on Acous., Speech, and signal proc., vol. ASSP-22
(1974)
S. Kadambe et al.
Application of the wavelets transform for pitch detection of speech signals
IEEE Trans. on Information Theory
(1992)
O. Rioul et al.
Wavelet and signal processing
IEEE SP Magazine
(1991)

There are more references available in the full text version of this article.

Cited by (36)

Analyzing the Vibration Signals for Bearing Defects Diagnosis Using the Combination of SGWT Feature Extraction and SVM
2018, undefined
The aim of this paper is to introduce a multi-step vibration-based diagnostic algorithm to automatically diagnose bearings faults. The proposed diagnostic scheme extracts the informative features from each component by resorting to the second generation wavelet transform. Undoubtedly, a large dimension of features brought more challenges to detect healthy and defective bearings. In this regard, the dimensionality reduction phase makes use of linear discriminant analysis that aims to obtain a low dimensional representation of high dimensional data as well as achieves maximum separability between different classes. Furthermore, self-organizing maps (SOM) helps in evaluating and facilitating visual comprehension of the extracted features. In the following step, support vector machine (SVM) is used for identifying faulty and fault-free bearings. Finally, the performance of the proposed technique is compared with the previous works.
iPEEH: Improving pitch estimation by enhancing harmonics
2016, Expert Systems with Applications
Citation Excerpt :
As is seen, a majority of algorithms in this domain use spectrum in a direct way without any modification or enhancement. Other useful domains for pitch extraction include the wavelet domain (Chen & Wang, 2002; Ercelebi, 2003) and the Hilbert-Huang domain (Huang & Pan, 2006). Recently, data-driven statistical approaches are increasingly popular.
Pitch estimation is quite crucial to many applications. Although a number of estimation methods working in different domains have been put forward, there are still demands for improvement, especially for noisy speech. In this paper, we present iPEEH, a general technique to raise performance of pitch estimators by enhancing harmonics. By analysis and experiments, it is found that missing and submerged harmonics are the root causes for failures of many pitch detectors. Hence, we propose to enhance the harmonics in spectrum before implementing the pitch detection. One enhancement algorithm that mainly applies the square operation to regenerate harmonics is presented in detail, including the theoretical analysis and implementation. Four speech databases with 11 types of additive noise and 5 noise levels are utilized in assessment. We compare the performance of algorithms before and after using iPEEH. Experimental results indicate that the proposed iPEEH can effectively reduce the detection errors. In some cases, the error rate reductions are higher than 20%. In addition, the advantage of iPEEH is manifold since it is demonstrated in experiments that the iPEEH is effective for various noise types, noise levels, multiple basic frequency-based estimators, and two audio types. Through this work, we investigated the underlying reasons for pitch detection failures and presented a novel direction for pitch detection. Besides, this approach, a preprocessing step in essence, indicates the significance of preprocessing for any intelligent systems.
Audio watermarking scheme based on embedding strategy in low frequency components with a binary image
2009, Digital Signal Processing: A Review Journal
With the recent development of information technology and computer network, digital format of data has become more and more popular. However, a major problem faced by digital data providers and owners is protecting data from unauthorized copying and distribution. As a solution to the problem, digital watermark technology is now attracting attention as new method of protection against said unauthorized copying and distribution. The aim of the digital audio watermarking is to take prespecified data that carries certain information and hide it within the audio stream such that it is not audible to the human ear (i.e., transparent) but at the same time renders the file more resistant to removal (i.e., robust). In this paper, we propose a new method for embedding digital watermarks into audio signals in low frequency components, which method mitigates these and other related shortcomings. The proposed method uses the wavelet transform constructed by lifting-based wavelet transform (LBWT) in order to provide a fast implementation between watermark embedding and extraction parts. In the first stage of the proposed method, the original audio host signal is converted to a wavelet domain using LBWT. The signal is thus decomposed into low and high frequency components. Approximation coefficients correspond to low frequency components of the signal. Next, the watermark generated by pseudorandom numbers is embedded into wavelet approximation coefficients of the segmented host audio signal depending on the binary value of the binary image. The reason for embedding the watermark in the low frequency components is that these components' energy is greater than that of high frequency components in such a way that the watermark is inaudible; therefore, it should not alter the audible content and should not be easy to remove. The proposed method uses a binary image to decide whether or not the watermark generated by pseudorandom numbers is embedded in the audio host signal. To evaluate the performance of the proposed audio watermarking method, subjective and objective quality tests including bit error rate (BER) and signal-to-noise ratio (SNR) are conducted. The tests' results show that the proposed method yields a high recovery rate after attacks by commonly used audio data manipulations such as low-pass filtering, requantization, resampling and MP3 compression.
Adaptive multiresolution finite element method based on second generation wavelets
2007, Finite Elements in Analysis and Design
Citation Excerpt :
The scaling function and wavelet can be custom constructed by designing the prediction operator and update operator. Second generation wavelets have gradually being applied in many areas [11–19], especially in numerical analysis. Yserentant [12,13] has proposed an important multiresolution approach adopting hierarchical bases, which can be interpreted as a very elementary class of second generation wavelets [14].
A distinguishing feature of second generation wavelets is that it can be custom designed depending on applications. Based on second generation wavelets, a multiresolution finite element method is discussed, and its adaptive algorithm is constructed. The hierarchical approximation spaces for finite element analysis are produced. The finite element equation is scale-decoupled via eliminating all coupling in the stiffness matrix of element across scales, then resolved in different spaces independently. The coarse solution can be obtained in the coarse approximation space, and refined by adding details in the detail spaces over several levels till the equation is resolved to the desired accuracy. The scale-decoupling condition of the stiffness matrix of element is proposed by introducing wavelet vanishing moments, and the principle of constructing the scale-decoupling wavelet bases is established. The method establishes an important connection between finite element analysis and multiresolution analysis. The numerical examples have illustrated that the proposed method is powerful to analyze the field problems with changes in gradients and singularities.
Application of lifting based wavelet transforms to characterize power quality events
2007, Energy Conversion and Management
This paper discusses the analysis of voltage disturbances in the time scale domain using Lifting Based Wavelet Transforms (LBWT) to quantify Power Quality (PQ) events. Characteristics of the investigated signals are generated on a time–frequency plane. Converter operation, load interruption/reenergizing, nonlinear loading and two types of capacitor switching, representing five common power quality events at the distribution level, are presented. These examples provide the basis for further characterization of other power quality events. Magnitudes of transient PQ events are located in the width of the signal. Furthermore, meaningful time and frequency components of transients are analyzed. The whole method is implemented and tested over a sample representing disturbances. Simulation results for five types of PQ events show that the proposed method is more efficient and faster in tracking signal dynamics than classical Wavelet Transforms (WT).
A novel ensemble model on defects identification in aero-engine blade
2021, Processes

View all citing articles on Scopus

View full text

Second generation wavelet transform-based pitch period estimation and voiced/unvoiced decision for speech signals

Abstract

Introduction

Section snippets

Classical wavelet transform

Construction of the second generation wavelet transform

Pitch period determination and voiced/unvoiced decision

Experimental results

Conclusions

Journal of Information Sciences

Journal of Information Sciences

Appl. Computer Harmon. Anal

On the use of autocorrelation analysis for pitch detection

IEEE Trans. on Acous., Speech, and Signal Proc. ASSP-25

Cepstrum pitch determination

IEEE Trans. Audio Electro Acoustic. Vol. AU-16

The SIFT algorithm for fundamental frequency estimation

IEEE Trans. Audio Electro Acoustic. Vol. AU-20

Average magnitude difference function pitch extractor

IEEE Trans. on Acous., Speech, and signal proc., vol. ASSP-22

Application of the wavelets transform for pitch detection of speech signals

IEEE Trans. on Information Theory

Wavelet and signal processing

IEEE SP Magazine