Stochastic mechanical model of vocal folds for producing jitter and for identifying pathologies through real voices
Introduction
The production of a voiced sound starts when the airflow coming from the lungs is modified into the glottal signal, a quasi-periodic signal after passing through the glottis, where the vocal folds are located. The main examples of voiced sounds are the vowels and this paper is based on their production.
The acoustic pressure signal, after passing by the vocal folds, is filtered and amplified by the vocal tract and then radiated by the mouth originating the voice signal. As the vocal folds displacements are not exactly symmetric the time intervals corresponding to the air pulses of the glottal signal have random fluctuations, called jitter.
There are different ways to measure jitter and its study is important to identify irregularities on the phonation. The values of jitter considered to a normal voice is between and, at the maximum, in relation to the mean of the time glottal intervals. Other acoustic measures can also be used, as Shimmer and HNR (Ratio Harmonic-Noise), to help in identifying pathologies on the vocal folds, vocal aging or even to help in problems of speaker recognition or stress situations related to the voice. However, the main feature that should be considered is jitter (Wong et al., 1991, Jiang et al., 2009, Dejonckerea et al., 2012, Mongia and Sharma, 2014, Silva et al., 2016) and this paper is focused in its generation.
Some models of jitter have been proposed but, in general, they do not consider mechanical models, they are created directly on the voice signals, considering some perturbations as, for example, a controlled noise (Schoengten and De Guchteneere, 1997).
Some mechanical models of jitter have been proposed by the same authors of this paper (Cataldo et al., 2012, Cataldo and Soize, 2016, Cataldo and Soize, 2017) and, now, a new mechanical stochastic model is then proposed but considering three control parameters, which gives more possibilities to generate jitter, including a way to change the quality of the voice generated. A new parameter is introduced to discuss this quality, related to the bandwidth of the power spectral density function and, mainly, an inverse stochastic problem is solved to identify parameters and, consequently, to validate the model proposed. With these new possibilities, specific pathologies of the vocal folds can be created and identified, such as paralysis of the vocal folds.
The stochastic model proposed here has the origin based on the deterministic model created by Flanagan and Landgraf (1968), known as the first model used to generate voice using a nonlinear one-mass mechanical model. More complete deterministic models were created (Ishizaka and Flanagan, 1972, Avanzini, 2008; Zhang and Jiang, 2008, Pickup and Thomson, 2009; Cveticanin, 2012, Erath et al., 2013, Pinheiro and Kerschen, 2013) even considering pathological cases in the vocal folds (Gunter, 2004) or stress situation (Luzan et al., 2015) but the idea here is to show that it is possible to generate jitter and voice signal with quality from the primary model considering the stiffness as a stochastic process and, mainly, validate the model proposed identifying parameters solving an statistical inverse problem taking into account experimental normal voices and also with pathological characteristics.
Section snippets
Primary deterministic model
Fig. 1 illustrates a sketch of the model.
Each vocal fold is represented by a nonlinear mass-stiffness-damper system and the complete model is composed by the subsystem of the vocal folds (source) coupled by the glottal flow to the subsystem of the vocal tract (filter). To generate jitter the stiffness will be considered as a stochastic process for which a model is proposed.
Stochastic modeling of jitter
The stiffness k is modeled by a stochastic process with values in . Consequently, the dynamical position of each vocal fold will be given by a stochastic process, named , coupled with the stochastic process associated with the glottal flow (volume flow velocity), noted . The stochastic dynamics of the vocal folds is described by Eq. (1):where and , with the length of each vocal fold and d
General ideas
The objective of this section is to generate voice signals with jitter using the stochastic model proposed and to analyze the sensitivity of the stochastic model with respect to parameters , and . As the main idea is to generate jitter, a way to measure it will also be discussed. There are different ways to analyze jitter effects (Mongia, 2012). At first, it is important to define the random variable associated with the duration of the glottal cycle, which is defined as the duration between
Statistical inverse problem
In order to validate the model proposed, parameters , and are identified using experimental voice signals. This identification is carried out by introducing a cost function that is constructed writing that the probability density function associated with the simulated voice is close to the probability density function of the experimental voice and also, the jitter obtained for the simulated voice is close to the jitter of the experimental voice. The four measures of jitter are used. The
Conclusions
A stochastic model has been proposed using three control parameters for generating jitter considering a mechanical model for producing voiced sounds. Some pathological cases have been generated and the model has been validated considering an inverse stochastic problem to identify the parameters. With three control parameters more possibilities of different sounds are obtained, including different levels of jitter and, mainly, it is possible to control the quality of the synthesized voice. The
Conflict of interest statement
The authors disclose any financial and personal relationships with other people or organisations that could inappropriately influence their work.
Acknowledgments
This work was supported by CNPq (Conselho Nacional de Pesquisa e Desenvolvimento) – Brazil.
References (25)
Simulation of vocal fold oscillation with a pseudo-one-mass physical model
Speech Commun.
(2008)- et al.
Jitter generation in voice signals produced by a two-mass stochastic mechanical model
Biomed. Signal Process. Control (Print)
(2016) - et al.
A review of lumped-element models of voiced speech
Speech Commun.
(2013) Modeling mechanical stresses as a factor in the etiology of benign vocal fold lesions
J. Biomech.
(2004)- et al.
Influence of asymmetric stiffness on the structural and aerodynamic response of synthetic vocal fold models
J. Biomech.
(2009) - et al.
Vibrational dynamics of vocal folds using nonlinear normal modes
Med. Eng. Phys.
(2013) - et al.
Nonlinear dynamic mechanism of vocal tremor from voice analysis and model simulations
J. Sound Vib.
(2008) - et al.
Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations
(1997) - et al.
Voice signals produced with jitter through a stochastic one-mass mechanical model
J. Voice
(2017) - Cataldo, E., Soize, C., Sampaio, R., 2012. Using Bayesian method for updating the probability density function related...
To what degree of voice perturbation are jitter measurements valid? A novel approach with synthesized vowels and visuo-perceptual pattern recognition
Biomed. Signal Process. Control
Cited by (8)
Biomechanical Models to Represent Vocal Physiology: A Systematic Review
2023, Journal of VoiceCitation Excerpt :These models allow to understand the detailed measurements of the vocal folds geometry, as well as to simulate the deformation that occurs during their self-oscillation.53,54 These models analyze the dynamics of the vocal folds to determine possible structural and vibratory asymmetries, simulating, for example, vocal cord paralysis, to obtain characteristics related to the sound quality of normal and pathological voices.47,55–59 Mass-spring models consist of two or more mass elements that are coupled to each other with simple springs.
A Novel Source-Filter Stochastic Model for Voice Production
2023, Journal of VoiceCitation Excerpt :Some stochastic models of jitter have already been proposed taking into account only mathematical expressions of the glottal signal without considering a mechanical model for the vocal folds.22,23 Other authors have recently described a mechanical model for the vocal folds considering the generation of jitter.3–5 However these models consider the coupling between the vocal folds and the vocal tract causing a relative model complexity that induces a significant computational cost for carrying out its identification by solving a statistical inverse problem.
Stochastic models of glottal pulses from the Rosenberg and Liljencrants-Fant models with unified parameters
2021, Computer Speech and LanguageCitation Excerpt :Ruinskiy and Lavner (2008) presented an algorithm to transform a normal voice in a hoarse voice. Using spring-mass-damper mechanical models of the vocal folds, Cataldo and Soize (2016, 2018) considered a stochastic model for the stiffness of the spring to generate jitter. In this paper, a voice modification algorithm for transforming a modal voice to a hoarse voice is presented.
A stochastic model of voice generation and the corresponding solution for the inverse problem using Artificial Neural Network for case with pathology in the vocal folds
2021, Biomedical Signal Processing and ControlCitation Excerpt :It is important to say that the work presented here is a continuation and an extension of the paper previously published in Journal of Biomedical Signal Processing and Control [2] but with a new methodology and also new results. In this work, a simplified model is proposed for the generation of jitter, considering the unification of two deterministic models proposed by Qureshi [14] and also by Titze [20], Titze [21] with posterior modifications [7], disregarding a coupling equation between the vocal tract and the vocal folds, and considering the stiffness associated to the vocal folds as a stochastic process following the ideas proposed by [2], [3]. With simplified stochastic model it is possible to obtain very good intelligible synthesis of voiced sounds, including jitter, characterizing normal voices but also hoarse voices or voices indicating pathologies due to the high level of jitter.
Dynamics of a Duffing oscillator with the stiffness modeled as a stochastic process
2019, International Journal of Non-Linear MechanicsUsing a vertical three-mass computational model of the vocal folds to match human phonation of three adult males
2023, Journal of the Acoustical Society of America