Phasegram Analysis of Vocal Fold Vibration Documented With Laryngeal High-speed Video Endoscopy

doi:10.1016/j.jvoice.2015.11.006

Journal of Voice

Volume 30, Issue 6, November 2016, Pages 771.e1-771.e15

https://doi.org/10.1016/j.jvoice.2015.11.006 Get rights and content

Summary

Introduction

In a recent publication, the phasegram, a bifurcation diagram over time, has been introduced as an intuitive visualization tool for assessing the vibratory states of oscillating systems. Here, this nonlinear dynamics approach is augmented with quantitative analysis parameters, and it is applied to clinical laryngeal high-speed video (HSV) endoscopic recordings of healthy and pathological phonations.

Methods

HSV data from a total of 73 females diagnosed as healthy (n = 42), or with functional dysphonia (n = 15) or with unilateral vocal fold paralysis (n = 16), were quantitatively analyzed. Glottal area waveforms (GAW) and left and right hemi-GAWs (hGAW) were extracted from the HSV recordings. Based on Poincaré sections through phase space-embedded signals, two novel quantitative parameters were computed: the phasegram entropy (PE) and the phasegram complexity estimate (PCE), inspired by signal entropy and correlation dimension computation, respectively.

Results

Both PE and PCE assumed higher average values (suggesting more irregular vibrations) for the pathological as compared with the healthy participants, thus significantly discriminating healthy group from the paralysis group (P = 0.02 for both PE and PCE). Comparisons of individual PE or PCE data for the left and the right hGAW within each subject resulted in asymmetry measures for the regularity of vocal fold vibration. The PCE-based asymmetry measure revealed significant differences between the healthy group and the paralysis group (P = 0.03).

Conclusions

Quantitative phasegram analysis of GAW and hGAW data is a promising tool for the automated processing of HSV data in research and in clinical practice.

Introduction

The behavior of a vibratory system is periodic if the observed oscillatory pattern continuously repeats itself after a constant time interval. Periodicity abiding this strict definition is hardly observed in empirical data of biomechanical systems such as the voice. Rather, voice production is at best a nearly periodic¹ phenomenon under nonpathological conditions. In the presence of a voice disorder, vocal fold vibration and thus the generated acoustical output is likely to be more or less perturbed,² often caused by highly irregular vibratory regimes of the vocal folds.3, 4, 5, 6

Deviations of periodicity can be quantified in a variety of ways. Apart from time domain-based and frequency domain-based approaches such as calculation of jitter⁷ or the harmonics-to-noise ratio,⁸ methods from nonlinear systems analysis have received growing interest in the past decades.9, 10, 11, 12 In non-linear dynamics methods, the voice is considered to be a dynamical system¹³ that is able to exhibit a wide variety of oscillatory behavior “on the way to chaos.”14, 15

Several quantitative methods for assessing the complexity of the temporal behavior of nonlinear systems have been introduced in the domain of mathematics and physics, for instance the correlation dimension,¹⁶ Lyapunov exponents,¹⁷ or Tokuda et al's low-dimensional nonlinearity measure.¹⁸ These methods have been successfully applied during the analysis of biosignals from both healthy and pathological voices, such as the acoustical waveform,12, 19, 20, 21, 22 electroglottography,23, 24, 25 or data derived from high-speed video (HSV) recordings of vocal fold vibration.26, 27, 28, 29

The detailed interpretation of available quantitative methods for analyzing the dynamics of irregular voice often requires expert background knowledge in mathematics and physics. In contrast, visualization methods are often easier to understand for nonexperts. Such visualization methods, applied to nonperiodic voice production, include for example spectrograms12, 30, 31 or local maxima displays.³²

Recently, a novel visualization method of system dynamics has been introduced: the phasegram.³³ In a phasegram, time is mapped onto the x-axis, and various vibratory regimes, such as periodic oscillation, subharmonics, or chaos, are identified within the generated graph by the number and the stability of horizontal lines. Phasegrams can be interpreted as bifurcation diagrams in time. They are particularly suited for nonstationary signals. The benefits of sliding window analysis are combined with the visualization potential of phase space embedding.34, 35 In contrast to other nonlinear analysis techniques (eg, bifurcation maps), phasegrams can be automatically constructed from a time domain signal alone, no additional system parameter needs to be known. In contrast to conventional voice perturbation measures (eg, jitter), no information about glottal cycle duration or fundamental frequency needs to be known.

Phasegrams have thus far been utilized for the visualization³³ and the manual classification³⁶ of electroglottographic voice signals. Here, their application to the analysis of time series data derived from HSV recordings is introduced by example of simulated vocal fold vibrations using a lumped element biomechanical model. The concept is further extended to healthy and pathological phonations, considering both stationary and nonstationary signals. The analysis is complemented by spatiotemporal visualization37, 38 and of Fourier analysis of vocal fold vibration and of simultaneously acquired acoustical signals. It will be shown that sequences of aberrant vocal fold vibratory behavior can be easily located in phasegrams, thus earmarking the method as a promising candidate for detection of clinically relevant passages within HSV recordings. To facilitate automated objective analysis of vocal fold vibratory behavior (as seen in HSV recordings), two novel quantitative analysis parameters derived from the phasegram visualization are introduced in this paper. The performance of these quantitative parameters is assessed through analysis of a database containing HSV recordings of healthy and pathological phonations.

Section snippets

Participants and phonatory tasks

A total of 73 female participants were included in the study. Before data acquisition, all participants underwent a standard clinical evaluation. Forty-two of these were considered to be normophonic (ie, healthy) speakers. Another 15 participants were diagnosed with functional dysphonia, and the remaining 16 were diagnosed with unilateral vocal fold paralysis. The average (±standard deviation) age of these clinical groups was 40.2 ± 15.8 years (healthy), 46.2 ± 16.1 years (functional

Qualitative analysis—typical examples

Three stereotypical vocal fold vibratory regimes (periodic, subharmonic, and irregular), generated through attempts at stable phonation by a normophonic female and two females with vocal fold paralysis, respectively, are illustrated in Figure 3.

Discussion

In a recent publication, the phasegram has been introduced as an intuitive visualization tool for various oscillatory phenomena in physics and in biology, demonstrated with analysis of the human voice.³³ Here, a more specialized investigation is performed, showing that phasegrams are useful in analyzing signals derived from HSV recordings documenting healthy and pathological phonations. The feasibility of the approach was demonstrated by creating a phasegram of the GAW from a synthesized vocal

Conclusion

In this work, the phasegram visualization method has been extended to the analysis of GAW data derived from HSV recordings of both normophonic and pathological voice production. Qualitative analysis showed that the phasegram is a valuable complement to existing analysis methods, as it provides direct insights into the time-dependent complexity of vocal fold vibration. Because of the phasegram's potential to condense information about the vocal fold dynamics of an entire phonation into a single

Acknowledgments

This research was supported by the institutional fund of Palacký University Olomouc, Czech Republic (to C.T.H.), by the Technology Agency of the Czech Republic project no. TA04010877 (to CTH and JGS), by the state budget of the Czech Republic OPVK CZ.1.07/2.3.00/20.0057 (to J.G.Š.), and by grant no. LO1413/2-2 from Deutsche Forschungsgemeinschaft (to J.U. and J.L.).

References (56)

H. Hollien et al.
A method for analyzing vocal jitter in sustained phonation
J Phon
(1973)
JiangJ. et al.
Chaos in voice, from modeling to measurement
J Voice
(2006)
W.T. Fitch et al.
Calls out of chaos: the adaptive significance of nonlinear phenomena in mammalian vocal production
Anim Behav
(2002)
P. Grassberger et al.
Measuring the strangeness of strange attractors
Physica D
(1983)
ZhangY. et al.
Acoustic analyses of sustained and running voices from patients with laryngeal pathologies
J Voice
(2008)
ZhangY. et al.
Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis
J Voice
(2005)
R.J. Baken
Irregularity of vocal period and amplitude: a first approach to the fractal analysis of voice
J Voice
(1990)
J.-C. Roux et al.
Observation of a strange attractor
Physica D
(1983)
J. Lohscheller et al.
Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos
Med Image Anal
(2007)
S.-Z. Karakozoglou et al.
Automatic glottal segmentation using local-based active contours and application to glottovibrography
Speech Commun
(2012)

I.R. Titze

Workshop on acoustic voice analysis

(1995)

R.J. Baken et al.

Clinical Measurement of Speech and Voice

(2000)

C. Bohr et al.

Spatiotemporal analysis of high-speed videolaryngoscopic imaging of organic pathologies in males

J Speech Lang Hear Res

(2014)

D.D. Mehta et al.

Voice production mechanisms following phonosurgical treatment of early glottic cancer

Ann Otol Rhinol Laryngol

(2010)

J.G. Svec et al.

Videokymography in voice disorders: what to look for?

Ann Otol Rhinol Laryngol

(2007)

D.A. Berry et al.

Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions

J Acoust Soc Am

(1994)

E. Yumoto et al.

Harmonics-to-noise ratio as an index of the degree of hoarseness

J Acoust Soc Am

(1982)

W. Lauterborn et al.

Methods of chaos physics and their application to acoustics

J Acoust Soc Am

(1988)

H. Herzel

Bifurcations and chaos in voice signals

Appl Mech Rev

(1993)

I.R. Titze et al.

Evidence of chaos in vocal fold vibration

S.H. Strogatz

Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering

(2007)

W. Lauterborn et al.

Subharmonic routes to chaos observed in acoustics

Phys Rev Lett

(1981)

J. Eckmann et al.

Liapunov exponents from time series

Phys Rev A

(1986)

I. Tokuda et al.

Nonlinear analysis of irregular animal vocalizations

J Acoust Soc Am

(2002)

I. Tokuda et al.

Surrogate analysis for detecting nonlinear dynamics in normal vowels

J Acoust Soc Am

(2001)

H. Herzel et al.

Detecting bifurcations in voice signals

A. Behrman et al.

Correlation dimension of electroglottographic data from healthy and pathologic subjects

J Acoust Soc Am

(1997)

A. Behrman

Global and local dimensions of vocal dynamics

J Acoust Soc Am

(1999)

Cited by (12)

Automatic glottis segmentation for laryngeal endoscopic images based on U-Net
2022, Biomedical Signal Processing and Control
Citation Excerpt :
Precise glottal area segmentation shows the outline of vocal fold, the location and shape of glottis, which is significant for further classification of laryngeal diseases or computer-assisted-surgery [8]. The glottis segmentation has been widely used to classify laryngeal diseases [9,10], also to understand vibratory patterns in different phonation or singing style [11,12] and others applications [6]. However, the automatic segmentation of glottis remains a challenging task due to the various shapes of glottis, the low brightness of laryngeal images, the slight differences between glottis and other laryngeal tissues, the existence of laryngeal diseases and so on.
The glottis’s morphology not only reflects vocal and respiratory information, but also plays an important role in the diagnosis of laryngeal diseases. The glottis segmentation is a primary step in computer-aided diagnostic system, however is challenging due to various shapes of glottis, low contrast with surrounding tissues, the existence of laryngeal diseases and so on. In this paper, a deep attention network based on U-Net with color normalization operation (CN-DA-Unet) is proposed to achieve an end-to-end segmentation of the glottal area for the first time. The original images are first processed by color normalization to reduce the adverse effects of low contrast and large differences in colors between different images. The normalized images are then sent to the proposed DA-Unet for feature extraction. In this network, residual structure is incorporated to extract rich features from deep neural networks. After extracting features, a feature pyramid attention (FPA) module is applied to enhance the semantic information of the glottal area. These features are up-sampled and added to the features from the corresponding encoding layer for several times to obtain the final segmented image. The proposed approach is tested on laryngeal images of an in–house dataset including images from healthy subjects and pathologic subjects. Its performance is evaluated by several reliable and popular evaluation metrics, achieving the dice coefficient of 92.9%, sensitivity of 93.5% and precision of 92.6%. These results demonstrate the effectiveness of our proposed approach and the better performance comparing with several popular networks.
Effect of Ventricular Folds on Vocalization Fundamental Frequency in Domestic Pigs (Sus scrofa domesticus)
2021, Journal of Voice
Citation Excerpt :
The number and stability of lines perpendicular to the (vertical) y-axis indicate the system state at a particular point in time: one line → no oscillation (stasis); two locally stable lines → periodic oscillation; more than two locally stable lines → subharmonic patterns; no continuous lines, rugged appearance → irregular system behavior, potential indicator for chaos (Figure 5B for an example). The complexity of the respective phase space embeddings within a generated phasegram can be quantified with a parameter termed the phasegram complexity estimate (PCE), by calculating the one-dimensional correlation dimension along each Poincaré section (36, Appendix I). PCE = 0 for a perfectly periodic stationary signal devoid of noise.
This study investigates the effect of the ventricular folds on fundamental frequency (f_o) in the voice production of domestic pigs (Sus scrofa domesticus). The excised larynges of six subadult pigs were phonated in two preparation stages, with the ventricular folds present (PS1) and removed (PS2). Vocal fold resonances were tested with a laser vibrometer, and a four-mass computational model was created. Highly significant f_o differences were found between PS1 and PS2 (means at 93.7 and 409.3 Hz, respectively). Two tissue resonances were found at 115 Hz and 250–290 Hz. The computational model had unique solutions for abducted and adducted ventricular folds at about 150 and 400 Hz, roughly matching the f_o measured ex vivo for PS1 and PS2. The differing f_o encountered across preparation stages PS1 and PS2 is explained by distinct activation of either a high or a low eigenfrequency mode, depending on the engagement of the ventricular folds. The inability of the investigated larynges to vibrate at frequencies below 250 Hz in PS2 suggests that in vivo low-frequency calls of domestic pigs (pre-eminently grunts) are likely produced with engaged ventricular folds. Allometric comparison suggests that the special, mechanically coupled “double oscillator” has evolved to prevent signaling disadvantages. Given these traits, the porcine larynx might – apart from special applications relating to the involvement of ventricular folds – not be an ideal candidate for emulating human voice production in excised larynx experimentation.
Analysis Method for Laryngeal High-Speed Videoendoscopy: Development of the Criteria for the Measurement Input
2021, Journal of Voice
Citation Excerpt :
However, because the available evidential supports for this assertion, at best, is inferential, further confirmation is required to verify the use of 2000 frames as the length of the SOI. It is a common practice in the LHSV-based analysis methods, where only a certain part of the image sample is selected for analysis.14,23–26 A similar practice is also applied in this analysis method.
Despite its clear advantages, laryngeal high-speed videoendoscopy (LHSV) has not yet been accepted as a routine imaging tool for the evaluation of vocal fold vibration due to the unavailability of methods to effectively analyze the huge number of images from the LHSV recording. Recently, a promising LHSV-based analysis method has been introduced. The ability of this analysis method in studying the vocal fold vibratory behaviors had been substantially demonstrated. However, some practical aspects of its clinical applications still require further attention. Most fundamental is that the criteria for the measurement input ie, a segment of interest (SOI), which has not been fully defined. Particularly, the length of the SOI and the location along the sample, where it needs to be selected require further confirmation. Meanwhile, the analysis using any options of a well-delineated glottal area demands verification. Without clear criteria for the SOI, it is difficult to demonstrate the relevance of this analysis method in clinical voice assessment. Therefore, the aim of the present study is to establish the criteria for the SOI, which involved the investigations on the length of the SOI and the location along the sample, where it needs to be selected, as well as the use of any options of a well-delineated glottal area for analysis. The participants in the present study consisted of 36 young normophonic females. The methods involved LHSV recording of the images of the vibrating vocal folds. The captured images were then analyzed using the method. The LHSV-based measures from the analyses were compared according to the specified procedures of each investigation. Results indicated that 2000 frames should be used as the SOI length. The SOI could be selected at any location along the sample as long as well-delineated glottal areas were observed. With the current findings, a more conclusive measurement protocol is available to ensure reliable LHSV-based measures. The findings further support this analysis method for clinical application, which in turn promote LHSV as a reliable laryngeal imaging tool in clinical setting.
Classifying Vocal Folds Fixation from Endoscopic Videos with Machine Learning
2023, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
Comparative analysis of high-speed videolaryngoscopy images and sound data simultaneously acquired from rigid and flexible laryngoscope: a pilot study
2021, Scientific Reports
Laryngeal High-Speed Videoendoscopy with Laser Illumination: A Preliminary Report
2021, Otolaryngologia Polska

View all citing articles on Scopus

View full text

Phasegram Analysis of Vocal Fold Vibration Documented With Laryngeal High-speed Video Endoscopy

Summary

Introduction

Methods

Results

Conclusions

Introduction

Section snippets

Participants and phonatory tasks

Qualitative analysis—typical examples

Discussion

Conclusion

Acknowledgments

J Phon

J Voice

Anim Behav

Physica D

J Voice

J Voice

J Voice

Physica D

Med Image Anal

Speech Commun

Workshop on acoustic voice analysis

Clinical Measurement of Speech and Voice

Spatiotemporal analysis of high-speed videolaryngoscopic imaging of organic pathologies in males

J Speech Lang Hear Res

Voice production mechanisms following phonosurgical treatment of early glottic cancer

Ann Otol Rhinol Laryngol

Videokymography in voice disorders: what to look for?

Ann Otol Rhinol Laryngol

Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions

J Acoust Soc Am

Harmonics-to-noise ratio as an index of the degree of hoarseness

J Acoust Soc Am

Methods of chaos physics and their application to acoustics

J Acoust Soc Am

Bifurcations and chaos in voice signals

Appl Mech Rev

Evidence of chaos in vocal fold vibration

Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering

Subharmonic routes to chaos observed in acoustics

Phys Rev Lett

Liapunov exponents from time series

Phys Rev A

Nonlinear analysis of irregular animal vocalizations

J Acoust Soc Am

Surrogate analysis for detecting nonlinear dynamics in normal vowels

J Acoust Soc Am

Detecting bifurcations in voice signals

Correlation dimension of electroglottographic data from healthy and pathologic subjects

J Acoust Soc Am

Global and local dimensions of vocal dynamics

J Acoust Soc Am