Elsevier

Journal of Voice

Volume 30, Issue 6, November 2016, Pages 771.e1-771.e15
Journal of Voice

Phasegram Analysis of Vocal Fold Vibration Documented With Laryngeal High-speed Video Endoscopy

https://doi.org/10.1016/j.jvoice.2015.11.006Get rights and content

Summary

Introduction

In a recent publication, the phasegram, a bifurcation diagram over time, has been introduced as an intuitive visualization tool for assessing the vibratory states of oscillating systems. Here, this nonlinear dynamics approach is augmented with quantitative analysis parameters, and it is applied to clinical laryngeal high-speed video (HSV) endoscopic recordings of healthy and pathological phonations.

Methods

HSV data from a total of 73 females diagnosed as healthy (n = 42), or with functional dysphonia (n = 15) or with unilateral vocal fold paralysis (n = 16), were quantitatively analyzed. Glottal area waveforms (GAW) and left and right hemi-GAWs (hGAW) were extracted from the HSV recordings. Based on Poincaré sections through phase space-embedded signals, two novel quantitative parameters were computed: the phasegram entropy (PE) and the phasegram complexity estimate (PCE), inspired by signal entropy and correlation dimension computation, respectively.

Results

Both PE and PCE assumed higher average values (suggesting more irregular vibrations) for the pathological as compared with the healthy participants, thus significantly discriminating healthy group from the paralysis group (P = 0.02 for both PE and PCE). Comparisons of individual PE or PCE data for the left and the right hGAW within each subject resulted in asymmetry measures for the regularity of vocal fold vibration. The PCE-based asymmetry measure revealed significant differences between the healthy group and the paralysis group (P = 0.03).

Conclusions

Quantitative phasegram analysis of GAW and hGAW data is a promising tool for the automated processing of HSV data in research and in clinical practice.

Introduction

The behavior of a vibratory system is periodic if the observed oscillatory pattern continuously repeats itself after a constant time interval. Periodicity abiding this strict definition is hardly observed in empirical data of biomechanical systems such as the voice. Rather, voice production is at best a nearly periodic1 phenomenon under nonpathological conditions. In the presence of a voice disorder, vocal fold vibration and thus the generated acoustical output is likely to be more or less perturbed,2 often caused by highly irregular vibratory regimes of the vocal folds.3, 4, 5, 6

Deviations of periodicity can be quantified in a variety of ways. Apart from time domain-based and frequency domain-based approaches such as calculation of jitter7 or the harmonics-to-noise ratio,8 methods from nonlinear systems analysis have received growing interest in the past decades.9, 10, 11, 12 In non-linear dynamics methods, the voice is considered to be a dynamical system13 that is able to exhibit a wide variety of oscillatory behavior “on the way to chaos.”14, 15

Several quantitative methods for assessing the complexity of the temporal behavior of nonlinear systems have been introduced in the domain of mathematics and physics, for instance the correlation dimension,16 Lyapunov exponents,17 or Tokuda et al's low-dimensional nonlinearity measure.18 These methods have been successfully applied during the analysis of biosignals from both healthy and pathological voices, such as the acoustical waveform,12, 19, 20, 21, 22 electroglottography,23, 24, 25 or data derived from high-speed video (HSV) recordings of vocal fold vibration.26, 27, 28, 29

The detailed interpretation of available quantitative methods for analyzing the dynamics of irregular voice often requires expert background knowledge in mathematics and physics. In contrast, visualization methods are often easier to understand for nonexperts. Such visualization methods, applied to nonperiodic voice production, include for example spectrograms12, 30, 31 or local maxima displays.32

Recently, a novel visualization method of system dynamics has been introduced: the phasegram.33 In a phasegram, time is mapped onto the x-axis, and various vibratory regimes, such as periodic oscillation, subharmonics, or chaos, are identified within the generated graph by the number and the stability of horizontal lines. Phasegrams can be interpreted as bifurcation diagrams in time. They are particularly suited for nonstationary signals. The benefits of sliding window analysis are combined with the visualization potential of phase space embedding.34, 35 In contrast to other nonlinear analysis techniques (eg, bifurcation maps), phasegrams can be automatically constructed from a time domain signal alone, no additional system parameter needs to be known. In contrast to conventional voice perturbation measures (eg, jitter), no information about glottal cycle duration or fundamental frequency needs to be known.

Phasegrams have thus far been utilized for the visualization33 and the manual classification36 of electroglottographic voice signals. Here, their application to the analysis of time series data derived from HSV recordings is introduced by example of simulated vocal fold vibrations using a lumped element biomechanical model. The concept is further extended to healthy and pathological phonations, considering both stationary and nonstationary signals. The analysis is complemented by spatiotemporal visualization37, 38 and of Fourier analysis of vocal fold vibration and of simultaneously acquired acoustical signals. It will be shown that sequences of aberrant vocal fold vibratory behavior can be easily located in phasegrams, thus earmarking the method as a promising candidate for detection of clinically relevant passages within HSV recordings. To facilitate automated objective analysis of vocal fold vibratory behavior (as seen in HSV recordings), two novel quantitative analysis parameters derived from the phasegram visualization are introduced in this paper. The performance of these quantitative parameters is assessed through analysis of a database containing HSV recordings of healthy and pathological phonations.

Section snippets

Participants and phonatory tasks

A total of 73 female participants were included in the study. Before data acquisition, all participants underwent a standard clinical evaluation. Forty-two of these were considered to be normophonic (ie, healthy) speakers. Another 15 participants were diagnosed with functional dysphonia, and the remaining 16 were diagnosed with unilateral vocal fold paralysis. The average (±standard deviation) age of these clinical groups was 40.2 ± 15.8 years (healthy), 46.2 ± 16.1 years (functional

Qualitative analysis—typical examples

Three stereotypical vocal fold vibratory regimes (periodic, subharmonic, and irregular), generated through attempts at stable phonation by a normophonic female and two females with vocal fold paralysis, respectively, are illustrated in Figure 3.

Discussion

In a recent publication, the phasegram has been introduced as an intuitive visualization tool for various oscillatory phenomena in physics and in biology, demonstrated with analysis of the human voice.33 Here, a more specialized investigation is performed, showing that phasegrams are useful in analyzing signals derived from HSV recordings documenting healthy and pathological phonations. The feasibility of the approach was demonstrated by creating a phasegram of the GAW from a synthesized vocal

Conclusion

In this work, the phasegram visualization method has been extended to the analysis of GAW data derived from HSV recordings of both normophonic and pathological voice production. Qualitative analysis showed that the phasegram is a valuable complement to existing analysis methods, as it provides direct insights into the time-dependent complexity of vocal fold vibration. Because of the phasegram's potential to condense information about the vocal fold dynamics of an entire phonation into a single

Acknowledgments

This research was supported by the institutional fund of Palacký University Olomouc, Czech Republic (to C.T.H.), by the Technology Agency of the Czech Republic project no. TA04010877 (to CTH and JGS), by the state budget of the Czech Republic OPVK CZ.1.07/2.3.00/20.0057 (to J.G.Š.), and by grant no. LO1413/2-2 from Deutsche Forschungsgemeinschaft (to J.U. and J.L.).

References (56)

  • I.R. Titze

    Workshop on acoustic voice analysis

    (1995)
  • R.J. Baken et al.

    Clinical Measurement of Speech and Voice

    (2000)
  • C. Bohr et al.

    Spatiotemporal analysis of high-speed videolaryngoscopic imaging of organic pathologies in males

    J Speech Lang Hear Res

    (2014)
  • D.D. Mehta et al.

    Voice production mechanisms following phonosurgical treatment of early glottic cancer

    Ann Otol Rhinol Laryngol

    (2010)
  • J.G. Svec et al.

    Videokymography in voice disorders: what to look for?

    Ann Otol Rhinol Laryngol

    (2007)
  • D.A. Berry et al.

    Interpretation of biomechanical simulations of normal and chaotic vocal fold oscillations with empirical eigenfunctions

    J Acoust Soc Am

    (1994)
  • E. Yumoto et al.

    Harmonics-to-noise ratio as an index of the degree of hoarseness

    J Acoust Soc Am

    (1982)
  • W. Lauterborn et al.

    Methods of chaos physics and their application to acoustics

    J Acoust Soc Am

    (1988)
  • H. Herzel

    Bifurcations and chaos in voice signals

    Appl Mech Rev

    (1993)
  • I.R. Titze et al.

    Evidence of chaos in vocal fold vibration

  • S.H. Strogatz

    Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering

    (2007)
  • W. Lauterborn et al.

    Subharmonic routes to chaos observed in acoustics

    Phys Rev Lett

    (1981)
  • J. Eckmann et al.

    Liapunov exponents from time series

    Phys Rev A

    (1986)
  • I. Tokuda et al.

    Nonlinear analysis of irregular animal vocalizations

    J Acoust Soc Am

    (2002)
  • I. Tokuda et al.

    Surrogate analysis for detecting nonlinear dynamics in normal vowels

    J Acoust Soc Am

    (2001)
  • H. Herzel et al.

    Detecting bifurcations in voice signals

  • A. Behrman et al.

    Correlation dimension of electroglottographic data from healthy and pathologic subjects

    J Acoust Soc Am

    (1997)
  • A. Behrman

    Global and local dimensions of vocal dynamics

    J Acoust Soc Am

    (1999)
  • Cited by (12)

    • Automatic glottis segmentation for laryngeal endoscopic images based on U-Net

      2022, Biomedical Signal Processing and Control
      Citation Excerpt :

      Precise glottal area segmentation shows the outline of vocal fold, the location and shape of glottis, which is significant for further classification of laryngeal diseases or computer-assisted-surgery [8]. The glottis segmentation has been widely used to classify laryngeal diseases [9,10], also to understand vibratory patterns in different phonation or singing style [11,12] and others applications [6]. However, the automatic segmentation of glottis remains a challenging task due to the various shapes of glottis, the low brightness of laryngeal images, the slight differences between glottis and other laryngeal tissues, the existence of laryngeal diseases and so on.

    • Effect of Ventricular Folds on Vocalization Fundamental Frequency in Domestic Pigs (Sus scrofa domesticus)

      2021, Journal of Voice
      Citation Excerpt :

      The number and stability of lines perpendicular to the (vertical) y-axis indicate the system state at a particular point in time: one line → no oscillation (stasis); two locally stable lines → periodic oscillation; more than two locally stable lines → subharmonic patterns; no continuous lines, rugged appearance → irregular system behavior, potential indicator for chaos (Figure 5B for an example). The complexity of the respective phase space embeddings within a generated phasegram can be quantified with a parameter termed the phasegram complexity estimate (PCE), by calculating the one-dimensional correlation dimension along each Poincaré section (36, Appendix I). PCE = 0 for a perfectly periodic stationary signal devoid of noise.

    • Analysis Method for Laryngeal High-Speed Videoendoscopy: Development of the Criteria for the Measurement Input

      2021, Journal of Voice
      Citation Excerpt :

      However, because the available evidential supports for this assertion, at best, is inferential, further confirmation is required to verify the use of 2000 frames as the length of the SOI. It is a common practice in the LHSV-based analysis methods, where only a certain part of the image sample is selected for analysis.14,23–26 A similar practice is also applied in this analysis method.

    • Classifying Vocal Folds Fixation from Endoscopic Videos with Machine Learning

      2023, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS
    View all citing articles on Scopus
    View full text