Coupling relations underlying the production of speech articulator movements and their invariance to speech rate

Lancia, Leonardo; Rosenbaum, Benjamin

doi:10.1007/s00422-018-0749-y

Coupling relations underlying the production of speech articulator movements and their invariance to speech rate

Original Article
Published: 09 February 2018

Volume 112, pages 253–276, (2018)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

Leonardo Lancia¹ &
Benjamin Rosenbaum^2,3

453 Accesses
4 Citations
Explore all metrics

Abstract

Since the seminal works of Bernstein (The coordination and regulation of movements. Pergamon Press, Oxford, 1967) several authors have supported the idea that, to produce a goal-oriented movement in general, and a movement of the organs responsible for the production of speech sounds in particular, individuals activate a set of coupling relations that coordinate the behavior of the elements of the motor system involved in the production of the target movement or sound. In order to characterize the configurations of the coupling relations underlying speech production articulator movements, we introduce an original method based on recurrence analysis. The method is validated through the analysis of simulated dynamical systems adapted to reproduce the features of speech gesture kinematics and it is applied to the analysis of speech articulator movements recorded in five German speakers during the production of labial and coronal plosive and fricative consonants at variable speech rates. We were able to show that the underlying coupling relations change systematically between labial and coronal consonants, but are not affected by speech rate, despite the presence of qualitative changes observed in the trajectory of the jaw at fast speech rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coupling dynamics in speech gestures: amplitude and rate influences

Article 17 May 2017

Speech rhythms and their neural foundations

Article 06 May 2020

A Parsimonious Look at Neural Oscillations in Speech Perception

Notes

This step was not present in the original version of the algorithm as described in Lancia et al. (2016).

References

Abbs JH, Gracco VL (1984) Control of complex motor gestures: orofacial muscle responses to load perturbations of lip during speech. J Neurophysiol 51(4):705–723
Article PubMed CAS Google Scholar
Balasubramaniam R (2013) On the control of unstable objects: the dynamics of human stick balancing. In: Kevin S, Richardson MJ, Riley MA (eds) Progress in motor control. Springer, New York, pp 149–168
Chapter Google Scholar
Bernstein N (1967) The coordination and regulation of movements. Pergamon Press, Oxford
Google Scholar
Browman CP, Goldstein L (1989) Articulatory gestures as phonological units. Phonology 6(02):201–251
Article Google Scholar
Fitch H, Tuller B, Turvey MT (1982) The Bernstein perspective III. Tuning of coordinative structures with special reference to perception. In: Kelso JAS (ed) Understanding human motor control. Human Kinetics, Champaign, pp 271–287
Folkins JW, Abbs JH (1975) Lip and jaw motor control during speech: responses to resistive loading of the jaw. J Speech Langc Hear Res 18(1):207–220
Article CAS Google Scholar
Frankel J, King S (2001) ASR–Articulatory speech recognition. Proc Eurospeech 1:599–602
Google Scholar
Fuchs S, Perrier P, Hartinger M (2011) A critical evaluation of gestural stiffness estimations in speech production based on a linear second-order model. J Speech Lang Hear Res 54(4):1067–1076
Article PubMed Google Scholar
Geumann A, Kroos C, Tillmann HG (1999) Are there compensatory effects in natural speech? In: Proceedings of the 14th international congress of phonetic sciences, San Francisco, pp 399–402
Graco VL, Abbs JH (1985) Dynamic control of the perioral system during speech: kinematic analyses of autogenic and nonautogenic sensorimotor processes. J Neurophysiol 54(2):418–432
Article Google Scholar
Grimme B, Fuchs S, Perrier P, Schöner G (2011) Limb versus speech motor control: a conceptual review. Mot Control 15(1):5–33
Article Google Scholar
Guenther FH (1994) A neural network model of speech acquisition and motor equivalent speech production. Biol Cybern 72(1):43–53
Article PubMed CAS Google Scholar
Hoole P (1996) Issues in the acquisition, processing, reduction and parameterization of articulographic data. Forschungsberichte des Instituts für Phonetik und Sprachliche Kommunikation der Universität München 34:158–173
Google Scholar
Ishwaran H, Rao JS (2005) Spike and slab variable selection: frequentist and Bayesian strategies. Ann Stat 33:730–773
Article Google Scholar
Iskarous K, Mooshammer C, Hoole P, Recasens D, Shadle CH, Saltzman E, Whalen DH (2013) The coarticulation/invariance scale: mutual information as a measure of coarticulation resistance, motor synergy, and articulatory invariance. J Acoust Soc Am 134(2):1271–1282
Article PubMed PubMed Central Google Scholar
Ito T, Gomi H, Honda M (2004) Dynamical simulation of speech cooperative articulation by muscle linkages. Biol Cybern 91(5):275–282
Article PubMed Google Scholar
Iwanski JS, Bradley E (1998) Recurrence plots of experimental data: to embed or not to embed? Chaos: an Interdisciplinary. J Nonlinear Sci 8(4):861–871
Google Scholar
Jackson PJ, Singampalli VD (2009) Statistical identification of articulation constraints in the production of speech. Speech Commun 51(8):695–710
Article Google Scholar
Keating PA, Lindblom B, Lubker J, Kreiman J (1994) Variability in jaw height for segments in English and Swedish VCVs. J Phon 22(4):407–422
Google Scholar
Kelso JS, Tuller B, Vatikiotis-Bateson E, Fowler CA (1984) Functionally specific articulatory cooperation following jaw perturbations during speech: evidence for coordinative structures. J Exp Psychol Hum Percept Perform 10(6):812
Article PubMed CAS Google Scholar
Kennel MB, Brown R, Abarbanel HD (1992) Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys Rev A 45(6):3403
Article PubMed CAS Google Scholar
Kinsella-Shaw JM, Harrison SJ, Turvey MT (2011) Interleg coordination in quiet standing: influence of age and visual environment on noise and stability. J Mot Behav 43(4):285–294
Article PubMed Google Scholar
Koenig L, Lucero J, Löfqvist A, Palethorpe S, Tabain M (2003) Studying articulatory variability using functional data analysis. In: Proceedings of the 15th international congress of phonetic sciences, pp 269–272
Krivobokova T, Kneib T, Claeskens G (2012) Simultaneous confidence bands for penalized spline estimators. J Am Stat Assoc 105(490):852–863
Article CAS Google Scholar
Lancia L, Fuchs S (2011) The labial coronal effect revisited. In: Laprie Y (ed) Proceedings of the 8th international seminar on speech production, Montreal, Canada, pp 187–194
Lancia L, Fuchs S, Tiede M (2014) Cross-recurrence analysis in speech production: an overview and a comparison to other nonlinear methods. J Speech Lang Hear Res 57(3):718–33
Article PubMed Google Scholar
Lancia L, Voigt D, Krasovitskiy G (2016) Characterization of laryngealization as irregular vocal fold vibration and interaction with prosodic prominence. J Phon 54:80–97
Article Google Scholar
Latash ML, Scholz JP, Schöner G (2007) Toward a new theory of motor synergies. Mot Control 11(3):276–308
Article Google Scholar
Lucero JC (2005) Comparison of measures of variability of speech movement trajectories using synthetic records. J Speech Lang Hear Res 48(2):336–344
Article PubMed Google Scholar
Marwan N, Romano MC, Thiel M, Kurths J (2007) Recurrence plots for the analysis of complex systems. Phys Rep 438(5):237–329
Article Google Scholar
Marwan N, Kurths J (2002) Nonlinear analysis of bivariate data with cross recurrence plots. Phys Lett A 302(5):299–307
Article CAS Google Scholar
McFarland DH, Baum SR (1995) Incomplete compensation to articulatory perturbation. J Acoust Soc Am 97(3):1865–1873
Article PubMed CAS Google Scholar
McFarland DH, Baum SR, Chabot C (1996) Speech compensation to structural modifications of the oral cavity. J Acoust Soc Am 100(2):1093–1104
Article PubMed CAS Google Scholar
Mooshammer C, Hoole P, Geumann A (2007) Jaw and order. Lang Speech 50(2):145–176
Article PubMed Google Scholar
Morris JS, Carroll RJ (2006) Wavelet-based functional mixed models. J R Stat Soc Ser B 68(2):179–199
Article Google Scholar
Olsen MA, Hartung D, Busch C, Larsen R (2011) Convolution approach for feature detection in topological skeletons obtained from vascular patterns. In: 2011 IEEE workshop on computational intelligence in biometrics and identity management (CIBIM), pp 163–167
Papcun G, Hochberg J, Thomas T, Laroche F, Zacks J, Levy S (1992) Inferring articulation and recognizing gestures from acoustics with a neural network trained on x-ray microbeam data. J Acoust Soc Am 92(2):688–700
Article PubMed CAS Google Scholar
Perkell JS (2012) Movement goals and feedback and feedforward control mechanisms in speech production. J Neurolinguistics 25(5):382–407
Article PubMed Google Scholar
Ramsay JO (2006) Functional data analysis. Wiley, New York
Book Google Scholar
Rochet-Capellan A, Schwartz JL (2007) An articulatory basis for the labial-to-coronal effect:/pata/seems a more stable articulatory pattern than /tapa/. J Acoust Soc Am 121(6):3740–3754
Article PubMed Google Scholar
Romano MC, Thiel M, Kurths J, Grebogi C (2007) Estimation of the direction of the coupling by conditional probabilities of recurrence. Phys Rev E 76(3):036211
Article CAS Google Scholar
Romano MC, Thiel M, Kurths J, Mergenthaler K, Engbert R (2009) Hypothesis test for synchronization: twin surrogates revisited. Chaos: an interdisciplinary. J Nonlinear Sci 19(1):015108
Google Scholar
Rulkov NF, Sushchik MM, Tsimring LS, Abarbanel HD (1995) Generalized synchronization of chaos in directionally coupled chaotic systems. Phys Rev E 51(2):980–994
Article CAS Google Scholar
Saltzman E, Kelso JA (1987) Skilled actions: a task-dynamic approach. Psychol Rev 94(1):84
Article PubMed CAS Google Scholar
Saltzman EL, Munhall KG (1989) A dynamical approach to gestural patterning in speech production. Ecol Psychol 1(4):333–382
Article Google Scholar
Schöner G, Martin V, Reimann H, Scholz JP (2008) Motor equivalence and the uncontrolled manifold. In: Proceedings of the international seminar on speech production (ISSP 2008), Strasbourg, France, pp 23–28
Sugihara G, May R, Ye H, Hsieh CH, Deyle E, Fogarty M, Munch S (2012) Detecting causality in complex ecosystems. Science 338(6106):496–500
Article PubMed CAS Google Scholar
Thiel M, Romano MC, Read PL, Kurths J (2004) Estimation of dynamical invariants without embedding by recurrence plots. Chaos: an Interdisciplinary. J Nonlinear Sci 14(2):234–243
CAS Google Scholar
Tourville JA, Guenther FH (2011) The DIVA model: a neural theory of speech acquisition and production. Lang Cogn Process 26(7):952–981
Article PubMed Google Scholar
Turvey MT (1977) Preliminaries to a theory of action with reference to vision. In Shaw RE, Bransford J (eds) Perceiving, acting and knowing. Lawrence Erlbaum Associates, pp 211–265
Weirich M, Lancia L, Brunner J (2013) Inter-speaker articulatory variability during vowel-consonant-vowel sequences in twins and unrelated speakers. J Acoust Soc Am 134(5):3766–3780
Article PubMed Google Scholar
Zou Y, Romano MC, Thiel M, Marwan N, Kurths J (2011) Inferring indirect coupling by means of recurrences. Int J Bifurcat Chaos 21(04):1099–1111
Article Google Scholar

Download references

Acknowledgements

Leonardo Lancia’s work, carried out within the Labex BLRI (ANR-11-LABX-0036) and EFL (ANR-10-LABX-0083), has benefited from support from the French government, managed by the French National Agency for Research (ANR), under the program “Investissements d’Avenir”.

Author information

Authors and Affiliations

Laboratoire de Phonétique et Phonologie (CNRS, Sorbonne Nouvelle), 19 rue des Bernardins, 75005, Paris, France
Leonardo Lancia
German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, 04103, Leipzig, Germany
Benjamin Rosenbaum
Institute of Ecology, Friedrich Schiller University Jena, Dornburger Str. 159, 07743, Jena, Germany
Benjamin Rosenbaum

Authors

Leonardo Lancia
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin Rosenbaum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonardo Lancia.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 746 KB)

Appendix: Simulations details

We validate methods for detecting coupling in simulation experiments where we can control the direction and the level of coupling, as well as amplitude or temporal variability. Throughout this study, we are only considering unidirectional coupling. We are interested in the generalized synchronization of two dynamical systems X and Y, i.e., we do not use coupling scenarios like $\dot{x}=F(x)+\mu (y-x)$, which would easily lead to complete synchronization in our setups.

1.1 Uninterrupted simulations of structurally different systems

In order to obtain results comparable to those obtained in previous works based on joint recurrence analysis (e.g. Romano et al., 2007), we investigate a Van der Pol oscillator X and a Rössler system. In a first scenario, we let X drive Y. The systems are described by the equations:

$$\begin{aligned}&\dot{x}=x_{2}, \nonumber \\&{{\dot{x}}_{2}}=0.1\left( 1-x_{1}^{2} \right) {{x}_{2}}-{{\omega }^{2}}{{x}_{1}}+\nu {{\sigma }_{{{x}_{2}}}}\xi , \end{aligned}$$

(8)

and

$$\begin{aligned} {{\dot{y}}_{1}}= & {} -{{y}_{2}}-{{y}_{3}},\nonumber \\ {{\dot{y}}_{2}}= & {} {{y}_{1}}+0.15{{y}_{2}}+\mu \frac{{{\sigma }_{{{y}_{2}}}}}{{{\sigma }_{{{x}_{1}}}}}{{x}_{1}},\nonumber \\ {{\dot{y}}_{3}}= & {} ({{y}_{1}}-10){{y}_{3}}+0.2. \end{aligned}$$

(9)

Here, $\xi $ is Gaussian white noise with a standard deviation of 1, which is added to the driving system. The driven system features a coupling term. The amount of noise is given by $\upnu $ and the coupling strength is given by $\mu $. $\sigma $ is the standard deviation of the respective components, evaluated for a typical realization of the trajectories without noise or coupling ($v=0$, $\mu =0)$, used for normalization. $\omega $ is a frequency parameter, and for $\omega =1$ both systems have approximately the same frequency, depending on the levels of noise and coupling. Since values around 1.0 (e.g., $\omega \in \left[ {0.7,1.3} \right] )$ quickly produce phase synchronization even for low levels of coupling, we choose X having half of Y ’s frequency by $\omega =0.5$ in the setup of long trajectories. X then completes half a cycle on average in the temporal interval $\left[ {0,2\pi } \right] $ while Y completes one. In this way, the influence of coupling is rather local and leads to generalized synchronization. The systems are numerically integrated using the forward Euler method and a constant step size of $dt=0.01$ in the interval $\left[ {0,200} \right] $. In this interval, X completes around 16 cycles, Y around twice as many. The length of the interval is chosen so that the number of oscillations of the slower X trajectory roughly matches the number of gestures found in the real data investigated in this study (Fig. 11).

A ”warmup” interval of the same length is used and discarded afterwards, which guarantees that the systems are not in a transient state. The remaining 20,000 time steps are downsampled to $N=2000$ points for further analysis.

For investigating the methods’ independence of the special structure of driver and driven systems, we also consider the case of X driven by Y. The equations read

$$\begin{aligned} \dot{x}_1= & {} x_2 , \nonumber \\ {{\dot{x}}_{2}}= & {} 0.1\left( 1-x_{1}^{2} \right) {{x}_{2}}-{{\omega }^{2}}{{x}_{1}}+\mu \frac{{{\sigma }_{{{x}_{2}}}}}{{{\sigma }_{{{y}_{1}}}}}{{y}_{1}}, \end{aligned}$$

(10)

and

$$\begin{aligned} {{\dot{y}}_{1}}= & {} -{{y}_{2}}-{{y}_{3}},\nonumber \\ {{\dot{y}}_{2}}= & {} {{y}_{1}}+0.15{{y}_{2}}+\nu {{\sigma }_{{{y}_{2}}}}\xi ,\nonumber \\ {{\dot{y}}_{3}}= & {} ({{y}_{1}}-10){{y}_{3}}+0.2. \end{aligned}$$

(11)

As above, a stochastic component $\xi $ is added to the driver and a coupling term is added to the driven system.

1.2 Multiple-runs simulations

In order to simulate the conditions found in speech data, we also evaluate the equations several times during short time-intervals. This means that for a certain number of repetitions (here: 30), we generate short trajectories of an approximate length of $2\pi $ under different initial conditions, cf. Fig. 12. For this setup, we can choose $\omega =1$ (same frequency), because the systems are not run long enough to get phase locked and we still obtain generalized synchronization. The initial conditions are drawn randomly from a set of typical realizations of the long trajectories subject to the constraint that the difference in phase across repetitions and between the two systems is smaller than $\pi /2$, so a certain temporal alignment is guaranteed. The systems are numerically integrated in the interval $\left[ {0,6\pi } \right] $ (using $dt=0.01$ as above), where they complete approximately three cycles. The portion of time corresponding to the second cycle is then extracted (where the beginning of the cycle is set at the time point where $x_{1}$ is closer to 0) and each of the 30 trajectories obtained is downsampled to 80 points (Fig. 12).

1.3 Sources of variability in the simulations

We add synthetic dynamical noise and temporal variability to the systems (Lancia et al. 2014). We distinguish between internal and observational variability. Internal noise is induced to the driver before coupling, such that it also affects the dynamics of the driven. Observational variability is applied to both systems afterwards and, therefore, does not interfere with the coupling relation, but will make it even harder to identify.

1.4 Internal dynamical noise

Internal dynamical noise is introduced to the driver by the additive white noise term $\xi $. The effect of the noise term is modulated by the parameter v. Due to the normalization, e.g. for $v=0.3$, the additive noise has a standard deviation 30% of the respective original signal’s standard deviation $\sigma $. Through the coupling term, intrinsic noise is propagated to the driven system.

1.5 Internal temporal variability

First we describe the general case of a continuous time-deformed signal, following Lucero (2005). The methods for the discretized signals are introduced subsequently. The signal $\tilde{x} \left( t \right) $, $t\in \left[ {0,T} \right] $, is obtained by the original signal $x\left( t \right) $, via a mapping function $H\left( t \right) $ introducing the required amount of temporal variability:

$$\begin{aligned} \tilde{x}\left( t \right) =x\left( {H\left( t \right) } \right) \end{aligned}$$

(12)

First, consider a mapping function which is the identity $H\left( t \right) =t$ or, equivalently, a signal $x\left( {H\left( t \right) } \right) $ moving at constant speed $dH\left( t \right) /dt=1$. Now, let the signal speed be perturbed by temporal noise

$$\begin{aligned} \frac{dH\left( t \right) }{dt}=1+\gamma \phi \left( t \right) , \end{aligned}$$

(13)

where $\phi \left( t \right) $ is a Gaussian process with zero mean, standard deviation of 1 and an autocorrelation function $exp\left( {-\left( {\theta t} \right) ^{2}} \right) $. The correlation length parameter $\theta $ is adapted such that it features approximately five extrema per cycle. $\gamma $ controls the amount of temporal variation. $\gamma \phi \left( t \right) >-1$ is required for a monotonic mapping $H\left( t \right) $. Otherwise, locally negative slopes occur, which means the new trajectory contains time-reversed portions of the original trajectory.

The mapping function $H\left( t \right) $ is obtained by integration:

$$\begin{aligned} H\left( t\right) =t+\gamma \mathop {\int }\nolimits _{0}^{t} \phi \left( s\right) ds. \end{aligned}$$

(14)

For a discrete time-series $x\left( n \right) $ with dimensionless time-scaling $\left( {n=0,1,\ldots ,N} \right) $, temporal variability is introduced as follows. A discrete Gaussian process $\phi \left( n \right) $, satisfying $\phi \left( 0 \right) =0$, is generated and the mapping function

$$\begin{aligned} H\left( n \right) =n+\gamma \mathop \sum \limits _{k=0}^n \phi \left( k \right) \end{aligned}$$

(15)

is computed, satisfying $H\left( 0 \right) =0$. In a second step, the mapping is normalized by

$$\begin{aligned} \tilde{H} \left( n \right) =H\left( n \right) \frac{N}{H\left( N \right) } \end{aligned}$$

(16)

which leads to $\tilde{H} \left( n \right) =N$. Since a mapping is generally not integer-valued ($\tilde{H} \left( n \right) \notin N)$, an interpolation of $x\left( 0 \right) ,x\left( 1 \right) ,\ldots ,x\left( N \right) $ has to be used to compute the values of the time-deformed signal

$$\begin{aligned} \tilde{x} \left( n \right) =x\left( {\tilde{H} \left( n \right) } \right) \end{aligned}$$

(17)

between the original trajectory’s discretization points $0,1,\ldots ,N$. For the multiple-runs simulations, the temporal deformation is applied to each run separately. The effect of temporal deformation is illustrated in Fig. 13.

To introduce internal temporal variability, we do not let $Y\left( \cdot \right) $ be driven by $X\left( \cdot \right) $, but by a temporally deformed signal $X\left( {H\left( \cdot \right) } \right) $, where the amount of variability is given by $\gamma $. The analysis of coupling relations is then performed on $X\left( {H\left( \cdot \right) } \right) $ and $Y\left( \cdot \right) $.

1.6 Observational temporal variability

We let X drive Y without any temporal deformation. Afterwards, both signals are deformed by different mappings, $H\left( \cdot \right) $ and $G\left( \cdot \right) $. The amount of temporal variability is given by $\gamma _x $ and $\gamma _y $, respectively. In this way we can also allow different amounts of noise in $X\left( {H\left( \cdot \right) } \right) $ and $Y\left( {G\left( \cdot \right) } \right) $. Then we compare the time-deformed signals $X\left( {H\left( \cdot \right) } \right) $ to $Y\left( {G\left( \cdot \right) } \right) $ for detecting the coupling relation.

1.7 Simulations parameters

Purely deterministic simulations: ($v=0$, $\gamma =0$, $\gamma _X =0$, $\gamma _Y =0)$; simulations with intrinsic noise and temporal variability: ($v=0.3$, $\gamma =0.3$,, $\gamma _X =0$, $\gamma _Y =0)$; uninterrupted simulations with intrinsic noise and variability and observational temporal variability: ($v=0.3$, $\gamma =0.3$, $\gamma _X =0.3$, $\gamma _Y =0.3)$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lancia, L., Rosenbaum, B. Coupling relations underlying the production of speech articulator movements and their invariance to speech rate. Biol Cybern 112, 253–276 (2018). https://doi.org/10.1007/s00422-018-0749-y

Download citation

Received: 20 November 2016
Accepted: 13 January 2018
Published: 09 February 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s00422-018-0749-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Coupling relations underlying the production of speech articulator movements and their invariance to speech rate

Abstract

Access this article

Similar content being viewed by others

Coupling dynamics in speech gestures: amplitude and rate influences

Speech rhythms and their neural foundations

A Parsimonious Look at Neural Oscillations in Speech Perception

Notes

References

Acknowledgements