Skip to main content
Log in

Coupling relations underlying the production of speech articulator movements and their invariance to speech rate

  • Original Article
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

Since the seminal works of Bernstein (The coordination and regulation of movements. Pergamon Press, Oxford, 1967) several authors have supported the idea that, to produce a goal-oriented movement in general, and a movement of the organs responsible for the production of speech sounds in particular, individuals activate a set of coupling relations that coordinate the behavior of the elements of the motor system involved in the production of the target movement or sound. In order to characterize the configurations of the coupling relations underlying speech production articulator movements, we introduce an original method based on recurrence analysis. The method is validated through the analysis of simulated dynamical systems adapted to reproduce the features of speech gesture kinematics and it is applied to the analysis of speech articulator movements recorded in five German speakers during the production of labial and coronal plosive and fricative consonants at variable speech rates. We were able to show that the underlying coupling relations change systematically between labial and coronal consonants, but are not affected by speech rate, despite the presence of qualitative changes observed in the trajectory of the jaw at fast speech rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. This step was not present in the original version of the algorithm as described in Lancia et al. (2016).

References

  • Abbs JH, Gracco VL (1984) Control of complex motor gestures: orofacial muscle responses to load perturbations of lip during speech. J Neurophysiol 51(4):705–723

    Article  PubMed  CAS  Google Scholar 

  • Balasubramaniam R (2013) On the control of unstable objects: the dynamics of human stick balancing. In: Kevin S, Richardson MJ, Riley MA (eds) Progress in motor control. Springer, New York, pp 149–168

    Chapter  Google Scholar 

  • Bernstein N (1967) The coordination and regulation of movements. Pergamon Press, Oxford

    Google Scholar 

  • Browman CP, Goldstein L (1989) Articulatory gestures as phonological units. Phonology 6(02):201–251

    Article  Google Scholar 

  • Fitch H, Tuller B, Turvey MT (1982) The Bernstein perspective III. Tuning of coordinative structures with special reference to perception. In: Kelso JAS (ed) Understanding human motor control. Human Kinetics, Champaign, pp 271–287

  • Folkins JW, Abbs JH (1975) Lip and jaw motor control during speech: responses to resistive loading of the jaw. J Speech Langc Hear Res 18(1):207–220

    Article  CAS  Google Scholar 

  • Frankel J, King S (2001) ASR–Articulatory speech recognition. Proc Eurospeech 1:599–602

    Google Scholar 

  • Fuchs S, Perrier P, Hartinger M (2011) A critical evaluation of gestural stiffness estimations in speech production based on a linear second-order model. J Speech Lang Hear Res 54(4):1067–1076

    Article  PubMed  Google Scholar 

  • Geumann A, Kroos C, Tillmann HG (1999) Are there compensatory effects in natural speech? In: Proceedings of the 14th international congress of phonetic sciences, San Francisco, pp 399–402

  • Graco VL, Abbs JH (1985) Dynamic control of the perioral system during speech: kinematic analyses of autogenic and nonautogenic sensorimotor processes. J Neurophysiol 54(2):418–432

    Article  Google Scholar 

  • Grimme B, Fuchs S, Perrier P, Schöner G (2011) Limb versus speech motor control: a conceptual review. Mot Control 15(1):5–33

    Article  Google Scholar 

  • Guenther FH (1994) A neural network model of speech acquisition and motor equivalent speech production. Biol Cybern 72(1):43–53

    Article  PubMed  CAS  Google Scholar 

  • Hoole P (1996) Issues in the acquisition, processing, reduction and parameterization of articulographic data. Forschungsberichte des Instituts für Phonetik und Sprachliche Kommunikation der Universität München 34:158–173

    Google Scholar 

  • Ishwaran H, Rao JS (2005) Spike and slab variable selection: frequentist and Bayesian strategies. Ann Stat 33:730–773

    Article  Google Scholar 

  • Iskarous K, Mooshammer C, Hoole P, Recasens D, Shadle CH, Saltzman E, Whalen DH (2013) The coarticulation/invariance scale: mutual information as a measure of coarticulation resistance, motor synergy, and articulatory invariance. J Acoust Soc Am 134(2):1271–1282

    Article  PubMed  PubMed Central  Google Scholar 

  • Ito T, Gomi H, Honda M (2004) Dynamical simulation of speech cooperative articulation by muscle linkages. Biol Cybern 91(5):275–282

    Article  PubMed  Google Scholar 

  • Iwanski JS, Bradley E (1998) Recurrence plots of experimental data: to embed or not to embed? Chaos: an Interdisciplinary. J Nonlinear Sci 8(4):861–871

    Google Scholar 

  • Jackson PJ, Singampalli VD (2009) Statistical identification of articulation constraints in the production of speech. Speech Commun 51(8):695–710

    Article  Google Scholar 

  • Keating PA, Lindblom B, Lubker J, Kreiman J (1994) Variability in jaw height for segments in English and Swedish VCVs. J Phon 22(4):407–422

    Google Scholar 

  • Kelso JS, Tuller B, Vatikiotis-Bateson E, Fowler CA (1984) Functionally specific articulatory cooperation following jaw perturbations during speech: evidence for coordinative structures. J Exp Psychol Hum Percept Perform 10(6):812

    Article  PubMed  CAS  Google Scholar 

  • Kennel MB, Brown R, Abarbanel HD (1992) Determining embedding dimension for phase-space reconstruction using a geometrical construction. Phys Rev A 45(6):3403

    Article  PubMed  CAS  Google Scholar 

  • Kinsella-Shaw JM, Harrison SJ, Turvey MT (2011) Interleg coordination in quiet standing: influence of age and visual environment on noise and stability. J Mot Behav 43(4):285–294

    Article  PubMed  Google Scholar 

  • Koenig L, Lucero J, Löfqvist A, Palethorpe S, Tabain M (2003) Studying articulatory variability using functional data analysis. In: Proceedings of the 15th international congress of phonetic sciences, pp 269–272

  • Krivobokova T, Kneib T, Claeskens G (2012) Simultaneous confidence bands for penalized spline estimators. J Am Stat Assoc 105(490):852–863

    Article  CAS  Google Scholar 

  • Lancia L, Fuchs S (2011) The labial coronal effect revisited. In: Laprie Y (ed) Proceedings of the 8th international seminar on speech production, Montreal, Canada, pp 187–194

  • Lancia L, Fuchs S, Tiede M (2014) Cross-recurrence analysis in speech production: an overview and a comparison to other nonlinear methods. J Speech Lang Hear Res 57(3):718–33

    Article  PubMed  Google Scholar 

  • Lancia L, Voigt D, Krasovitskiy G (2016) Characterization of laryngealization as irregular vocal fold vibration and interaction with prosodic prominence. J Phon 54:80–97

    Article  Google Scholar 

  • Latash ML, Scholz JP, Schöner G (2007) Toward a new theory of motor synergies. Mot Control 11(3):276–308

    Article  Google Scholar 

  • Lucero JC (2005) Comparison of measures of variability of speech movement trajectories using synthetic records. J Speech Lang Hear Res 48(2):336–344

    Article  PubMed  Google Scholar 

  • Marwan N, Romano MC, Thiel M, Kurths J (2007) Recurrence plots for the analysis of complex systems. Phys Rep 438(5):237–329

    Article  Google Scholar 

  • Marwan N, Kurths J (2002) Nonlinear analysis of bivariate data with cross recurrence plots. Phys Lett A 302(5):299–307

    Article  CAS  Google Scholar 

  • McFarland DH, Baum SR (1995) Incomplete compensation to articulatory perturbation. J Acoust Soc Am 97(3):1865–1873

    Article  PubMed  CAS  Google Scholar 

  • McFarland DH, Baum SR, Chabot C (1996) Speech compensation to structural modifications of the oral cavity. J Acoust Soc Am 100(2):1093–1104

    Article  PubMed  CAS  Google Scholar 

  • Mooshammer C, Hoole P, Geumann A (2007) Jaw and order. Lang Speech 50(2):145–176

    Article  PubMed  Google Scholar 

  • Morris JS, Carroll RJ (2006) Wavelet-based functional mixed models. J R Stat Soc Ser B 68(2):179–199

    Article  Google Scholar 

  • Olsen MA, Hartung D, Busch C, Larsen R (2011) Convolution approach for feature detection in topological skeletons obtained from vascular patterns. In: 2011 IEEE workshop on computational intelligence in biometrics and identity management (CIBIM), pp 163–167

  • Papcun G, Hochberg J, Thomas T, Laroche F, Zacks J, Levy S (1992) Inferring articulation and recognizing gestures from acoustics with a neural network trained on x-ray microbeam data. J Acoust Soc Am 92(2):688–700

    Article  PubMed  CAS  Google Scholar 

  • Perkell JS (2012) Movement goals and feedback and feedforward control mechanisms in speech production. J Neurolinguistics 25(5):382–407

    Article  PubMed  Google Scholar 

  • Ramsay JO (2006) Functional data analysis. Wiley, New York

    Book  Google Scholar 

  • Rochet-Capellan A, Schwartz JL (2007) An articulatory basis for the labial-to-coronal effect:/pata/seems a more stable articulatory pattern than /tapa/. J Acoust Soc Am 121(6):3740–3754

    Article  PubMed  Google Scholar 

  • Romano MC, Thiel M, Kurths J, Grebogi C (2007) Estimation of the direction of the coupling by conditional probabilities of recurrence. Phys Rev E 76(3):036211

    Article  CAS  Google Scholar 

  • Romano MC, Thiel M, Kurths J, Mergenthaler K, Engbert R (2009) Hypothesis test for synchronization: twin surrogates revisited. Chaos: an interdisciplinary. J Nonlinear Sci 19(1):015108

    Google Scholar 

  • Rulkov NF, Sushchik MM, Tsimring LS, Abarbanel HD (1995) Generalized synchronization of chaos in directionally coupled chaotic systems. Phys Rev E 51(2):980–994

    Article  CAS  Google Scholar 

  • Saltzman E, Kelso JA (1987) Skilled actions: a task-dynamic approach. Psychol Rev 94(1):84

    Article  PubMed  CAS  Google Scholar 

  • Saltzman EL, Munhall KG (1989) A dynamical approach to gestural patterning in speech production. Ecol Psychol 1(4):333–382

    Article  Google Scholar 

  • Schöner G, Martin V, Reimann H, Scholz JP (2008) Motor equivalence and the uncontrolled manifold. In: Proceedings of the international seminar on speech production (ISSP 2008), Strasbourg, France, pp 23–28

  • Sugihara G, May R, Ye H, Hsieh CH, Deyle E, Fogarty M, Munch S (2012) Detecting causality in complex ecosystems. Science 338(6106):496–500

    Article  PubMed  CAS  Google Scholar 

  • Thiel M, Romano MC, Read PL, Kurths J (2004) Estimation of dynamical invariants without embedding by recurrence plots. Chaos: an Interdisciplinary. J Nonlinear Sci 14(2):234–243

    CAS  Google Scholar 

  • Tourville JA, Guenther FH (2011) The DIVA model: a neural theory of speech acquisition and production. Lang Cogn Process 26(7):952–981

    Article  PubMed  Google Scholar 

  • Turvey MT (1977) Preliminaries to a theory of action with reference to vision. In Shaw RE, Bransford J (eds) Perceiving, acting and knowing. Lawrence Erlbaum Associates, pp 211–265

  • Weirich M, Lancia L, Brunner J (2013) Inter-speaker articulatory variability during vowel-consonant-vowel sequences in twins and unrelated speakers. J Acoust Soc Am 134(5):3766–3780

    Article  PubMed  Google Scholar 

  • Zou Y, Romano MC, Thiel M, Marwan N, Kurths J (2011) Inferring indirect coupling by means of recurrences. Int J Bifurcat Chaos 21(04):1099–1111

    Article  Google Scholar 

Download references

Acknowledgements

Leonardo Lancia’s work, carried out within the Labex BLRI (ANR-11-LABX-0036) and EFL (ANR-10-LABX-0083), has benefited from support from the French government, managed by the French National Agency for Research (ANR), under the program “Investissements d’Avenir”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo Lancia.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (docx 746 KB)

Appendix: Simulations details

Appendix: Simulations details

We validate methods for detecting coupling in simulation experiments where we can control the direction and the level of coupling, as well as amplitude or temporal variability. Throughout this study, we are only considering unidirectional coupling. We are interested in the generalized synchronization of two dynamical systems X and Y, i.e., we do not use coupling scenarios like \(\dot{x}=F(x)+\mu (y-x)\), which would easily lead to complete synchronization in our setups.

1.1 Uninterrupted simulations of structurally different systems

In order to obtain results comparable to those obtained in previous works based on joint recurrence analysis (e.g. Romano et al., 2007), we investigate a Van der Pol oscillator X and a Rössler system. In a first scenario, we let X drive Y. The systems are described by the equations:

$$\begin{aligned}&\dot{x}=x_{2}, \nonumber \\&{{\dot{x}}_{2}}=0.1\left( 1-x_{1}^{2} \right) {{x}_{2}}-{{\omega }^{2}}{{x}_{1}}+\nu {{\sigma }_{{{x}_{2}}}}\xi , \end{aligned}$$
(8)

and

$$\begin{aligned} {{\dot{y}}_{1}}= & {} -{{y}_{2}}-{{y}_{3}},\nonumber \\ {{\dot{y}}_{2}}= & {} {{y}_{1}}+0.15{{y}_{2}}+\mu \frac{{{\sigma }_{{{y}_{2}}}}}{{{\sigma }_{{{x}_{1}}}}}{{x}_{1}},\nonumber \\ {{\dot{y}}_{3}}= & {} ({{y}_{1}}-10){{y}_{3}}+0.2. \end{aligned}$$
(9)

Here, \(\xi \) is Gaussian white noise with a standard deviation of 1, which is added to the driving system. The driven system features a coupling term. The amount of noise is given by \(\upnu \) and the coupling strength is given by \(\mu \). \(\sigma \) is the standard deviation of the respective components, evaluated for a typical realization of the trajectories without noise or coupling (\(v=0\), \(\mu =0)\), used for normalization. \(\omega \) is a frequency parameter, and for \(\omega =1\) both systems have approximately the same frequency, depending on the levels of noise and coupling. Since values around 1.0 (e.g., \(\omega \in \left[ {0.7,1.3} \right] )\) quickly produce phase synchronization even for low levels of coupling, we choose X having half of Y ’s frequency by \(\omega =0.5\) in the setup of long trajectories. X then completes half a cycle on average in the temporal interval \(\left[ {0,2\pi } \right] \) while Y completes one. In this way, the influence of coupling is rather local and leads to generalized synchronization. The systems are numerically integrated using the forward Euler method and a constant step size of \(dt=0.01\) in the interval \(\left[ {0,200} \right] \). In this interval, X completes around 16 cycles, Y around twice as many. The length of the interval is chosen so that the number of oscillations of the slower X trajectory roughly matches the number of gestures found in the real data investigated in this study (Fig. 11).

Fig. 11
figure 11

A noisy Van der Pol system X (Eq. 8) driving a Rössler system Y (Eq. 9), \(t\in \left[ {0,200} \right] \), \(v=0.3\), \(\mu =0.3\)

A ”warmup” interval of the same length is used and discarded afterwards, which guarantees that the systems are not in a transient state. The remaining 20,000 time steps are downsampled to \(N=2000\) points for further analysis.

For investigating the methods’ independence of the special structure of driver and driven systems, we also consider the case of X driven by Y. The equations read

$$\begin{aligned} \dot{x}_1= & {} x_2 , \nonumber \\ {{\dot{x}}_{2}}= & {} 0.1\left( 1-x_{1}^{2} \right) {{x}_{2}}-{{\omega }^{2}}{{x}_{1}}+\mu \frac{{{\sigma }_{{{x}_{2}}}}}{{{\sigma }_{{{y}_{1}}}}}{{y}_{1}}, \end{aligned}$$
(10)

and

$$\begin{aligned} {{\dot{y}}_{1}}= & {} -{{y}_{2}}-{{y}_{3}},\nonumber \\ {{\dot{y}}_{2}}= & {} {{y}_{1}}+0.15{{y}_{2}}+\nu {{\sigma }_{{{y}_{2}}}}\xi ,\nonumber \\ {{\dot{y}}_{3}}= & {} ({{y}_{1}}-10){{y}_{3}}+0.2. \end{aligned}$$
(11)

As above, a stochastic component \(\xi \) is added to the driver and a coupling term is added to the driven system.

1.2 Multiple-runs simulations

In order to simulate the conditions found in speech data, we also evaluate the equations several times during short time-intervals. This means that for a certain number of repetitions (here: 30), we generate short trajectories of an approximate length of \(2\pi \) under different initial conditions, cf. Fig. 12. For this setup, we can choose \(\omega =1\) (same frequency), because the systems are not run long enough to get phase locked and we still obtain generalized synchronization. The initial conditions are drawn randomly from a set of typical realizations of the long trajectories subject to the constraint that the difference in phase across repetitions and between the two systems is smaller than \(\pi /2\), so a certain temporal alignment is guaranteed. The systems are numerically integrated in the interval \(\left[ {0,6\pi } \right] \) (using \(dt=0.01\) as above), where they complete approximately three cycles. The portion of time corresponding to the second cycle is then extracted (where the beginning of the cycle is set at the time point where \(x_{1}\) is closer to 0) and each of the 30 trajectories obtained is downsampled to 80 points (Fig. 12).

Fig. 12
figure 12

15 cycles of a Van der Pol system X (Eq. 8) driving a Rössler system Y (Eq. 9),  \(v=0.3,\, \gamma =0.3,\, \mu =0.1\)

1.3 Sources of variability in the simulations

We add synthetic dynamical noise and temporal variability to the systems (Lancia et al. 2014). We distinguish between internal and observational variability. Internal noise is induced to the driver before coupling, such that it also affects the dynamics of the driven. Observational variability is applied to both systems afterwards and, therefore, does not interfere with the coupling relation, but will make it even harder to identify.

1.4 Internal dynamical noise

Internal dynamical noise is introduced to the driver by the additive white noise term \(\xi \). The effect of the noise term is modulated by the parameter v. Due to the normalization, e.g. for \(v=0.3\), the additive noise has a standard deviation 30% of the respective original signal’s standard deviation \(\sigma \). Through the coupling term, intrinsic noise is propagated to the driven system.

1.5 Internal temporal variability

First we describe the general case of a continuous time-deformed signal, following Lucero (2005). The methods for the discretized signals are introduced subsequently. The signal \(\tilde{x} \left( t \right) \), \(t\in \left[ {0,T} \right] \), is obtained by the original signal \(x\left( t \right) \), via a mapping function \(H\left( t \right) \) introducing the required amount of temporal variability:

$$\begin{aligned} \tilde{x}\left( t \right) =x\left( {H\left( t \right) } \right) \end{aligned}$$
(12)

First, consider a mapping function which is the identity \(H\left( t \right) =t\) or, equivalently, a signal \(x\left( {H\left( t \right) } \right) \) moving at constant speed \(dH\left( t \right) /dt=1\). Now, let the signal speed be perturbed by temporal noise

$$\begin{aligned} \frac{dH\left( t \right) }{dt}=1+\gamma \phi \left( t \right) , \end{aligned}$$
(13)

where \(\phi \left( t \right) \) is a Gaussian process with zero mean, standard deviation of 1 and an autocorrelation function \(exp\left( {-\left( {\theta t} \right) ^{2}} \right) \). The correlation length parameter \(\theta \) is adapted such that it features approximately five extrema per cycle. \(\gamma \) controls the amount of temporal variation. \(\gamma \phi \left( t \right) >-1\) is required for a monotonic mapping \(H\left( t \right) \). Otherwise, locally negative slopes occur, which means the new trajectory contains time-reversed portions of the original trajectory.

The mapping function \(H\left( t \right) \) is obtained by integration:

$$\begin{aligned} H\left( t\right) =t+\gamma \mathop {\int }\nolimits _{0}^{t} \phi \left( s\right) ds. \end{aligned}$$
(14)

For a discrete time-series \(x\left( n \right) \) with dimensionless time-scaling \(\left( {n=0,1,\ldots ,N} \right) \), temporal variability is introduced as follows. A discrete Gaussian process \(\phi \left( n \right) \), satisfying \(\phi \left( 0 \right) =0\), is generated and the mapping function

$$\begin{aligned} H\left( n \right) =n+\gamma \mathop \sum \limits _{k=0}^n \phi \left( k \right) \end{aligned}$$
(15)

is computed, satisfying \(H\left( 0 \right) =0\). In a second step, the mapping is normalized by

$$\begin{aligned} \tilde{H} \left( n \right) =H\left( n \right) \frac{N}{H\left( N \right) } \end{aligned}$$
(16)

which leads to \(\tilde{H} \left( n \right) =N\). Since a mapping is generally not integer-valued (\(\tilde{H} \left( n \right) \notin N)\), an interpolation of \(x\left( 0 \right) ,x\left( 1 \right) ,\ldots ,x\left( N \right) \) has to be used to compute the values of the time-deformed signal

$$\begin{aligned} \tilde{x} \left( n \right) =x\left( {\tilde{H} \left( n \right) } \right) \end{aligned}$$
(17)

between the original trajectory’s discretization points \(0,1,\ldots ,N\). For the multiple-runs simulations, the temporal deformation is applied to each run separately. The effect of temporal deformation is illustrated in Fig. 13.

Fig. 13
figure 13

a Temporal deformation of the Van der Pol system (Eqs. 8, 10), dashed: original \( x \left( n \right) \), solid: deformed \(\tilde{x} \left( n \right) ,\gamma =0.3\). b Corresponding time-shift \(\tilde{H} \left( n \right) -n\). c Quantification of temporal deformation: amplitude root-mean-square error \(RMSE/\sigma \) vs. different values of \(\gamma \) in Eq. (14) for 100 Van der Pol simulations each. E.g. \(,\gamma =0.3\) introduces a mean amplitude error of \(0.87\sigma \)

To introduce internal temporal variability, we do not let \(Y\left( \cdot \right) \) be driven by \(X\left( \cdot \right) \), but by a temporally deformed signal \(X\left( {H\left( \cdot \right) } \right) \), where the amount of variability is given by \(\gamma \). The analysis of coupling relations is then performed on \(X\left( {H\left( \cdot \right) } \right) \) and \(Y\left( \cdot \right) \).

1.6 Observational temporal variability

We let X drive Y without any temporal deformation. Afterwards, both signals are deformed by different mappings, \(H\left( \cdot \right) \) and \(G\left( \cdot \right) \). The amount of temporal variability is given by \(\gamma _x \) and \(\gamma _y \), respectively. In this way we can also allow different amounts of noise in \(X\left( {H\left( \cdot \right) } \right) \) and \(Y\left( {G\left( \cdot \right) } \right) \). Then we compare the time-deformed signals \(X\left( {H\left( \cdot \right) } \right) \) to \(Y\left( {G\left( \cdot \right) } \right) \) for detecting the coupling relation.

1.7 Simulations parameters

Purely deterministic simulations: (\(v=0\), \(\gamma =0\), \(\gamma _X =0\), \(\gamma _Y =0)\); simulations with intrinsic noise and temporal variability: (\(v=0.3\), \(\gamma =0.3\),, \(\gamma _X =0\), \(\gamma _Y =0)\); uninterrupted simulations with intrinsic noise and variability and observational temporal variability: (\(v=0.3\), \(\gamma =0.3\), \(\gamma _X =0.3\), \(\gamma _Y =0.3)\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lancia, L., Rosenbaum, B. Coupling relations underlying the production of speech articulator movements and their invariance to speech rate. Biol Cybern 112, 253–276 (2018). https://doi.org/10.1007/s00422-018-0749-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-018-0749-y

Keywords

Navigation