Skip to main content

Evaluation of a Segmental Durations Model for TTS

  • Conference paper
  • First Online:
Book cover Computational Processing of the Portuguese Language (PROPOR 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2721))

  • 437 Accesses

Abstract

In this paper we present a condensed description of a European Portuguese segmental duration’s model for TTS purposes and concentrate on its evaluation. This model is based on artificial neural networks. The evaluation of the model quality was made by comparison with read speech. The standard deviation reached in test set is 19.5 ms and the linear correlation coefficient is 0.84. The model is perceptually evaluated with 4.12 against 4.30 for natural human read speech in a scale of 5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Campbell, W.N., “Predicting Segmental Durations for Accommodation within a Syllable-Level Timing Framework”, Proceeding Eurospeech 93, volume 2, pag. 1081–1084.

    Google Scholar 

  2. Van Santen, J.P.H., “Assignment of segmental duration in text-to-speech synthesis”, in Computer Speech and Language, 8, 95–128, 1994.

    Article  Google Scholar 

  3. Barbosa P., Bailly G., “Generation of pauses within the z-score model”, in “Progress in Speech Synthesis”, by Van Santen J.P. et al, editors. Springer-Verlag, 1997.

    Google Scholar 

  4. Barbosa P., “A Model of Segment (and Pause) Duration Generation for Brazilian Portuguese Text-to-Speech Synthesis”, in Eurospeech’97, Rodes.

    Google Scholar 

  5. Klatt, D.H., “Linguistic uses of segmental duration in English: Acoustic and perceptual evidence”, JASA, 59, 1209–1221, 1976.

    Google Scholar 

  6. Zellner, B., “Caractérisation et prédiction du débit de parole en français — Une étude de cas”, PhD, U. de Lausanne, 1998.

    Google Scholar 

  7. Salgado, Xavier F., e Banga E.R., “Segmental Duration Modelling in a Text-to-Speech System for the Galician Language”, in Eurospeech’99, Budapeste.

    Google Scholar 

  8. Córdoba, Vallejo, Montero, Gutierrez, López., Pardo, “Automatic Modelling of Duration in a Spanish Text-to-Speech System Using Neural Networks. Eurospeech’99.

    Google Scholar 

  9. Hifny, Y., Rashwan, M., “Duration Modeling for Arabic Text to Speech Synthesis”, Proceedings of ICSLP’ 2002.

    Google Scholar 

  10. Chung, H., “Segment Duration in Spoken Korean”, Proceedings of ICSLP’ 2002.

    Google Scholar 

  11. Mixdorff, H., “An Integrated Approach to Modeling German Prosody”, Thesis for Dr.-Ing. Habil., Technical University of Dresden, 2002.

    Google Scholar 

  12. Teixeira, J.P., Freitas, D., Braga, D., Barros, M.J., Latsch, V., “Phonetic Events from the Labeling the European Portuguese Database for Speech Synthesis, FEUP/IPB-DB”, in Eurospeech’ 01, Aalborg.

    Google Scholar 

  13. Hagan, M.T., Menhaj, M., “Training feedforward networks with the Marquardt algorithm”, IEEE Transactions on Neural Networks, vol. 5, n 6, 1994.

    Google Scholar 

  14. Riedmiller, M., and H. Braun, “A direct adaptive method for faster backpropagation learning: The RPROP algorithm”, Proceedings of the IEEE International Conference on Neural Networks, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Teixeira, J.P., Freitas, D. (2003). Evaluation of a Segmental Durations Model for TTS. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-45011-4_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40436-1

  • Online ISBN: 978-3-540-45011-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics