Skip to main content

A Phonetization Approach for the Forced-Alignment Task in SPPAS

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9561))

Abstract

The phonetization of text corpora requires a sequence of processing steps and resources in order to convert a normalized text in its constituent phones and then to directly exploit it by a given application. This paper presents a generic approach for text phonetization and concentrates on the aspects of phonetizing unknown words. This serves to develop a phonetizer in the context of forced-alignment application. The proposed approach is dictionary-based, which is as language-independent as possible. It is used on French, English, Spanish, Italian, Catalan, Polish, Mandarin Chinese, Taiwanese, Cantonese and Japanese in SPPAS software, a tool distributed under the terms of the GPL license.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Allen, J., Hunnicutt, M.S., Dennis, H.: From Text to Speech: The MITalk System. Cambridge University Press, New York (1987)

    Google Scholar 

  2. Belrhali, R., Aubergé, V., Boë, L.J.: From lexicon to rules: toward a descriptive method of french text-to-phonetics transcription. In: The Second International Conference on Spoken Language Processing (1992)

    Google Scholar 

  3. Bigi, B.: A multilingual text normalization approach. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNAI, vol. 8387, pp. 515–526. Springer, Heidelberg (2014)

    Google Scholar 

  4. Bigi, B.: SPPAS: a tool for the phonetic segmentations of speech. In: The Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey, pp. 1748–1755 (2012). ISBN 978-2-9517408-7-7

    Google Scholar 

  5. Bigi, B., Péri, P., Bertrand, R.: Orthographic transcription: which enrichment is required for phonetization? In: The Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey, pp. 1756–1763 (2012). ISBN 978-2-9517408-7-7

    Google Scholar 

  6. Bigi, B., Portes, C., Steuckardt, A., Tellier, M.: Multimodal annotations and categorization for political debates. In: ICMI Workshop on Multimodal Corpora for Machine learning, Alicante (Spain) (2011)

    Google Scholar 

  7. Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)

    Article  Google Scholar 

  8. Blache, P., Bertrand, R., Bigi, B., Bruno, E., Cela, E., Espesser, R., Ferré, G., Guardiola, M., Hirst, D., Magro, E.P., Martin, J.C., Meunier, C., Morel, M.A., Murisasco, E., Nesterenko, I., Nocera, P., Pallaud, B., Prévot, L., Priego-Valverde, B., Seinturier, J., Tan, N., Tellier, M., Rauzy, S.: Multimodal annotation of conversational data. In: The Fourth Linguistic Annotation Workshop, Uppsala, Sueden, pp. 186–191 (2010)

    Google Scholar 

  9. Caseiro, D., Trancoso, L., Oliveira, L., Viana, C.: Grapheme-to-phone using finite-state transducers. In: IEEE Workshop on Speech Synthesis, pp. 215–218 (2002)

    Google Scholar 

  10. Chalamandaris, A., Raptis, S., Tsiakoulis, P.: Rule-based grapheme-to-phoneme method for the Greek. Trees 18, 19 (2005)

    Google Scholar 

  11. Daelemans, W.M.P., van den Bosch, A.P.J.: Language-independent data-oriented grapheme-to-phoneme conversion. In: van Santen, J.P.H., Olive, J.P., Sproat, R.W., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 77–89. Springer, New York (1997)

    Chapter  Google Scholar 

  12. Damper, R., Marchand, Y., Adamson, M., Gustafson, K.: Comparative evaluation of letter-to-sound conversion techniques for english text-to-speech synthesis. In: The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis (1998)

    Google Scholar 

  13. Demenko, G., Wypych, M., Baranowska, E.: Implementation of grapheme-to-phoneme rules and extended sampa alphabet in polish text-to-speech synthesis. Speech Lang. Technol. 7, 79–97 (2003)

    Google Scholar 

  14. Divay, M., Guyomard, M.: Grapheme-to-phoneme transcription for French. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 575–578 (1977)

    Google Scholar 

  15. Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Text, Speech and Language Technology, vol. 3. Springer, Dordrecht (1997)

    Google Scholar 

  16. El-Imam, Y.: Phonetization of Arabic: rules and algorithms. Comput. Speech Lang. 18(4), 339–373 (2004)

    Article  Google Scholar 

  17. El-Imam, Y., Don, Z.: Text-to-speech conversion of standard Malay. Int. J. Speech Technol. 3(2), 129–146 (2000)

    Article  MATH  Google Scholar 

  18. Galescu, L., Allen, J.: Bi-directional conversion between graphemes and phonemes using a joint n-gram model. In: 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis (2001)

    Google Scholar 

  19. Gera, P.: Text to speech synthesis for Punjabi language. M.Tech Thesis, Thapar University (2006)

    Google Scholar 

  20. Goldman, J.P.: EasyAlign: a friendly automatic phonetic alignment tool under Praat. In: Interspeech. No. Ses1-S3: 2, Florence, Italy (2011)

    Google Scholar 

  21. Herment, S., Loukina, A., Tortel, A., Hirst, D., Bigi, B.: A multi-layered learners corpus: automatic annotation. In: 4th International Conference on Corpus Linguistics Language, Corpora and Applications: Diversity and Change, Jaén (Spain) (2012)

    Google Scholar 

  22. Jiampojamarn, S., Cherry, C., Kondrak, G.: Joint processing and discriminative training for letter-to-phoneme conversion. In: ACL, pp. 905–913 (2008)

    Google Scholar 

  23. József, D., Ovidiu, B., Gavril, T.: Automated grapheme-to-phoneme conversion system for Romanian. In: 6th Conference on Speech Technology and Human-Computer Dialogue, pp. 1–6 (2011)

    Google Scholar 

  24. Kim, B., Lee, G.G., Lee, J.H.: Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information. J. ACM Trans. Asian Lang. Inf. Process. 1(1), 65–82 (2002)

    Article  Google Scholar 

  25. Laurent, A., Deléglise, P., Meignier, S.: Grapheme to phoneme conversion using an SMT system. In: Interspeech, pp. 708–711 (2009)

    Google Scholar 

  26. Levinson, S., Olive, J., Tschirgi, J.: Speech synthesis in telecommunications. IEEE Commun. Mag. 31(11), 46–53 (1993)

    Article  Google Scholar 

  27. Nagoya Institute of Technology: Open-source large vocabulary CSR engine Julius, rev. 4.1.5 (2010)

    Google Scholar 

  28. Schlippe, T., Ochs, S., Schultz, T.: Grapheme-to-phoneme model generation for Indo-European languages. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4801–4804 (2012)

    Google Scholar 

  29. Tarsaku, P., Sornlertlamvanich, V., Thongprasirt, R.: Thai grapheme-to-phoneme using probabilistic GLR parser. In: Interspeech, Aalborg, Denmark (2001)

    Google Scholar 

  30. Taylor, P.: Hidden Markov models for grapheme to phoneme conversion. In: Interspeech, pp. 1973–1976 (2005)

    Google Scholar 

  31. Thangthai, A., Wutiwiwatchai, C., Rugchatjaroen, A., Saychum, S.: A learning method for Thai phonetization of English words. In: Interspeech, pp. 1777–1780 (2007)

    Google Scholar 

  32. Torkkola, K.: An efficient way to learn English grapheme-to-phoneme rules automatically. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 199–202 (1993)

    Google Scholar 

  33. Young, S., Young, S.: The HTK hidden Markov model toolkit: design and philosophy, vol. 2, pp. 2–44. Entropic Cambridge Research Laboratory, Ltd. (1994)

    Google Scholar 

  34. Yvon, F., de Mareüil, P.B., et al.: Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French. Comput. Speech Lang. 12(4), 393–410 (1998)

    Article  Google Scholar 

Download references

Acknowledgement

This work has been partly carried out thanks to the support of the French state program ORTOLANG (Ref. Nr. ANR-11-EQPX-0032) funded by the “Investissements d’Avenir” French Government program, managed by the French National Research Agency (ANR). The support is gratefully acknowledged (http://www.ortolang.fr).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brigitte Bigi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Bigi, B. (2016). A Phonetization Approach for the Forced-Alignment Task in SPPAS. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2013. Lecture Notes in Computer Science(), vol 9561. Springer, Cham. https://doi.org/10.1007/978-3-319-43808-5_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43808-5_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43807-8

  • Online ISBN: 978-3-319-43808-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics