Abstract
The phonetization of text corpora requires a sequence of processing steps and resources in order to convert a normalized text in its constituent phones and then to directly exploit it by a given application. This paper presents a generic approach for text phonetization and concentrates on the aspects of phonetizing unknown words. This serves to develop a phonetizer in the context of forced-alignment application. The proposed approach is dictionary-based, which is as language-independent as possible. It is used on French, English, Spanish, Italian, Catalan, Polish, Mandarin Chinese, Taiwanese, Cantonese and Japanese in SPPAS software, a tool distributed under the terms of the GPL license.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Allen, J., Hunnicutt, M.S., Dennis, H.: From Text to Speech: The MITalk System. Cambridge University Press, New York (1987)
Belrhali, R., Aubergé, V., Boë, L.J.: From lexicon to rules: toward a descriptive method of french text-to-phonetics transcription. In: The Second International Conference on Spoken Language Processing (1992)
Bigi, B.: A multilingual text normalization approach. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNAI, vol. 8387, pp. 515–526. Springer, Heidelberg (2014)
Bigi, B.: SPPAS: a tool for the phonetic segmentations of speech. In: The Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey, pp. 1748–1755 (2012). ISBN 978-2-9517408-7-7
Bigi, B., Péri, P., Bertrand, R.: Orthographic transcription: which enrichment is required for phonetization? In: The Eighth International Conference on Language Resources and Evaluation, Istanbul, Turkey, pp. 1756–1763 (2012). ISBN 978-2-9517408-7-7
Bigi, B., Portes, C., Steuckardt, A., Tellier, M.: Multimodal annotations and categorization for political debates. In: ICMI Workshop on Multimodal Corpora for Machine learning, Alicante (Spain) (2011)
Bisani, M., Ney, H.: Joint-sequence models for grapheme-to-phoneme conversion. Speech Commun. 50(5), 434–451 (2008)
Blache, P., Bertrand, R., Bigi, B., Bruno, E., Cela, E., Espesser, R., Ferré, G., Guardiola, M., Hirst, D., Magro, E.P., Martin, J.C., Meunier, C., Morel, M.A., Murisasco, E., Nesterenko, I., Nocera, P., Pallaud, B., Prévot, L., Priego-Valverde, B., Seinturier, J., Tan, N., Tellier, M., Rauzy, S.: Multimodal annotation of conversational data. In: The Fourth Linguistic Annotation Workshop, Uppsala, Sueden, pp. 186–191 (2010)
Caseiro, D., Trancoso, L., Oliveira, L., Viana, C.: Grapheme-to-phone using finite-state transducers. In: IEEE Workshop on Speech Synthesis, pp. 215–218 (2002)
Chalamandaris, A., Raptis, S., Tsiakoulis, P.: Rule-based grapheme-to-phoneme method for the Greek. Trees 18, 19 (2005)
Daelemans, W.M.P., van den Bosch, A.P.J.: Language-independent data-oriented grapheme-to-phoneme conversion. In: van Santen, J.P.H., Olive, J.P., Sproat, R.W., Hirschberg, J. (eds.) Progress in Speech Synthesis, pp. 77–89. Springer, New York (1997)
Damper, R., Marchand, Y., Adamson, M., Gustafson, K.: Comparative evaluation of letter-to-sound conversion techniques for english text-to-speech synthesis. In: The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis (1998)
Demenko, G., Wypych, M., Baranowska, E.: Implementation of grapheme-to-phoneme rules and extended sampa alphabet in polish text-to-speech synthesis. Speech Lang. Technol. 7, 79–97 (2003)
Divay, M., Guyomard, M.: Grapheme-to-phoneme transcription for French. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 575–578 (1977)
Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Text, Speech and Language Technology, vol. 3. Springer, Dordrecht (1997)
El-Imam, Y.: Phonetization of Arabic: rules and algorithms. Comput. Speech Lang. 18(4), 339–373 (2004)
El-Imam, Y., Don, Z.: Text-to-speech conversion of standard Malay. Int. J. Speech Technol. 3(2), 129–146 (2000)
Galescu, L., Allen, J.: Bi-directional conversion between graphemes and phonemes using a joint n-gram model. In: 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis (2001)
Gera, P.: Text to speech synthesis for Punjabi language. M.Tech Thesis, Thapar University (2006)
Goldman, J.P.: EasyAlign: a friendly automatic phonetic alignment tool under Praat. In: Interspeech. No. Ses1-S3: 2, Florence, Italy (2011)
Herment, S., Loukina, A., Tortel, A., Hirst, D., Bigi, B.: A multi-layered learners corpus: automatic annotation. In: 4th International Conference on Corpus Linguistics Language, Corpora and Applications: Diversity and Change, Jaén (Spain) (2012)
Jiampojamarn, S., Cherry, C., Kondrak, G.: Joint processing and discriminative training for letter-to-phoneme conversion. In: ACL, pp. 905–913 (2008)
József, D., Ovidiu, B., Gavril, T.: Automated grapheme-to-phoneme conversion system for Romanian. In: 6th Conference on Speech Technology and Human-Computer Dialogue, pp. 1–6 (2011)
Kim, B., Lee, G.G., Lee, J.H.: Morpheme-based grapheme to phoneme conversion using phonetic patterns and morphophonemic connectivity information. J. ACM Trans. Asian Lang. Inf. Process. 1(1), 65–82 (2002)
Laurent, A., Deléglise, P., Meignier, S.: Grapheme to phoneme conversion using an SMT system. In: Interspeech, pp. 708–711 (2009)
Levinson, S., Olive, J., Tschirgi, J.: Speech synthesis in telecommunications. IEEE Commun. Mag. 31(11), 46–53 (1993)
Nagoya Institute of Technology: Open-source large vocabulary CSR engine Julius, rev. 4.1.5 (2010)
Schlippe, T., Ochs, S., Schultz, T.: Grapheme-to-phoneme model generation for Indo-European languages. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4801–4804 (2012)
Tarsaku, P., Sornlertlamvanich, V., Thongprasirt, R.: Thai grapheme-to-phoneme using probabilistic GLR parser. In: Interspeech, Aalborg, Denmark (2001)
Taylor, P.: Hidden Markov models for grapheme to phoneme conversion. In: Interspeech, pp. 1973–1976 (2005)
Thangthai, A., Wutiwiwatchai, C., Rugchatjaroen, A., Saychum, S.: A learning method for Thai phonetization of English words. In: Interspeech, pp. 1777–1780 (2007)
Torkkola, K.: An efficient way to learn English grapheme-to-phoneme rules automatically. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 199–202 (1993)
Young, S., Young, S.: The HTK hidden Markov model toolkit: design and philosophy, vol. 2, pp. 2–44. Entropic Cambridge Research Laboratory, Ltd. (1994)
Yvon, F., de Mareüil, P.B., et al.: Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French. Comput. Speech Lang. 12(4), 393–410 (1998)
Acknowledgement
This work has been partly carried out thanks to the support of the French state program ORTOLANG (Ref. Nr. ANR-11-EQPX-0032) funded by the “Investissements d’Avenir” French Government program, managed by the French National Research Agency (ANR). The support is gratefully acknowledged (http://www.ortolang.fr).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Bigi, B. (2016). A Phonetization Approach for the Forced-Alignment Task in SPPAS. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2013. Lecture Notes in Computer Science(), vol 9561. Springer, Cham. https://doi.org/10.1007/978-3-319-43808-5_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-43808-5_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43807-8
Online ISBN: 978-3-319-43808-5
eBook Packages: Computer ScienceComputer Science (R0)