Skip to main content

A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Abstract

This paper introduces a new approach based on neural networks for selecting the vocabulary to be used in a speech transcription system. Indeed, nowadays, large sets of text data can be collected from web sources, and used in addition to more traditional text sources for building language models for speech transcription systems. However, web data sources lead to large amounts of heterogeneous data, and, as a consequence, standard vocabulary selection procedures based on unigram approaches tend to select unwanted and undesirable items as new words. As an alternative to unigram-based and empirical manual-based selection approaches, this paper proposes a new selection procedure that relies on a machine learning technique, namely neural networks. The paper presents and discusses the results obtained with the various selection procedures. The neural network based selection experiments are promising and they can handle automatically various detailed information in the selection process.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rosenfeld, R.: Optimizing lexical and ngram coverage via judicious use of linguistic data. In: Proc. EUROSPEECH 1995, 4th European Conf. on Speech Communication and Technology, Madrid, Spain, pp. 1763–1766 (1995)

    Google Scholar 

  2. Allauzen, A., Gauvain, J.-L.: Automatic building of the vocabulary of a speech transcription system (in French) “Construction automatique du vocabulaire d’un système de transcription”. In: Proc. JEP 2004, Journées d’Etudes sur la Parole, Fès, Maroc (2004)

    Google Scholar 

  3. Venkataraman, A., Wang, W.: Techniques for effective vocabulary selection. In: Proc. INTERSPEECH 2003, 8th European Conf. on Speech Communication and Technology, Geneva, Switzerland, pp. 245–248 (2003)

    Google Scholar 

  4. Maergner, P., Waibel, A., Lane, I.: Unsupervised Vocabulary Selection for Real-Time Speech Recognition of Lectures. In: Proc. ICASSP 2012, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Kyoto, Japan (2012)

    Google Scholar 

  5. Mendona, A., Graff, D., DiPersio, D.: French Gigaword, 2nd edn. Linguistic Data Consortium, Philadelphia (2009)

    Google Scholar 

  6. Stolcke, A.: SRILM - An Extensible Language Modeling Toolkit. In: Proc. ICSLP 2002, Int. Conf. on Spoken Language Processing, Denver, Colorado (2002)

    Google Scholar 

  7. Gravier, G., Adda, G.: Evaluations en traitement automatique de la parole (ETAPE). Evaluation Plan, Etape 2011, version 2.0 (2011)

    Google Scholar 

  8. de Calmès, M., Pérennou, G.: BDLEX: A Lexicon for Spoken and Written French. In: Proc. LREC 1998, 1st Int. Conf. on Language Resources & Evaluation, Grenade, pp. 1129–1136 (1998)

    Google Scholar 

  9. FANN toolkit, http://leenissen.dk/fann/wp/

  10. Sphinx (2011), http://cmusphinx.sourceforge.net

  11. Jouvet, D., Vinuesa, N.: Classification margin for improved class-based speech recognition performance. In: ICASSP 2012, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Kyoto, Japan (2012)

    Google Scholar 

  12. Galliano, S., Gravier, G., Chaubard, L.: The Ester 2 evaluation campaign for rich transcription of French broadcasts. In: Proc. INTERSPEECH 2009, Brighton, UK, pp. 2583–2586 (2009)

    Google Scholar 

  13. Corpus EPAC: Transcriptions orthographiques. Catalogue ELRA, reference ELRA-S0305, http://catalog.elra.info

  14. Illina, I., Fohr, D., Jouvet, D.: Grapheme-to-Phoneme Conversion using Conditional Random Fields. In: Proc. INTERSPEECH 2011, Florence, Italy (2011)

    Google Scholar 

  15. Gillick, L., Cox, S.J.: Some statistical issues in the comparison of speech recognition algorithms. In: Proc. ICASSP 1989, Int. Conf. on Acoustics, Speech and Signal Processing, pp. 532–535 (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jouvet, D., Langlois, D. (2013). A Machine Learning Based Approach for Vocabulary Selection for Speech Transcription. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40585-3_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40584-6

  • Online ISBN: 978-3-642-40585-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics