An Open Source Speech Synthesis Frontend for HTS

Toman, Markus; Pucher, Michael

doi:10.1007/978-3-319-24033-6_33

Markus Toman¹⁵ &
Michael Pucher¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1823 Accesses

Abstract

This paper describes a software framework for HMM-based speech synthesis that we have developed and released to the public. The framework is compatible to the well-known HTS toolkit by incorporating hts_engine and Flite. It enables HTS voices to be used as Microsoft Windows system voices and to be integrated into Android and iOS apps. Non-English languages are supported through the capability to load Festival format pronunciation dictionaries and letter to sound rules. The release also includes an Austrian German voice model of a male, professional speaker recorded in studio quality as well as pronunciation dictionary, letter to sound rules and basic text preprocessing procedures for Austrian German. The framework is available under an MIT-style license.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Communication 51(11), 1039–1064 (2009)
Article Google Scholar
Yamagishi, J., Kobayashi, T.: Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training. IEICE Transactions on Information and Systems E90–D(2), 533–543 (2007)
Article Google Scholar
Pucher, M., Schabus, D., Yamagishi, J., Neubarth, F., Strom, V.: Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis. Speech Communication 52(2), 164–179 (2010)
Article Google Scholar
Valentini-Botinhao, C., Toman, M., Pucher, M., Schabus, D., Yamagishi, J.: Intelligibility analysis of fast synthesized speech. In: Proc. Interspeech, Singapore, pp. 2922–2926, September 2014
Google Scholar
Wu, Y.J., Nankaku, Y., Tokuda, K.: State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis. In: Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH), Brighton, United Kingdom, pp. 528–531 (2009)
Google Scholar
Toman, M., Pucher, M., Schabus, D.: Cross-variety speaker transformation in HSMM-based speech synthesis. In: Proceedings of the 8th ISCA Workshop on Speech Synthesis (SSW), Barcelona, Spain, pp. 77–81, August 2013
Google Scholar
Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A.W., Tokuda, K.: The HMM-based speech synthesis system (HTS) version 2.0. In: Proceedings of the 6th ISCA Workshop on Speech Synthesis (SSW), Bonn, Germany, pp. 294–299, August 2007
Google Scholar
HTS working group: hts-engine. http://hts-engine.sourceforge.net/
University of Edinburgh: Festival. http://www.cstr.ed.ac.uk/projects/festival/
Carnegie Mellon University: Flite. http://www.festvox.org/flite/
Microsoft Corporation: Visual Studio. https://www.visualstudio.com/
Google Inc: Android NDK. https://developer.android.com/tools/sdk/ndk/
Black, A.W., Lenzo, K., Pagel, V.: Issues in building general letter to sound rules. In: The Third ESCA Workshop in Speech Synthesis, pp. 77–80 (1998)
Google Scholar
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T.: Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proceedings of the 6th European Conference on Speech Communication and Technology (EUROSPEECH), Budapest, Hungary, pp. 2374–2350, September 1999
Google Scholar

Download references

Author information

Authors and Affiliations

FTW Telecommunications Research Center Vienna, Donau-City-Straße 1, 1220, Vienna, Austria
Markus Toman & Michael Pucher

Authors

Markus Toman
View author publications
You can also search for this author in PubMed Google Scholar
Michael Pucher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markus Toman .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toman, M., Pucher, M. (2015). An Open Source Speech Synthesis Frontend for HTS. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_33
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics