Automatic syllabification of speech signal using short time energy and vowel onset points

Mary, Leena; Antony, Anil P.; Babu, Ben P.; Prasanna, S. R. Mahadeva

doi:10.1007/s10772-018-9517-6

Automatic syllabification of speech signal using short time energy and vowel onset points

Published: 09 May 2018

Volume 21, pages 571–579, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Leena Mary ORCID: orcid.org/0000-0002-0080-0177¹,
Anil P. Antony²,
Ben P. Babu² &
…
S. R. Mahadeva Prasanna³

351 Accesses
8 Citations
Explore all metrics

Abstract

This paper describes a language independent method for automatic syllabification of speech signal. This method utilizes the valleys in short time energy (STE) contour and location of vowel onset points (VOP) for marking the syllable boundaries. In the proposed method, automatic syllabification is performed in three steps. First, long silence/pause regions are marked with the help of speech/non-speech detection. Then VOPs are located from the Hilbert Envelope of LP residual. The existence of more than one VOP in a continuous speech region (identified using speech/non-speech detection in the first step) is an indication of syllable boundaries within the region. Location with minimum energy in the STE contour between two consecutive VOP is identified as the syllable boundary. Since automatic VOP detection algorithm fails to detect some of the VOPs, certain syllable boundaries will be missed. Therefore, at the third step, additional syllable boundaries are detected from STE contour by fixing a valley threshold which is equal to the mean value of STE corresponding to each speech region between two consecutive syllable boundaries. This method is evaluated for 50 sentences each in read, extempore and conversational mode speech of Malayalam and Bengali languages. Overall accuracy of 80% is obtained with ± 50 ms tolerance with reference to manually marked syllable boundaries for this database. Method also shows good accuracy in case of TIMIT and NTIMIT data without tuning of thresholds and other parameters. This method is useful for applications that do not require exact syllable boundaries, rather a meaningful separation of syllables. Application of this technique for prosody based emotion recognition is illustrated using Emo-DB German emotional database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analysis on Syllable-Based Intonational Features of Assamese Speech Signals

Syllable Segmentation of Tamil Speech Signals Using Vowel Onset Point and Spectral Transition Measure

Article 01 January 2018

Issues in Formant Analysis of Emotive Speech Using Vowel-Like Region Onset Points

References

Mary, L., Anish, Babu K. K., & joseph, Aju. (2012). Analysis and detection of mimicked speech based on prosodic features. International Journal of Speech Technology, 15, 407–417.
Article Google Scholar
Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosody for language and speaker recognition. Speech Communication, 50(10), 782–796.
Article Google Scholar
Mermelstein, P. (1975). Automatic segmentation of speech into syllabic units. The Journal of the Acoustical Society of America, 58(4), 880–883.
Article Google Scholar
Mohanan, V., & Mary, L. (2016). Prosody based emotion recognition using SVM. In Proceedings of the International Conference on Signal & Speech Processing (ICSSP-2016), Kollam.
Nagarajan, T., Murthy, Hema A., Hegde, Rajesh M. (2003). Automatic segmentation of speech into syllable-like units. Eurospeech. Geneva, pp.2893-2896
Nair, L. M., & Mary, L. (2015). Pair-wise language discrimination using phonotactic information. In Proceedings of the 2015 International Conference on Control Communication & Computing India (ICCC), Trivandrum (pp. 544-547).
Nel, P., & du Preez, J. (2003). Automatic syllabification using hierarchical hidden markov models. In Proceedings of the ICASSP (pp. 768–771) Cambridge, MA: MIT Press.
Pradhan, G., & Prasanna, S. R. M. (2011). Significance of vowel onset point information for speaker verification. International Journal of Computer and Communication Technology, 2, 56–61.
Google Scholar
Prasad, V. K., Nagarajan, T., & Murthy, Hema A. (2004). Automatic segmentation of continuous speech using minimum phase group delay functions. Speech Communication, 42, 429–446.
Article Google Scholar
Prasanna, S. R. M. (2004). Event-based analysis of speech, Ph.D thesis, Indian Institute of Technology Madras, Department of Computer Science and Engg., Chennai
Prasanna, S.R. M., Yegnanarayana, B. (2005). Detection of vowel onset point events using excitation information, INTERSPEECH, pp.1133-1136
Rao, K. S., & Yegnanarayana, B. (2009). Intonation modeling for Indian languages. Computer, Speech and Language, 23(2), 240–256.
Article Google Scholar
Sebastian, K., & Mary, L. (2016). FASR: Effect of voice disguise. Paper presented at the 2016 International Conference on Emerging Technological Trends (ICETT), Kollam (pp. 1–4).
Villing, R., Timoney, J., & Ward, T. (2004). Automatic blind syllable segmentation for continuous speech.ISSC. Belfast
Zhang, Y., & Glass, J. (2009). Speech rhythm guided syllable nuclei detection. In Proceeding of the ICASSP (pp. 3797–3800). Cambridge, MA: MIT Press.

Download references

Acknowledgements

The authors would like to thank Kerala State Council for Science, Technology and Environment (KSCSTE), Government of Kerala, India for their support.

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Government Engineering College, Idukki, Kerala, India
Leena Mary
Department of Electronics and Communication Engineering, Rajiv Gandhi Institute of Technology, Kottayam, Kerala, India
Anil P. Antony & Ben P. Babu
Department of Electronics and Electrical Engineering, Indian Institute of Technology, Guwahati, India
S. R. Mahadeva Prasanna

Authors

Leena Mary
View author publications
You can also search for this author in PubMed Google Scholar
Anil P. Antony
View author publications
You can also search for this author in PubMed Google Scholar
Ben P. Babu
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leena Mary.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mary, L., Antony, A.P., Babu, B.P. et al. Automatic syllabification of speech signal using short time energy and vowel onset points. Int J Speech Technol 21, 571–579 (2018). https://doi.org/10.1007/s10772-018-9517-6

Download citation

Received: 08 May 2017
Accepted: 19 April 2018
Published: 09 May 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10772-018-9517-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic syllabification of speech signal using short time energy and vowel onset points

Abstract

Access this article

Similar content being viewed by others

Analysis on Syllable-Based Intonational Features of Assamese Speech Signals

Syllable Segmentation of Tamil Speech Signals Using Vowel Onset Point and Spectral Transition Measure

Issues in Formant Analysis of Emotive Speech Using Vowel-Like Region Onset Points

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic syllabification of speech signal using short time energy and vowel onset points

Abstract

Access this article

Similar content being viewed by others

Analysis on Syllable-Based Intonational Features of Assamese Speech Signals

Syllable Segmentation of Tamil Speech Signals Using Vowel Onset Point and Spectral Transition Measure

Issues in Formant Analysis of Emotive Speech Using Vowel-Like Region Onset Points

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation