Skip to main content
Log in

Improving Phoneme Classification Performance Using Observation Context–Dependent Segment Models

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This article describes a novel method that models the correlation among acoustic observations in contiguous speech segments. The basic idea behind the method is that acoustic observations are conditioned not only on the phonetic context but also on the preceding acoustic segment observation. The correlation between consecutive acoustic observations is modeled by mean trajectory polynomial segment models (PSM). This method is an extension of conventional segment modeling approaches in that it describes the correlation of acoustic observations not only inside segments but also between contiguous segments. It is also a generalization of phonetic context (e.g., triphone) modeling approaches because it can model acoustic context and phonetic context at the same time. Using the proposed method in a speaker-independent phoneme classification test resulted in a 7 to 9% relative reduction of error rate as compared with the traditional triphone segmental model system and a 31% reduction as compared with a similar triphone hidden Markov model (HMM) system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Fukada, T., Sagisaka, Y., and Paliwal, K.K. (1997). Model parameter estimation for mixture density polynomial segment models. In ICASSP, pp. 1403–1406.

  • Furui, S. (1986). On the role of spectral transition for speech perception. J. Acoust. Soc. Am. 80(4):1016–1025.

    Google Scholar 

  • Gish, H. and Ng, K. (1993). A Segmental speech model with applications to word spotting. In ICASSP-93, pp. II/447–450.

  • Kimball, O. (1994). Segment Modeling Alternatives for Continuous Speech Recognition. Ph.D. thesis. Elect. Comput. Syst. Eng. Dept., Boston University.

  • Lee, K.-F. (1989). Automatic speech recognition: The developement of the SPHINX system. Norwell, Massachusetts 02061: Kluwer Academic Publishers.

    Google Scholar 

  • Ostendorf, M., Digalakis, V.V., and Kimball, O.A. (1996). From HMMs to segment models: A unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing SAP; 4(5):360–378.

    Google Scholar 

  • Ostendorf, M., Kannan, A., Austin, S., Kimball, O., Schwartz, R., and Rohlicek, J.R. (1991). Integration of diverse recognition methodologies through reevaluation of N-Best sentence hypotheses. In Proc. of the DARPA Workshop on Speech and Natural Language, pp. 83–87.

  • Sagisaka, Y., Abe, M., Umeda, T., Katagiri, S., Takeda, K., and Kuwabara, H. (1990). A large-scale japanese speech database. In ICSLP, pp. 1089–1092.

  • Schwartz, R. and Chow, Y.-L. (1990). The N-Best Algorithm: An efficient and exact procedure for finding theNmost likely sentence hypotheses. In ICASSP, pp. 1857–1860.

  • Schwartz, R., Chow, Y.-L., Kimball, O., Roucos, S., Knasser, M., and Makhoul, J. (1985).Context-dependent modeling for acoustic phonetic recognition of continous-speech. In ICASSP, pp. 1205–1208.

  • Szarvas, M. and Matsunaga, S. (1998). Acoustic observation context modeling in segment based speech recognition. In ICSLP-98, pp. VII/2967–2970.

  • Szarvas, M. and Matsunaga, S. (1999). Segment-based speech recognition using acoustic observation context. Technical Report of IEICE SP98-119(1): 9–16.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Szarvas, M., Matsunaga, S. Improving Phoneme Classification Performance Using Observation Context–Dependent Segment Models. International Journal of Speech Technology 3, 253–262 (2000). https://doi.org/10.1023/A:1026502830036

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1026502830036

Navigation