Skip to main content

Chinese Unknown Word Identification Using Class-Based LM

  • Conference paper
Natural Language Processing – IJCNLP 2004 (IJCNLP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

Abstract

This paper presents a modified class-based LM approach to Chinese unknown word identification. In this work, Chinese unknown word identification is viewed as a classification problem and the part-of-speech of each unknown word is defined as its class. Furthermore, three types of features, including contextual class feature, word juncture model and word formation patterns, are combined in a framework of class-based LM to perform correct unknown word identification on a sequence of known words. In addition to unknown word identification, the class-based LM approach also provides a solution for unknown word tagging. The results of our experiments show that most unknown words in Chinese texts can be resolved effectively by the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wu, A., Jiang, Z.: Statistically-enhanced new word identification in a rule-based Chinese system. In: Proceedings of the Second Chinese Language Processing Workshop, Hong Kong, pp. 46–51 (2000)

    Google Scholar 

  2. Zhang, H.-P., Liu, Q., Zhang, H., Cheng, X.-Q.: Automatic recognition of Chinese unknown words based on roles tagging. In: Proceedings of The First SIGHAN Workshop on Chinese Language Processing, Taiwan, pp. 71–77 (2002)

    Google Scholar 

  3. Xue, N., Converse, S.P.: Combining classifier for Chinese word segmentation. In: Proceedings of the First SIGHAN Workshop on Chinese Language Processing, Taiwan, pp. 57–63 (2002)

    Google Scholar 

  4. Wang, X., Guohong, F., Yeung, D.S., Liu, J.N.K., Luk, R.: Models and algorithms of Chinese word segmentation. In: Proceedings of the International Conference on Artificial Intelligence, Las Vegas, pp. 1279–1284 (2000)

    Google Scholar 

  5. Fu, G., Luke, K.-K.: An integrated approach to Chinese word segmentation. Journal of Chinese Language and Computing 13(3), 249–260 (2003)

    Google Scholar 

  6. Sun, J., Zhou, M., Gao, J.: A class-based language model approach to Chinese named entity identification. Computational Linguistics and Chinese Language Processing 8(2), 1–28 (2003)

    MATH  Google Scholar 

  7. Yu, S., Duan, H., Zhu, S., Swen, B., Chang, B.: Specification for corpus processing at Peking University: Word segmentation, POS tagging and phonetic notation. Journal of Chinese Language and Computing 13(2), 121–158 (2003)

    Google Scholar 

  8. Sproat, R., Emerson, T.: The first international Chinese word segmentation bakeoff. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, pp. 133–143 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fu, G., Luke, KK. (2005). Chinese Unknown Word Identification Using Class-Based LM. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_74

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_74

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics