Skip to main content

Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation

  • Conference paper
  • First Online:
Book cover Natural Language Understanding and Intelligent Applications (ICCPOL 2016, NLPCC 2016)

Abstract

Large corpus-based embedding methods have received increasing attention for their flexibility and effectiveness in many NLP tasks including Word Similarity (WS). However, these approaches rely on high-quality corpora and neglect the human’s intelligence contained in semantic resources such as Tongyici Cilin and Hownet. This paper proposes a novel framework for measuring the Chinese word similarity by combining word embedding and Tongyici Cilin. We also utilize retrieval techniques to extend the contexts of word pairs and calculate the similarity scores to weakly supervise the selection of a better result. In the Chinese Lexical Similarity Computation (CLSC) shared task, we rank No. 2 with the result of 0.457/0.455 of Spearman/Pearson rank correlation coefficient. After the submission, we boost the embedding model by merging an English model into the Chinese one and learning the co-occurrence sequence via LSTM networks. Our final results are 0.541/0.514, which outperform the state-of-the-art performance to the best of our knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tian, J.L., Zhao, W.: Words similarity algorithm based on Tongyici Cilin in semantic web adaptive learning system. J. Jilin Univ. (Inf. Sci. Edn.) 28(6), 602–608 (2010)

    Google Scholar 

  2. Zhao, J., Hu, S.Z., Fan, X.H.: Word similarity computation based on word link distribution. J. Chongqing Univ. Posts Telecommun. (Nat. Sci. Edn.) 4, 021 (2009)

    Google Scholar 

  3. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013a)

    Google Scholar 

  4. Mikolov, T., Sutskever, I., et al.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS, pp. 3111–3119 (2013b)

    Google Scholar 

  5. Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems, pp. 2177–2185 (2014)

    Google Scholar 

  6. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)

    Google Scholar 

  7. Mei, J.J., Zhu, Y.M., et al.: Tongyici Cilin. Shanghai Lexicon Publishing Company, Shanghai (1983)

    Google Scholar 

  8. Dong, Z., Dong, Q.: HowNet and the Computation of Meaning, pp. 85–95. World Scientific, Singapore (2006)

    Google Scholar 

  9. Liu, Q., Li, S.: Word similarity computing based on How-Net. Comput. Linguist. Chin. Lang. Process. 7(2), 59–76 (2002)

    Google Scholar 

  10. Wu, S.Y., Wu, Y.Y.: Chinese and English word similarity measure based on Chinese WordNet. J. Zhengzhou Univ. (Nat. Sci. Edn.) 42(2), 66–69 (2010)

    Google Scholar 

  11. Ahsaee, M.G., Naghibzadeh, M., Naeini, S.E.Y.: Semantic similarity assessment of words using weighted WordNet. Int. J. Mach. Learn. Cybernet. 5(3), 479–490 (2014)

    Article  Google Scholar 

  12. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)

    MathSciNet  MATH  Google Scholar 

  13. Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)

    Article  MathSciNet  Google Scholar 

  14. Iacobacci, I., Pilehvar, M.T., Navigli, R.: SensEmbed: learning sense embeddings for word and relational similarity. In: Proceedings of ACL, pp. 95–105 (2015)

    Google Scholar 

  15. Mrkšić, N., Séaghdha, D.Ó, et al.: Counter-fitting word vectors to linguistic constraints (2016). arXiv preprint arXiv:1603.00892

  16. Nguyen, K.A., Walde, S.S.I., Vu, N.T.: Integrating distributional lexical contrast into word embeddings for antonym-synonym distinction (2016). arXiv preprint arXiv:1605.07766

  17. Chen, Z., Lin, W., et al.: Revisiting word embedding for contrasting meaning. In: Proceedings of ACL, pp. 106–115 (2015)

    Google Scholar 

  18. Rothe, S., Schütze, H.: AutoExtend: extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of ACL-IJNLP, pp. 1793–1803 (2015)

    Google Scholar 

  19. Faruqui, M., Dodge, J., et al.: Retrofitting word vectors to semantic lexicons. In: Proceedings of NAACL (2015)

    Google Scholar 

  20. Wu, Y.F., Li, W.: NLPCC-ICCPOL 2016 shared task 3: Chinese word similarity measurement. In: Proceddings of NLPCC 2016 (2016)

    Google Scholar 

  21. Yu, M., Dredze, M.: Improving lexical embeddings with semantic knowledge. In: Proceedings of ACL, pp. 545–550 (2014)

    Google Scholar 

  22. Ono, M., Miwa, M., Sasaki, Y.: Word embedding-based antonym detection using thesauri and distributional information. In: Proceedings of NAACL (2015)

    Google Scholar 

  23. Liu, Q., Jiang, H., et al.: Learning semantic word embeddings based on ordinal knowledge constraints. In: Proceedings of ACL-IJCNLP, pp. 1501–1511 (2015)

    Google Scholar 

  24. Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: WWW, vol. 7, pp. 757–766 (2007)

    Google Scholar 

Download references

Acknowledgements

This research is supported by National Natural Science Foundation of China (Nos. 61672127, 61173100) and National Social Science Foundation of China (No. 15BYY175). We also wish to thank NVIDIA Corporation for their donation of Tesla K40c GPU device.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Degen Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Pei, J., Zhang, C., Huang, D., Ma, J. (2016). Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50496-4_69

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50495-7

  • Online ISBN: 978-3-319-50496-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics