Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation

Pei, Jiahuan; Zhang, Cong; Huang, Degen; Ma, Jianjun

doi:10.1007/978-3-319-50496-4_69

Jiahuan Pei¹⁸,
Cong Zhang¹⁸,
Degen Huang¹⁸ &
…
Jianjun Ma¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10102))

Included in the following conference series:

4928 Accesses
9 Citations

Abstract

Large corpus-based embedding methods have received increasing attention for their flexibility and effectiveness in many NLP tasks including Word Similarity (WS). However, these approaches rely on high-quality corpora and neglect the human’s intelligence contained in semantic resources such as Tongyici Cilin and Hownet. This paper proposes a novel framework for measuring the Chinese word similarity by combining word embedding and Tongyici Cilin. We also utilize retrieval techniques to extend the contexts of word pairs and calculate the similarity scores to weakly supervise the selection of a better result. In the Chinese Lexical Similarity Computation (CLSC) shared task, we rank No. 2 with the result of 0.457/0.455 of Spearman/Pearson rank correlation coefficient. After the submission, we boost the embedding model by merging an English model into the Chinese one and learning the co-occurrence sequence via LSTM networks. Our final results are 0.541/0.514, which outperform the state-of-the-art performance to the best of our knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tian, J.L., Zhao, W.: Words similarity algorithm based on Tongyici Cilin in semantic web adaptive learning system. J. Jilin Univ. (Inf. Sci. Edn.) 28(6), 602–608 (2010)
Google Scholar
Zhao, J., Hu, S.Z., Fan, X.H.: Word similarity computation based on word link distribution. J. Chongqing Univ. Posts Telecommun. (Nat. Sci. Edn.) 4, 021 (2009)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013a)
Google Scholar
Mikolov, T., Sutskever, I., et al.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS, pp. 3111–3119 (2013b)
Google Scholar
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems, pp. 2177–2185 (2014)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)
Google Scholar
Mei, J.J., Zhu, Y.M., et al.: Tongyici Cilin. Shanghai Lexicon Publishing Company, Shanghai (1983)
Google Scholar
Dong, Z., Dong, Q.: HowNet and the Computation of Meaning, pp. 85–95. World Scientific, Singapore (2006)
Google Scholar
Liu, Q., Li, S.: Word similarity computing based on How-Net. Comput. Linguist. Chin. Lang. Process. 7(2), 59–76 (2002)
Google Scholar
Wu, S.Y., Wu, Y.Y.: Chinese and English word similarity measure based on Chinese WordNet. J. Zhengzhou Univ. (Nat. Sci. Edn.) 42(2), 66–69 (2010)
Google Scholar
Ahsaee, M.G., Naghibzadeh, M., Naeini, S.E.Y.: Semantic similarity assessment of words using weighted WordNet. Int. J. Mach. Learn. Cybernet. 5(3), 479–490 (2014)
Article Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)
MathSciNet MATH Google Scholar
Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)
Article MathSciNet Google Scholar
Iacobacci, I., Pilehvar, M.T., Navigli, R.: SensEmbed: learning sense embeddings for word and relational similarity. In: Proceedings of ACL, pp. 95–105 (2015)
Google Scholar
Mrkšić, N., Séaghdha, D.Ó, et al.: Counter-fitting word vectors to linguistic constraints (2016). arXiv preprint arXiv:1603.00892
Nguyen, K.A., Walde, S.S.I., Vu, N.T.: Integrating distributional lexical contrast into word embeddings for antonym-synonym distinction (2016). arXiv preprint arXiv:1605.07766
Chen, Z., Lin, W., et al.: Revisiting word embedding for contrasting meaning. In: Proceedings of ACL, pp. 106–115 (2015)
Google Scholar
Rothe, S., Schütze, H.: AutoExtend: extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of ACL-IJNLP, pp. 1793–1803 (2015)
Google Scholar
Faruqui, M., Dodge, J., et al.: Retrofitting word vectors to semantic lexicons. In: Proceedings of NAACL (2015)
Google Scholar
Wu, Y.F., Li, W.: NLPCC-ICCPOL 2016 shared task 3: Chinese word similarity measurement. In: Proceddings of NLPCC 2016 (2016)
Google Scholar
Yu, M., Dredze, M.: Improving lexical embeddings with semantic knowledge. In: Proceedings of ACL, pp. 545–550 (2014)
Google Scholar
Ono, M., Miwa, M., Sasaki, Y.: Word embedding-based antonym detection using thesauri and distributional information. In: Proceedings of NAACL (2015)
Google Scholar
Liu, Q., Jiang, H., et al.: Learning semantic word embeddings based on ordinal knowledge constraints. In: Proceedings of ACL-IJCNLP, pp. 1501–1511 (2015)
Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: WWW, vol. 7, pp. 757–766 (2007)
Google Scholar

Download references

Acknowledgements

This research is supported by National Natural Science Foundation of China (Nos. 61672127, 61173100) and National Social Science Foundation of China (No. 15BYY175). We also wish to thank NVIDIA Corporation for their donation of Tesla K40c GPU device.

Author information

Authors and Affiliations

School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
Jiahuan Pei, Cong Zhang & Degen Huang
School of Foreign Languages, Dalian University of Technology, Dalian, 116024, Liaoning, China
Jianjun Ma

Authors

Jiahuan Pei
View author publications
You can also search for this author in PubMed Google Scholar
Cong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Degen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jianjun Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Degen Huang .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Chin-Yew Lin
Brandeis University, Waltham, Massachusetts, USA
Nianwen Xue
Peking University, Beijing, China
Dongyan Zhao
Fudan University, Shanghai, China
Xuanjing Huang
Peking University, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pei, J., Zhang, C., Huang, D., Ma, J. (2016). Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_69

Download citation

DOI: https://doi.org/10.1007/978-3-319-50496-4_69
Published: 02 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics