Abstract
Large corpus-based embedding methods have received increasing attention for their flexibility and effectiveness in many NLP tasks including Word Similarity (WS). However, these approaches rely on high-quality corpora and neglect the human’s intelligence contained in semantic resources such as Tongyici Cilin and Hownet. This paper proposes a novel framework for measuring the Chinese word similarity by combining word embedding and Tongyici Cilin. We also utilize retrieval techniques to extend the contexts of word pairs and calculate the similarity scores to weakly supervise the selection of a better result. In the Chinese Lexical Similarity Computation (CLSC) shared task, we rank No. 2 with the result of 0.457/0.455 of Spearman/Pearson rank correlation coefficient. After the submission, we boost the embedding model by merging an English model into the Chinese one and learning the co-occurrence sequence via LSTM networks. Our final results are 0.541/0.514, which outperform the state-of-the-art performance to the best of our knowledge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tian, J.L., Zhao, W.: Words similarity algorithm based on Tongyici Cilin in semantic web adaptive learning system. J. Jilin Univ. (Inf. Sci. Edn.) 28(6), 602–608 (2010)
Zhao, J., Hu, S.Z., Fan, X.H.: Word similarity computation based on word link distribution. J. Chongqing Univ. Posts Telecommun. (Nat. Sci. Edn.) 4, 021 (2009)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013a)
Mikolov, T., Sutskever, I., et al.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS, pp. 3111–3119 (2013b)
Levy, O., Goldberg, Y.: Neural word embedding as implicit matrix factorization. In: Advances in Neural Information Processing Systems, pp. 2177–2185 (2014)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543 (2014)
Mei, J.J., Zhu, Y.M., et al.: Tongyici Cilin. Shanghai Lexicon Publishing Company, Shanghai (1983)
Dong, Z., Dong, Q.: HowNet and the Computation of Meaning, pp. 85–95. World Scientific, Singapore (2006)
Liu, Q., Li, S.: Word similarity computing based on How-Net. Comput. Linguist. Chin. Lang. Process. 7(2), 59–76 (2002)
Wu, S.Y., Wu, Y.Y.: Chinese and English word similarity measure based on Chinese WordNet. J. Zhengzhou Univ. (Nat. Sci. Edn.) 42(2), 66–69 (2010)
Ahsaee, M.G., Naghibzadeh, M., Naeini, S.E.Y.: Semantic similarity assessment of words using weighted WordNet. Int. J. Mach. Learn. Cybernet. 5(3), 479–490 (2014)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)
Hill, F., Reichart, R., Korhonen, A.: Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput. Linguist. 41(4), 665–695 (2015)
Iacobacci, I., Pilehvar, M.T., Navigli, R.: SensEmbed: learning sense embeddings for word and relational similarity. In: Proceedings of ACL, pp. 95–105 (2015)
Mrkšić, N., Séaghdha, D.Ó, et al.: Counter-fitting word vectors to linguistic constraints (2016). arXiv preprint arXiv:1603.00892
Nguyen, K.A., Walde, S.S.I., Vu, N.T.: Integrating distributional lexical contrast into word embeddings for antonym-synonym distinction (2016). arXiv preprint arXiv:1605.07766
Chen, Z., Lin, W., et al.: Revisiting word embedding for contrasting meaning. In: Proceedings of ACL, pp. 106–115 (2015)
Rothe, S., Schütze, H.: AutoExtend: extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of ACL-IJNLP, pp. 1793–1803 (2015)
Faruqui, M., Dodge, J., et al.: Retrofitting word vectors to semantic lexicons. In: Proceedings of NAACL (2015)
Wu, Y.F., Li, W.: NLPCC-ICCPOL 2016 shared task 3: Chinese word similarity measurement. In: Proceddings of NLPCC 2016 (2016)
Yu, M., Dredze, M.: Improving lexical embeddings with semantic knowledge. In: Proceedings of ACL, pp. 545–550 (2014)
Ono, M., Miwa, M., Sasaki, Y.: Word embedding-based antonym detection using thesauri and distributional information. In: Proceedings of NAACL (2015)
Liu, Q., Jiang, H., et al.: Learning semantic word embeddings based on ordinal knowledge constraints. In: Proceedings of ACL-IJCNLP, pp. 1501–1511 (2015)
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: WWW, vol. 7, pp. 757–766 (2007)
Acknowledgements
This research is supported by National Natural Science Foundation of China (Nos. 61672127, 61173100) and National Social Science Foundation of China (No. 15BYY175). We also wish to thank NVIDIA Corporation for their donation of Tesla K40c GPU device.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Pei, J., Zhang, C., Huang, D., Ma, J. (2016). Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_69
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_69
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)