Abstract
Previous researches on semantic similarity calculating have been mainly focused on documents, sentences or concepts. In this paper, we study the semantic similarity of words and compositional phrases. The task is to judge the semantic similarity of a word and a short sequence of words. Based on structured resource (WordNet), semi-structured resource (Wikipedia) and unstructured resource (Web), this paper extracts rich effective features to represent the word-phrase pair. The task can be treated as a binary classification problem and we employ Support Vector Machine to estimate whether the word and phrase is similar given a word-phrase pair. Experiments are conducted on SemEval 2013 Task5a. Our method achieves 82.9% in accuracy, and outperforms the best system (80.3%) that participates in the task. Experimental results demonstrate the effectiveness of our proposed approach.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37(1), 141–188 (2010)
Wittgenstein, L.: Philosophical Investigations. Blackwell. Translated by Anscombe, G.E.M. (1953)
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
Weaver, W.: Translation. In: Locke, W., Booth, D. (eds.) Machine Translation of Languages: Fourteen Essays. MIT Press, Cambridge (1955)
Firth, J.R.: A synopsis of linguistic theory 1930-1955. In: Studies in Linguistic Analysis, pp. 1–32. Blackwell, Oxford (1957)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science (JASIS) 41(6), 391–407 (1990)
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Han, E.-H(S.), Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCC (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis. In: IJCAI, vol. 7, pp. 1606–1611 (2007)
Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34(2), 443 (2009)
Strube, M., Ponzetto, S.P.: WikiRelate! Computing semantic relatedness using Wikipedia. In: AAAI, vol. 6, pp. 1419–1424 (2006)
Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30. AAAI Press, Chicago (2008)
Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. arXiv preprint arXiv:1105.5444 (2011)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database 49(2), 265–283 (1998)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007 (1995)
Lin, D.: An information-theoretic definition of similarity. In: ICML, vol. 98, pp. 296–304 (1998)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008 (1997)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Firth, J.R.: A Synopsis of Linguistic Theory 1930-1955. In: Studies in Linguistic Analysis, pp. 1–32. Philological Society, Oxford (1957), Reprinted in Palmer, F.R. (ed.): Selected Papers of J.R. Firth 1952-1959. Longman, London (1968)
Turney, P.: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL (2001)
Chen, H.H., Lin, M.S., Wei, Y.C.: Novel association measures using web search with double checking. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–1016. Association for Computational Linguistics (2006)
Lu, G., Huang, P., He, L., et al.: A new semantic similarity measuring method based on web search engines. WSEAS Transactions on Computers 9(1), 1–10 (2010)
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: WWW, vol. 7, pp. 757–766 (2007)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: AAAI, vol. 6, pp. 775–780 (2006)
Bar, D., Biemann, C., Gurevych, I., Zesch, T.: Ukp: Computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics-vol. 1: Proceedings of the Main Conference and the Shared Task, and vol. 2: Proceedings of the Sixth International Workshop on Semantic Evaluation, pp. 435–440. Association for Computational Linguistics (2012)
Ng, A.: Regularization and model selection, CS 229 Machine Learning Course Materials, pp. 4–5
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jin, X., Sun, C., Lin, L., Wang, X. (2014). Exploiting Multiple Resources for Word-Phrase Semantic Similarity Evaluation. In: Sun, M., Liu, Y., Zhao, J. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2014 2014. Lecture Notes in Computer Science(), vol 8801. Springer, Cham. https://doi.org/10.1007/978-3-319-12277-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-12277-9_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12276-2
Online ISBN: 978-3-319-12277-9
eBook Packages: Computer ScienceComputer Science (R0)