Abstract
The growth of available data in digital format has been facilitating the development of new models to automatically infer the semantic similarity between word pairs. However, there are still many natural languages without sufficient resources to evaluate measures of semantic relatedness. In this paper we translated word pairs from a well-known baseline for evaluating semantic relatedness measures into Portuguese and performed a manual evaluation of each pair. We compared the correlation with similar datasets in other languages and generated LSA models from Wikipedia articles in order to verify the pertinence of each dataset and how semantic similarity conveys across languages.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Chen, L., Chen, S.: A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval. International Journal of Information and Management Sciences 18(4), 299–315 (2007)
Di Marco, A., Navigli, R.: Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction. Computational Linguistics 39(3), 709–754 (2013)
Erk, K.: Vector Space Models of Word Meaning and Phrase Meaning: A Survey. Language and Linguistics Compass 6(10), 635–653 (2012)
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)
Granada, R.L., Vieira, R., Strube de Lima, V.L.: Evaluating co-occurrence order for automatic thesaurus construction. In: IEEE 13th International Conference on Information Reuse and Integration (IRI), pp. 474–481 (2012)
Harris, Z.S.: Distributional structure. Words 10(23), 146–162 (1954)
Hassan, S., Mihalcea, R.: Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge. In: EMNLP 2009, pp. 1192–1201. Association for Computational Linguistics, Stroudsburg (2009)
Iosif, E., Potamianos, A.: Similarity computation using semantic networks created from web-harvested data. Natural Language Engineering, 1–31 (2014)
Joubarne, C., Inkpen, D.: Comparison of Semantic Similarity for Different Languages Using the Google N-gram Corpus and Second-Order Co-occurrence Measures. In: Butz, C., Lingras, P. (eds.) Canadian AI 2011. LNCS (LNAI), vol. 6657, pp. 216–221. Springer, Heidelberg (2011)
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language & Cognitive Processes 6(1), 1–28 (1991)
Rehurek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta (2010)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)
Utsumi, A.: A semantic space approach to the computational semantics of noun compounds. Natural Language Engineering 20(2), 185–234 (2014)
Yang, D., Powers, D.M.W.: Automatic thesaurus construction. In: 31st Australasian conference on Computer science – ACSC 2008, pp. 147–156. Australian Computer Society, Inc., Darlinghurst (2008)
Zhu, Z., Li, M., Chen, L., Yang, Z.: Building Comparable Corpora Based on Bilingual LDA Model. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 278–282. Association for Computational Linguistics, Sofia (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Granada, R., Trojahn, C., Vieira, R. (2014). Comparing Semantic Relatedness between Word Pairs in Portuguese Using Wikipedia. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-09761-9_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)