Comparing Semantic Relatedness between Word Pairs in Portuguese Using Wikipedia

Granada, Roger; Trojahn, Cassia; Vieira, Renata

doi:10.1007/978-3-319-09761-9_17

Comparing Semantic Relatedness between Word Pairs in Portuguese Using Wikipedia

Roger Granada²⁵,
Cassia Trojahn²⁶ &
Renata Vieira²⁷

Conference paper

673 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8775))

Abstract

The growth of available data in digital format has been facilitating the development of new models to automatically infer the semantic similarity between word pairs. However, there are still many natural languages without sufficient resources to evaluate measures of semantic relatedness. In this paper we translated word pairs from a well-known baseline for evaluating semantic relatedness measures into Portuguese and performed a manual evaluation of each pair. We compared the correlation with similar datasets in other languages and generated LSA models from Wikipedia articles in order to verify the pertinence of each dataset and how semantic similarity conveys across languages.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, L., Chen, S.: A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval. International Journal of Information and Management Sciences 18(4), 299–315 (2007)
MATH MathSciNet Google Scholar
Di Marco, A., Navigli, R.: Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction. Computational Linguistics 39(3), 709–754 (2013)
Article Google Scholar
Erk, K.: Vector Space Models of Word Meaning and Phrase Meaning: A Survey. Language and Linguistics Compass 6(10), 635–653 (2012)
Article Google Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)
Article Google Scholar
Granada, R.L., Vieira, R., Strube de Lima, V.L.: Evaluating co-occurrence order for automatic thesaurus construction. In: IEEE 13th International Conference on Information Reuse and Integration (IRI), pp. 474–481 (2012)
Google Scholar
Harris, Z.S.: Distributional structure. Words 10(23), 146–162 (1954)
Google Scholar
Hassan, S., Mihalcea, R.: Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge. In: EMNLP 2009, pp. 1192–1201. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Iosif, E., Potamianos, A.: Similarity computation using semantic networks created from web-harvested data. Natural Language Engineering, 1–31 (2014)
Google Scholar
Joubarne, C., Inkpen, D.: Comparison of Semantic Similarity for Different Languages Using the Google N-gram Corpus and Second-Order Co-occurrence Measures. In: Butz, C., Lingras, P. (eds.) Canadian AI 2011. LNCS (LNAI), vol. 6657, pp. 216–221. Springer, Heidelberg (2011)
Chapter Google Scholar
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Article Google Scholar
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language & Cognitive Processes 6(1), 1–28 (1991)
Article Google Scholar
Rehurek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta (2010)
Google Scholar
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)
Article Google Scholar
Utsumi, A.: A semantic space approach to the computational semantics of noun compounds. Natural Language Engineering 20(2), 185–234 (2014)
Article Google Scholar
Yang, D., Powers, D.M.W.: Automatic thesaurus construction. In: 31st Australasian conference on Computer science – ACSC 2008, pp. 147–156. Australian Computer Society, Inc., Darlinghurst (2008)
Google Scholar
Zhu, Z., Li, M., Chen, L., Yang, Z.: Building Comparable Corpora Based on Bilingual LDA Model. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 278–282. Association for Computational Linguistics, Sofia (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

PUCRS & IRIT - Toulouse, France
Roger Granada
UTM & IRIT - Toulouse, France
Cassia Trojahn
PUCRS - Porto Alegre, Brazil
Renata Vieira

Authors

Roger Granada
View author publications
You can also search for this author in PubMed Google Scholar
Cassia Trojahn
View author publications
You can also search for this author in PubMed Google Scholar
Renata Vieira
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

FCHS, Universidade do Algarve, Campus de Gambelas,, 8005-139, Faro, Portugal
Jorge Baptista
INESC-ID Lisboa, Lisbon, Portugal
Nuno Mamede
IT-University of Coimbra, Coimbra, Portugal
Sara Candeias
USP-EACH, São Paulo-SP, Brazil
Ivandré Paraboni
USP-ICMC, Universidade de São Paulo, São Carlos, SP, Brazil
Thiago A. S. Pardo
SCC-ICMC, University of São Paulo, São Carlos, SP, Brazil
Maria das Graças Volpe Nunes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Granada, R., Trojahn, C., Vieira, R. (2014). Comparing Semantic Relatedness between Word Pairs in Portuguese Using Wikipedia. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-09761-9_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics