Skip to main content

Comparing Semantic Relatedness between Word Pairs in Portuguese Using Wikipedia

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8775))

Abstract

The growth of available data in digital format has been facilitating the development of new models to automatically infer the semantic similarity between word pairs. However, there are still many natural languages without sufficient resources to evaluate measures of semantic relatedness. In this paper we translated word pairs from a well-known baseline for evaluating semantic relatedness measures into Portuguese and performed a manual evaluation of each pair. We compared the correlation with similar datasets in other languages and generated LSA models from Wikipedia articles in order to verify the pertinence of each dataset and how semantic similarity conveys across languages.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, L., Chen, S.: A New Approach for Automatic Thesaurus Construction and Query Expansion for Document Retrieval. International Journal of Information and Management Sciences 18(4), 299–315 (2007)

    MATH  MathSciNet  Google Scholar 

  2. Di Marco, A., Navigli, R.: Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction. Computational Linguistics 39(3), 709–754 (2013)

    Article  Google Scholar 

  3. Erk, K.: Vector Space Models of Word Meaning and Phrase Meaning: A Survey. Language and Linguistics Compass 6(10), 635–653 (2012)

    Article  Google Scholar 

  4. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems 20(1), 116–131 (2002)

    Article  Google Scholar 

  5. Granada, R.L., Vieira, R., Strube de Lima, V.L.: Evaluating co-occurrence order for automatic thesaurus construction. In: IEEE 13th International Conference on Information Reuse and Integration (IRI), pp. 474–481 (2012)

    Google Scholar 

  6. Harris, Z.S.: Distributional structure. Words 10(23), 146–162 (1954)

    Google Scholar 

  7. Hassan, S., Mihalcea, R.: Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge. In: EMNLP 2009, pp. 1192–1201. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  8. Iosif, E., Potamianos, A.: Similarity computation using semantic networks created from web-harvested data. Natural Language Engineering, 1–31 (2014)

    Google Scholar 

  9. Joubarne, C., Inkpen, D.: Comparison of Semantic Similarity for Different Languages Using the Google N-gram Corpus and Second-Order Co-occurrence Measures. In: Butz, C., Lingras, P. (eds.) Canadian AI 2011. LNCS (LNAI), vol. 6657, pp. 216–221. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211–240 (1997)

    Article  Google Scholar 

  11. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language & Cognitive Processes 6(1), 1–28 (1991)

    Article  Google Scholar 

  12. Rehurek, R., Sojka, P.: Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta (2010)

    Google Scholar 

  13. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  14. Utsumi, A.: A semantic space approach to the computational semantics of noun compounds. Natural Language Engineering 20(2), 185–234 (2014)

    Article  Google Scholar 

  15. Yang, D., Powers, D.M.W.: Automatic thesaurus construction. In: 31st Australasian conference on Computer science – ACSC 2008, pp. 147–156. Australian Computer Society, Inc., Darlinghurst (2008)

    Google Scholar 

  16. Zhu, Z., Li, M., Chen, L., Yang, Z.: Building Comparable Corpora Based on Bilingual LDA Model. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 278–282. Association for Computational Linguistics, Sofia (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Granada, R., Trojahn, C., Vieira, R. (2014). Comparing Semantic Relatedness between Word Pairs in Portuguese Using Wikipedia. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09761-9_17

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09760-2

  • Online ISBN: 978-3-319-09761-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics