skip to main content
10.1145/2600428.2609543acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
poster

Learning to bridge colloquial and formal language applied to linking and search of E-Commerce data

Published:03 July 2014Publication History

ABSTRACT

We study the problem of linking information between different idiomatic usages of the same language, for example, colloquial and formal language. We propose a novel probabilistic topic model called multi-idiomatic LDA (MiLDA). Its modeling principles follow the intuition that certain words are shared between two idioms of the same language, while other words are non-shared, that is, idiom-specific. We demonstrate the ability of our model to learn relations between cross-idiomatic topics in a dataset containing product descriptions and reviews. We intrinsically evaluate our model by the perplexity measure. Following that, as an extrinsic evaluation, we present the utility of the new MiLDA topic model in a recently proposed IR task of linking Pinterest pins (given in colloquial English on the users' side) to online webshops (given in formal English on the retailers' side). We show that our multi-idiomatic model outperforms the standard monolingual LDA model and the pure bilingual LDA model both in terms of perplexity and MAP scores in the IR task.

References

  1. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. De Smet and M.-F. Moens. Cross-language linking of news stories on the web using interlingual topic modelling. In Proc. of the CIKM SWSM Workshop, pages 57--64, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Mimno, H. Wallach, J. Naradowsky, D. A. Smith, and A. McCallum. Polylingual topic models. In EMNLP, pages 880--889, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. X. Ni, J.-T. Sun, J. Hu, and Z. Chen. Mining multilingual topics from Wikipedia. In WWW, pages 1155--1156, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Steyvers and T. Griffiths. Probabilistic topic models. Handbook of Latent Semantic Analysis, 427(7):424--440, 2007.Google ScholarGoogle Scholar
  6. X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Zoghbi, I. Vuli--c, and M.-F. Moens. Are words enough?: A study on text-based representations and retrieval models for linking pins to online shops. In CIKM UnstructureNLP Workshop, pages 45--52, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning to bridge colloquial and formal language applied to linking and search of E-Commerce data

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
          July 2014
          1330 pages
          ISBN:9781450322577
          DOI:10.1145/2600428

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 July 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          SIGIR '14 Paper Acceptance Rate82of387submissions,21%Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader