skip to main content
10.1145/2247596.2247662acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Tailoring entity resolution for matching product offers

Published:27 March 2012Publication History

ABSTRACT

Product matching is a challenging variation of entity resolution to identify representations and offers referring to the same product. Product matching is highly difficult due to the broad spectrum of products, many similar but different products, frequently missing or wrong values, and the textual nature of product titles and descriptions. We propose the use of tailored approaches for product matching based on a preprocessing of product offers to extract and clean new attributes usable for matching. In particular, we propose a new approach to extract and use so-called product codes to identify products and distinguish them from similar product variations. We evaluate the effectiveness of the proposed approaches with challenging real-life datasets with product offers from online shops. We also show that the UPC information in product offers is often error-prone and can lead to insufficient match decisions.

References

  1. S. Bergamaschi, F. Guerra, and M. Vincini. A data integration framework for e-commerce product classification. ISWC, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. W. Cohen, P. D. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWeb, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Ghani, K. Probst, Y. Liu, M. Krema, and A. Fano. Text mining for product attribute extraction. ACM SIGKDD Explorations Newsletter, 8(1):41--48, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Kannan, I. E. Givoni, R. Agrawal, and A. Fuxman. Matching unstructured product offers to structured product specifications. In Proc. KDD Conf., 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Kim, T. Lee, J. Chun, and S. Lee. Modified Naïve Bayes Classifier for E-Catalog Classification. Data Engineering Issues in E-Commerce and Services, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Köpcke and E. Rahm. Frameworks for entity matching: A comparison. Data Knowl. Eng., 69(2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Köpcke, A. Thor, and E. Rahm. Evaluation of entity resolution approaches on real-world match problems. PVLDB, 3(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Köpcke, A. Thor, and E. Rahm. Learning-based approaches for matching web data entities. IEEE Internet Computing, 99, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Nguyen, A. Fuxman, S. Paparizos, J. Freire, and R. Agrawal. Synthesizing products for online catalogs. PVLDB, 4(7), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Tailoring entity resolution for matching product offers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology
      March 2012
      643 pages
      ISBN:9781450307901
      DOI:10.1145/2247596

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 March 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate7of10submissions,70%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader