ABSTRACT
Product matching is a challenging variation of entity resolution to identify representations and offers referring to the same product. Product matching is highly difficult due to the broad spectrum of products, many similar but different products, frequently missing or wrong values, and the textual nature of product titles and descriptions. We propose the use of tailored approaches for product matching based on a preprocessing of product offers to extract and clean new attributes usable for matching. In particular, we propose a new approach to extract and use so-called product codes to identify products and distinguish them from similar product variations. We evaluate the effectiveness of the proposed approaches with challenging real-life datasets with product offers from online shops. We also show that the UPC information in product offers is often error-prone and can lead to insufficient match decisions.
- S. Bergamaschi, F. Guerra, and M. Vincini. A data integration framework for e-commerce product classification. ISWC, 2002. Google ScholarDigital Library
- W. W. Cohen, P. D. Ravikumar, and S. E. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWeb, 2003.Google ScholarDigital Library
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1), 2007. Google ScholarDigital Library
- R. Ghani, K. Probst, Y. Liu, M. Krema, and A. Fano. Text mining for product attribute extraction. ACM SIGKDD Explorations Newsletter, 8(1):41--48, 2006. Google ScholarDigital Library
- A. Kannan, I. E. Givoni, R. Agrawal, and A. Fuxman. Matching unstructured product offers to structured product specifications. In Proc. KDD Conf., 2011. Google ScholarDigital Library
- Y. Kim, T. Lee, J. Chun, and S. Lee. Modified Naïve Bayes Classifier for E-Catalog Classification. Data Engineering Issues in E-Commerce and Services, 2006. Google ScholarDigital Library
- H. Köpcke and E. Rahm. Frameworks for entity matching: A comparison. Data Knowl. Eng., 69(2), 2010. Google ScholarDigital Library
- H. Köpcke, A. Thor, and E. Rahm. Evaluation of entity resolution approaches on real-world match problems. PVLDB, 3(1), 2010. Google ScholarDigital Library
- H. Köpcke, A. Thor, and E. Rahm. Learning-based approaches for matching web data entities. IEEE Internet Computing, 99, 2010. Google ScholarDigital Library
- H. Nguyen, A. Fuxman, S. Paparizos, J. Freire, and R. Agrawal. Synthesizing products for online catalogs. PVLDB, 4(7), 2011. Google ScholarDigital Library
- Tailoring entity resolution for matching product offers
Recommendations
Tailoring dynamic software product lines
GPCE '11: Proceedings of the 10th ACM international conference on Generative programming and component engineeringSoftware product lines (SPLs) and adaptive systems aim at variability to cope with changing requirements. Variability can be described in terms of features, which are central for development and configuration of SPLs. In traditional SPLs, features are ...
Tailoring dynamic software product lines
GCPE '11Software product lines (SPLs) and adaptive systems aim at variability to cope with changing requirements. Variability can be described in terms of features, which are central for development and configuration of SPLs. In traditional SPLs, features are ...
Name resolution strategies in variability realization languages for software product lines
FOSD 2016: Proceedings of the 7th International Workshop on Feature-Oriented Software DevelopmentSoftware Product Lines (SPLs) exploit reuse-in-the-large to enable customization by explicitly modeling commonalities and variabilities of closely related software systems. Different approaches exist to enable SPL development and product creation by ...
Comments