ABSTRACT
Hierarchical classification is a challenging problem yet bears a broad application in real-world tasks. Item categorization in the ecommerce domain is such an example. In a large-scale industrial setting such as eBay, a vast amount of items need to be categorized into a large number of leaf categories, on top of which a complex topic hierarchy is defined. Other than the scale challenges, item data is extremely sparse and skewed distributed over categories, and exhibits heterogeneous characteristics across categories. A common strategy for hierarchical classification is the "gates-and-experts" methods, where a high-level classification is made first (the gates), followed by a low-level distinction (the experts). In this paper, we propose to leverage domain-specific feature generation and modeling techniques to greatly enhance the classification accuracy of the experts. In particular, we innovatively derive features to encode various rich domain knowledge and linguistic hints, and then adapt a SVM-based model to distinguish several very confusing category groups appeared as the performance bottleneck of a currently deployed live system at eBay. We use illustrative examples and empirical results to demonstrate the effectiveness of our approach, particularly the merit of smartly designed domain-specific features.
- L. Cai and T. Hofmann. Hierarchical document categorization with support vector machines. In Proc. of the 13th ACM International Conference on Information and Knowledge Management(CIKM), pages 78--87, 2004. Google ScholarDigital Library
- O. Dekel, J. Keshet, and Y. Singer. Large margin hierarchical classification. In Proc. of the 21st International Conference on Machine Learning(ICML), pages 27--34, 2004. Google ScholarDigital Library
- S. T. Dumais and H. Chen. Hierarchical classification of web content. In Proc. of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 256--263, 2000. Google ScholarDigital Library
- R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. Liblinear: A library for large linear classification. Journal of Machine Learning Research, pages 1871--1874, 2008. Google ScholarDigital Library
- D. Koller and M. Sahami. Hierarchically classifying docuemnts using very few words. In Proc. of the 14th International Conference on Machine Learning(ICML), pages 171--178, 1997. Google ScholarDigital Library
- F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34:1--47, 2002. Google ScholarDigital Library
- D. Shen, M. Somaiya, J. D. Ruvini, and N. Sundaresan. Large-scale hierarchical item categorization for e-commerce. eBay Research Labs Technical Report, 2011.Google Scholar
- A. S.Weigend, E. D.Wiener, and J. O. Pedersen. Exploiting hierarchy in text categorization. Information Retrieval, pages 193--216, 1999. Google ScholarDigital Library
- V. N. Vapnik. Statistical learning theory, 1998.Google Scholar
- K. Weinberger and O. Chapelle. Large margin taxonomy embedding with an application to document categorization. Advances in Neural Information Processing Systems, pages 1737--1744, 2008.Google Scholar
- Y. Yang and X. Liu. A re-examination of text categorization methods. In Proc. of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 42--49, 1999. Google ScholarDigital Library
Index Terms
- Item categorization in the e-commerce domain
Recommendations
E-Commerce Product Categorization via Machine Translation
Special Section on WITS 2018 and Regular ArticlesE-commerce platforms categorize their products into a multi-level taxonomy tree with thousands of leaf categories. Conventional methods for product categorization are typically based on machine learning classification algorithms. These algorithms take ...
Fine-Grained Product Categorization in E-commerce
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge ManagementE-commerce sites usually leverage taxonomies for better organizing products. The fine-grained categories, regarding the leaf categories in taxonomies, are defined by the most descriptive and specific words of products. Fine-grained product ...
Large-scale item categorization for e-commerce
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementThis paper studies the problem of leveraging computationally intensive classification algorithms for large scale text categorization problems. We propose a hierarchical approach which decomposes the classification problem into a coarse level task and a ...
Comments