ABSTRACT
E-commerce search engines are the primary means by which customers shop for products online. Each customer query contains multiple facets such as product type, color, brand, etc. A successful search engine retrieves products that are relevant to the query along each of these attributes. However, due to lexical (erroneous title, description, etc.) and behavioral irregularities (clicks or purchases of products that do not belong to the same facet as the query), some mismatched products are often included in search results. These irregularities can be detected using simple binary classifiers like gradient boosted decision trees or logistic regression. Typically, these binary classifiers use strong independence assumptions between the results and ignore structural relationships available in the data, such as the connections between products and queries. In this paper, we use the connections that exist between products and query to identify a special kind of structure we refer to as a micrograph. Further, we make use of Statistical Relational Learning (SRL) to incorporate these micrographs in the data and pose the problem as a structured prediction problem. We refer to this approach as structured mismatch classification (\SMC). In addition, we show that naive addition of structure does not improve the performance of the model and hence introduce a variation of \SMC, strong \SMC~(\SSMC), which improves over the baseline by passing information from high-confidence predictions to lower confidence predictions. In our empirical evaluation we show that our proposed approach outperforms the baseline classification methods by up to 12% in precision. Furthermore, we use quasi-Newton methods to make our method viable for real-time inference in a search engine and show that our approach is up to 150 times faster than existing ADMM-based solvers.
- Charu C. Aggarwal. 2014. Data Classification: Algorithms and Applications .Chapman & Hall/CRC.Google ScholarCross Ref
- Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Christian Konig, and Dong Xin. 2009. Exploiting Web Search Engines to Search Structured Databases. In WWW .Google Scholar
- Stephen H. Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. 2017. Hinge-Loss Markov Random Fields and Probabilistic Soft Logic. JMLR , Vol. 18 (2017), 109:1--109:67.Google Scholar
- Stephen H. Bach, Bert Huang, Ben London, and Lise Getoor. 2013. Hinge-loss Markov Random Fields: Convex Inference for Structured Prediction. In UAI.Google Scholar
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics) .Springer-Verlag.Google ScholarDigital Library
- Stephen P. Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. FTML (2011).Google Scholar
- Michael J. Cafarella, Michele Banko, and Oren Etzioni. 2006. Relational Web Search. In WWW .Google Scholar
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In KDD .Google Scholar
- L. De Raedt, K. Kersting, S. Natarajan, and D. Poole. 2016. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation .Morgan & Claypool.Google Scholar
- Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, Disha Makhija, and Mohit Kumar. 2017. ZooBP: Belief Propagation for Heterogeneous Networks. VLDB (2017).Google ScholarDigital Library
- Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. JMLR , Vol. 9 (2008), 1871--1874.Google ScholarDigital Library
- Jerome H. Friedman. 2000. Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics (2000).Google Scholar
- Lise Getoor and Ben Taskar. 2007. Introduction to statistical relational learning .The MIT Press.Google ScholarDigital Library
- Siddharth Gopal and Yiming Yang. 2013. Distributed training of Large-scale Logistic models. In ICML .Google Scholar
- Benjamin Haeffele, Eric Young, and Rene Vidal. 2014. Structured low-rank matrix factorization: Optimality, algorithm, and applications to image processing. In ICML .Google Scholar
- Chih-Yang Hsia, Ya Zhu, and Chih-Jen Lin. 2017. A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification. In ACML .Google Scholar
- Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In NIPS .Google Scholar
- Angelika Kimmig, Stephen H. Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. 2012. A Short Introduction to Probabilistic Soft Logic. In NIPS Workshop on PP .Google Scholar
- George J. Klir and Bo Yuan. 1995. Fuzzy Sets and Fuzzy Logic: Theory and Applications .Prentice-Hall, Inc.Google ScholarDigital Library
- Xiangnan Kong, Philip S. Yu, Ying Ding, and David J. Wild. 2012. Meta Path-based Collective Classification in Heterogeneous Information Networks. In CIKM .Google Scholar
- Arlind Kopliku, Karen Pinel-Sauvagnat, and Mohand Boughanem. 2014. Aggregated Search: A New Information Retrieval Paradigm. ACM CS , Vol. 46, 3 (2014), 41:1--41:31.Google ScholarDigital Library
- Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki, and Lise Getoor. 2015. HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems. In RecSys. ACM, ACM.Google Scholar
- Chih-Jen Lin and Jorge J. Moré. 1999. Newton's Method for Large Bound-Constrained Optimization Problems. SIAM J. on Optimization (1999).Google Scholar
- Chih-Jen Lin, Ruby C Weng, and S Sathiya Keerthi. 2008. Trust region newton method for logistic regression. JMLR , Vol. 9, Apr (2008), 627--650.Google Scholar
- Nickel Maxmilien, Murphy Kevin, Tresp Volker, and Gabrilovich Evgeniy. 2016. A Review of Relational Machine Learning for Knowledge Graphs. Proc. IEEE , Vol. 104, 1 (2016), 11--33.Google ScholarCross Ref
- Rada F. Mihalcea and Dragomir R. Radev. 2011. Graph-based Natural Language Processing and Information Retrieval .Cambridge University Press.Google ScholarDigital Library
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS .Google Scholar
- Houssam Nassif, Yirong Wu, David Page, and Elizabeth S. Burnside. 2012. Logical Differential Prediction Bayes Net, improving breast cancer diagnosis for older women. In AMIA.Google Scholar
- Singla Parag and Domingos Pedros. 2006. Entity Resolution with Markov Logic. In ICDM .Google Scholar
- Nikhil Rao, Hsiang-Fu Yu, Pradeep K Ravikumar, and Inderjit S Dhillon. 2015. Collaborative filtering with graph information: Consistency and scalable methods. In NIPS .Google Scholar
- Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. 2008. Collective Classification in Network Data. AI Magazine , Vol. 29 (2008), 93--106.Google ScholarDigital Library
- Dhanya Sridhar, Shobeir Fakhraei, and Lise Getoor. 2016. A probabilistic approach for collective similarity-based drug-drug interaction prediction. Bioinformatics , Vol. 32, 20 (2016), 3175--3182.Google ScholarCross Ref
- Charles Sutton and Andrew McCallum. 2012. An Introduction to Conditional Random Fields. FTML , Vol. 4 (2012), 267--373.Google Scholar
- Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Peng Cui, P. Yu, and Yanfang Ye. 2019. Heterogeneous Graph Attention Network. In ICWC .Google Scholar
- Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, and Wei-Ying Ma. 2005. Improving Web Search Results Using Affinity Graph. In SIGIR .Google Scholar
- Yizhou Zhang, Yun Xiong, Xiangnan Kong, Shanshan Li, Jinhong Mi, and Yangyong Zhu. 2018. Deep Collective Classification in Heterogeneous Information Networks. In WWW .Google Scholar
- Xiaojin Zhu, Andrew Goldberg, Jurgen Van Gael, and David Andrzejewski. 2007. Improving Diversity in Ranking using Absorbing Random Walks. In ACL .Google Scholar
Index Terms
- Identifying Facet Mismatches In Search Via Micrographs
Recommendations
Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval TechnologyWeb users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Dynamic Facet Ordering for Faceted Product Search Engines
Faceted browsing is widely used in Web shops and product comparison sites. In these cases, a fixed ordered list of facets is often employed. This approach suffers from two main issues. First, one needs to invest a significant amount of time to devise an ...
Identifying meaningful return information for XML keyword search
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataKeyword search enables web users to easily access XML data without the need to learn a structured query language and to study possibly complex data schemas. Existing work has addressed the problem of selecting qualified data nodes that match keywords ...
Comments