Identifying Facet Mismatches In Search Via Micrographs

Authors:
Sriram Srinivasan

University of California, Santa Cruz, Santa Cruz, CA, USA

University of California, Santa Cruz, Santa Cruz, CA, USA
View Profile

,
Nikhil S. Rao

Amazon Inc., Palo Alto, CA, USA

Amazon Inc., Palo Alto, CA, USA
View Profile

,
Karthik Subbian

Amazon Inc., Palo Alto, CA, USA

Amazon Inc., Palo Alto, CA, USA
View Profile

,
Lise Getoor

University of California, Santa Cruz, Santa Cruz, CA, USA

University of California, Santa Cruz, Santa Cruz, CA, USA
View Profile

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge ManagementNovember 2019Pages 1663–1672https://doi.org/10.1145/3357384.3357911

Published:03 November 2019Publication History

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Pages 1663–1672

ABSTRACT

E-commerce search engines are the primary means by which customers shop for products online. Each customer query contains multiple facets such as product type, color, brand, etc. A successful search engine retrieves products that are relevant to the query along each of these attributes. However, due to lexical (erroneous title, description, etc.) and behavioral irregularities (clicks or purchases of products that do not belong to the same facet as the query), some mismatched products are often included in search results. These irregularities can be detected using simple binary classifiers like gradient boosted decision trees or logistic regression. Typically, these binary classifiers use strong independence assumptions between the results and ignore structural relationships available in the data, such as the connections between products and queries. In this paper, we use the connections that exist between products and query to identify a special kind of structure we refer to as a micrograph. Further, we make use of Statistical Relational Learning (SRL) to incorporate these micrographs in the data and pose the problem as a structured prediction problem. We refer to this approach as structured mismatch classification (\SMC). In addition, we show that naive addition of structure does not improve the performance of the model and hence introduce a variation of \SMC, strong \SMC~(\SSMC), which improves over the baseline by passing information from high-confidence predictions to lower confidence predictions. In our empirical evaluation we show that our proposed approach outperforms the baseline classification methods by up to 12% in precision. Furthermore, we use quasi-Newton methods to make our method viable for real-time inference in a search engine and show that our approach is up to 150 times faster than existing ADMM-based solvers.

References

Charu C. Aggarwal. 2014. Data Classification: Algorithms and Applications .Chapman & Hall/CRC.Google ScholarCross Ref
Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Christian Konig, and Dong Xin. 2009. Exploiting Web Search Engines to Search Structured Databases. In WWW .Google Scholar
Stephen H. Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. 2017. Hinge-Loss Markov Random Fields and Probabilistic Soft Logic. JMLR , Vol. 18 (2017), 109:1--109:67.Google Scholar
Stephen H. Bach, Bert Huang, Ben London, and Lise Getoor. 2013. Hinge-loss Markov Random Fields: Convex Inference for Structured Prediction. In UAI.Google Scholar
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics) .Springer-Verlag.Google ScholarDigital Library
Stephen P. Boyd, Neal Parikh, Eric Chu, Borja Peleato, and Jonathan Eckstein. 2011. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. FTML (2011).Google Scholar
Michael J. Cafarella, Michele Banko, and Oren Etzioni. 2006. Relational Web Search. In WWW .Google Scholar
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In KDD .Google Scholar
L. De Raedt, K. Kersting, S. Natarajan, and D. Poole. 2016. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation .Morgan & Claypool.Google Scholar
Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, Disha Makhija, and Mohit Kumar. 2017. ZooBP: Belief Propagation for Heterogeneous Networks. VLDB (2017).Google ScholarDigital Library
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A Library for Large Linear Classification. JMLR , Vol. 9 (2008), 1871--1874.Google ScholarDigital Library
Jerome H. Friedman. 2000. Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics (2000).Google Scholar
Lise Getoor and Ben Taskar. 2007. Introduction to statistical relational learning .The MIT Press.Google ScholarDigital Library
Siddharth Gopal and Yiming Yang. 2013. Distributed training of Large-scale Logistic models. In ICML .Google Scholar
Benjamin Haeffele, Eric Young, and Rene Vidal. 2014. Structured low-rank matrix factorization: Optimality, algorithm, and applications to image processing. In ICML .Google Scholar
Chih-Yang Hsia, Ya Zhu, and Chih-Jen Lin. 2017. A Study on Trust Region Update Rules in Newton Methods for Large-scale Linear Classification. In ACML .Google Scholar
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In NIPS .Google Scholar
Angelika Kimmig, Stephen H. Bach, Matthias Broecheler, Bert Huang, and Lise Getoor. 2012. A Short Introduction to Probabilistic Soft Logic. In NIPS Workshop on PP .Google Scholar
George J. Klir and Bo Yuan. 1995. Fuzzy Sets and Fuzzy Logic: Theory and Applications .Prentice-Hall, Inc.Google ScholarDigital Library
Xiangnan Kong, Philip S. Yu, Ying Ding, and David J. Wild. 2012. Meta Path-based Collective Classification in Heterogeneous Information Networks. In CIKM .Google Scholar
Arlind Kopliku, Karen Pinel-Sauvagnat, and Mohand Boughanem. 2014. Aggregated Search: A New Information Retrieval Paradigm. ACM CS , Vol. 46, 3 (2014), 41:1--41:31.Google ScholarDigital Library
Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki, and Lise Getoor. 2015. HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems. In RecSys. ACM, ACM.Google Scholar
Chih-Jen Lin and Jorge J. Moré. 1999. Newton's Method for Large Bound-Constrained Optimization Problems. SIAM J. on Optimization (1999).Google Scholar
Chih-Jen Lin, Ruby C Weng, and S Sathiya Keerthi. 2008. Trust region newton method for logistic regression. JMLR , Vol. 9, Apr (2008), 627--650.Google Scholar
Nickel Maxmilien, Murphy Kevin, Tresp Volker, and Gabrilovich Evgeniy. 2016. A Review of Relational Machine Learning for Knowledge Graphs. Proc. IEEE , Vol. 104, 1 (2016), 11--33.Google ScholarCross Ref
Rada F. Mihalcea and Dragomir R. Radev. 2011. Graph-based Natural Language Processing and Information Retrieval .Cambridge University Press.Google ScholarDigital Library
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In NIPS .Google Scholar
Houssam Nassif, Yirong Wu, David Page, and Elizabeth S. Burnside. 2012. Logical Differential Prediction Bayes Net, improving breast cancer diagnosis for older women. In AMIA.Google Scholar
Singla Parag and Domingos Pedros. 2006. Entity Resolution with Markov Logic. In ICDM .Google Scholar
Nikhil Rao, Hsiang-Fu Yu, Pradeep K Ravikumar, and Inderjit S Dhillon. 2015. Collaborative filtering with graph information: Consistency and scalable methods. In NIPS .Google Scholar
Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. 2008. Collective Classification in Network Data. AI Magazine , Vol. 29 (2008), 93--106.Google ScholarDigital Library
Dhanya Sridhar, Shobeir Fakhraei, and Lise Getoor. 2016. A probabilistic approach for collective similarity-based drug-drug interaction prediction. Bioinformatics , Vol. 32, 20 (2016), 3175--3182.Google ScholarCross Ref
Charles Sutton and Andrew McCallum. 2012. An Introduction to Conditional Random Fields. FTML , Vol. 4 (2012), 267--373.Google Scholar
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Peng Cui, P. Yu, and Yanfang Ye. 2019. Heterogeneous Graph Attention Network. In ICWC .Google Scholar
Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, and Wei-Ying Ma. 2005. Improving Web Search Results Using Affinity Graph. In SIGIR .Google Scholar
Yizhou Zhang, Yun Xiong, Xiangnan Kong, Shanshan Li, Jinhong Mi, and Yangyong Zhu. 2018. Deep Collective Classification in Heterogeneous Information Networks. In WWW .Google Scholar
Xiaojin Zhu, Andrew Goldberg, Jurgen Van Gael, and David Andrzejewski. 2007. Improving Diversity in Ranking using Absorbing Random Walks. In ACL .Google Scholar

Index Terms

Identifying Facet Mismatches In Search Via Micrographs
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
      2. Document filtering
  2. World Wide Web
    1. Web applications
      1. Electronic commerce
        Online shopping
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Machine learning theory
      1. Structured prediction

Recommendations

Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Web users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Read More
Dynamic Facet Ordering for Faceted Product Search Engines

Faceted browsing is widely used in Web shops and product comparison sites. In these cases, a fixed ordered list of facets is often employed. This approach suffers from two main issues. First, one needs to invest a significant amount of time to devise an ...
Read More
Identifying meaningful return information for XML keyword search
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Keyword search enables web users to easily access XML data without the need to learn a structured query language and to study possibly complex data schemas. Existing work has addressed the problem of selecting qualified data nodes that match keywords ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
General Chairs:
Wenwu Zhu
Tsinghua University, China
,
Dacheng Tao
University of Massachusetts, USA
,
Xueqi Cheng
Institute of Computing Technology, CAS, China
,
Program Chairs:
Peng Cui
Tsinghua University, China
,
Elke Rundensteiner
Worcester Polytechnic Institute, USA
,
David Carmel
Amazon Research, USA
,
Qi He
LinkedIn, USA
,
Jeffrey Xu Yu
Chinese University of Hong Kong, China
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
collective classification
search defect
statistical relational learning
structured prediction
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '19 Paper Acceptance Rate202of1,031submissions,20%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 308
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Identifying Facet Mismatches In Search Via Micrographs

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Identifying popular search goals behind search queries to improve web search ranking

Dynamic Facet Ordering for Faceted Product Search Engines

Identifying meaningful return information for XML keyword search