header

Discovering and Understanding Word Level User Intent in Web Search Queries

19 Pages Posted: 25 Jun 2018 Publication Status: Accepted

See all articles by Rishiraj Saha Roy

Rishiraj Saha Roy

Indian Institute of Technology (IIT), Kharagpur - Department of Computer Science and Engineering

Rahul Katare

Indian Institute of Technology (IIT), Kharagpur - Department of Computer Science and Engineering

Niloy Ganguly

Indian Institute of Technology (IIT), Kharagpur

Srivatsan Laxman

Scibler Technologies Private Limited

Monojit Choudhury

Indian Institute of Technology

Abstract

Identifying and interpreting user intent are fundamental to semantic  search. In this paper, we investigate the association of intent with individual words of a search query. We propose that words in queries can be classified as either content or intent, where content words represent the central topic of the query, while users add intent words to make their requirements more explicit. We argue that intelligent processing of intent words can be vital to improving result quality, and in this work we focus on intent word discovery and understanding. Our approach towards intent word detection is motivated by the hypotheses that query intent words satisfy certain distributional properties in large query logs similar to function words in natural language corpora. Following this idea, we first prove the effectiveness of our corpus distributional features, namely, word co-occurrence counts and entropies, towards function word detection for five natural languages. Next, we show that reliable detection of intent words in queries is possible using these same features computed from query logs. To make the distinction between content and intent words more tangible, we additionally provide operational definitions of content and intent words as those words that should match, and those that need not match, respectively, in the text of relevant documents. In addition to a standard evaluation against human annotations, we also provide an alternative validation of our ideas using click through data. Concordance of the two orthogonal evaluation approaches provide further support to our original hypothesis of the existence of two distinct word classes in search queries. Finally, we provide a taxonomy of intent words derived through rigorous manual analysis of large query logs.

Keywords: Query Understanding, Query Intent, Intent Words, Co-occurrence Entropy

Suggested Citation

Roy, Rishiraj Saha and Katare, Rahul and Ganguly, Niloy and Laxman, Srivatsan and Choudhury, Monojit, Discovering and Understanding Word Level User Intent in Web Search Queries (January 2015). Available at SSRN: https://ssrn.com/abstract=3199173 or http://dx.doi.org/10.2139/ssrn.3199173

Rishiraj Saha Roy (Contact Author)

Indian Institute of Technology (IIT), Kharagpur - Department of Computer Science and Engineering ( email )

Kharagpur
India

Rahul Katare

Indian Institute of Technology (IIT), Kharagpur - Department of Computer Science and Engineering ( email )

Kharagpur
India

Niloy Ganguly

Indian Institute of Technology (IIT), Kharagpur ( email )

Kharagpur
IIT Khragpur
Kharagpur, IN West Bengal 721302
India

Srivatsan Laxman

Scibler Technologies Private Limited ( email )

Flat No.11, 4th Floor
Y S Palazzo Apts No. 49, 6th Main, 18th Cross
Bangalore
India

Monojit Choudhury

Indian Institute of Technology ( email )

Kharagpur
India

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
28
Abstract Views
636
PlumX Metrics