Abstract
When conducting a search for research papers, the search should return comprehensive results related to the user’s query. In general, a user inputs a Boolean query that reflects the information need, and the search engine ranks the research papers based on the query. However, it is difficult to anticipate all possible terms that authors of relevant papers might have used. Moreover, general query-based ranking methods emphasize how to rank the relevant documents at the top of the results, but require some means of guaranteeing the comprehensiveness of the results. Therefore, two ranking methods that consider the comprehensiveness of relevant papers are proposed. The first uses a topic-based Boolean query search. This search converts every word in the abstract set and query into a topic via topic analysis by Latent Dirichlet Allocation (LDA) and conducts a search at the topic level. The topic assigned to synonyms of a search term is expected to be the same as that assigned to the search term. Each paper is ranked based on the number of times it is matched with each topic-based Boolean query search executed for various LDA parameter settings. The second is a hybrid method that emphasizes better results from our topic-based ranking result and a general query-based ranking result. This method is based on the observation that the paper sets retrieved by our method and by a general ranking method will be different. Through experiments using the NTCIR-1 and -2 datasets, the effectiveness of our topic-based and hybrid methods are demonstrated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amami, M., Pasi, G., Stella, F., Faiz, R.: An LDA-based approach to scientific paper recommendation. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2016. LNCS, vol. 9612, pp. 200–210. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41754-7_17
Dhanda, M., Verma, V.: Recommender system for academic literature with incremental dataset. Procedia Comput. Sci. 89, 483–491 (2016)
Ganguly, D., Roy, D., Mitra, M., Jones, G.J.F.: A Word embedding based generalized language model for information retrieval. In: SIGIR, pp. 795–798 (2015)
Griffiths, T.L., Steyvers, M.: Finding scientific topics. In: National Academy of Sciences, pp. 5228–5253 (2004)
Hassan, H.A.M.: Personalized research paper recommendation using deep learning. In: UMAP, pp. 327–330 (2017)
Hong, K., Jeon, H., Jeon, C.: Personalized research paper recommendation system using keyword extraction based on userprofile. Convergence Inf. Technol. 8(16), 106–116 (2013)
Kando, N., et al.: The NTCIR workshop: the first evaluation workshop on Japanese text retrieval and cross-lingual information retrieval. In: Information Retrieval with Asian Languages Workshop (1999)
Kando, N.: Overview of the second NTCIR workshop. In: NTCIR Workshop, pp. 35–43 (2001)
Kim, Y., Seo, J., Croft, W.B.: Automatic Boolean query suggestion for professional search. In: SIGIR, pp. 825–834 (2011)
Kuriyama, K., Kando, N., Nozue, T., Eguchi, K.: Pooling for a large-scale test collection: an analysis of the search results from the first NTCIR workshop. Inf. Retrieval 5(1), 41–59 (2002)
Liu, X., Croft, W.B.: Cluster-based retrieval using language models. In: SIGIR, pp. 186–193 (2004)
Mai, G., Janowicz, K., Yan, B.: Combining text embedding and knowledge graph embedding techniques for academic search engines. In: SemDeep–4 at ISWC (2018)
Masumura, R., Asami, T., Masataki, H., Sadamitsu, K., Nishida, K., Higashinaka, R.: Hyperspherical query likelihood models with word embeddings. In: IJCNLP, pp. 210–216 (2017)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR, pp. 275–281 (1998)
Sugiyama, K., Kan, M.-Y.: Scholarly paper recommendation via user’s recent research interests. In: JCDL, pp. 29–38 (2010)
Takaku, M., Egusa, Y.: Simple document-by-document search tool “fuwatto search” using web API. In: Tuamsuk, K., Jatowt, A., Rasmussen, E. (eds.) ICADL 2014. LNCS, vol. 8839, pp. 312–319. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12823-8_32
Tannebaum, W., Rauber, A.: Using query logs of USPTO patent examiners for automatic query expansion in patent searching. Inf. Retrieval 17(5–6), 452–470 (2014)
TreeTagger. http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
Verberne, S., Sappelli, M., Kraaij, W.: Query term suggestion in academic search. In: de Rijke, M., et al. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 560–566. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-06028-6_57
Wei, X., Croft, W.B.: LDA-based document models for ad-hoc retrieval. In: SIGIR, pp. 178–185 (2006)
Xion, C., Power, R., Callan, J.: Explicit semantic ranking for academic search via knowledge graph embedding. In: WWW, pp. 1271–1279 (2017)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst. 22(2), 179–214 (2004)
Zhao, W., Wu, R., Liu, H.: Paper recommendation based on the knowledge gap between a researcher’s background knowledge and research target. Inf. Process. Manage. 52(5), 976–988 (2016)
Acknowledgements
This work was supported by JSPS KAKENHI Grant Number JP15H01721. We thank Stuart Jenkinson, PhD, from Edanz Group (www.edanzediting.com/ac) for editing a draft of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Fukuda, S., Tomiura, Y., Ishita, E. (2019). Research Paper Search Using a Topic-Based Boolean Query Search and a General Query-Based Ranking Model. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11707. Springer, Cham. https://doi.org/10.1007/978-3-030-27618-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-27618-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27617-1
Online ISBN: 978-3-030-27618-8
eBook Packages: Computer ScienceComputer Science (R0)