Enriching Word Embeddings for Patent Retrieval with Global Context

Hofstätter, Sebastian; Rekabsaz, Navid; Lupu, Mihai; Eickhoff, Carsten; Hanbury, Allan

doi:10.1007/978-3-030-15712-8_57

Sebastian Hofstätter²⁰,
Navid Rekabsaz²¹,
Mihai Lupu²²,
Carsten Eickhoff²³ &
…
Allan Hanbury^20,24

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11437))

Included in the following conference series:

European Conference on Information Retrieval

2742 Accesses
5 Citations

Abstract

The training and use of word embeddings for information retrieval has recently gained considerable attention, showing competitive performance across various domains. In this study, we explore the use of word embeddings for patent retrieval, a challenging domain, especially for methods based on distributional semantics. We hypothesize that the previously reported limited effectiveness of semantic approaches, and in particular word embeddings (word2vec Skip-gram) in this domain, is due to inherent constraints on the (short) window context that is too narrow for the model to capture the full complexity of the patent domain. To address this limitation, we jointly draw from local and global contexts for embedding learning. We do this in two ways: (1) adapting the Skip-gram model’s vectors using global retrofitting (2) filtering word similarities using global context. We measure patent retrieval performance using BM25 and LM Extended Translation models and observe significant improvements over three baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our code and the Lucene extensions are available at: github.com/sebastian-hofstaetter/ir-generalized-translation-models.

References

Andersson, L., Lupu, M., Palotti, J., Hanbury, A., Rauber, A.: When is the time ripe for natural language processing for patent passage retrieval? In: Proceedings of CIKM (2016)
Google Scholar
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of ACL (1998)
Google Scholar
Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proceedings of SIGIR (1999)
Google Scholar
Diaz, F., Mitra, B., Craswell, N.: Query expansion with locally-trained word embeddings. In: Proceedings of ACL (2016)
Google Scholar
Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting word vectors to semantic lexicons. In: Proceedings of NAACL-HLT (2015)
Google Scholar
Ganitkevitch, J., Van Durme, B., Callison-Burch, C.: PPDB: the paraphrase database. In: Proceedings of NAACL (2013)
Google Scholar
Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings. In: Proceedings of CIKM (2016)
Google Scholar
Lupu, M.: On the usability of random indexing in patent retrieval. In: Hernandez, N., Jäschke, R., Croitoru, M. (eds.) ICCS 2014. LNCS (LNAI), vol. 8577, pp. 202–216. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08389-6_17
Chapter Google Scholar
Lupu, M., Hanbury, A.: Patent retrieval. In: Foundations and Trends in Information Retrieval (2013)
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS (2013)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995)
Article Google Scholar
Nguyen, G.-H., Soulier, L., Tamine, L., Bricon-Souf, N.: DSRIM: a deep neural information retrieval model enhanced by a knowledge resource driven representation of documents. In: Proceedings of SIGIR (2017)
Google Scholar
Piroi, F., Lupu, M., Hanbury, A.: Overview of CLEF-IP 2013 lab. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 232–249. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40802-1_25
Chapter Google Scholar
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: Proceedings of SIGIR (1998)
Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of LREC Workshop on New Challenges for NLP Frameworks (2010)
Google Scholar
Rekabsaz, N., Lupu, M., Hanbury, A.: Exploration of a threshold for similarity based on uncertainty in word embedding. In: Jose, J.M., et al. (eds.) ECIR 2017. LNCS, vol. 10193, pp. 396–409. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56608-5_31
Chapter Google Scholar
Rekabsaz, N., Lupu, M., Hanbury, A., Zamani, H.: Word embedding causes topic shifting; exploit global context! In: Proceedings of SIGIR (2017)
Google Scholar
Rekabsaz, N., Lupu, M., Hanbury, A., Zuccon, G.: Generalizing translation models in the probabilistic relevance framework. In: Proceedings of CIKM (2016)
Google Scholar
Xiong, C., Callan, J., Liu, T.-Y.: Word-entity duet representations for document ranking. In: Proceedings of SIGIR (2017)
Google Scholar
Xiong, C., Dai, Z., Callan, J., Liu, Z., Power, R.: End-to-end neural ad-hoc ranking with kernel pooling. In: Proceedings of SIGIR (2017)
Google Scholar
Zamani, H., Croft, W.B.: Relevance-based word embedding. In: Proceedings of SIGIR (2017)
Google Scholar
Zuccon, G., Koopman, B., Bruza, P., Azzopardi, L.: Integrating and evaluating neural word embeddings in information retrieval. In: Proceedings of Australasian Document Computing Symposium (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

TU Wien, Vienna, Austria
Sebastian Hofstätter & Allan Hanbury
Idiap Research Institute, Martigny, Switzerland
Navid Rekabsaz
Research Studios Austria, Vienna, Austria
Mihai Lupu
Brown University, Providence, USA
Carsten Eickhoff
Complexity Science Hub, Vienna, Austria
Allan Hanbury

Authors

Sebastian Hofstätter
View author publications
You can also search for this author in PubMed Google Scholar
Navid Rekabsaz
View author publications
You can also search for this author in PubMed Google Scholar
Mihai Lupu
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Eickhoff
View author publications
You can also search for this author in PubMed Google Scholar
Allan Hanbury
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Hofstätter .

Editor information

Editors and Affiliations

University of Strathclyde, Glasgow, UK
Leif Azzopardi
Bauhaus Universität Weimar, Weimar, Germany
Benno Stein
Universität Duisburg-Essen, Duisburg, Germany
Norbert Fuhr
GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany
Philipp Mayr
Delft University of Technology, Delft, The Netherlands
Claudia Hauff
University of Twente, Enschede, The Netherlands
Djoerd Hiemstra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hofstätter, S., Rekabsaz, N., Lupu, M., Eickhoff, C., Hanbury, A. (2019). Enriching Word Embeddings for Patent Retrieval with Global Context. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11437. Springer, Cham. https://doi.org/10.1007/978-3-030-15712-8_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-15712-8_57
Published: 07 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15711-1
Online ISBN: 978-3-030-15712-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics