Reference Hub1
An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

Alaidine Ben Ayed, Ismaïl Biskri, Jean-Guy Meunier
Copyright: © 2022 |Volume: 12 |Issue: 1 |Pages: 14
ISSN: 2155-6377|EISSN: 2155-6385|EISBN13: 9781683182085|DOI: 10.4018/IJIRR.289950
Cite Article Cite Article

MLA

Ben Ayed, Alaidine, et al. "An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval." IJIRR vol.12, no.1 2022: pp.1-14. http://doi.org/10.4018/IJIRR.289950

APA

Ben Ayed, A., Biskri, I., & Meunier, J. (2022). An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval. International Journal of Information Retrieval Research (IJIRR), 12(1), 1-14. http://doi.org/10.4018/IJIRR.289950

Chicago

Ben Ayed, Alaidine, Ismaïl Biskri, and Jean-Guy Meunier. "An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval," International Journal of Information Retrieval Research (IJIRR) 12, no.1: 1-14. http://doi.org/10.4018/IJIRR.289950

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

In the context of big data and the 4.0 industrial revolution era, enhancing document/information retrieval frameworks efficiency to handle the ever‐growing volume of text data in an ever more digital world is a must. This article describes a double-stage system of document/information retrieval. First, a Lucene-based document retrieval tool is implemented, and a couple of query expansion techniques using a comparable corpus (Wikipedia) and word embeddings are proposed and tested. Second, a retention-fidelity summarization protocol is performed on top of the retrieved documents to create a short, accurate, and fluent extract of a longer retrieved single document (or a set of top retrieved documents). Obtained results show that using word embeddings is an excellent way to achieve higher precision rates and retrieve more accurate documents. Also, obtained summaries satisfy the retention and fidelity criteria of relevant summaries.