ABSTRACT
Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research question. To aid query formulation, information specialists use a set of exemplar documents, called 'seed studies', prior to query formulation. Seed studies help verify the effectiveness of a query prior to the full assessment of retrieved studies. Beyond this use of seeds, specific IR methods can exploit seed studies for guiding both automatic query formulation and new retrieval models. One major limitation of work to date is that these methods exploit 'pseudo seed studies' through retrospective use of included studies (i.e., relevance assessments). However, we show pseudo seed studies are not representative of real seed studies used by information specialists. Hence, we provide a test collection with real world seed studies used to assist with the formulation of queries. To support our collection, we provide an analysis, previously not possible, on how seed studies impact retrieval and perform several experiments using seed study based methods to compare the effectiveness of using seed studies versus pseudo seed studies. We make our test collection and the results of all of our experiments and analysis available at http://github.com/ielab/sysrev-seed-collection.
- Mustafa Abualsaud, Nimesh Ghelani, Haotian Zhang, Mark D Smucker, Gordon V Cormack, and Maura R Grossman. 2018. A system for efficient high-recall retrieval. In The 41st international ACM SIGIR conference on research & development in information retrieval . 1317--1320.Google ScholarDigital Library
- Amal Alharbi and Mark Stevenson. 2019. A dataset of systematic review updates. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval . 1257--1260.Google ScholarDigital Library
- Amal Alharbi and Mark Stevenson. 2020. Refining Boolean queries to identify relevant studies for systematic review updates. Journal of the American Medical Informatics Association , Vol. 27, 11 (2020), 1658--1666.Google ScholarCross Ref
- Krystal Bullers, Allison M Howard, Ardis Hanson, William D Kearns, John J Orriola, Randall L Polo, and Kristen A Sakmar. 2018. It Takes Longer than You Think: Librarian Time Spent on Systematic Review Tasks. Journal of the Medical Library Association , Vol. 106, 2 (2018), 198.Google ScholarCross Ref
- Miew Keen Choong, Filippo Galgani, Adam G Dunn, and Guy Tsafnat. 2014. Automatic evidence retrieval for systematic reviews. Journal of medical Internet research , Vol. 16, 10 (2014), e3369.Google ScholarCross Ref
- Justin Clark. 2013. Systematic Reviewing. In Methods of Clinical Epidemiology , , Gail M. Williams Suhail A. R. Doi (Ed.).Google Scholar
- Justin Clark, Paul Glasziou, Chris Del Mar, Alexandra Bannach-Brown, Paulina Stehlik, and Anna Mae Scott. 2020. A full systematic review was completed in 2 weeks using automation tools: a case study. Journal of Chronic Diseases , Vol. 121 (May 2020), 81--90. https://doi.org/10.1016/j.jclinepi.2020.01.008Google Scholar
- Francesco Colace, Massimo De Santo, Luca Greco, and Paolo Napoletano. 2011. Improving text retrieval accuracy by using a minimal relevance feedback. In International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management. Springer, 126--140.Google Scholar
- Gordon V Cormack and Maura R Grossman. 2014. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval . 153--162.Google ScholarDigital Library
- Gordon V Cormack and Maura R Grossman. 2015. Autonomy and reliability of continuous active learning for technology-assisted review. arXiv preprint arXiv:1504.06868 (2015).Google Scholar
- Gordon V Cormack and Maura R Grossman. 2016. Scalability of continuous active learning for reliable high-recall text classification. In Proceedings of the 25th ACM international on conference on information and knowledge management . 1039--1048.Google ScholarDigital Library
- Giorgio Maria Di Nunzio. 2020. A Study on a Stopping Strategy for Systematic Reviews Based on a Distributed Effort Approach. In International Conference of the Cross-Language Evaluation Forum for European Languages. Springer, 112--123.Google ScholarDigital Library
- Trisha Greenhalgh and Richard Peacock. 2005. Effectiveness and efficiency of search methods in systematic reviews of complex evidence: audit of primary sources. Bmj , Vol. 331, 7524 (2005), 1064--1065.Google ScholarCross Ref
- Neal Haddaway, Matt Grainger, and Charles Gray. 2021. citationchaser: An R package and Shiny app for forward and backward citations chasing in academic searching. https://doi.org/10.5281/zenodo.4543513Google Scholar
- Elke Hausner, Siw Waffenschmidt, Thomas Kaiser, and Michael Simon. 2012. Routine Development of Objectively Derived Search Strategies. Systematic reviews , Vol. 1, 1 (2012), 19.Google Scholar
- Sampath Jayarathna, Atish Patra, and Frank Shipman. 2015. Unified relevance feedback for multi-application user interest modeling. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries . 129--138.Google ScholarDigital Library
- Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2017. CLEF 2017 technologically assisted reviews in empirical medicine overview. In CEUR workshop proceedings, Vol. 1866. 1--29.Google Scholar
- Evangelos Kanoulas, Dan Li, Leif Azzopardi, and René Spijker. 2018. CLEF 2018 Technologically Assisted Reviews in Empirical Medicine Overview. In CLEF .Google Scholar
- Evangelos Kanoulas, Dan Li, Leif Azzopardi, and Rene Spijker. 2019. CLEF 2019 technology assisted reviews in empirical medicine overview. In CEUR workshop proceedings , Vol. 2380.Google Scholar
- Rianne Kaptein, Jaap Kamps, and Djoerd Hiemstra. 2008. The impact of positive, negative and topical relevance feedback . Technical Report. AMSTERDAM UNIV (NETHERLANDS).Google Scholar
- Athanasios Lagopoulos, Antonios Anagnostou, Adamantios Minas, and Grigorios Tsoumakas. 2018. Learning-to-rank and relevance feedback for literature appraisal in empirical medicine. In International conference of the cross-language evaluation forum for European languages. Springer, 52--63.Google ScholarCross Ref
- Victor Lavrenko and W Bruce Croft. 2017. Relevance-based language models. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 260--267.Google ScholarDigital Library
- Grace E Lee and Aixin Sun. 2018. Seed-driven document ranking for systematic reviews in evidence-based medicine. In The 41st international ACM SIGIR conference on research & development in information retrieval. 455--464.Google ScholarDigital Library
- Dan Li and Evangelos Kanoulas. 2020. When to stop reviewing in technology-assisted reviews: Sampling from an adaptive distribution to estimate residual relevant documents. ACM Transactions on Information Systems (TOIS) , Vol. 38, 4 (2020), 1--36.Google ScholarDigital Library
- Donna Maglott, Jim Ostell, Kim D Pruitt, and Tatiana Tatusova. 2005. Entrez Gene: gene-centered information at NCBI. Nucleic acids research , Vol. 33, suppl_1 (2005), D54--D58.Google Scholar
- Jessie McGowan and Margaret Sampson. 2005. Systematic Reviews Need Systematic Searchers (IRP ). Journal of the Medical Library Association , Vol. 93, 1 (2005), 74.Google Scholar
- Christopher Norman, Mariska Leeflang, and Aurélie Névéol. 2018. Data extraction and synthesis in systematic reviews of diagnostic test accuracy: a corpus for automating and evaluating the process. In AMIA Annual Symposium Proceedings , Vol. 2018. American Medical Informatics Association, 817.Google Scholar
- Mateus Pereira, Elham Etemad, and Fernando Paulovich. 2020. Iterative learning to rank from explicit relevance feedback. In Proceedings of the 35th Annual ACM Symposium on Applied Computing. 698--705.Google ScholarDigital Library
- Animesh Prasad, Manpreet Kaur, and Min-Yen Kan. 2018. Neural ParsCit : A Deep Learning Based Reference String Parser. Journal on Digital Libraries , Vol. 19 (2018), 323--337.Google ScholarDigital Library
- Piotr Przybyła, Austin J Brockmeier, Georgios Kontonatsios, Marie-Annick Le Pogam, John McNaught, Erik von Elm, Kay Nolan, and Sophia Ananiadou. 2018. Prioritising references for systematic reviews with RobotAnalyst: a user study. Research synthesis methods , Vol. 9, 3 (2018), 470--488.Google Scholar
- JJ Rocchio and Gerard Salton. 1965. Information search optimization and interactive retrieval techniques. In Proceedings of the November 30--December 1, 1965, fall joint computer conference, part I . 293--305.Google ScholarDigital Library
- Claude Sammut and Geoffrey I. Webb (Eds.). 2010. Leave-One-Out Cross-Validation .Springer US, Boston, MA, 600--601. https://doi.org/10.1007/978-0--387--30164--8_469Google Scholar
- Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2019. Automatic Boolean Query Refinement for Systematic Review Literature Search. In The World Wide Web Conference (San Francisco, CA, USA) (WWW '19). Association for Computing Machinery, New York, NY, USA, 1646--1656. https://doi.org/10.1145/3308558.3313544Google ScholarDigital Library
- Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2020 a. You can teach an old dog new tricks: Rank fusion applied to coordination level matching for ranking in systematic reviews. In European conference on information retrieval . Springer, 399--414.Google ScholarDigital Library
- Harrisen Scells, Guido Zuccon, and Bevan Koopman. 2021. A comparison of automatic Boolean query formulation for systematic reviews. Information Retrieval Journal , Vol. 24, 1 (2021), 3--28.Google ScholarDigital Library
- Harrisen Scells, Guido Zuccon, Bevan Koopman, and Justin Clark. 2020 b. Automatic Boolean query formulation for systematic review literature search. In Proceedings of The Web Conference 2020. 1071--1081.Google ScholarDigital Library
- Harrisen Scells, Guido Zuccon, Bevan Koopman, and Justin Clark. 2020 c. A computational approach for objectively derived systematic review search strategies. In European conference on information retrieval. Springer, 385--398.Google ScholarDigital Library
- Harrisen Scells, Guido Zuccon, Bevan Koopman, Anthony Deacon, Leif Azzopardi, and Shlomo Geva. 2017. A test collection for evaluating retrieval of studies for inclusion in systematic reviews. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval . 1237--1240.Google ScholarDigital Library
- Harrisen Scells, Guido Zuccon, Mohamed A. Sharaf, and Bevan Koopman. 2020 d. Sampling Query Variations for Learning to Rank to Improve Automatic Boolean Query Generation in Systematic Reviews .Association for Computing Machinery, New York, NY, USA, 3041--3048. https://doi.org/10.1145/3366423.3380075Google ScholarDigital Library
- Alison Sneyd and Mark Stevenson. 2021. Stopping Criteria for Technology Assisted Reviews based on Counting Processes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2293--2297.Google ScholarDigital Library
- Shuai Wang, Hang Li, Harrisen Scells, Daniel Locke, and Guido Zuccon. 2021. MeSH Term Suggestion for Systematic Review Literature Search. In Proceedings of the 25th Australasian Document Computing Symposium (Virtual Event, Australia) (ADCS '21). Association for Computing Machinery, New York, NY, USA, Article 8. https://doi.org/10.1145/3503516.3503530Google ScholarDigital Library
- Shuai Wang, Harrisen Scells, Ahmed Mourad, and Guido Zuccon. 2022. Seed-Driven Document Ranking for Systematic Reviews: A Reproducibility Study. In European Conference on Information Retrieval. Springer, 686--700.Google Scholar
- Haotian Zhang, Mustafa Abualsaud, Nimesh Ghelani, Mark D Smucker, Gordon V Cormack, and Maura R Grossman. 2018. Effective user interaction for high-recall retrieval: Less is more. In Proceedings of the 27th ACM international conference on information and knowledge management . 187--196.Google ScholarDigital Library
- Jie Zou, Dan Li, and Evangelos Kanoulas. 2018. Technology assisted reviews: Finding the last few relevant documents by asking yes/no questions to reviewers. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval . 949--952.Google ScholarDigital Library
Index Terms
- From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search
Recommendations
Assessing the Impact of Vocabulary Similarity on Multilingual Information Retrieval for Bantu Languages
FIRE '16: Proceedings of the 8th Annual Meeting of the Forum for Information Retrieval EvaluationDespite the availability of massive open information and efforts to promote multilingualism on the Web, content in Bantu languages remains negligible. Additionally, Information Retrieval (IR) systems, such as the Google search engine, use algorithms ...
Building a web test collection using social media
SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrievalCommunity Question Answering (CQA) platforms contain a large number of questions and associated answers. Answerers sometimes include URLs as part of the answers to provide further information. This paper describes a novel way of building a test ...
A Test Collection for Evaluating Retrieval of Studies for Inclusion in Systematic Reviews
SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information RetrievalThis paper introduces a test collection for evaluating the effectiveness of different methods used to retrieve research studies for inclusion in systematic reviews. Systematic reviews appraise and synthesise studies that meet specific inclusion ...
Comments