Overview of Long-form Document Matching: Survey of Existing Models and Their Challenges

Yaokai Cheng; Ruoyu Chen; Xiaoguang Yuan; Yuting Yang; Shan Jiang; Bo Yang

doi:10.1088/1742-6596/2171/1/012059

Journal of Physics: Conference Series

Paper • The following article is Open access

Overview of Long-form Document Matching: Survey of Existing Models and Their Challenges

Yaokai Cheng¹, Ruoyu Chen², Xiaoguang Yuan¹, Yuting Yang¹, Shan Jiang¹ and Bo Yang¹

Published under licence by IOP Publishing Ltd
Journal of Physics: Conference Series, Volume 2171, International Conference on Computer, Big Data and Artificial Intelligence (ICCBDAI 2021) 12/11/2021 - 14/11/2021 Beihai Citation Yaokai Cheng et al 2022 J. Phys.: Conf. Ser. 2171 012059 DOI 10.1088/1742-6596/2171/1/012059

Download Article PDF

Article metrics

952 Total downloads

Author e-mails

chenruoyu@bistu.edu.cn

Author affiliations

¹ Beijing Institute of Computer Technology and Application, Yongding Road, Haidian District, Beijing 100039, China

² Beijing Information Science and Technology University, No.35 Beisihuan Middle Road, Chaoyang District, Beijing, 100101, China

Buy this article in print

Journal RSS

Sign up for new issue notifications

Abstract

Long-form document matching is an important direction in the field of natural language processing and can be applied to tasks such as news recommendation and text clustering. However, long-form document matching suffers from noisiness and sparsity of semantic information in long text. Using short-form document matching methods on a long-form matching problem is not satisfactory. Long-form document matching has attracted the attention of researchers, who have proposed many effective methods. Methods for matching long texts can be divided into three categories: traditional bag-of-words-based models, traditional deep learning-based models, and pre-training-based models. This study reviews typical methods of long-form document matching, analyzes their advantages and disadvantages, and discusses possible future developments.

Export citation and abstract BibTeX RIS

Previous article in issue

Next article in issue

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.

Overview of Long-form Document Matching: Survey of Existing Models and Their Challenges

Article metrics

Share this article

Author e-mails

Author affiliations

Abstract