Brought to you by:
Paper The following article is Open access

Overview of Long-form Document Matching: Survey of Existing Models and Their Challenges

, , , , and

Published under licence by IOP Publishing Ltd
, , Citation Yaokai Cheng et al 2022 J. Phys.: Conf. Ser. 2171 012059 DOI 10.1088/1742-6596/2171/1/012059

1742-6596/2171/1/012059

Abstract

Long-form document matching is an important direction in the field of natural language processing and can be applied to tasks such as news recommendation and text clustering. However, long-form document matching suffers from noisiness and sparsity of semantic information in long text. Using short-form document matching methods on a long-form matching problem is not satisfactory. Long-form document matching has attracted the attention of researchers, who have proposed many effective methods. Methods for matching long texts can be divided into three categories: traditional bag-of-words-based models, traditional deep learning-based models, and pre-training-based models. This study reviews typical methods of long-form document matching, analyzes their advantages and disadvantages, and discusses possible future developments.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.
10.1088/1742-6596/2171/1/012059