Abstract
Long-form document matching is an important direction in the field of natural language processing and can be applied to tasks such as news recommendation and text clustering. However, long-form document matching suffers from noisiness and sparsity of semantic information in long text. Using short-form document matching methods on a long-form matching problem is not satisfactory. Long-form document matching has attracted the attention of researchers, who have proposed many effective methods. Methods for matching long texts can be divided into three categories: traditional bag-of-words-based models, traditional deep learning-based models, and pre-training-based models. This study reviews typical methods of long-form document matching, analyzes their advantages and disadvantages, and discusses possible future developments.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.