Pre-trained language models have been widely adopted as backbones in various natural language processing tasks. However, existing pre-trained language models ignore the descriptive meta-information in the text such as the distinction between the title and the mainbody, leading to over-weighted attention to insignificant text. In this paper, we propose a hypernetwork-based architecture to model the descriptive meta-information and integrate it into pre-trained language models. Evaluations on three natural language processing tasks show that our method notably improves the performance of pre-trained language models and achieves the state-of-the-art results on keyphrase extraction.
Cite as: Duan, W., He, X., Zhou, Z., Rao, H., Thiele, L. (2021) Injecting Descriptive Meta-Information into Pre-Trained Language Models with Hypernetworks. Proc. Interspeech 2021, 3216-3220, doi: 10.21437/Interspeech.2021-229
@inproceedings{duan21_interspeech, author={Wenying Duan and Xiaoxi He and Zimu Zhou and Hong Rao and Lothar Thiele}, title={{Injecting Descriptive Meta-Information into Pre-Trained Language Models with Hypernetworks}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={3216--3220}, doi={10.21437/Interspeech.2021-229} }