Improving Simultaneous Machine Translation with Monolingual Data

Deng, Hexuan; Ding, Liang; Liu, Xuebo; Zhang, Meishan; Tao, Dacheng; Zhang, Min

Computer Science > Computation and Language

arXiv:2212.01188 (cs)

[Submitted on 2 Dec 2022]

Title:Improving Simultaneous Machine Translation with Monolingual Data

Authors:Hexuan Deng, Liang Ding, Xuebo Liu, Meishan Zhang, Dacheng Tao, Min Zhang

View PDF

Abstract:Simultaneous machine translation (SiMT) is usually done via sequence-level knowledge distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model. However, there is still a significant performance gap between NMT and SiMT. In this work, we propose to leverage monolingual data to improve SiMT, which trains a SiMT student on the combination of bilingual data and external monolingual data distilled by Seq-KD. Preliminary experiments on En-Zh and En-Ja news domain corpora demonstrate that monolingual data can significantly improve translation quality (e.g., +3.15 BLEU on En-Zh). Inspired by the behavior of human simultaneous interpreters, we propose a novel monolingual sampling strategy for SiMT, considering both chunk length and monotonicity. Experimental results show that our sampling strategy consistently outperforms the random sampling strategy (and other conventional typical NMT monolingual sampling strategies) by avoiding the key problem of SiMT -- hallucination, and has better scalability. We achieve +0.72 BLEU improvements on average against random sampling on En-Zh and En-Ja. Data and codes can be found at this https URL.

Comments:	Accepted by AAAI 2023. Extended version includes supplementary material. 10 pages, 4 figures, 8 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2212.01188 [cs.CL]
	(or arXiv:2212.01188v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.01188

Submission history

From: Hexuan Deng [view email]
[v1] Fri, 2 Dec 2022 14:13:53 UTC (1,373 KB)

Computer Science > Computation and Language

Title:Improving Simultaneous Machine Translation with Monolingual Data

Submission history

Access Paper:

References & Citations

1 blog link

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Improving Simultaneous Machine Translation with Monolingual Data

Submission history

Access Paper:

References & Citations

1 blog link

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators