Abstract
Document-grounded dialogue (DGD) uses documents as external knowledge for dialogue generation. Correctly understanding the dialogue context is crucial for selecting knowledge from the document and generating proper responses. In this article, we propose using a dialogue policy to help the dialogue understanding in DGD. Our dialogue policy consists of two kinds of guiding signals: utterance function and topic transfer intent. The utterance function reflects the purpose and style of an utterance, and the topic transfer intent reflects the topic and content of an utterance. We propose a novel framework exploiting our dialogue policy for two core tasks in DGD, namely, knowledge selection (KS) and response generation (RG). The framework consists of two modules: the policy planner leverages policy-aware dialogue representation to select knowledge and predict the policy of the response; the generator uses policy/knowledge-aware dialogue representation for response generation. Our policy-driven model gets state-of-the-art performance on three public benchmarks, and we provide a detailed analysis of the experimental results. Our code/data will be released on GitHub.
- [1] . 1991. The HCRC map task corpus. Lang. Speech 34, 4 (1991), 351–366.Google Scholar
Cross Ref
- [2] . 2018. Generating more interesting responses in neural conversation models with distributional constraints. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, , , , and (Eds.). Association for Computational Linguistics, 3970–3980.
DOI: Google ScholarCross Ref
- [3] . 2010. Towards an ISO standard for dialogue act annotation. In Proceedings of the International Conference on Language Resources and Evaluation, , , , , , , , and (Eds.). European Language Resources Association. Retrieved from http://www.lrec-conf.org/proceedings/lrec2010/summaries/560.htmlGoogle Scholar
- [4] . 2020. The ISO standard for dialogue act annotation, second edition. In Proceedings of the 12th Language Resources and Evaluation Conference, , , , , , , , , , , , , , and (Eds.). European Language Resources Association, 549–558. Retrieved from https://aclanthology.org/2020.lrec-1.69/Google Scholar
- [5] . 2005. The AMI meeting corpus: A pre-announcement. In Proceedings of the International Workshop on Machine Learning for Multimodal Interaction. Springer, 28–39.Google Scholar
- [6] . 2019. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In Proceedings of the 57th Conference of the Association for Computational Linguistics, , , and (Eds.). Association for Computational Linguistics, 3696–3709.
DOI: Google ScholarCross Ref
- [7] . 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, , , and (Eds.). Association for Computational Linguistics, 4171–4186.Google Scholar
- [8] . 2019. Wizard of Wikipedia: Knowledge-powered conversational agents. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net. Retrieved from https://openreview.net/forum?id=r1l73iRqKmGoogle Scholar
- [9] . 2018. Sounding Board: A user-centric and content-driven social chatbot. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics, , , and (Eds.). Association for Computational Linguistics, 96–100.
DOI: Google ScholarCross Ref
- [10] . 2020. doc2dial: A goal-oriented document-grounded dialogue dataset. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, , , , and (Eds.). Association for Computational Linguistics, 8118–8128.
DOI: Google ScholarCross Ref
- [11] . 1971. Measuring nominal scale agreement among many raters. Psychol. Bull. 76, 5 (1971), 378–382.Google Scholar
Cross Ref
- [12] . 2021. DiSCoL: Toward engaging dialogue systems through conversational line guided response generation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, and (Eds.). Association for Computational Linguistics, 26–34.
DOI: Google ScholarCross Ref
- [13] . 2018. A knowledge-grounded neural conversation model. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), the 30th Innovative Applications of Artificial Intelligence (IAAI’18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’18), and (Eds.). AAAI Press, 5110–5117. Retrieved from https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16710Google Scholar
Cross Ref
- [14] . 1992. SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE Computer Society, 517–520.
DOI: Google ScholarCross Ref
- [15] . 2019. Topical-Chat: Towards knowledge-grounded open-domain conversations. In Proceedings of the Interspeech Conference, and (Eds.). ISCA, 1891–1895.
DOI: Google ScholarCross Ref
- [16] . 2022. Attention biasing and context augmentation for zero-shot control of encoder-decoder transformers for natural language generation. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI’22), 34th Conference on Innovative Applications of Artificial Intelligence (IAAI 2022) 12th Symposium on Educational Advances in Artificial Intelligence (EAAI’22). AAAI Press, 10738–10748. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/21319Google Scholar
Cross Ref
- [17] . 2022. SPACE-2: Tree-structured semi-supervised contrastive pre-training for task-oriented dialog understanding. In Proceedings of the 29th International Conference on Computational Linguistics, , , , , , , , , , , , , , , , , , , , and (Eds.). International Committee on Computational Linguistics, 553–569. Retrieved from https://aclanthology.org/2022.coling-1.46Google Scholar
- [18] . 2022. Unified dialog model pre-training for task-oriented dialog understanding and generation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, , , , , , and (Eds.). ACM, 187–200.
DOI: Google ScholarDigital Library
- [19] . 2022. GALAXY: A generative pre-trained model for task-oriented dialog with semi-supervised learning and explicit policy injection. In Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI’22), 34th Conference on Innovative Applications of Artificial Intelligence (IAAI’22), 12th Symposium on Educational Advances in Artificial Intelligence (EAAI’22). AAAI Press, 10749–10757. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/21320Google Scholar
Cross Ref
- [20] . 2020. Policy-driven neural response generation for knowledge-grounded dialog systems. In Proceedings of the 13th International Conference on Natural Language Generation, , , , and (Eds.). Association for Computational Linguistics, 412–421. Retrieved from https://aclanthology.org/2020.inlg-1.46/Google Scholar
- [21] . 2019. Neural conversation model controllable by given dialogue act based on adversarial learning and label-aware objective. In Proceedings of the International Conference on Natural Language Generation. Association for Computational Linguistics, 198–207.Google Scholar
Cross Ref
- [22] . 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation/IEEE Computer Society, 7482–7491.
DOI: Google ScholarCross Ref
- [23] . 2020. Sequential latent knowledge selection for knowledge-grounded dialogue. In Proceedings of the 8th International Conference on Learning Representations. OpenReview.net.Google Scholar
- [24] . 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, , , , and (Eds.). Association for Computational Linguistics, 7871–7880. Retrieved from https://www.aclweb.org/anthology/2020.acl-main.703/Google Scholar
Cross Ref
- [25] . 2016. A diversity-promoting objective function for neural conversation models. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. The Association for Computational Linguistics, 110–119.Google Scholar
Cross Ref
- [26] . 2017. DailyDialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the 8th International Joint Conference on Natural Language Processing, and (Eds.). Asian Federation of Natural Language Processing, 986–995. Retrieved from https://aclanthology.org/I17-1099/Google Scholar
- [27] . 2019. Incremental transformer with deliberation decoder for document grounded conversations. In Proceedings of the 57th Conference of the Association for Computational Linguistics, , , and (Eds.). Association for Computational Linguistics, 12–21.
DOI: Google ScholarCross Ref
- [28] . 2004. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out. Association for Computational Linguistics, 74–81.Google Scholar
- [29] . 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019).Google Scholar
- [30] . 2020. RefNet: A reference-aware network for background based conversation. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20) 32nd Innovative Applications of Artificial Intelligence Conference (IAAI’20), 10th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI’20). AAAI Press, 8496–8503. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/6370Google Scholar
Cross Ref
- [31] . 2020. DukeNet: A dual knowledge interaction network for knowledge-grounded conversation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, , , , , , , and (Eds.). ACM, 1151–1160.
DOI: Google ScholarDigital Library
- [32] . 2018. ISO-standard domain-independent dialogue act tagging for conversational agents. In Proceedings of the 27th International Conference on Computational Linguistics, , , and (Eds.). Association for Computational Linguistics, 3539–3551. Retrieved from https://aclanthology.org/C18-1300/Google Scholar
- [33] . 2018. Towards exploiting background knowledge for building conversation systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2322–2332.Google Scholar
Cross Ref
- [34] . 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Association for Computational Linguistics. ACL, 311–318.Google Scholar
- [35] . 2021. Focused attention improves document-grounded generation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, , , , , , , , , and (Eds.). Association for Computational Linguistics, 4274–4287.
DOI: Google ScholarCross Ref
- [36] . 2019. Conversing by reading: Contentful neural conversation with on-demand machine reading. In Proceedings of the Association for Computational Linguistics Conference. Association for Computational Linguistics, 5427–5436.Google Scholar
Cross Ref
- [37] . 2019. Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019).Google Scholar
- [38] . 2021. Increasing faithfulness in knowledge-grounded dialogue with controllable features. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, , , , and (Eds.). Association for Computational Linguistics, 704–718.
DOI: Google ScholarCross Ref
- [39] . 2020. Thinking globally, acting locally: Distantly supervised global-to-local knowledge selection for background based conversation. In Proceedings of the 34th AAAI Conference on Artificial Intelligence. AAAI Press, 8697–8704. Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/6395Google Scholar
Cross Ref
- [40] . 2021. Wizard of search engine: Access to information through conversations with search engines. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, , , , , , and (Eds.). ACM, 533–543.
DOI: Google ScholarDigital Library
- [41] . 2022. Stylistic response generation by controlling personality traits and intent. In Proceedings of the 4th Workshop on NLP for Conversational AI, , , , , , , , and (Eds.). Association for Computational Linguistics, 197–211.
DOI: Google ScholarCross Ref
- [42] . 2019. Deep reinforcement learning for modeling chit-chat dialog with discrete attributes. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, , , , , , , , and (Eds.). Association for Computational Linguistics, 1–10.
DOI: Google ScholarCross Ref
- [43] . 1999. Policy gradient methods for reinforcement learning with function approximation. In Proceedings of the NIPS Conference, , , and (Eds.). The MIT Press, 1057–1063. Retrieved from http://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximationGoogle Scholar
- [44] . 2017. Attention is all you need. In Proceedings of the NIPS Conference. 5998–6008.Google Scholar
- [45] . 2020. Multi-domain dialogue acts and response co-generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, , , , and (Eds.). Association for Computational Linguistics, 7125–7134.
DOI: Google ScholarCross Ref
- [46] . 2019. Improving conditioning in context-aware sequence to sequence models. CoRR abs/1911.09728 (2019).Google Scholar
- [47] . 2021. DIALKI: Knowledge identification in conversational systems through dialogue-document contextualization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, , , , and (Eds.). Association for Computational Linguistics, 1852–1863.
DOI: Google ScholarCross Ref
- [48] . 2022. CorefDiffs: Co-referential and differential knowledge flow in document grounded conversations. In Proceedings of the 29th International Conference on Computational Linguistics, , , , , , , , , , , , , , , , , , , , and (Eds.). International Committee on Computational Linguistics, 471–484. Retrieved from https://aclanthology.org/2022.coling-1.38Google Scholar
- [49] . 2019. Gunrock: A social bot for complex and engaging long conversations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, and (Eds.). Association for Computational Linguistics, 79–84.
DOI: Google ScholarCross Ref
- [50] . 2016. Strategy and policy learning for non-task-oriented conversational systems. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. The Association for Computer Linguistics, 404–412.
DOI: Google ScholarCross Ref
- [51] . 2021. CoLV: A collaborative latent variable model for knowledge-grounded dialogue generation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, , , , and (Eds.). Association for Computational Linguistics, 2250–2261.
DOI: Google ScholarCross Ref
- [52] . 2021. Augmenting knowledge-grounded conversations with sequential knowledge transition. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, , , , , , , , , and (Eds.). Association for Computational Linguistics, 5621–5630.
DOI: Google ScholarCross Ref
- [53] . 2020. Knowledge-grounded dialogue generation with pre-trained language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, , , , and (Eds.). Association for Computational Linguistics, 3377–3390.
DOI: Google ScholarCross Ref
- [54] . 2018. Commonsense knowledge aware conversation generation with graph attention. In Proceedings of the International Joint Conference on Artificial Intelligence. ijcai.org, 4623–4629.Google Scholar
Cross Ref
- [55] . 2018. A dataset for document grounded conversations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, , , , and (Eds.). Association for Computational Linguistics, 708–713.
DOI: Google ScholarCross Ref
Index Terms
- Policy-driven Knowledge Selection and Response Generation for Document-grounded Dialogue
Recommendations
Knowledge-Grounded Dialogue Generation with Contrastive Knowledge Selection
Web Information Systems Engineering – WISE 2023AbstractKnowledge selection is the key component in knowledge-ground dialogues, which aims to choice correct knowledge based on external knowledge for dialogue generation. The quality of knowledge selection depend on knowledge representation methods. ...
Aspect-Aware Response Generation for Multimodal Dialogue System
Survey Paper and Regular PaperMultimodality in dialogue systems has opened up new frontiers for the creation of robust conversational agents. Any multimodal system aims at bridging the gap between language and vision by leveraging diverse and often complementary information from ...
Prediction, selection, and generation: a knowledge-driven conversation system
AbstractIn conversational systems, we can use external knowledge to generate more diverse sentences and make these sentences contain actual knowledge. Leveraging knowledge for conversation system is important but challenging. Firstly, the conversation ...
Comments