research-article

Aspect-Aware Response Generation for Multimodal Dialogue System

Authors:
Mauajama Firdaus

Department of Computer Science and Engineering, Indian Institute of Technology Patna, India

Department of Computer Science and Engineering, Indian Institute of Technology Patna, India

0000-0001-7485-5974
View Profile

,
Nidhi Thakur

Department of Computer Science and Engineering, Indian Institute of Technology Patna, India

Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
View Profile

,
Asif Ekbal

Department of Computer Science and Engineering, Indian Institute of Technology Patna, India

Department of Computer Science and Engineering, Indian Institute of Technology Patna, India
View Profile

ACM Transactions on Intelligent Systems and Technology Volume 12 Issue 2Article No.: 15pp 1–33https://doi.org/10.1145/3430752

Published:04 February 2021Publication History

ACM Transactions on Intelligent Systems and Technology

Abstract

Multimodality in dialogue systems has opened up new frontiers for the creation of robust conversational agents. Any multimodal system aims at bridging the gap between language and vision by leveraging diverse and often complementary information from image, audio, and video, as well as text. For every task-oriented dialog system, different aspects of the product or service are crucial for satisfying the user’s demands. Based upon the aspect, the user decides upon selecting the product or service. The ability to generate responses with the specified aspects in a goal-oriented dialogue setup facilitates user satisfaction by fulfilling the user’s goals. Therefore, in our current work, we propose the task of aspect controlled response generation in a multimodal task-oriented dialog system. We employ a multimodal hierarchical memory network for generating responses that utilize information from both text and images. As there was no readily available data for building such multimodal systems, we create a Multi-Domain Multi-Modal Dialog (MDMMD++) dataset. The dataset comprises the conversations having both text and images belonging to the four different domains, such as hotels, restaurants, electronics, and furniture. Quantitative and qualitative analysis on the newly created MDMMD++ dataset shows that the proposed methodology outperforms the baseline models for the proposed task of aspect controlled response generation.

References

Shubham Agarwal, Ondřej Dušek, Ioannis Konstas, and Verena Rieser. 2018. Improving context modelling in multimodal dialogue generation. In Proceedings of the 11th International Conference on Natural Language Generation, Tilburg University, The Netherlands, November 5--8. 129--134Google ScholarCross Ref
Shubham Agarwal, Ondřej Dušek, Ioannis Konstas, and Verena Rieser. 2018. A knowledge-grounded multimodal search-based conversational agent. In Proceedings of the 2nd International Workshop on Search-Oriented Conversational AI, SCAI@EMNLP 2018, Brussels, Belgium, October 31. 59--66.Google ScholarCross Ref
Huda Alamri, Chiori Hori, Tim K. Marks, Dhruv Batr, and Devi Parikh. 2018. Audio visual scene-aware dialog (AVSD) track for natural language generation in DSTC7. In DSTC7 at AAAI2019 Workshop, Vol. 2.Google Scholar
Layla El Asri, Hannes Schulz, Shikhar Sharma, Jeremie Zumer, Justin Harris, Emery Fine, Rahul Mehrotra, and Kaheer Suleman. [n.d.]. Frames: A corpus for adding memory to goal-oriented dialogue systems. ([n.d.]).Google Scholar
Paweł Budzianowski and Ivan Vulić. 2019. Hello, it’s GPT-2--How can I help you? Towards the use of pretrained language models for task-oriented dialogue systems. arXiv preprint arXiv:1907.05774 (2019).Google Scholar
Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Inigo Casanueva, Stefan Ultes, Osman Ramadan, and Milica Gašić. 2018. Multiwoz-a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31-- November 4, 2018, 5016--5026.Google ScholarCross Ref
Hardik Chauhan, Mauajama Firdaus, Asif Ekbal, and Pushpak Bhattacharyya. 2019. Ordinal and attribute aware response generation in a multimodal dialogue system. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28--August 2, 2019, Volume 1: Long Papers. 5437--5447.Google ScholarCross Ref
Hongshen Chen, Zhaochun Ren, Jiliang Tang, Yihong Eric Zhao, and Dawei Yin. 2018. Hierarchical variational memory network for dialogue generation. In Proceedings of the 2018 World Wide Web Conference 2018, Lyon, France, April 23-- 27, 2018. 1653--1662.Google ScholarDigital Library
Stanley Chen, Douglas H. Beeferman, and Ronald Rosenfeld. 1998. Evaluation metrics for language models.Google Scholar
Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, and William Yang Wang. 2019. Semantically conditioned dialog response generation via hierarchical disentangled self-attention. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28--August 2, 2019, Volume 1: Long Papers (2019), 3696--3709.Google ScholarCross Ref
Xiuyi Chen, Jiaming Xu, and Bo Xu. 2019. A working memory model for task-oriented dialog response generation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28--August 2, 2019, Volume 1: Long Papers. 2687--2693.Google ScholarCross Ref
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).Google Scholar
Chen Cui, Wenjie Wang, Xuemeng Song, Minlie Huang, Xin-Shun Xu, and Liqiang Nie. 2019. User attention-guided multimodal dialog systems. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-- 25. 445--454.Google ScholarDigital Library
Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, Deshraj Yadav, José M. F. Moura, Devi Parikh, and Dhruv Batra. 2017. Visual dialog. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 326--335.Google ScholarCross Ref
Harm De Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, and Aaron Courville. 2017. GuessWhat?! Visual object discovery through multi-modal dialogue. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, July 21-- 26. 4466--4475.Google ScholarCross Ref
Mihail Eric and Christopher D. Manning. 2017. Key-value retrieval networks for task-oriented dialogue. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Saarbrücken, Germany, August 15-- 17, 2017 (2017), 37--49.Google Scholar
Mauajama Firdaus, Hardik Chauhan, Asif Ekbal, and Pushpak Bhattacharyya. 2020. EmoSen: Generating sentiment and emotion controlled responses in a multimodal dialogue system. IEEE Transactions on Affective Computing (2020).Google Scholar
Mauajama Firdaus, Nidhi Thakur, and Asif Ekbal. [n.d.]. MultiDM-GCN: Aspect-guided response generation in multi-domain multi-modal dialogue system using graph convolutional network. In Accepted in EMNLP Findings, 2020.Google ScholarCross Ref
Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters. Psychological Bulletin 76, 5 (1971), 378.Google ScholarCross Ref
Zhe Gan, Yu Cheng, Ahmed EI Kholy, Linjie Li, Jingjing Liu, and Jianfeng Gao. 2019. Multi-step reasoning via recurrent dual attention for visual dialog. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28--August 2, 2019, Volume 1: Long Papers (2019), 6463--6474.Google Scholar
Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, AISTATS 2010, Chia Laguna Resort, Sardinia, Italy, May 13--15. 249--256.Google Scholar
Matthew Henderson, Blaise Thomson, and Jason D. Williams. 2014. The second dialog state tracking challenge. In Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), June 18--20, 2014, Philadelphia, PA. 263--272.Google ScholarCross Ref
Simao Herdade, Armin Kappeler, Kofi Boakye, and Joao Soares. 2019. Image captioning: Transforming objects into words. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8--14 December 2019, Vancouver, BC, Canada. 11135--11145.Google Scholar
Chenyang Huang, Osmar R. Zaiane, Amine Trabelsi, and Nouha Dziri. 2018. Automatic dialogue generation with expressed emotions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, June 1--6, 2018, Volume 2 (Short Papers). 49--54.Google ScholarCross Ref
Pingping Huang, Jianhui Huang, Yuqing Guo, Min Qiao, and Yong Zhu. 2019. Multi-grained attention with object-level grounding for visual question answering. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-- August 2, 2019, Volume 1: Long Papers. 3595--3600.Google ScholarCross Ref
Shaojie Jiang and Maarten de Rijke. 2018. Why are sequence-to-sequence models so dull? Understanding the low-diversity problem of chatbots. In Proceedings of the 2nd International Workshop on Search-Oriented Conversational AI, SCAI@EMNLP 2018, Brussels, Belgium, October 31, 2018 (2018), 81--86.Google ScholarCross Ref
Hung Le, S. Hoi, Doyen Sahoo, and N Chen. 2019. End-to-end multimodal dialog systems with hierarchical multimodal attention on video features. In DSTC7 at AAAI2019 Workshop.Google Scholar
Hung Le, Doyen Sahoo, Nancy F. Chen, and Steven C. H. Hoi. 2019. Multimodal transformer networks for end-to-end video-grounded dialogue systems. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-- August 2, 2019, Volume 1: Long Papers (2019).Google Scholar
Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016. Deep reinforcement learning for dialogue generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, November 1--4 (2016), 1192--1202.Google ScholarCross Ref
Lizi Liao, Yunshan Ma, Xiangnan He, Richang Hong, and Tat-seng Chua. 2018. Knowledge-aware multimodal dialogue systems. In Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, MM 2018, Seoul, Republic of Korea, October 22--26. ACM, 801--809.Google ScholarDigital Library
Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, Barcelona, Spain.Google Scholar
Kuan-Yen Lin, Chao-Chun Hsu, Yun-Nung Chen, and Lun-Wei Ku. 2019. Entropy-enhanced multimodal attention model for scene-aware dialogue generation. In DSTC7 at AAAI2019 Workshop.Google Scholar
Zehao Lin, Xinjing Huang, Feng Ji, Haiqing Chen, and Ying Zhang. 2019. Task-oriented conversation generation using heterogeneous memory networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3--7 (2019), 4557--4566.Google ScholarCross Ref
Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015).Google Scholar
Andrea Madotto, Chien-Sheng Wu, and Pascale Fung. 2018. Mem2Seq: Effectively incorporating knowledge bases into end-to-end task-oriented dialog systems. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15--20, 2018, Volume 1: Long Papers (2018), 1468--1478.Google ScholarCross Ref
Fei Mi, Minlie Huang, Jiyong Zhang, and Boi Faltings. 2019. Meta-learning for low-resource natural language generation in task-oriented dialogue systems. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16 (2019), 3151--3157.Google ScholarCross Ref
Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios P. Spithourakis, and Lucy Vanderwende. 2017. Image-grounded conversations: Multimodal context for natural question and response generation. In Proceedings of the 8th International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27-- December 1, 2017.Volume 1: Long Papers (2017), 462--472.Google Scholar
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6--12, 2002, Philadelphia, PA. Association for Computational Linguistics, 311--318.Google ScholarDigital Library
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25--29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL. 1532--1543.Google Scholar
Denis Peskov, Nancy Clarke, Jason Krone, Brigi Fodor, Yi Zhang, Adel Youssef, and Mona Diab. 2019. Multi-domain goal-oriented dialogues (MultiDoGO): Strategies toward curating and annotating large scale dialogue data. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, November 3--7, 2019. 4518--4528.Google ScholarCross Ref
Kun Qian and Zhou Yu. 2019. Domain adaptive dialog generation via meta learning. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-- August 2, 2019, Volume 1: Long Papers (2019), 2639--2649.Google ScholarCross Ref
Libo Qin, Xiao Xu, Wanxiang Che, Yue Zhang, and Ting Liu. 2020. Dynamic fusion network for multi-domain end-to-end task-oriented dialog. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5--10, 2020 (2020), 6344--6354.Google ScholarCross Ref
Dinesh Raghu, Nikhil Gupta, et al. 2018. Disentangling language and knowledge in task-oriented dialogs. arXiv preprint arXiv:1805.01216 (2018).Google Scholar
Dinesh Raghu, Nikhil Gupta, et al. 2018. Hierarchical pointer memory network for task oriented dialogue. arXiv preprint arXiv:1805.01216 abs/1805.01216 (2018).Google Scholar
Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. 2019. On the convergence of Adam and Beyond. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30-- May 3, 2018, Conference Track Proceedings (2019).Google Scholar
Revanth Reddy, Danish Contractor, Dinesh Raghu, and Sachindra Joshi. 2019. Multi-level memory for task oriented dialogs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, June 2--7, 2019, Volume 1 (Long and Short Papers) (2019), 3744--3754.Google Scholar
Amrita Saha, Mitesh M. Khapra, and Karthik Sankaranarayanan. 2018. Towards building large scale multimodal domain-aware conversation systems. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, February 2--7. 696--704.Google Scholar
Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, February 12--17, 2016, Phoenix, Arizona. 3776--3784.Google Scholar
Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron C. Courville, and Yoshua Bengio. 2017. A hierarchical latent variable encoder-decoder model for generating dialogues. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, February 4--9, 2017, San Francisco, California. 3295--3301.Google Scholar
Pararth Shah, Dilek Hakkani-Tür, Gokhan Tür, Abhinav Rastogi, Ankur Bapna, Neha Nayak, and Larry Heck. 2018. Building a conversational agent overnight with dialogue self-play. arXiv preprint arXiv:1801.04871 (2018).Google Scholar
Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers. 1577--1586.Google Scholar
Xiaoyu Shen, Hui Su, Shuzi Niu, and Vera Demberg. 2018. Improving variational encoder-decoders in dialogue generation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, February 2--7. 5456--5463.Google Scholar
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, May 7--9, 2015, Conference Track Proceedings (2015).Google Scholar
Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob Grue Simonsen, and Jian-Yun Nie. 2015. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, Melbourne, VIC, Australia, October 19--23. ACM, 553--562.Google ScholarDigital Library
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929--1958.Google ScholarDigital Library
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems. 3104--3112.Google ScholarDigital Library
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR 2016, Las Vegas, NV, June 27--30, 2016. 2818--2826.Google ScholarCross Ref
Zhiliang Tian, Wei Bi, Xiaopeng Li, and Nevin L. Zhang. 2019. Learning to abstract for memory-augmented conversational response generation. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-- August 2, 2019, Volume 1: Long Papers. 3816--3825.Google Scholar
Oriol Vinyals and Quoc Le. 2015. A neural conversational model. arXiv preprint arXiv:1506.05869 (2015).Google Scholar
Bernard L. Welch. 1947. The generalization of student’s problem when several different population variances are involved. Biometrika 34, 1/2 (1947), 28--35.Google Scholar
Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2017. A network-based end-to-end trainable task-oriented dialogue system. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, April 3-7, 2017, Volume 1: Long Papers (2017), 438--449.Google ScholarCross Ref
Ronald J. Williams and David Zipser. 1989. A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1, 2 (1989), 270--280.Google ScholarDigital Library
Chien-Sheng Wu. 2019. Learning to memorize in neural task-oriented dialogue systems. arXiv preprint arXiv:1905.07687 (2019).Google Scholar
Chien-Sheng Wu, Richard Socher, and Caiming Xiong. 2019. Global-to-local memory pointer networks for task-oriented dialogue. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, May 6-9 (2019).Google Scholar
Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, and Anton van den Hengel. 2017. Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding 163 (2017), 21--40.Google ScholarDigital Library
Haotian Xu, Haiyun Peng, Haoran Xie, Erik Cambria, Liuyang Zhou, and Weiguo Zheng. 2019. End-to-End latent-variable task-oriented dialogue system with exact log-likelihood optimization. World Wide Web (2019), 1--14.Google Scholar
Sanqiang Zhao, Piyush Sharma, Tomer Levinboim, and Radu Soricut. 2019. Informative image captioning with external sources of information. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28-- August 2, 2019, Volume 1: Long Papers (2019), 6485--6494.Google ScholarCross Ref
Chenguang Zhu, Michael Zeng, and Xuedong Huang. 2019. Multi-task learning for natural language generation in task-oriented dialogue. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3--7. 1261--1266.Google ScholarCross Ref

Index Terms

Aspect-Aware Response Generation for Multimodal Dialogue System
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System
Abstract
Natural Language Understanding (NLU) and Natural Language Generation (NLG) are the two critical components of every conversational system that handles the task of understanding the user by capturing the necessary information in the form of slots ...
Read More
A multimodal dialogue system for improving user satisfaction via knowledge-enriched response and image recommendation
Abstract
Task-oriented multimodal dialogue systems have important application value and development prospects. Existing methods have made significant progress, but the following challenges still exist: (1) Most existing methods focus on improving the ...
Read More
Extending the Transformer with Context and Multi-dimensional Mechanism for Dialogue Response Generation
Natural Language Processing and Chinese Computing
Abstract
The existing work of using generative model in multi-turn dialogue system is often based on RNN (Recurrent neural network) even though the Transformer structure has achieved great success in other fields of NLP. In the multi-turn conversation task,...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Intelligent Systems and Technology Volume 12, Issue 2
Survey Paper and Regular Paper
April 2021
319 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3447400
Editor:
Yu Zheng
JD Digits, China
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 February 2021
- Accepted: 1 October 2020
- Revised: 1 September 2020
- Received: 1 March 2020
Published in tist Volume 12, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Multimodal dialogue system
memory network
response generation
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 788
  Total Downloads
- Downloads (Last 12 months)76
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Aspect-Aware Response Generation for Multimodal Dialogue System

ACM Transactions on Intelligent Systems and Technology

Abstract

References

Cited By

Index Terms

Recommendations

A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System

A multimodal dialogue system for improving user satisfaction via knowledge-enriched response and image recommendation

Extending the Transformer with Context and Multi-dimensional Mechanism for Dialogue Response Generation