End-to-end optimization of goal-driven and visually grounded dialogue systems

Florian Strub; Harm de Vries; Jérémie Mary; Bilal Piot; Aaron Courville; Olivier Pietquin

End-to-end optimization of goal-driven and visually grounded dialogue systems

Florian Strub, Harm de Vries, Jérémie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

Main track. Pages 2765-2771. https://doi.org/10.24963/ijcai.2017/385

PDF BibTeX

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision may fail to correctly render the planning problem inherent to dialogue as well as its contextual and grounded nature. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues, based on the policy gradient algorithm. This approach is tested on the question generation task from the dataset GuessWhat?! containing 120k dialogues and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.

Keywords:

Machine Learning: Reinforcement Learning

Natural Language Processing: Dialogue

Machine Learning: Deep Learning