Top Research Papers in Conversational AI For Chatbots And Intelligent Agents

Conversational AI For Chatbots, Bots, Dialog Systems, Intelligent Agents, Customer Support

UPDATE: We’ve also summarized the top 2019 Conversational AI research papers.

Conversational interfaces are permeating all aspects of our digital experiences. Digital assistants work alongside human agents to provide customer support. Chatbots are used to both market products and enable their purchases. IoT and other smart devices like Google Home or Amazon Echo enable hands-free operation through voice commands. Businesses are also starting to replace clunky enterprise UI for streamlined natural language interfaces to improve productivity and output.

We’ve searched through the major conversational AI research papers published in 2018 to select 10 which give an overview of the current state-of-the-art in dialog systems and intelligent agents.

We’ve done our best to summarize these papers correctly, but if we’ve made any mistakes, please contact us to request a fix.

If these summaries of scientific AI research papers are useful for you, you can subscribe to our AI Research mailing list at the bottom of this article to be alerted when we release new summaries. We’re planning to release summaries of important papers in computer vision, reinforcement learning, and conversational AI in the next few weeks.

If you’d like to skip around, here are the papers we featured:

Personalizing Dialogue Agents: I have a dog, do you have pets too?
Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems
Sounding Board: A User-Centric and Content-Driven Social Chatbot
Training Millions of Personalized Dialogue Agents
MultiWOZ – A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling
Semantic Parsing for Task Oriented Dialog using Hierarchical Representations
Towards Universal Dialogue State Tracking
Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents
Few-Shot Generalization Across Dialogue Tasks
Learning from Dialogue after Deployment: Feed Yourself, Chatbot!

Important Conversational AI Research Papers

1. Personalizing Dialogue Agents: I have a dog, do you have pets too?, by Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, Jason Weston

Original Abstract

Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.

Our Summary

Facebook AI Research team suggests that assigning a personality to the agent will make chit-chat dialogues much more consistent and engaging. To this end, they introduce a PERSONA-CHAT dataset, where each out of 10K dialogues is conditioned on a particular personality. Testing of various baseline models on this dataset shows that models that have access to their own personas are perceived as more consistent by annotators, but not more engaging. However, PERSONA-CHAT appeared to be a very strong source of training data for the beginning of conversations, when the speakers do not know each other and focus on asking and answering questions.

What’s the core idea of this paper?

Making chit-chat models more engaging and consistent via conditioning on persistent and recognizable profile information.
Suggesting a dataset, collected via Amazon Mechanical Turk, where each of the pairs of speakers conditions their dialogue on a given profile.

What’s the key achievement?

Introducing a PERSONA-CHAT dataset with:
- 1155 personality profiles each consisting of at least 5 short sentences;
- 162,064 utterances over 10,907 dialogues conditioned on some personality profiles.
Taking an important step towards modeling dialogue agents that can ask personality-related questions, remember the answers, and use them naturally in conversations.

What does the AI community think?

“If you want to have an interesting conversation with someone – even a virtual someone – then it helps if they have a personality, including likes and interests.”, – The Verge on the PERSONA-CHAT research paper by Facebook AI Research.

What are future research areas?

Exploring the ways to increase the scale of the dataset so that it:
- contains more different personas;
- with more complex conversations;
- generated naturally.

What are possible business applications?

Dialogue agents with assigned personality are likely to generate more coherent responses and keep more interesting and engaging conversations.

Where can you get implementation code?

Facebook research team provides example training and interactive scripts for some of the models used in this research paper.

2. Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems, by Bing Liu, Gokhan Tur, Dilek Hakkani-Tur, Pararth Shah, Larry Heck

Original Abstract

In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent’s capability in successfully completing a task.

Our Summary

The team from Carnegie Mellon University and Google introduces a new approach to training a task-oriented dialogue system. In particular, they suggest a hybrid imitation and reinforcement learning method, where the agent is firstly trained in a supervised manner from dialogue corpora, and then, continuously improves its performance by learning from users that demonstrate the right action to take when the agent makes mistake and give positive vs. negative feedback at the end of a dialogue. The experiments show that imitation learning combined with the reinforcement learning based on the user feedback significantly improves the agent’s performance.

What’s the core idea of this paper?

Introducing a neural network based task-oriented dialogue system that:
- is pre-trained in a supervised manner from dialogue corpora;
- collects new dialogue samples through interaction with users, i.e. when the agent makes mistakes, the system asks users to correct these mistakes and demonstrate the expected actions for the agent to make;
- gets a positive reward for successful tasks and a zero reward for failed tasks based on the user feedback at the end of a dialogue.

What’s the key achievement?

The dialogue system pre-trained in a supervised manner, followed by 1000 episodes of imitation learning, followed by reinforcement learning gets:
- roughly 65% of tasks completed successfully;
- a score of 4.63 from human evaluators on a scale of 1 (frustrating) to 5 (optimal way to help the user).
Dialogue state tracking accuracy goes up from 50.5% to 67.5% after only 500 imitation dialogue sessions.

What does the AI community think?

The paper was presented at NAACL-HLT 2018, one of the most important NLP conferences.

What are future research areas?

Exploring other, more natural ways to integrate human teaching and feedback into the agent’s training process.

What are possible business applications?

Improving the performance of task-oriented chatbots by incorporating human teaching and feedback into the model.

3. Sounding Board: A User-Centric and Content-Driven Social Chatbot, by Hao Fang, Hao Cheng, Maarten Sap, Elizabeth Clark, Ari Holtzman, Yejin Choi, Noah A. Smith, Mari Ostendorf

Original Abstract

We present Sounding Board, a social chatbot that won the 2017 Amazon Alexa Prize. The system architecture consists of several components including spoken language processing, dialogue management, language generation, and content management, with emphasis on user-centric and content-driven design. We also share insights gained from large-scale online logs based on 160,000 conversations with real-world users.

Our Summary

The team from the University of Washington has developed a socialbot named Sounding Board. This is a chatbot with which you can have a coherent and engaging conversation on sports, politics, entertainment, technology, and other popular topics and events. Sounding Board won the inaugural Amazon Alexa Prize in 2017 with an average score of 3.17 out of 5 and an average conversation duration over 10 minutes. The success comes from its ability to say something interesting to the user thanks to the rich content collection as well as the ability to show interest in the conversation partner by acknowledging user’s reactions and requests.

What’s the core idea of this paper?

Sounding Board strives to be:
- user-centric by allowing users to control the topic of conversation,
- content-driven by continually supplying interesting and relevant information to continue the conversation.

A system produces the response using three modules:
- Natural language understanding (NLU) module analyzes the user’s speech to produce a representation of the current event.
- Dialogue manager (DM) module executes the dialogue’s policy while considering user engagement, maintaining dialogue coherence, and enhancing the user experience. DM also has access to the rich content collection that is updated daily.
- Natural language generation (NLG) module builds the response using the content selected by the DM.
In addition, the researchers have found that modeling prosody is important for the chatbot to sound more engaging.

What’s the key achievement?

Creating a social bot that can have long and engaging conversations with users on a variety of topics.
Getting relatively good feedback from the users with 40% of rated conversations being scored at 5.
Analyzing how different personality types interact with the system and showing that users who are more extraverted, agreeable, or open to experience tend to rate the socialbot higher.

What does the AI community think?

Sounding Board won the 2017 Amazon Alexa Prize.
The lessons learned from building a successful social chatbot were shared in the keynote at NAACL-HLT 2018, one of the most important NLP conferences.

What are future research areas?

Increasing the success rate of the topic suggestion.
Improving the engagements via better analysis of user personality and topic-engagement patterns across users.

What are possible business applications?

A good social chatbot can help the company increase customer loyalty and engagement.

4. Training Millions of Personalized Dialogue Agents, by Pierre-Emmanuel Mazaré, Samuel Humeau, Martin Raison, Antoine Bordes

Original Abstract

Current dialogue systems are not very engaging for users, especially when trained end-to-end without relying on proactive reengaging scripted strategies. Zhang et al. (2018) showed that the engagement level of end-to-end dialogue models increases when conditioning them on text personas providing some personalized back-story to the model. However, the dataset used in Zhang et al. (2018) is synthetic and of limited size as it contains around 1k different personas. In this paper we introduce a new dataset providing 5 million personas and 700 million persona-based dialogues. Our experiments show that, at this scale, training using personas still improves the performance of end-to-end systems. In addition, we show that other tasks benefit from the wide coverage of our dataset by fine-tuning our model on the data from Zhang et al. (2018) and achieving state-of-the-art results.

Our Summary

Facebook AI research team introduces a very large-scale persona-based dialogue dataset created from REDDIT conversations. The corpus includes 5M personas spanning more than 700M conversations. The researchers argue that adding some personalized back-story to the model improves the performance of the chit-chat dialogue systems. The experiments confirm that models trained to align answers with both the persona of their author and the context achieve state-of-the-art results on retrieving the right response among 1M candidates.

What’s the core idea of this paper?

Building a dataset using conversations previously extracted from REDDIT:
- Creating personas of users by gathering all the comments they wrote, splitting them into sentences and selecting the sentences that contain (i) 4-20 words, (ii) either the word I or my, (iii) at least one verb, (iv) at least one noun, pronoun, or adjective.
- Taking each pair of successive comments in a thread to form the context and response, adding the corresponding persona.
Modeling dialogue by next utterance retrieval, where a response is not generated but picked among a set of candidates.

What’s the key achievement?

Demonstrating that training using personas improves the performance of end-to-end dialogue systems:
- conditioning on personas improves the prediction performance regardless of the encoder architecture (bag-of-words, LSTM, Transformer);
Showing that pre-training the model on the suggested dataset and then fine-tuning it for the specific dialogue system significantly improves the results:
- for example, the Transformer model trained only on PERSONA-CHAT dataset get the retrieval precision (hits @1) of 42.1%, while the model with the same architecture pre-trained on REDDIT dataset and then fine-tuned on PERSONA-CHAT dataset achieves the retrieval precision of 60.7%.

What does the AI community think?

The paper was presented at EMNLP 2018, leading conference in the area of natural language processing.

What are future research areas?

Building more advanced strategies to select personas.
Fine-tuning the model for various dialogues systems.

What are possible business applications?

Personalized dialogue agents are likely to generate more coherent responses and get higher user engagement.

5. MultiWOZ – A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling, by Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan, Milica Gašić

Original Abstract

Even though machine learning has become the major scene in dialogue research community, the real breakthrough has been blocked by the scale of data available. To address this fundamental obstacle, we introduce the Multi-Domain Wizard-of-Oz dataset (MultiWOZ), a fully-labeled collection of human-human written conversations spanning over multiple domains and topics. At a size of 10k dialogues, it is at least one order of magnitude larger than all previous annotated task-oriented corpora. The contribution of this work apart from the open-sourced dataset labelled with dialogue belief states and dialogue actions is two-fold: firstly, a detailed description of the data collection procedure along with a summary of data structure and analysis is provided. The proposed data-collection pipeline is entirely based on crowd-sourcing without the need of hiring professional annotators; secondly, a set of benchmark results of belief tracking, dialogue act and response generation is reported, which shows the usability of the data and sets a baseline for future studies.

Our Summary

University of Cambridge research team introduces a new benchmark for dialogue state tracking, dialogue-act-to-text generation, and dialogue-context-to-text generation. The presented dataset is significantly larger than all previous annotated task-oriented benchmarks and includes conversations spanning over multiple domains and topics. The researchers also provide benchmark results for a range of dialogue tasks that confirm that the new dataset is much more challenging than the previously available benchmarks.

What’s the core idea of this paper?

Building a large-scale multi-turn human-to-human conversational corpus that:
- contains 10,438 dialogues including 3,406 single-domain dialogues and 7,032 multi-domain dialogues consisting of at least 2 up to 5 domains;
- covers 7 domains including Attraction, Hospital, Police, Hotel, Restaurant, Taxi, Train.
Data collection pipeline is entirely based on crowd-sourcing without the need to involve professional annotators:
- turkers that act as users get easy-to-follow goals, and turkers that act on a system side get easy-to-operate system interface;
- multiple workers contribute to one dialogue but this rarely results in incoherent dialogues;
- annotation of the collected dialogues is also carried on via a crowdsourcing scheme.

What’s the key achievement?

Introducing a new fully-labeled collection of human-human written conversations, Multi-Domain Wizard-of-Oz dataset (MultiWOZ), which:
- is at least one order of magnitude larger than all previous annotated task-oriented corpora;
- spans over multiple domains and topics;
- is a great benchmark for a range of dialogue tasks, including dialogue state tracking, dialogue-act-to-text generation, and dialogue-context-to-text generation.
Providing a set of benchmark results that demonstrate the new challenges introduced by the MultiWOZ dataset.

What does the AI community think?

The paper was awarded the Best Paper Award at EMNLP 2018, leading conference in the area of natural language processing.

What are future research areas?

Exploring the ways to further optimize a data-collection pipeline and extend the scale of the dataset.

What are possible business applications?

A dataset with linguistically rich conversations spanning over multiple domains and topics is essential for training a sophisticated dialogue system.

Where can you get implementation code?

You can find the source code for end-to-end dialogue-context-to-text generation model from this research paper on GitHub.
The MultiWOZ dataset is also available online.

6. Semantic Parsing for Task Oriented Dialog using Hierarchical Representations, by Sonal Gupta, Rushin Shah, Mrinal Mohit, Anuj Kumar, Mike Lewis

Original Abstract

Task oriented dialog systems typically first parse user utterances to semantic frames comprised of intents and slots. Previous work on task oriented intent and slot-filling work has been restricted to one intent per query and one slot label per token, and thus cannot model complex compositional requests. Alternative semantic parsing systems have represented queries as logical forms, but these are challenging to annotate and parse. We propose a hierarchical annotation scheme for semantic parsing that allows the representation of compositional queries, and can be efficiently and accurately parsed by standard constituency parsing models. We release a dataset of 44k annotated queries (fb.me/semanticparsingdialog), and show that parsing models outperform sequence-to-sequence approaches on this dataset.

Our Summary

Traditional task-oriented dialogue systems have focused on identifying a single user intent and then filling the relevant slots. However, quite a lot of user queries include nested intents, like for example, “Driving directions to the Beyonce concert”, which is composed of GET DIRECTIONS and GET EVENT intents. The Facebook research team suggests a hierarchical representation for this type of complex compositional requests and even introduces a dataset with 44K annotated queries. The suggested approach allows the use of standard constituency parsing models, which outperform strong sequence-to-sequence baselines on the introduced dataset.

What’s the core idea of this paper?

Previous work on task-oriented dialogue systems is either restricted to one intent per query or represents queries as logical forms that are difficult to annotate and parse.
The authors of this research paper suggest a hierarchical Task Oriented Parsing (TOP) representation that:
- can express complex hierarchical queries, improving coverage of queries by 30%;
- keeps the annotation process straightforward;
- allows the use of existing constituency parsing algorithms;
- can be easily executed with minimal adaptation of the existing infrastructure.

What’s the key achievement?

Introducing a hierarchical semantic representation for task-oriented dialog systems that can model compositional and nested queries.
Releasing a dataset of 44K annotated utterances:
- the utterances are focused on navigation, events, and navigation to events;
- 35% of queries include nested intents.
Showing that inductive bias provided by the constituency parsing algorithms, and particularly, Recurrent Neural Network Grammars (RNNG), significantly improves the model’s accuracy compared to the strong sequence-to-sequence baselines based on CNNs, LSTMs, and Transformers.

What does the AI community think?

The paper was presented at EMNLP 2018, leading conference in the area of natural language processing.

What are future research areas?

Trying other parsing approaches, including sequence-to-tree models.

What are possible business applications?

A hierarchical representation introduced in this research paper provides a way for intelligent personal assistants to handle complex nested queries, significantly extending their capabilities.

Where can you get implementation code?

A large dataset of annotated utterances is available for download.
PyText framework includes a reference implementation and a pretrained model for this research paper.

7. Towards Universal Dialogue State Tracking, by Liliang Ren, Kaige Xie, Lu Chen, Kai Yu

Original Abstract

Dialogue state tracking is the core part of a spoken dialogue system. It estimates the beliefs of possible user’s goals at every dialogue turn. However, for most current approaches, it’s difficult to scale to large dialogue domains. They have one or more of following limitations: (a) Some models don’t work in the situation where slot values in ontology changes dynamically; (b) The number of model parameters is proportional to the number of slots; (c) Some models extract features based on hand-crafted lexicons. To tackle these challenges, we propose StateNet, a universal dialogue state tracker. It is independent of the number of values, shares parameters across all slots, and uses pre-trained word vectors instead of explicit semantic dictionaries. Our experiments on two datasets show that our approach not only overcomes the limitations, but also significantly outperforms the performance of state-of-the-art approaches.

Our Summary

When we ask a chatbot to recommend a restaurant, it will usually ask for some additional information like, which food we prefer, in which area the restaurant should be, and what should be a price range. These pieces of information, or slots, will have a number of possible values, for example, the food can be Italian, Chinese, Portuguese etc. The dialogue system gathers all this information, and thus, keeps track of the user’s goal at each dialogue turn. However, the current dialogue state trackers usually have a number of limitations, like not being able to handle the situation, where slot values are dynamically changing, or having a higher number of model parameters with every new slot being added. This limits the ability of these systems to scale to large dialogue domains. Thus, the authors suggest a universal dialogue state tracker StateNet, which not only overcomes these limitations but also significantly outperforms the state-of-the-art models.

What’s the core idea of this paper?

Universal dialogue state tracker StateNet:
- represents each slot and value by a group of word vectors;
- takes three parts of information as input: 1) user utterance representation; 2) machine act representation; 3) slot and slot values representation;
- generates a fixed-length representation of the dialogue history, and then compares the distances between this representation and the value vectors in the candidate set for making a prediction.
The best results were achieved when StateNet was strengthened with two additional features:
- parameters being shared among the slots;
- parameters being initialized with a pre-trained model based on one slot that is the most challenging for the state tracker.

What’s the key achievement?

The suggested approach to dialogue state tracking:
- overcomes a number of limitations that restrained previous approaches from scaling to large dialogue domains, in particular, the model is scalable for the slots that need tracking, and can make a prediction on new value as long as the corresponding word vector is available;
- achieves state-of-the-art performance in dialogue state tracking with joint goal accuracy on DSTC2 being 75.5 (previous best – 74.5) and on WOZ 2.0 – 88.9 (previous best – 88.1).

What does the AI community think?

The paper was presented at EMNLP 2018, leading conference in the area of natural language processing.

What are future research areas?

Evaluating the performance of the StateNet dialogue tracker in the scenarios where there are new slots and more unobserved slot values.
Evaluating how well the model can be transferred across different domains.

What are possible business applications?

StateNet dialogue tracker can benefit the performance of task-oriented chatbots as it:
- ensures a better understanding of the user’s goals during the dialogue;
- provides an opportunity to easily add new slots and new slot values.

Where can you get implementation code?

The authors provide the implementation code for this research paper on GitHub.

8. Unsupervised Transfer Learning for Spoken Language Understanding in Intelligent Agents, by Aditya Siddhant, Anuj Goyal, Angeliki Metallinou

Original Abstract

User interaction with voice-powered agents generates large amounts of unlabeled utterances. In this paper, we explore techniques to efficiently transfer the knowledge from these unlabeled utterances to improve model performance on Spoken Language Understanding (SLU) tasks. We use Embeddings from Language Model (ELMo) to take advantage of unlabeled data by learning contextualized word representations. Additionally, we propose ELMo-Light (ELMoL), a faster and simpler unsupervised pre-training method for SLU. Our findings suggest unsupervised pre-training on a large corpora of unlabeled utterances leads to significantly better SLU performance compared to training from scratch and it can even outperform conventional supervised transfer. Additionally, we show that the gains from unsupervised transfer techniques can be further improved by supervised transfer. The improvements are more pronounced in low resource settings and when using only 1000 labeled in-domain samples, our techniques match the performance of training from scratch on 10-15x more labeled in-domain data.

Our Summary

Amazon research team suggests the way to leverage millions of unannotated interactions with Alexa. They show that these utterances can be used to pre-train a spoken language understanding model that after pre-training will need only about 1000 labeled in-domain samples to demonstrate high accuracy in a specific task. The researchers also introduce ELMo-Light, a pre-training method that is very close to ELMo in terms of accuracy but much faster, making it a good candidate for real-world applications.

What’s the core idea of this paper?

Combining unsupervised and supervised transfer learning:
- training the embedding layers on 250 million unannotated requests to Alexa;
- using another 4 million annotated requests to existing Alexa services to train the network on two standard NLU tasks: intent classification and slot tagging;
- re-training the network on limited data to perform new tasks.
Leveraging ELMo embeddings but simplifying the network so that it is 60% faster and thus, efficient enough for deployment in real-life systems.

What’s the key achievement?

Introducing ELMo-Light, a faster and simpler unsupervised pre-training method for spoken language understanding.
Demonstrating that unsupervised transfer using unlabeled requests outperforms both training from scratch and supervised pre-training.
Showing how gains from unsupervised transfer can further be improved by supervised transfer, especially in low resource setting:
- with just 1K labeled in-domain samples, the proposed techniques match the performance of training from scratch on 10K-15K labeled samples.

What does the AI community think?

The research paper will be presented at AAAI 2019, one of the key conferences on Artificial Intelligence.

What are future research areas?

Applying the transfer techniques across different languages.
Experimenting with alternative architectures such as transformer and adversarial networks.

What are possible business applications?

The suggested approach to spoken language understanding can be directly applied in the commercial setting as it outperforms the alternatives in terms of both accuracy and efficiency.

9. Few-Shot Generalization Across Dialogue Tasks, by Vladimir Vlasov, Akela Drissner-Schmid, Alan Nichol

Original Abstract

Machine-learning based dialogue managers are able to learn complex behaviors in order to complete a task, but it is not straightforward to extend their capabilities to new domains. We investigate different policies’ ability to handle uncooperative user behavior, and how well expertise in completing one task (such as restaurant reservations) can be reapplied when learning a new one (e.g. booking a hotel). We introduce the Recurrent Embedding Dialogue Policy (REDP), which embeds system actions and dialogue states in the same vector space. REDP contains a memory component and attention mechanism based on a modified Neural Turing Machine, and significantly outperforms a baseline LSTM classifier on this task. We also show that both our architecture and baseline solve the bAbI dialogue task, achieving 100% test accuracy.

Our Summary

Chatbots life could be much easier if human users were always cooperative and happy to provide the information that a chatbot asks. But people are just people, and they usually tend to stray from the happy path. The Rasa team suggests a new dialogue policy, Recurrent Embedding Dialogue Policy (REDP), which is much better at learning how to deal with uncooperative behavior of human users, and moreover, can re-use this information when learning a new task. The experiments confirm that suggested dialogue model significantly outperforms the standard LSTM policies at dealing with uncooperative users.

What’s the core idea of this paper?

Successful handling of uncooperative behavior implies that assistant:
- responds correctly to the user’s uncooperative message;
- returns to the original task and continues as if the deviation never happened.
The assistant can ignore the irrelevant parts of the dialogue history thanks to an attention mechanism added to the neural network. This attention mechanism is based on a modified version of the Neural Turing Machine.
REDP model that was trained to handle uncooperative behavior in one domain (e.g., restaurant recommendation) can use this knowledge to handle deviations from the happy path in a different but similar domain (e.g., hotel recommendation).

What’s the key achievement?

Designing a dialogue policy REDP that:
- is able to recover from a sequence of uncooperative user utterances and return to the task to be completed;
- uses shared information in tasks to benefit from transfer learning across domains;
- achieves a 100% accuracy on the test set.

What does the AI community think?

The paper was presented at NeurIPS 2018, Conversational AI Workshop.

What are future research areas?

Applying the REDP framework to more domains, and testing it with real users.
Studying the properties of the learned embeddings.

What are possible business applications?

Improving the performance of task-oriented conversational assistants.

Where can you get implementation code?

The code and data for all the experiments are open-source and available on GitHub.

10. Learning from Dialogue after Deployment: Feed Yourself, Chatbot!, by Braden Hancock, Antoine Bordes, Pierre-Emmanuel Mazare, Jason Weston

Original Abstract

The majority of conversations a dialogue agent sees over its lifetime occur after it has already been trained and deployed, leaving a vast store of potential training signal untapped. In this work, we propose the self-feeding chatbot, a dialogue agent with the ability to extract new training examples from the conversations it participates in. As our agent engages in conversation, it also estimates user satisfaction in its responses. When the conversation appears to be going well, the user’s responses become new training examples to imitate. When the agent believes it has made a mistake, it asks for feedback; learning to predict the feedback that will be given improves the chatbot’s dialogue abilities further. On the PersonaChat chit-chat dataset with over 131k training examples, we find that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.

Our Summary

Facebook AI Research team introduces another chatbot that learns from humans when chatting with them. The idea to ask for user feedback when the model has low confidence in its response is not new and dates back to 90s, but this research introduces several interesting ideas. First of all, the authors show that assessing the speaking partner’s satisfaction works a lot better than using model confidence. Next, the user’s feedback responses in the suggested setting don’t need to be specially formatted. And finally, the experiments show that the models benefit from using both dialogue and feedback deployment examples, even though they are coming from the same conversations.

What’s the core idea of this paper?

Designing the self-feeding chatbot that can extract new training examples from the conversations it participates in during deployment:
- a dialogue agent imitates human responses when the human is satisfied,
- and it asks for feedback when the speaking partner is not satisfied (i.e., “Oops! Sorry. What should I have said instead?”).
The dialogue agent in the suggested setting performs three tasks:
- the primary DIALOGUE task – carrying on a coherent and engaging conversation;
- the auxiliary FEEDBACK task – predicting the feedback that will be given by the speaking partner when the agent believes it has made a mistake and asks for help;
- the auxiliary SATISFACTION task – predicting whether or not a speaking partner is satisfied with the quality of the current conversation.
During deployment, the dialogue agent collects DIALOGUE and FEEDBACK examples, and is then periodically re-trained using all available data.

What’s the key achievement?

Introducing a new approach for dialogue agent to learn from interaction with users.
Demonstrating that assessing user satisfaction works better than using model confidence.
Releasing three new datasets:
- deployment chat logs (512K messages);
- ratings of user satisfaction (42K);
- textual feedback on what a bot could have said in a given context (62K).

What does the AI community think?

“Being able to design AI systems that can automatically gather their own data once deployed feels like a middle ground between the systems we have today, and systems which do fully autonomous continuous learning.”, – Jack Clark, Strategy and Communications Director at OpenAI, reflects on the self-feeding chatbot introduced by FAIR in his Import AI newsletter.

What are future research areas?

Examining how to ask different kinds of questions, depending on the context, when asking for human feedback.
Exploring the gains from more frequent retraining of the model (e.g., every 5K examples instead of every 20K) or even updating it in an online manner.

What are possible business applications?

Improving the performance of chatbots by allowing them to learn from their conversations with users during deployment.

Where can you get implementation code?

The authors are going to make the datasets and models described in this paper available through the ParlAI platform.

Want Deeper Dives Into Specific AI Research Topics?

Due to popular demand, we’ve released several of these easy-to-read summaries and syntheses of major research papers for different subtopics within AI and machine learning.

Update: 2019 Research Summaries Are Released

We’ll let you know when we release more summary articles like this one.

Comments

williams says

February 26, 2019 at 12:43 am

Thanks for the information and love to read your articles…
thorbjoern. says

August 6, 2019 at 3:31 am

I’m missing the customer perspective in chat bot research here. With chat bots moving into customer care how do customers feel about these bots, how effective are the bots in handling customer requests. Do they actually save time for the _customer_ or is it usually just an extra hurdle before a human to look at your problem
ChatGPT says

January 15, 2024 at 11:22 pm

The article provides a great overview of key areas in conversational AI research that are pushing the boundaries of what’s possible. While systems like ChatGPT demonstrate impressive natural language abilities already, there are still challenges around consistency and reasoning that scientists are working to address. Advancing knowledge representation and reasoning in AI will be crucial for enabling truly intelligent dialogue. As research continues, we can expect even more human-like conversation from AI assistants that can engage deeply on specific topics. Exciting times ahead!