This year’s annual meeting of the Association for Computational Linguistics (ACL 2019) was bigger than ever. Although the conference received 75% more submissions than last year, the quality of the research papers remained high, and so the acceptance rates are almost the same.
It is becoming more and more challenging to keep track of the latest research advances in your area with such an overwhelming number of good research papers coming out. So, for your convenience, we’ve picked up and summarized several interesting research papers that might have particularly useful applications in a business setting.
These are also the papers that got lots of attention from the AI community, and most of these studies have been nominated for or awarded ACL Best Paper Awards. We’re featuring papers from different research areas including (visual) question answering, semantics, dialogue systems, and sentiment analysis.
If these accessible AI research analyses & summaries are useful for you, you can subscribe to receive our regular industry updates below.
If you’d like to skip around, here are the papers we featured:
- Explain Yourself! Leveraging Language Models for Commonsense Reasoning
- Detecting Concealed Information in Text and Speech
- Zero-Shot Entity Linking by Reading Entity Descriptions
- Improving Visual Question Answering by Referring to Generated Paragraph Captions
- Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts
- Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
ACL 2019 Research Papers
1. Explain Yourself! Leveraging Language Models for Commonsense Reasoning, by Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, Richard Socher
Original Abstract
Deep learning models perform poorly on tasks that require commonsense reasoning, which often necessitates some form of world-knowledge or reasoning over information not immediately present in the input. We collect human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations (CoS-E). We use CoS-E to train language models to automatically generate explanations that can be used during training and inference in a novel Commonsense Auto-Generated Explanation (CAGE) framework. CAGE improves the state-of-the-art by 10% on the challenging CommonsenseQA task. We further study commonsense reasoning in DNNs using both human and auto-generated explanations including transfer to out-of-domain tasks. Empirical results indicate that we can effectively leverage language models for commonsense reasoning.
Our Summary
Natural language processing algorithms are limited to information contained in texts, and often these algorithms lack commonsense reasoning that allows them to make inferences as most humans do. The Salesforce research team suggests addressing this problem by training the language model to automatically generate commonsense explanations. This task is accomplished by providing the model with human explanations alongside the question answering samples. These autogenerated explanations are then used by a neural network to solve the CommonsenseQA (CQA) task. This two-step approach improved accuracy on the CommonsenseQA multiple-choice test by 10% compared to existing models.
What’s the core idea of this paper?
- Natural language processing struggles with inference based on common sense and real-world knowledge.
- The paper suggests addressing this issue in two phases:
- First, the researchers train the model to generate Common Sense Explanations (CoS-E) by providing human-generated explanations in the form of both open-ended sentences and highlighted span annotations, alongside Commonsense Question Answering (CQA) examples.
- In the second phase, the authors use this trained language model to generate explanations for each sample in the training and validation sets. These Commonsense Auto-Generated Explanations (CAGE) are then leveraged to solve the CQA task.
What’s the key achievement?
- The explanation-generating model improves performance in a natural language reasoning test by 10% over the previous best model and improves understanding of how neural networks apply knowledge.
- Moreover, the experiments demonstrate that the introduced approach can be successfully transferred to out-of-domain datasets.
What does the AI community think?
- The paper is accepted for oral presentation at ACL 2019, one of the key conferences in natural language processing.
What are future research areas?
- Combining the explanation-generating model into an answer prediction model.
- Extending the dataset of explanations to other tasks to create a more general explanatory language model.
- Removing bias from training datasets to eliminate bias in generated explanations.
What are possible business applications?
- The model with improved common-sense reasoning capabilities can be leveraged:
- to provide better customer service via chatbots;
- to improve the performance of information retrieval systems.
Where can you get implementation code?
- The human-annotated commonsense explanations dataset is available on GitHub.
2. Detecting Concealed Information in Text and Speech, by Shengli Hu
Original Abstract
Motivated by infamous cheating scandals in various industries and political events, we address the problem of detecting concealed information in technical settings. In this work, we explore acoustic-prosodic and linguistic indicators of information concealment by collecting a unique corpus of professionals practicing for oral exams while concealing information. We reveal subtle signs of concealed information in speech and text, compare, and contrast them with those in deception detection literature, thus uncovering the link between concealing information and deception. We then present a series of experiments that automatically detect concealed information from text and speech. We compare the use of acoustic-prosodic, linguistic, and individual feature sets, using different machine learning models. Finally, we present a multi-task learning framework with acoustic, linguistic, and individual features, that outperforms human performance by over 15%.
Our Summary
When confidential information is leaked, it is often difficult to tell who originally obtained the leaked information and who it has been leaked to. Even though previous work has demonstrated that changes in voice tone, lexicon, and speech patterns can identify when someone is concealing information, research in this subject area is very scarce. It is partly due to the lack of datasets that include ground truth labels indicating information concealment. To address this issue, the present study introduces a new dataset collected from a unique audio corpus of professional wine tasters practicing for oral exams while concealing information. By leveraging this dataset, the researcher was able to develop a new multi-task learning model for detecting concealed information that performs 11% better than baseline models and 15% better than humans.
Multi-task Learning Framework Combining Acoustic, Linguistic, and Individual Features
What’s the core idea of this paper?
- While there are machine learning-based methods for detecting when someone does not have information but pretends to, there are few comparable models for detecting when someone is concealing leaked information.
- In this study, Hu from Cornell University captured linguistic and acoustic-prosodic features from a controlled human experiment to create a dataset of speech patterns when people were speaking honestly and when they were concealing some information.
- The author leverages this dataset to develop a multi-task learning framework where, as well as identifying concealed information, the system is also predicting whether the speaker’s answer is correct and the identity of the wine.
What’s the key achievement?
- A multi-task learning model outperformed baseline models by 11% and humans by 15% at detecting when someone is concealing information.
- Moreover, the introduced framework outperforms humans even in the case where some of the humans in the experiment knew one another and could read social cues (e.g. gestures) that are not available to the model.
What does the AI community think?
- The paper has been nominated as a candidate for the ACL 2019 Best Paper Awards.
What are future research areas?
- Studying individual differences in both detecting concealed information and concealing information.
- Exploring the predictive power of phonotactic variation features.
- Conducting domain adaptation with regards to detecting concealed information.
- Improving the scalability of the multi-task learning model.
What are possible business applications?
- Detecting insider trading in financial markets.
- Controlling data leaks within different testing procedures.
- Tracing and limiting the extent of information leaks around political campaigns.
3. Zero-Shot Entity Linking by Reading Entity Descriptions, by Lajanugen Logeswaran, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, Jacob Devlin, Honglak Lee
Original Abstract
We present the zero-shot entity linking task, where mentions must be linked to unseen entities without in-domain labeled data. The goal is to enable robust transfer to highly specialized domains, and so no metadata or alias tables are assumed. In this setting, entities are only identified by text descriptions, and models must rely strictly on language understanding to resolve the new entities. First, we show that strong reading comprehension models pre-trained on large unlabeled data can be used to generalize to unseen entities. Second, we propose a simple and effective adaptive pre-training strategy, which we term domain-adaptive pre-training (DAP), to address the domain shift problem associated with linking unseen entities in a new domain. We present experiments on a new dataset that we construct for this task and show that DAP improves over strong pre-training baselines, including BERT. The data and code are available at https://github.com/lajanugen/zeshel.
Our Summary
The paper addresses the problem of entity-linking in highly specialized domains, where there is no access to such powerful resources as high-coverage alias tables, structured data, and linking frequency statistics. So, the goal is to identify entities using only text descriptions and the model’s ability to comprehend text. The Google Research team constructs a dataset for this task using multiple subdomains in Wikia and extracting the labeled mentions using hyperlinks. Next, they develop a strong baseline model that relies on state-of-the-art reading comprehension frameworks and propose a novel adaptation strategy called domain-adaptive pre-training (DAP) that further improves entity-linking performance. The suggested approach achieves 77% accuracy when retrieving information from unseen target domains.
What’s the core idea of this paper?
- Linking to information in specialized databases is challenging for the existing approaches that mainly rely on the availability of powerful resources like alias tables or frequency statistics.
- To perform entity linking tasks for the domains where such resources are not available, the researchers suggest relying on strong reading comprehension models.
- Furthermore, they show that incorporating attention mechanisms between a mention in context and entity descriptions is crucial for the task.
- Finally, the paper introduces a novel adaptation strategy called domain-adaptive pre-training (DAP) to further improve the model’s performance: the model is pre-trained on the target-domain data and fine-tuned on the source-domain labeled data.
What’s the key achievement?
- Introducing a new task for zero-shot entity linking.
- Developing a new multi-domain dataset for entity linking research in specialized domains.
- Improving entity linking performance by 2% over the previous best model.
What does the AI community think?
- The paper has been nominated as a candidate for the ACL 2019 Best Paper Awards.
What are future research areas?
- Automatically recognizing queries that do not exist in the database.
- Jointly resolving mentions within a document rather than resolving them one at a time.
What are possible business applications?
- Domain-specific search functionality for retrieving information about specific subfields.
- Searchable subdomains of information within large corporations.
4. Improving Visual Question Answering by Referring to Generated Paragraph Captions, by Hyounghun Kim, Mohit Bansal
Original Abstract
Paragraph-style image captions describe diverse aspects of an image as opposed to the more common single-sentence captions that only provide an abstract description of the image. These paragraph captions can hence contain substantial information of the image for tasks such as visual question answering. Moreover, this textual information is complementary with visual information present in the image because it can discuss both more abstract concepts and more explicit, intermediate symbolic information about objects, events, and scenes that can directly be matched with the textual question and copied into the textual answer (i.e., via easier modality match). Hence, we propose a combined Visual and Textual Question Answering (VTQA) model which takes as input a paragraph caption as well as the corresponding image, and answers the given question based on both inputs. In our model, the inputs are fused to extract related information by cross-attention (early fusion), then fused again in the form of consensus (late fusion), and finally expected answers are given an extra score to enhance the chance of selection (later fusion). Empirical results show that paragraph captions, even when automatically generated (via an RL-based encoder-decoder model), help correctly answer more visual questions. Overall, our joint model, when trained on the Visual Genome dataset, significantly improves the VQA performance over a strong baseline model.
Our Summary
Computer models struggle with answering questions about visual images, a task known as visual question answering (VQA). In this study, the researchers sought to improve VQA performance by providing a VQA model with a text description of an image’s content produced by a paragraph captioning model. The two models were fused over three stages to generate a consensus answer to questions posed about the image. The resulting visual and textual question answering (VQTA) model was 1.92% more accurate than the standalone VQA model.
What’s the core idea of this paper?
- VQA models struggle with identifying all of the necessary information in images, and particularly abstract concepts, required to answer questions.
- The researchers suggest using a pre-trained paragraph captioning model to provide additional information to the VQA model.
- The text and image input are fused at three levels:
- in the early fuse stage, visual features are fused with paragraph caption and object property features by cross-attention;
- in the late fuse stage, the inputs are fused again in the form of consensus, i.e. logits from each module are integrated into one vector;
- in the later fuse stage, the model accounts for the fact that some regions of the image are more likely to draw people’s attention, and thus questions and answers are more likely to be related to those regions. So, the model gives an extra score to the answers related to the salient regions.
What’s the key achievement?
- Improving visual question answering performance by 1.92% compared to the baseline VQA model.
What does the AI community think?
- The paper has been nominated as a candidate for the ACL 2019 Best Paper Awards.
What are future research areas?
- Improving VTQA models to extract more information from textual captions, and enhancing paragraph captioning models to generate better captions.
- Training the VTQA model jointly with the paragraph captioning model.
What are possible business applications?
- Improving image search and retrieval.
- Image annotation and interactivity for blind people.
- Creating “interactive” images for online education.
5. Emotion-Cause Pair Extraction: A New Task to Emotion Analysis in Texts, by Rui Xia and Zixiang Ding
Original Abstract
Emotion cause extraction (ECE), the task aimed at extracting the potential causes behind certain emotions in text, has gained much attention in recent years due to its wide applications. However, it suffers from two shortcomings: 1) the emotion must be annotated before cause extraction in ECE, which greatly limits its applications in real-world scenarios; 2) the way to first annotate emotion and then extract the cause ignores the fact that they are mutually indicative. In this work, we propose a new task: emotion-cause pair extraction (ECPE), which aims to extract the potential pairs of emotions and corresponding causes in a document. We propose a 2-step approach to address this new ECPE task, which first performs individual emotion extraction and cause extraction via multi-task learning, and then conduct emotion-cause pairing and filtering. The experimental results on a benchmark emotion cause corpus prove the feasibility of the ECPE task as well as the effectiveness of our approach.
Our Summary
Emotion cause extraction (ECE) is an approach used in natural language processing to identify statements containing the causes behind vocabulary expressing emotion. However, ECE requires emotions to first be annotated, and ignores mutual relationships between causes and emotional effects. The researchers sought to solve this problem by simultaneously identifying pairs of emotions and causes in a task they call emotion-cause pair extraction (ECPE). ECPE uses a two-step approach: the first step uses two multi-task learning networks to identify emotion and cause clauses, while the second step pairs all causes and emotions, and uses a trained filter to eliminate pairings that do not contain a causal relationship. The resulting ECPE task is able to identify emotion-cause pairs at an accuracy on par with existing ECE methods but without requiring emotion annotation.
What’s the core idea of this paper?
- The paper introduces a new emotion-cause pair extraction (ECPE) task to overcome the limitations of the traditional ECE task, where emotion annotation is required prior to cause extraction and mutual indicativeness of emotion and cause is not taken into account.
- The introduced approach consists of two steps:
- In the first step, the two individual tasks of emotion extraction and cause extraction are performed via two kinds of multi-task learning networks:
- Inter-EC that uses emotion extraction to improve cause extraction;
- Inter-CE that leverages cause extraction to enhance emotion extraction.
- In the second step, the model combines all elements of the two sets into pairs by applying a Cartesian product. Then, a logistic regression model is trained to eliminate pairs that do not contain a causal relationship.
- In the first step, the two individual tasks of emotion extraction and cause extraction are performed via two kinds of multi-task learning networks:
What’s the key achievement?
- ECPE is able to achieve F1 scores of 0.83 for emotion extraction, 0.65 for cause extraction, and 0.61 for emotion-cause pairing.
- On the ECE benchmark dataset, ECPE performs on par with existing ECE methods that require emotion annotation before causal clauses can be identified.
What does the AI community think?
- The paper received an Outstanding Paper award at ACL 2019.
What are future research areas?
- Altering the ECPE approach from a two-step to a one-step process that directly extracts emotion-cause pairs in an end-to-end fashion.
What are possible business applications?
- Sentiment analysis for marketing campaigns.
- Opinion monitoring from social media.
Where can you get implementation code?
- The code used in this study is available on GitHub.
6. Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems, by Chien-Sheng Wu, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, Pascale Fung
Original Abstract
Over-dependence on domain ontology and lack of knowledge sharing across domains are two practical and yet less studied problems of dialogue state tracking. Existing approaches generally fall short in tracking unknown slot values during inference and often have difficulties in adapting to new domains. In this paper, we propose a Transferable Dialogue State Generator (TRADE) that generates dialogue states from utterances using a copy mechanism, facilitating knowledge transfer when predicting (domain, slot, value) triplets not encountered during training. Our model is composed of an utterance encoder, a slot gate, and a state generator, which are shared across domains. Empirical results demonstrate that TRADE achieves state-of-the-art joint goal accuracy of 48.62% for the five domains of MultiWOZ, a human-human dialogue dataset. In addition, we show its transferring ability by simulating zero-shot and few-shot dialogue state tracking for unseen domains. TRADE achieves 60.58% joint goal accuracy in one of the zero-shot domains, and is able to adapt to few-shot cases without forgetting already trained domains.
Our Summary
The research team from the Hong Kong University of Science and Technology and Salesforce Research addresses the problem of over-dependence on domain ontology and lack of knowledge sharing across domains. In a practical scenario, many slots share all or some of their values among different domains (e.g., the area slot can exist in many domains like restaurant, hotel, or taxi), and thus transferring knowledge across multiple domains is imperative for dialogue state tracking (DST) models. The researchers introduce a TRAnsferable Dialogue statE generator (TRADE) that leverages its context-enhanced slot gate and copy mechanism to track slot values mentioned anywhere in a dialogue history. TRADE shares its parameters across domains and doesn’t require a predefined ontology, which enables tracking of previously unseen slot values. The experiments demonstrate the effectiveness of this approach with TRADE achieving state-of-the-art joint goal accuracy of 48.62% on a challenging MultiWOZ dataset.
What’s the core idea of this paper?
- To overcome over-dependence on domain ontology and lack of knowledge sharing across domains, the researchers suggest:
- generating slot values directly instead of predicting the probability of every predefined ontology term;
- sharing all the model parameters across domains.
- The TRADE model consists of three components:
- an utterance encoder to encode dialogue utterances into a sequence of fixed-length vectors;
- a slot gate to predict whether a certain (domain, slot) pair is triggered by the dialogue;
- a state generator to decode multiple output tokens for all (domain, slot) pairs independently to predict their corresponding values.
What’s the key achievement?
- On a challenging MultiWOZ dataset of human-human dialogues, TRADE achieves joint goal accuracy of 48.62%, setting a new state of the art.
- Moreover, TRADE achieves 60.58% joint goal accuracy in one of the zero-shot domains, demonstrating its ability to transfer knowledge to previously unseen domains.
- The experiments also demonstrate the model’s ability to adapt to new few-shot domains without forgetting already trained domains.
What does the AI community think?
- The paper received an Outstanding Paper award at the main ACL 2019 conference and the Best Paper Award at NLP for Conversational AI Workshop at the same conference.
What are future research areas?
- Transferring knowledge from other resources to further improve zero-shot performance.
- Collecting a dataset with a large number of domains to facilitate the study of techniques within multi-domain dialogue state tracking.
What are possible business applications?
- The current research can significantly improve the performance of task-oriented dialogue systems in multi-domain settings.
Where can you get implementation code?
- The PyTorch implementation of this study is available on GitHub.
Enjoy this article? Sign up for more AI research updates.
We’ll let you know when we release more summary articles like this one.
Leave a Reply
You must be logged in to post a comment.