This year, I had the chance to attend the ACL 2019 conference in Florence. It was my first NLP academic conference, and I was eager to attend as many sessions as possible. In contrast to most of the attendees, I was not interested in any particular research area. Instead, I wanted to pick up on the general NLP trends – so I was happy to hear about the latest advances in dialogue systems, machine translation, question answering, psycholinguistics, and other research topics.
Here are my highlights from this conference.
1. The NLP research community is growing.
As pointed out by the organizers, “ACL size went XXL” this year. The number of registrations went up to 3160 compared to 1322 registrations last year. Furthermore, there was a 75% increase in the number of submissions relative to ACL 2018. In total, 2694 papers from 74 countries underwent review. Despite this huge surge in the number of papers submitted, the quality of the research remained high, and the acceptance rate for ACL 2019 (22.7%) is almost the same as the previous year (24.9%). In total, 660 papers were accepted to the conference, with 245 of them selected for oral presentations.
If these accessible AI research analyses are useful for you, subscribe to receive our regular industry updates below.
2. There is no clearly dominating area.
As in previous years, the submissions did not reveal any dominating area of NLP research. The leading areas, by number of papers submitted, include Information Extraction and Text Mining (9%), Machine Translation (8%), Machine Learning (7%), Dialogue and Interactive Systems (7%), and Generation (6%).
However, some areas have demonstrated significant growth from ACL 2018. For example, such areas as (1) Generation and (2) Linguistic Theories, Cognitive Modeling, and Psycholinguistics had 3-4 times more submissions than in the previous year.
3. The new paradigm is “pretraining + finetuning.”
Ming Zhou, ACL President, addressed the popularity of pretrained models in his talk. Lots of effort has been put into developing various pretraining models, starting with Word2Vec and ending with BERT and XLNet. This new paradigm allows multiple downstream tasks to be solved with quite small task-specific datasets that are used to finetune the pretrained model. Now, everybody who can program can do NLP.
But are we satisfied with this new paradigm? That is the question that Dr. Zhou asks in this talk. And the answer is “No.”
With this “pretraining-finetuning” approach, we still rely heavily on huge computational costs and annotated data. Students and academics don’t have access to the computing power required for advancing these huge language models. The framework also suffers from multiple significant challenges, including reasoning and interpretability. The issues are especially evident in the case of low-resource and multi-turn tasks.
4. Linguistics, knowledge, and common sense can be part of the solution.
Dr. Zhou believes that with linguistics, knowledge, common sense, and symbolic reasoning, we can solve these challenges.
Even for rich-resource tasks, there is a massive room for improvement. For example, we often have missing words in the translations performed by neural models, and we don’t know why these words are missing because the technology works as a black box. The neural models still have difficulties with handling abbreviations and named entities, even with lots of training data available. So the solution can involve more context modeling and data de-biasing, as well as leveraging multi-task learning and human knowledge to strengthen the models further.
For low-resource tasks, the promising research venues include cross-language learning, transfer learning, unsupervised learning, and also using humans in the loop to provide feedback to the system.
When it comes to multi-turn tasks, common sense and reasoning are things that the current models cannot handle easily. To introduce context, knowledge, and inference to the model, Dr. Zhou suggests using a concept model of reasoning with a memory-augmented network.
One of the key ideas of the presidential talk is that deep learning and linguistics boost each other, and linguistics can improve the interpretability of the data-driven approach. We need to move “towards interpretable, knowledgeable, ethical, economical and non-stop-learnable NLP.”
5. Neural machine translation is progressing.
During this year’s ACL, quite a lot of attention was devoted to the machine translation (MT) research area.
One of the two invited talks was about the advances and challenges of simultaneous machine translation. Liang Huang from Baidu gave a talk about the breakthrough his team has made in simultaneous translation. Their research group had the fairly simple idea of replacing the traditional seq2seq approach with the wait-k approach, where the system waits for k words before starting to anticipate the translation. Different languages have different word order in a sentence. So when you, for example, translate from German to English, you might need to wait quite a long time for a verb, because the verb comes at the very end in many German sentences. Human interpreters are usually quite good at anticipating from the context the words that have not been said yet. Now machines are also trying to anticipate the right translation.
Interestingly, the research group was able to create a prototype of this MT system in just one week, and then their colleagues in China were able to deploy it in production after another week.
Of course, simultaneous machine translation has yet some challenges to solve. For example, even though the system can often recognize humor, it has difficulties with translating the jokes so that they remain funny. Similarly, the machine will not tone down offensive speech as human interpreters often do. Though Dr. Huang doesn’t think that this is a problem – machines are more direct, and that’s fine.
The Best Long Paper Award was also granted in the MT research field this year. The paper by Zhang et al. (2019) addresses the problem of exposure bias in sequence-to-sequence translation. This refers to the problem where the predicted words are generated from different distributions at training and inference time. Namely, the model makes predictions using the ground-truth words as context at training time but needs to generate the entire sequence from scratch at inference time. The suggested solution is to sample context words not only from the ground-truth sequence but also from the decoder’s output received during model training.
6. Dialogue systems remain a popular research area.
Conversational AI was another huge research field covered at ACL 2019. For example, the second invited talk covered “technical approaches, applications and ethical issues of conversational systems.” Pascale Fung from the Hong Kong University of Science and Technology gave the talk about incorporating empathy into dialogue systems and even carried out a short demo of CAiRE, an end-to-end empathetic conversation chatbot, developed in her lab. Despite the considerable progress that this chatbot achieved in dialogue emotion detection and empathetic response generation, the demo showed that there is still significant room for improvement.
Of course, there were a lot of exciting papers covering conversational AI. For example, the Tencent research team showed how multi-turn dialogue modeling can be improved with utterance rewriting. Researchers from Montreal, supervised by Yoshua Bengio, demonstrated that neural dialogue systems do not use the conversation history effectively. In particular, they are insensitive to most types of perturbations and give the same results even if we disorder the words or utterances in the input. Interestingly, transformers are less sensitive to perturbations than LSTMs, especially when those include attention components.
The research team from the Hong Kong University of Science and Technology and Salesforce Research introduced a TRAnsferable Dialogue statE generator (TRADE) that leverages its context-enhanced slot gate and copy mechanism to track slot values mentioned anywhere in a dialogue history. The paper received an Outstanding Paper award at the main ACL 2019 conference and the Best Paper Award at the NLP for Conversational AI Workshop.
Several research papers suggested compelling applications of dialogue systems. For example, Wang et al. (2019) investigated the use of persuasive chatbots to change people’s opinions and actions for social good, namely donating to charity organizations. Cao et al. (2019) addressed the possibility of guiding psychotherapists using a dialogue observer that follows the dialogue between a therapist and a client to provide real-time feedback to therapists.
7. The dream of imitating the human brain with neural networks is still alive.
One of the papers that gathered many hundreds of people during the presentation touched upon the relationship between sentence representations learned by deep recurrent networks and those encoded by the human brain. The research performed by Jat et al. (2019) considered multiple neural network architectures, including ELMo and BERT, for encoding simple sentences like “The bone was eaten by the dog.” Then, they compared these representations with the magnetoencephalography (MEG) brain recording data from human subjects when they were reading these simple sentences. The findings demonstrate that BERT activations correlate the best with MEG data. Moreover, the researchers show that BERT can be used to augment data collected from brain activity, and the generated synthetic brain data helps in improving subsequent stimuli decoding task accuracy.
8. Ethics is becoming an essential aspect of NLP research.
One of the oral sessions at ACL 2019 was devoted entirely to discussing bias in language processing. The focus of the current research in this area is gender bias. The presented papers investigated:
- gender-preserving debiasing approaches (Kaneko and Bollegala, 2019);
- data augmentation for mitigating gender bias (Zmigrod et al., 2019);
- evaluating gender bias in machine translation (Stanovsky et al., 2019).
Sweeney and Najafian (2019) went beyond gender bias in their research and studied the unintended bias in word embeddings for other demographic groups. The authors argued that existing approaches to evaluating biases fail to explain how the embeddings could cause discrimination in the downstream NLP tasks. In contrast, the presented method “measures fairness in word embeddings through the relative negative sentiment associated with demographic identity terms from various protected groups,” and enables more in-depth analysis into the bias.
Sap et al. (2019) paid more attention to racial bias and investigated how annotators’ insensitivity to differences in dialect can lead to racial bias in hate speech detection models.
Key takeaways from ACL 2019
The current paradigm of using huge pretrained language models to solve most of the downstream NLP tasks has obviously leveled up natural language processing research. Even such often-criticized research advances as using more computing power and more data to set a new SOTA are important, at least to demonstrate the limitations of the current paradigm.
However, it’s time to move forward and explore “smarter” ways for improving neural networks’ performance. There are lots of promising research avenues here, including combining deep learning with linguistics and psycholinguistics, doing more context-aware modeling, adding knowledge and feedback from humans, and paying more attention to interpretability and reasoning. Finally, it’s critical to consider the ethical implications of NLP systems and further investigate the practical solutions to removing bias from NLP models.
Enjoy this article? Sign up for more AI research updates.
We’ll let you know when we release more summary articles like this one.