To help you stay well prepared for 2021, we have summarized the latest trends across different research areas, including natural language processing, conversational AI, computer vision, and reinforcement learning.
We also suggest key research papers in different areas that we think are representative of the latest advances.
Subscribe to our AI Research mailing list at the bottom of this article to be alerted when we release new research articles.
Natural Language Processing
In 2020, NLP research advances were still dominated by large pre-trained language models, particularly transformers.This year we’re likely to see some more interesting research ideas on improving the transformer architecture and the efficiency of its training. At the same time, we can be sure that top tech companies will continue to exploit the model’s size as the main factor for improving the performance of language models, with GPT-4 or something similar likely to be introduced in 2021.
- Accelerating the training of large language models. While transformers demonstrate remarkable performance on downstream NLP tasks, pre-training or fine-tuning the latest transformer-based language models requires too much time and resources. Several interesting ideas on accelerating the training of transformers were introduced last year, and this topic will likely remain hot in 2021.
- Detecting and removing toxicity and biases. GPT-3 has demonstrated impressive, often human-like, results, especially in language generation tasks. However, its output quite often contains toxic and biased remarks. Detecting and removing toxicity and bias from the output of language models is likely to be one of the key challenges for the NLP research community in the coming years.
- Applying language models to a multilingual setting. The research demonstrates that pre-trained multilingual models can generalize across languages without any explicit cross-lingual supervision. NLP researchers don’t yet have a clear understanding of how this works, making language models in the multilingual setting an interesting topic for future research.
- Exploring successful data augmentation strategies. Compared to images, generating diverse and yet semantically invariant perturbations of text is much more challenging. Still, there are some data augmentation techniques that work in NLP but are not widely used. As demonstrated recently, these techniques fail to consistently improve the performance of pre-trained transformers, implying that data augmentation does not convey any additional benefits beyond those of pre-training. However, it might be useful in settings where pre-training shows limitations (e.g., negation, malformed input).
Key research papers:
- Language Models are Few-Shot Learners
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
- Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
- RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
- Investigating Gender Bias in Language Models Using Causal Mediation Analysis
- On the Cross-lingual Transferability of Monolingual Representations
- How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers?
2020 was the breakthrough year for open-domain chatbots. Google’s chatbot Meena and Facebook’s chatbot Blender, both introduced that year, achieve close to human-level performance. The developers of these state-of-the-art conversational agents suggested novel approaches to improving conversation quality in terms of sensibility and sensitivity of responses, the empathy of the agent, and the consistency of its personality.
Also, knowing that Meena is based on a model with 2.6 billion parameters and Blender is trained using a Transformer-based model with up to 9.4 billion parameters, we can conclude that the model’s size is one of the key factors for the success of these models. However, most companies cannot afford to train and deploy chatbots of that size, and thus look for ‘smarter’ approaches to improving the performance of dialog agents.
- Building transformer-based conversational agents. The recent research demonstrates that the transformer architecture can be effectively applied to open-domain dialog systems (e.g. TransferTransfo by HuggingFace, GPT-3 by OpenAI). Therefore, it’s likely that transformer-based models will continue to boost the performance of conversational agents.
- Addressing data scarcity for task-oriented dialog systems. As it’s usually very expensive to obtain data for goal-oriented dialog agents, developing new approaches to address data scarcity is one of the appealing directions for future research. The other related research directions include increasing sample efficiency in policy learning and introducing approaches for zero-shot domain adaptation.
- Developing evaluation metrics for open-domain chatbots. Evaluating open-domain dialog agents remains a very challenging problem for the NLP research community. In contrast to task-oriented dialogs, chit-chat conversations don’t have an explicit goal and there are many possible correct responses in each dialog turn.
Key research papers:
- Towards a Human-like Open-Domain Chatbot
- Recipes for Building an Open-Domain Chatbot
- A Simple Language Model for Task-Oriented Dialogue
Will transformers revolutionize computer vision like they did with natural language processing? That was one of the major research questions investigated by computer vision researchers in 2020. The first results indicate that transformers achieve very promising results on image recognition tasks, making this a promising direction for further research.
- Applying transformers to computer vision tasks. In 2020, we saw the impressive performance of transformer-based architectures in image generation (Image GPT by OpenAI), object detection (RelationNet++ by Microsoft), and image recognition at scale (Vision Transformer by Google). We are likely to see more research on applying transformers to computer vision tasks.
- Further improving object detection. While object detection is one of the most well-researched topics, with many real-world applications, there is still room for improvement. Some of the appealing research directions include increasing the efficiency and scalability of the object detection models, combining the benefits of different representations, and applying transformers to object detection.
- Introducing more accurate and efficient approaches to semantic and instance segmentation. Similar to object detection, segmentation is also a very popular research topic with numerous applications. However, to be applied in the real world, segmentation models need to be very accurate, efficient, fast, scalable, and robust. Thus, we expect more novel approaches, further improving the performance and efficiency of semantic and instance segmentation.
Key research papers:
- Generative Pretraining from Pixels
- An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
- EfficientDet: Scalable and Efficient Object Detection
- RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder
- RandAugment: Practical Automated Data Augmentation with a Reduced Search Space
Nowadays, reinforcement learning (RL) is successfully applied mainly in areas where huge amounts of simulated data can be generated, like robotics and games. However, from the high number of RL research papers accepted to NeurIPS 2020, we can see that the research community is working hard on improving the sample efficiency and generalization abilities of RL agents.
- Increasing the sample efficiency of RL algorithms. With reinforcement learning requiring huge amounts of data for training, it continues to be less valuable for business applications than supervised and unsupervised learning. Thus, data efficiency is one of the key topics in current reinforcement learning research, and at NeurIPS 2020 we saw papers introducing new and interesting approaches to sample-efficient exploration and RL with augmented data.
- Improving the generalization abilities of RL agents. Deep reinforcement learning agents often struggle to generalize to new environments, even when they are semantically similar. While some promising techniques for addressing this issue have already been introduced (e.g., network randomization), lack of generalization ability still remains one of the weaknesses of the current RL methods.
- Offline reinforcement learning. Allowing the agent to learn only from its own interactions with the environment hinders its application in the real world, since such data collection is usually prohibitively expensive. The alternative approach, which is referred to as offline or batch reinforcement learning, allows learning from big datasets collected during past interactions. This is a promising path to the effective real-world applications of reinforcement learning.
Key research papers:
- Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning
- Reinforcement Learning with Augmented Data
- An Optimistic Perspective on Offline Deep Reinforcement Learning
- Conservative Q-Learning for Offline Reinforcement Learning
Top AI & Machine Learning Research Papers 2020
To get a more in-depth understanding of the latest trends in AI, check out our curated lists of top research papers:
- 2020’s Top AI & Machine Learning Research Papers
- GPT-3 & Beyond: 10 NLP Research Papers You Should Read
- Novel Computer Vision Research Papers From 2020
Enjoy this article? Sign up for more AI research updates.
We’ll let you know when we release more summary articles like this one.