TOPBOTS

No time to read AI research? We summarized top 2018 papers for you

Most important MUST READ scientific research papers in AI for 2018, summarized with bullet points

Trying to keep up with AI research papers can feel like an exercise in futility given how quickly the industry moves. If you’re buried in papers to read that you haven’t quite gotten around to, you’re in luck.

To help you catch up, we’ve summarized 10 important AI research papers from 2018 to give you a broad overview of machine learning advancements this year. There are many more breakthrough papers worth reading as well, but we think this is a good list for you to start with.

We’ve done our best to summarize these papers correctly, but if we’ve made any mistakes, please contact us to request a fix.

If these summaries of scientific AI research papers are useful for you, you can subscribe to our AI Research mailing list at the bottom of this article to be alerted when we release new summaries.

If you’d like to skip around, here are the papers we featured:

  1. Universal Language Model Fine-tuning for Text Classification
  2. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
  3. Deep Contextualized Word Representations
  4. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
  5. Delayed Impact of Fair Machine Learning
  6. World Models
  7. Taskonomy: Disentangling Task Transfer Learning
  8. Know What You Don’t Know: Unanswerable Questions for SQuAD
  9. Large Scale GAN Training for High Fidelity Natural Image Synthesis
  10. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

 

10 Important AI Research Papers Of 2018

 

1. Universal Language Model Fine-tuning for Text Classification, by Jeremy Howard and Sebastian Ruder (2018)

 

Original Abstract

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model. Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the error by 18-24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100x more data. We open source our pretrained models and code.

Our Summary

Howard and Ruder suggest using pre-trained models for solving a wide range of NLP problems. With this approach, you don’t need to train your model from scratch, but only fine-tune the original model. Their method, called Universal Language Model Fine-Tuning (ULMFiT) outperforms state-of-the-art results, reducing the error by 18-24%. Even more, with only 100 labeled examples, ULMFiT matches the performance of models trained from scratch on 10K labeled examples.

 

What’s the core idea of this paper?

 

What’s the key achievement?

 

What does the AI community think?

 

What are future research areas?

 

What are possible business applications?

 

Where can you get implementation code?

 

2. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples, by Anish Athalye, Nicholas Carlini, David Wagner (2018)

 

Original Abstract

We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining non-certified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.

Our Summary

The researchers found that defenses against adversarial examples commonly use obfuscated gradients, which create a false sense of security because they can be easily circumvented. The study describes three ways in which defenses obfuscate gradients and shows which techniques can circumvent the defenses. The findings can help organizations that use defenses relying on obfuscated gradients to fortify their current methods.

 

What’s the core idea of this paper?

 

What’s the key achievement?

 

What does the AI community think?

 

What are future research areas?

 

What are possible business applications?

 

3. Deep contextualized word representations, by Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer (2018)

 

Original Abstract

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals.

Our Summary

The team from Allen Institute for Artificial Intelligence introduces a new type of deep contextualized word representation – Embeddings from Language Models (ELMo). In ELMO-enhanced models, each word is vectorized on the basis of the entire context in which it is used. Adding ELMo to the existing NLP systems results in 1) relative error reduction ranging from 6-20%, 2) a significantly lower number of epochs required to train the models and 3) a significantly reduced amount of training data needed to reach baseline performance.

 

What’s the core idea of this paper?

 

What’s the key achievement?

 

What does the AI community think?

 

What are future research areas?

 

What are possible business applications?

 

Where can you get implementation code?

 

4. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling, by Shaojie Bai, J. Zico Kolter, Vladlen Koltun (2018)

 

Original Abstract

For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks. To assist related work, we have made code available at http://github.com/locuslab/TCN.

Our Summary

The authors of this paper question the common assumption that recurrent architectures should be a default starting point for sequence modeling tasks. Their results suggest that generic temporal convolutional networks (TCNs) convincingly outperform canonical recurrent architectures such as long short-term memory networks (LSTMs) and gated recurrent unit networks (GRUs) across a broad range of sequence modeling tasks.

 

What’s the core idea of this paper?

 

What’s the key achievement?

 

What does the AI community think?

 

What are future research areas?

 

What are possible business applications?

 

Where can you get implementation code?

 

5. Delayed Impact of Fair Machine Learning, by Lydia T. Liu, Sarah Dean, Esther Rolf, Max Simchowitz, Moritz Hardt (2018)

 

Original Abstract

Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect.

We study how static fairness criteria interact with temporal indicators of well-being, such as long-term improvement, stagnation, and decline in a variable of interest. We demonstrate that even in a one-step feedback model, common fairness criteria in general do not promote improvement over time, and may in fact cause harm in cases where an unconstrained objective would not. We completely characterize the delayed impact of three standard criteria, contrasting the regimes in which these exhibit qualitatively different behavior. In addition, we find that a natural form of measurement error broadens the regime in which fairness criteria perform favorably.

Our results highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.

Our Summary

The goal is to ensure fair treatment across different demographic groups when using a score-based machine learning algorithm to decide who gets an opportunity (e.g., loan, scholarship, job) and who does not. Researchers from Berkeley’s Artificial Intelligence Research lab show that using common fairness criteria may in fact harm underrepresented or disadvantaged groups due to certain delayed outcomes. Thus, they encourage looking at the long-term outcomes when designing a “fair” machine-learning system.

 

What’s the core idea of this paper?

 

What’s the key achievement?

 

What does the AI community think?

 

What are future research areas?

 

What are possible business applications?

 

Where can you get implementation code?

 

6. World Models, by David Ha and Jurgen Schmidhuber (2018)

 

Original Abstract

We explore building generative neural network models of popular reinforcement learning environments. Our world model can be trained quickly in an unsupervised manner to learn a compressed spatial and temporal representation of the environment. By using features extracted from the world model as inputs to an agent, we can train a very compact and simple policy that can solve the required task. We can even train our agent entirely inside of its own hallucinated dream generated by its world model, and transfer this policy back into the actual environment.

An interactive version of this paper is available at https://worldmodels.github.io.

Our Summary

Ha and Schmidhuber develop a world model that can be quickly trained in an unsupervised manner to learn spatial and temporal representations of the environment. The agent succeeded in navigating the race track in the Car Racing task and avoiding the fireballs shot by monsters in the VizDom experiment. These tasks were too challenging for previous methods.

 

What’s the core idea of this paper?

 

What’s the key achievement?

 

What does the AI community think?

 

What are future research areas?

 

What are possible business applications?

 

7. Taskonomy: Disentangling Task Transfer Learning, by Amir R. Zamir, Alexander Sax, William Shen, Leonidas J. Guibas, Jitendra Malik, and Silvio Savarese (2018)

 

Original Abstract

Do visual tasks have a relationship, or are they unrelated? For instance, could having surface normals simplify estimating the depth of an image? Intuition answers these questions positively, implying existence of a structure among visual tasks. Knowing this structure has notable values; it is the concept underlying transfer learning and provides a principled way for identifying redundancies across tasks, e.g., to seamlessly reuse supervision among related tasks or solve many tasks in one system without piling up the complexity.

We propose a fully computational approach for modeling the structure of space of visual tasks. This is done via finding (first and higher-order) transfer learning dependencies across a dictionary of twenty six 2D, 2.5D, 3D, and semantic tasks in a latent space. The product is a computational taxonomic map for task transfer learning. We study the consequences of this structure, e.g. nontrivial emerged relationships, and exploit them to reduce the demand for labeled data. For example, we show that the total number of labeled datapoints needed for solving a set of 10 tasks can be reduced by roughly 2/3 (compared to training independently) while keeping the performance nearly the same. We provide a set of tools for computing and probing this taxonomical structure including a solver that users can employ to devise efficient supervision policies for their use cases.

Our Summary

Assertions of existence of a structure among visual tasks have been made by many researchers since the early years of modern computer science. And now Amir Zamir and his team make an attempt to actually find this structure. They model it using a fully computational approach and discover lots of useful relationships between different visual tasks, including nontrivial ones. They also show that by taking advantage of these interdependencies, it is possible to achieve the same model performance with the labeled data requirements reduced by roughly ⅔.

 

 

What’s the core idea of this paper?

 

What’s the key achievement?

 

What does the AI community think?

 

What are future research areas?

 

What are possible business applications?

 

8. Know What You Don’t Know: Unanswerable Questions for SQuAD, by Pranav Rajpurkar, Robin Jia, and Percy Liang (2018)

 

Original Abstract

Extractive reading comprehension systems can often locate the correct answer to a question in a context document, but they also tend to make unreliable guesses on questions for which the correct answer is not stated in the context. Existing datasets either focus exclusively on answerable questions, or use automatically generated unanswerable questions that are easy to identify. To address these weaknesses, we present SQuAD 2.0, the latest version of the Stanford Question Answering Dataset (SQuAD). SQuAD 2.0 combines existing SQuAD data with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD 2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering. SQuAD 2.0 is a challenging natural language understanding task for existing models: a strong neural system that gets 86% F1 on SQuAD 1.1 achieves only 66% F1 on SQuAD 2.0.

Our Summary

A Stanford University research group extends the famous Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions. The answers to these questions cannot be found in the supporting paragraphs, yet the questions look very similar to the answerable questions. Even more, the supporting paragraphs contain plausible (but incorrect) answers to these questions. This makes the new SQuAD 2.0 extremely challenging for existing state-of-the-art models: a strong neural system that achieves an accuracy of 86% on the previous version of SQuAD gets only 66% after the unanswerable questions are introduced.

 

What’s the core idea of this paper?

 

What’s the key achievement?

 

What does the AI community think?

 

What are future research areas?

 

What are possible business applications?

 

Where can you get implementation code?

 

9. Large Scale GAN Training for High Fidelity Natural Image Synthesis, by Andrew Brock, Jeff Donahue, and Karen Simonyan (2018)

 

Original Abstract

Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple “truncation trick”, allowing fine control over the trade-off between sample fidelity and variety by truncating the latent space. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128×128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.3 and Frechet Inception Distance (FID) of 9.6, improving over the previous best IS of 52.52 and FID of 18.65.

Our Summary

A DeepMind team finds that current techniques are sufficient for synthesizing high-resolution, diverse images from available datasets such as ImageNet and  JFT-300M. In particular, they show that Generative Adversarial Networks (GANs) can generate images that look very realistic if they are trained at a very large scale, i.e. using two to four times as many parameters and eight times the batch size compared to prior experiments. These large-scale GANs, or BigGANs, are the new state-of-the-art in class-conditional image synthesis.

 

 

What’s the core idea of this paper?

 

What’s the key achievement?

 

What does the AI community think?

 

What are future research areas?

 

What are possible business applications?

 

Where can you get implementation code? 

 

10. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, by Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova (2018)

 

Original Abstract

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT representations can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.

BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE benchmark to 80.4% (7.6% absolute improvement), MultiNLI accuracy to 86.7 (5.6% absolute improvement) and the SQuAD v1.1 question answering Test F1 to 93.2 (1.5% absolute improvement), outperforming human performance by 2.0%.

Our Summary

Google AI team presents a new cutting-edge model for Natural Language Processing (NLP) – BERT, or Bidirectional Encoder Representations from Transformers. Its design allows the model to consider the context from both left and right sides of each word. While being conceptually simple, BERT obtains new state-of-the-art results on eleven NLP tasks, including question answering, named entity recognition and other tasks related to general language understanding.

 

What’s the core idea of this paper?

 

What’s the key achievement?

 

What does the AI community think?

 

What are future research areas?

 

What are possible business applications?

 

Where can you get implementation code?

 

Want Deeper Dives Into Specific AI Research Topics?

Due to popular demand, we’ve released several of these easy-to-read summaries and syntheses of major research papers for different subtopics within AI and machine learning.

  1. Top 10 machine learning & AI research papers of 2018
  2. Top 10 AI fairness, accountability, transparency, and ethics (FATE) papers of 2018
  3. Top 14 natural language processing (NLP) research papers of 2018
  4. Top 10 computer vision and image generation research papers of 2018
  5. Top 10 conversational AI and dialog systems research papers of 2018
  6. Top 10 deep reinforcement learning research papers of 2018

 

Enjoy this article? Sign up for more AI research updates.

We’ll let you know when we release more summary articles like this one.