Natural Language Processing

Natural Language Processing in Production: 27 Fast Text Pre-Processing Methods

October 20, 2020 by Bruce Cottman

Estimates state that 70%–85% of the world’s data is text (unstructured data) [1]. New deep learning language models (transformers) have caused explosive growth in industry applications [5,6,11]. This blog is not an article introducing you to Natural Language Processing. Instead, it assumes you are familiar with noise reduction and normalization of text. It covers … [Read more...] about Natural Language Processing in Production: 27 Fast Text Pre-Processing Methods

The Relationship Between Perplexity And Entropy In NLP

September 24, 2020 by Ravi Charan

Perplexity is a common metric to use when evaluating language models. For example, scikit-learn’s implementation of Latent Dirichlet Allocation (a topic-modeling algorithm) includes perplexity as a built-in metric. In this post, I will define perplexity and then discuss entropy, the relation between the two, and how it arises naturally in natural language … [Read more...] about The Relationship Between Perplexity And Entropy In NLP

3 NLP Interpretability Tools For Debugging Language Models

August 26, 2020 by Mariya Yao

With constant advances and unprecedented performance on many NLP tasks, language models have gotten really complex and hard to debug. Researchers and engineers often can’t easily answer questions like: Why did your model make that prediction? Does your model have any algorithmic biases? What kind of data samples does your model perform poorly … [Read more...] about 3 NLP Interpretability Tools For Debugging Language Models

Highlights of ACL 2020

July 22, 2020 by Vered Shwartz

ACL Trends Visualization by Wanxiang Che With ACL becoming virtual this year, I unfortunately spent less time networking and catching up with colleagues, but as a silver lining I watched many more talks than I usually do. I decided to share the notes I took and discuss some overall trends. The list is not exhaustive, and is based on my research interests. I recommend also … [Read more...] about Highlights of ACL 2020

Reformer, Longformer, and ELECTRA: Key Updates To Transformer Architecture In 2020

June 2, 2020 by Mariya Yao

The leading pre-trained language models demonstrate remarkable performance on different NLP tasks, making them a much-welcomed tool for a number of applications, including sentiment analysis, chatbots, text summarization, and so on. However, good performance usually comes at the cost of enormous computational resources that are not accessible by most researchers and business … [Read more...] about Reformer, Longformer, and ELECTRA: Key Updates To Transformer Architecture In 2020

The Best NLP Papers From ICLR 2020

April 28, 2020 by Christina Kim

I went through 687 papers that were accepted to ICLR 2020 virtual conference (out of 2594 submitted - up 63% since 2019!) and identified 9 papers with the potential to advance the use of deep learning NLP models in everyday use cases. Here are the papers found and why they matter. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators Kevin … [Read more...] about The Best NLP Papers From ICLR 2020

The Dark Secrets Of BERT

April 21, 2020 by Anna Rogers

This blog post summarizes EMNLP 2019 paper Revealing the Dark Secrets of BERT by researchers from the Text Machine Lab at UMass Lowell: Olga Kovaleva (LinkedIn), Alexey Romanov (LinkedIn), Anna Rogers (Twitter: @annargrs), and Anna Rumshisky (Twitter: @arumshisky). Here are the topics covered: A brief intro to … [Read more...] about The Dark Secrets Of BERT

Data Labeling For Natural Language Processing

April 9, 2020 by Ivan Lee

Why Does Training Data Matter? Machine Learning has made significant strides in the last decade. This can be attributed to parallel improvements in processing power and new breakthroughs in Deep Learning research. Another key reason is the abundance of data that has been accumulated. Analysts estimate humankind sits atop 44 zettabytes of information today. The … [Read more...] about Data Labeling For Natural Language Processing

Why Choosing a Heavier NLP Model Might Be a Good Choice?

March 18, 2020 by Pratik Bhavsar

From Google’s 43 rules of ML. Rule #4: Keep the first model simple and get the infrastructure right. With some opinions floating in the market, I feel it’s a good time to spark a discussion about this topic. Otherwise, the opinions of the popular will just drown other ideas. Note: I work in NLP and these opinions are more focussed towards NLP applications. Cannot … [Read more...] about Why Choosing a Heavier NLP Model Might Be a Good Choice?

« Previous Page