Perplexity is a common metric to use when evaluating language models. For example, scikit-learn’s implementation of Latent Dirichlet Allocation (a topic-modeling algorithm) includes perplexity as a built-in metric. In this post, I will define perplexity and then discuss entropy, the relation between the two, and how it arises naturally in natural language … [Read more...] about The Relationship Between Perplexity And Entropy In NLP
NLP Tutorials
Data Labeling For Natural Language Processing
Why Does Training Data Matter? Machine Learning has made significant strides in the last decade. This can be attributed to parallel improvements in processing power and new breakthroughs in Deep Learning research. Another key reason is the abundance of data that has been accumulated. Analysts estimate humankind sits atop 44 zettabytes of information today. The … [Read more...] about Data Labeling For Natural Language Processing
Document Embedding Techniques
Word embedding — the mapping of words into numerical vector spaces — has proved to be an incredibly important method for natural language processing (NLP) tasks in recent years, enabling various machine learning models that rely on vector representation as input to enjoy richer representations of text input. These representations preserve more semantic and syntactic information … [Read more...] about Document Embedding Techniques
NLP Interview Questions
Are you hiring technical AI talent for your company? Post your openings on the TOPBOTS jobs board (go to jobs board) to reach thousands of engineers, data scientists, and researchers currently looking for work. It's one thing to practice NLP and another to crack interviews. Giving an interview for NLP role is very different from a generic data science profile. In just a … [Read more...] about NLP Interview Questions
Semantic Search: Theory And Implementation
It took me a long time to realise that search is the biggest problem in NLP. Just look at Google, Amazon and Bing. These are multi-billion dollar businesses possible only due to their powerful search engines. My initial thoughts on search were centered around unsupervised ML, but I participated in Microsoft Hackathon 2018 for Bing and came to know the various ways a … [Read more...] about Semantic Search: Theory And Implementation
Better Sentiment Analysis with BERT
Imagine you have a bot answering your clients, and you want to make it sound a little bit more natural, more human. To achieve that, you have to make the answers more personalized. One way to learn more about the customers you’re talking to is to analyze the polarity of their answers. By polarity here I mean detecting if the sentence (or group of sentences) is written with … [Read more...] about Better Sentiment Analysis with BERT
An Ultimate Guide To Transfer Learning In NLP
Natural language processing is a powerful tool, but in real-world we often come across tasks which suffer from data deficit and poor model generalisation. Transfer learning solved this problem by allowing us to take a pre-trained model of a task and use it for others. Today, transfer learning is at the heart of language models like Embeddings from Language … [Read more...] about An Ultimate Guide To Transfer Learning In NLP
Productionizing NLP Models
Problem statement 💰 Lately, I have been consolidating my experiences of working in different ML projects. I will tell this story from the lens of my recent NLP project to classify phrases into categories — A multiclass single label problem. Team structure 👪 Making AI teams is quite tricky. If you don’t have the skillsets inside your company, you have to plan … [Read more...] about Productionizing NLP Models
Getting Started with Text Preprocessing for Machine Learning & NLP
Based on some recent conversations, I realized that text preprocessing is a severely overlooked topic. A few people I spoke to mentioned inconsistent results from their NLP applications only to realize that they were not preprocessing their text or were using the wrong kind of text preprocessing for their project. With that in mind, I thought of shedding some light around … [Read more...] about Getting Started with Text Preprocessing for Machine Learning & NLP
Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision
Introduction There is a catch to training state-of-the-art NLP models: their reliance on massive hand-labeled training sets. That’s why data labeling is usually the bottleneck in developing NLP applications and keeping them up-to-date. For example, imagine how much it would cost to pay medical specialists to label thousands of electronic health records. In general, having … [Read more...] about Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision