Planning to Netflix & Chill this weekend? The movie you choose to watch may be heavily influenced by Netflix’s sophisticated algorithms. Similarly, decisions like where you choose to dine and what you choose to wear are increasingly facilitated by predictive technologies powered by deep learning.
Applications of Deep Learning In Digital Consumer Products
1. Netflix Dynamically Personalizes Layouts & Movie Thumbnails
Historically, watching television is a uni-directional communication channel. You receive the content, but give no feedback to the content producers. With digital streaming, your watch history, mouse clicks, and search terms all enable Netflix to learn your preferences and deliver more relevant content.
In 2009, the Netflix awarded their $1 million Netflix Prize in an open competition for external programming teams to improve upon the company’s internal ratings prediction system. The winning team beat the original algorithm by over 10%.
Since then, the introduction of more advanced machine learning algorithms allows Netflix to achieve new levels of prediction and personalization in rankings, layout, catalog, new member onboarding, and more. Tony Jebara, Director at Netflix and Professor of Computer Science at Columbia, explained at REWORK’s Deep Learning Summit in SF how Netflix not only recommends better movies, but predicts better thumbnail images for each individual user.
Traditionally, optimizing images on a website involves A/B testing two alternatives over a period of time. The problem with this method is that you have to painfully wait to collect data before arriving at the optimal decision. During this period, portions of your audience are experiencing the suboptimal variants of your test. This loss of experience is called “regret.”
To minimize regret, Netflix employs dynamic adaptive tests such as the multi-arm bandit model. Such models are able to shift traffic to the best performing creatives dynamically during a test and mathematically reduce regret.
What counts as “better” differs between people, so Netflix also takes into account your consumption profile to perform personalized explore / exploit optimization. If you often watch comedy, they’ll use Robin Williams for the cover of Good Will Hunting. If you love romantic chick flicks, they’ll feature Matt Damon kissing Minnie Driver instead.
2. Yelp Surfaces The Most Beautiful Photos For Any Venue
A picture is worth a thousand words. When you’re trying to pick a romantic spot that will impress your OKCupid date, you’ll want to know if the food is plated nicely and the ambience sets the right mood. To help you make the right restaurant choices, Alex Miller and his team at Yelp employed deep learning algorithms to highlight the best user photographs.
While metrics such as number of likes and clickthroughs can be useful to evaluate photos, they can also be thrown off by clickbait or happenstance. A better solution would be to judge a photo based on inherent content and characteristics – such as depth of field, contrast, and alignment – but with 25 million MAU (monthly active users) uploading thousands of photos to Yelp every day, no staff of human evaluators would suffice.
At the Startup ML conference in San Francisco, Miller described how his engineering team used CNNs (convolutional neural networks) to build a photo scoring model. A good proxy for beautiful photos is whether or not they were taken by a DSLR, which can be discovered easily by examining a photo’s EXIF metadata.
Miller’s team leveraged this fact to create scalable training data sets using DSLR images as positive examples and non-DSLR images as negative examples. Deep learning algorithms learned the qualities of good photos from the training data set and could apply these learnings to all photos, whether taken with DSLR or not.
In addition to the photo quality score, the team also added strategic filters and diversification logic, so that a restaurant famous for a particular dish or feature wouldn’t have their top 10 photo results all be of the same subject.
The results speak for themselves:
You can read more about Miller’s technical process on the Yelp engineering blog. Also relevant is a blog post by Miller’s colleague Wei-Hong Chuang on how Yelp classifies each photo into categories like food, drink, outdoor shot, indoor shot, etc.
3. Yahoo Ensures You Pick The Best Emoji For Any Situation
For lazy texters, emojis are the easiest way to say so much with so little. But with over 1,800 possible emoji choices, how can you be sure you’re picking the perfect one for exactly what you want to say?
This is the problem Stacey Svetlichnaya, a machine learning engineer at Yahoo, aims to solve. When a user composes or responds to a message, what emojis should show up in the autocomplete suggestions? Ideally you want to predict the top five emojis that a user is likely to select.
Emoji usage is highly dynamic. Some are used to replace words with images, others are used to express emotion, and yet more have bizarre cultural uses. For example, the goat emoji is often used to mean “greatest of all time.”
An additional challenge is that emojis differ in visual style across platforms, leading to misinterpretations.
Svetlichnaya and the Yahoo Vision & Machine Learning Team tested three different approaches: 1) FastText, a fast linear classifier, 2) LSTM, a type of recurrent neural network architecture, and 3) WordCNN, a convolutional net approach that balances performance and complexity. Of the three, FastText was not surprisingly the winner on speed, but humans preferred the LSTM results.
Yahoo isn’t the only company applying machine learning to emojis. Back in 2015, Instagram Engineering published a fascinating series by engineer Thomas Dimson called “Emojineering: Machine Learning For Emoji Trends.”
4. Stitch Fix Finds You The Perfect Fashion Faster
Being fashionable is hard, but Stitch Fix makes styling effortless. The personal styling startup lets you personalize your style profile and get hand-picked clothing and accessories sent to your door every month.
Defining style can be a nebulous affair. After all, how can you tell if a particular shirt counts as “urban boho chic” or that a dress is “sexy but not too slutty”? Christopher Moody has several ideas.
Moody is a Data Scientist at Stitch Fix with a background in statistics, astrophysics, and high-performance computing. Turns out these nerdy skills are now in-demand in the fashion world.
Many deep learning models are opaque black boxes where you can’t easily understand why an algorithm came to a particular conclusion. Moody’s research focuses on improving model interpretability so that human experts can give feedback on the relative performance of the algorithms.
One method is to use t-SNE (t-distributed stochastic neighbor embedding), a dimensionality reduction method that helps visualize similar objects. Many deep learning models use high-dimensional data which is impossible for a human to conceptualize. Dimensionality reduction methods flatten complicated data into two or three dimensional scatter plots that are much easier to understand.
Moody is also big fan of the k-SVD method. k-SVD is a generalization of the k-means clustering method. In high-level non-technical terms, cluster analysis involves grouping together objects that have similar properties. Once distinct clusters have been identified, a human expert can examine the groups to see if they exhibit any unifying features and add appropriate labels such as “tank tops” or “statement pieces.”
5. Google’s Machine Translation Approaches Human Levels
In September 2016, Google announced they were replacing older methods for machine translation with neural network architectures. Previously Google Translate leveraged a statistical method called phrase-based machine translation (PBMT) which breaks sentences into words and phrases to be translated separately.
PBMT methods often produce grammatically clunky sentences, especially if the input and output languages differ dramatically in their structure, such as Chinese to English. Compensating for these quirks requires additional engineering complexity and effort.
The new GNMT (Google Neural Machine Translation) system uses RNNs (recurrent neural networks) to map entire input sentences in one language to an output sentence in another language. This reduces code complexity while maintaining speed and improving performance. In some cases, such as French to English, the GNMT architecture is closing in on human translators.
If you want to understand the technical evolution of the GNMT and how Google’s team iteratively improved upon the design and performance, read Stephen Merity’s visual guide to the process.
AI Permeates All Aspects Of Our Digital Lives
Many consumers don’t realize how significant artificial intelligence is in improving the user experience of digital products. Traditional statistical models of predicting optimal experiences are dramatically enhanced by recent breakthroughs in big data, computational power, and deep neural network architectures. More and more companies will have to adopt these methodologies to stay competitive.
Are there any products or technology you use every day that you would like us to explain? Let us know in the comments below.