Ever since neural networks began their renewed renaissance in 2012, computer vision has been a ripe field of study and innovation for AI researchers and a fruitful area of applied AI for enterprises. Deep learning enables incredible feats of machine vision, such as classifying image subjects at human parity and dynamically generating completely new imagery.
Brands have quickly recognized this technology as a commercial asset. Clarifai, a leading computer vision company with a popular visual intelligence API, was recently used to analyze hundreds of photos of fans during baseball games to identify when they have caught a ball. The insight was then overlaid over stadium seating maps to show the highest probability seats for fans hoping to bring home a souvenir catch. From a business perspective, such data can drive more effective tiered pricing.
The same technology was equally adept playing Hearthstone, a popular card game produced by Blizzard Entertainment. By watching many rounds of play, the AI learned the cards and strategies of the game and could offer play-by-play commentary. During a hackathon, developers used Clarifai’s visual APIs for yet another purpose: Tinder swiping. The algorithms learned the preferences of one of the developers and took over the laborious act of swiping for him.
“The common theme here is customization,” explains Clarifai founder and CEO Matthew Zeiler. “You can teach the platform whatever you want to recognize, and that applies all the way up to the biggest companies.” Indeed, many large corporations in search of faster methods of photo searching and sorting are taking notice. Digital media juggernaut Buzzfeed adopted visual intelligence in order to assist editors in finding relevant photos for articles. Another one of Clarifai’s customers, a proponent of “natural beauty”, used visual intelligence to identify examples from social media of fashionable women not wearing makeup who embodied their brand message.
Zeiler and his team at Clarifai are not the only ones who are banking on the potential of visual intelligence. Major players like Google, IBM, Salesforce, Microsoft as well as smaller entities such as GumGum and Ditto all offer computer vision solutions. The Google Cloud Vision API, for example, enables user to easily organize stored photos. Amazon’s version, Rekognition, is marketed towards developers who want to add facial recognition security to apps or record the demographics of their viewers. GumGum’s offer is directed at companies interested in using social media posts to monitor brand representation online or calculate earned media from videos of sporting events. A big promise of visual recognition, says Zeiler, is that of “next-level analytics that show how your products are being used out in the world.”
Challenges still remain, especially with brand data. For example, many retailers have catalog photos of products against white backgrounds. While an algorithm could be trained to recognize these products in sanitized conditions, identifying the same shoes in real-world scenarios would require more training data. Zeiler says the key going forward is that “it’s not going to be the AI experts that collect this data – it’s going to be everybody on the planet.” To facilitate this, he and his team are making their technology as widely-adoptable and user-friendly as possible to “get everybody interacting.”
Another challenge is the lack of high-resolution images, which are crucial for certain applications. Many brands are clamoring for AI solutions for automatic counterfeit detection. The difference between a counterfeit and a real product tends to be very subtle, such as slight variation on a purse buckle or shoelaces, and will likely be undetectable in a low-resolution photo. Many times the difference is hard even for human eyes to detect.
Zeiler also believes multi-modal neural networks – i.e. those which can process many different media types in parallel, such as audio, video, and text – is the next innovation to focus on. “Fusing more than one data type is difficult,” he points out, “and multi-function neural nets don’t perform as well as single ones.” However, there’s extraordinary value to analyzing multiple aspects of a product at once, such as the image, description, prices, user reviews, and user generated images and videos. Zeiler hopes to handle up to 10 modalities at once.
While tech giants have been snapping up AI startup companies like Orbeus, Alchemy, and Metamind to catch up in the AI wars, Zeiler believes a dedicated commitment to customer needs will help Clarifai maintain a competitive advantage. “Google makes internet balloons, self-driving cars, word processors, email clients, and of course search and advertising products. With so many divisions, they compete with their customers.” If an advertiser sees their proprietary data as their key defensible advantage, they might be hesitant to send this data to Google’s clouds. Similarly, Amazon can learn from another retailer’s data and compete them with directly.
“Twilio is the independent communications company. Stripe is the independent payments company. Spotify is the independent music company. We believe Clarifai will be the independent vision company,” Zeiler concludes.