Chihuahua OR Muffin? Searching For The Best Computer Vision API

September 22, 2017 by Mariya Yao

Chihuahua vs. Muffin Enterprise Computer Vision APIs Clarifai Google Cloud Rekognition Cloudsight

You’ve probably seen this internet meme demonstrating the alarming resemblance of chihuahuas and muffins. Everyone in the AI industry (including myself) loves putting the image in their presentations.

But, one question I haven’t seen anyone answer rigorously is: just how good IS modern AI at disambiguating between a chihuahua and a muffin? For your entertainment and education, I’ll be investigating this question today.

Chihuahua vs Muffin Enterprise Computer Vision API Benchmarking

Binary classification has been possible ever since the perceptron algorithm was invented in 1957. If you think AI is hyped now, the New York Times reported in 1958 that the invention was the beginning of a computer that would “be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” But while perceptron machines like the Mark 1 were designed for image recognition, in reality they could only discern patterns that are linearly separable, preventing them from learning the complex patterns that underlie most visual media.

No wonder the world was disillusioned and an “AI Winter” ensued. Since then, multi-layer perceptions (popular in the 1980s) and convolutional neural networks (pioneered by Yann LeCun in 1998) have greatly outperformed single-layer perceptions in image recognition tasks. With the advent of large labeled data sets like ImageNet and powerful GPU computing, increasingly more advanced neural network architectures like AlexNet, VGG, Inception, and ResNet have achieved state-of-the-art performance in computer vision.

Computer Vision & Image Recognition APIs

If you’re a machine learning engineer, it’s easy to start experimenting with and fine-tuning these models by using pre-trained models and weights in either Keras / Tensorflow or PyTorch. If you’re not comfortable tweaking neural networks on your own, you’re in luck because virtually all the leading technology giants and promising startups claim to “democratize AI” by offering easy-to-use computer vision APIs:

Which one is the “best”? To truly answer that question, you’d have to clearly define your business goals, product use cases, test data sets, and metrics of success before you benchmark the solutions against each other.

In lieu of a serious inquiry, we can at least get a high-level sense of the different behaviors of each platform by testing them with our toy problem of chihuahua vs. muffin.

Conducting The Test

To do this, I split the canonical meme above into 16 separate test images and use open source code written by engineer Gaurav Oberoi to consolidate results from the different APIs. Each image is pushed through the 6 APIs listed above which return high confidence labels as predictions. The only exceptions are Microsoft which returns both labels and a caption and Cloudsight which uses human-AI hybrid technology to return only a single caption. This is why Cloudsight can return eerily accurate captions for complex images, but takes 10-20x longer to process.

Below is an example of the output. You can see the full list of results on all 16 chihuahua vs. muffin images by clicking here.

TOPBOTS Muffin Best Computer Vision API Test

How well did the APIs do? Other than Microsoft which confused this muffin for a stuffed animal, every other API recognized the image was of food, but there wasn’t agreement as to whether the food was bread, cake, cookies, or muffins. Google was the only API to successfully identify “muffin” as the highest probability label.

Let’s look at a chihuahua example:

TOPBOTS Chihuahua Best Computer Vision API Test

Again, the APIs did rather well. All of them realized the image was of a dog, although a few of them missed the exact breed.

There were definite failures, though. Microsoft returned a blatantly wrong caption three separate times describing muffins as either stuffed animals or a teddy bear.

TOPBOTS Muffin Microsoft Computer Vision API

Google was the ultimate muffin identifier, returning “muffin” as its highest confidence label for 6 out of the 7 muffin images in our test set. The other APIs never returned “muffin” as the first label for any muffin picture, but instead related but less relevant labels like “bread”, “cookie”, or “cupcake”.

However, despite the string of successes, Google did fail on this specific muffin image, returning “snout” and “dog breed group” as predictions.

TOPBOTS Google Cloud Vision API Muffin Test

Even the world’s most advanced machine learning platforms are tripped up by our facetious chihuahua vs. muffin challenge! A human toddler beats deep learning when it comes to figuring what’s food and what’s Fido.

Testing With Real-World Images

As a further test, I’d like to know how well the APIs perform on more real-world images of chihuahuas and muffins, not just ones carefully curated to resemble each other. ImageNet happens to have 1750 images of chihuahuas and 1335 images of various types of muffins.

Some of the images turned out to be pretty easy for our APIs to recognize because they exhibit very distinct class features, such as buggy eyes and pointy ears in the case of this chihuahua:

Other images, on the other hand, proved tricky. APIs often miss identifying objects in photos if there are multiple subjects within the same photo or if the subject is costumed or otherwise obstructed:

In the above case, the costume on the dog may have prevented the APIs (and likely many human classifiers) from correctly identifying the breed. IBM Watson manages to tag just the hats but not the dog or the person wearing them.

Handling Noisy Labels

With unstructured real-world data, including images, human-tagged labels are not always “ground truth” and labels can be incorrect or “noisy”. Here’s an example of an image that was included in the “muffin” category on ImageNet:

TOPBOTS Cookie Monster Mislabled As Muffin ImageNet Computer Vision API

We humans would likely identify this “muffin in disguise” more accurately as a “cupcake”. Fortunately many of our APIs did return “cake”, “cupcake”, or “cookie” as predictions that are more relevant than the ImageNet category. Cloudsight’s human labeling produced the most accurate result of “cookie monster cupcake” for what is indeed a strange human invention for machines to interpret.

Utilizing multiple different models and APIs could be one interesting way to assess the “noisiness of labels”. In the case of ImageNet’s “muffin” category, the muffin varieties (i.e. bran, corn, popover, etc) can appear quite visually distinct and many are actually mislabeled cupcakes or other non-muffin types of baked goods.

TOPBOTS ImageNet Muffins Corn Muffin Bran Muffin Popover Muffin

Running large numbers of images through a number of different image recognition APIs and tracking the common overlaps and divergent one-offs can help you systematically flag images which might have noisy or incorrect labels.

Weird side note: in searching for different muffin categories on ImageNet, I happened across an unexpected category called “muffin man”, which ImageNet defines as “Formerly an itinerant peddler of muffins”. If you’re ever looking for photos of dudes presenting muffins, now you know where to go.

TOPBOTS Muffin Man ImageNet Weird Category

Playing Trickster

Just for fun, I tried to fool the APIs with these types of tricky photos:

Photos of both a chihuahua AND a muffin
Photos of dog-shaped cupcakes

Here’s how the APIs did on one of the photos featuring both a chihuahua AND a muffin:

TOPBOTS Chihuahua Muffin Computer Vision API

IBM and Cloudsight were the only two APIs that acknowledged any food was present in the image, although IBM got a bit creative with its guesses of “takoyaki”, “gyoza”, and “cannoli”.

There was also confusion caused by the dog-shaped cupcakes:

Microsoft, in traditional fashion, captioned the image as “a bunch of stuffed animals.” Google predicted the photo was more likely to be of a “dog like mammal” (0.89) than “cake” (0.79). Clarifai seemed to think the image contained both “food” (0.99) and a “mammal” (0.96) with very high confidence.

In these complex or unusual cases, Cloudsight’s human captioning demonstrated superior results, with this last image tagged very specifically as “12-piece West Highland White Terrier cupcakes” and the previous image even being recognized as being a popular meme!

So, Which Computer Vision API Is The Best?

While we can’t determine conclusively that one API is “better” than another just by these joke examples, you can definitely observe qualitative differences in how they perform.

Amazon Rekognition

Amazon’s Rekognition is not just good at identifying the primary object, but also many of the objects around the scene, such as when a human, bird, or piece of furniture is also in the image. It also includes qualitative judgements, just as “cute” or “adorable”. There’s a nice balance of objective and subjective labels in their top predictions.

Google & IBM

Google’s Vision API and IBM Watson Vision are both very literal and never seem to return labels other than straightforward descriptive labels. The performance seems comparable between the two, with IBM typically returning slightly more labels on average for any given photo.

Microsoft

Microsoft’s tags were usually too high level, i.e. “dog”, “canine”, “mammal” and they never once specified “chihuahua” or “muffin” which is a huge surprise. They also seemed to be very trigger happy with identifying muffins as “stuffed animals” in their automatically generated captions. You’d think that the company behind ResNet would have better performance to show for it, but this may be a quirk of this dataset so I encourage more robust testing on your own.

Cloudsight

Cloudsight is a hybrid between human tagging and machine labeling, so the API is much slower than the others as you can see from the speed stats below. That said, for difficult or strange photos, the Cloudsight description tends to be the most accurate, i.e. “12-piece West Highland White Terrier cupcakes.”

Clarifai

Clarifai returns by far the most tags (20), yet never once correctly identified the breed of the dog images as “chihuahua”. Instead they resorted to more generic tags like “dog”, “mammal”, or “animal”. What Clarifai does do well is add a lot of qualitative and subjective labels, such as “cute”, “funny”, “adorable”, “delicious”, etc. They also sometimes return abstracted concepts like “facial expression” or “no person”. These can be useful if you’re looking for a richer description of images for use in advertising or other consumer-facing purposes.

Other Considerations

As stated before, actual assessment of these APIs would require you to define clear business and product goals, an appropriate test data set, and metrics for success. You’d likely also need to consider factors such as cost, speed, and number of tags returned.

Here’s the summary for these additional metrics based on the 16 images from the classic chihuahua vs. muffin meme. Amazon Rekognition regularly performs a smidge faster than the other fully automated APIs. Cloudsight, as expected is materially slower because of the human / AI hybrid structure and only returns a single caption. Clarifai returns 20 labels by default.

TOPBOTS Computer Vision Image Recognition API Speeds

Pricing for all of the APIs can be found on their pricing pages which are linked below. Most of the APIs offer a free tier and then charge based on monthly processing volume. These are approximate starting prices per image as of the date of this article, but pricing is constantly in flux so you’ll need to check for updates before you commit to any platform.

Amazon – $0.001
Microsoft – $0.001
IBM Watson – $0.002
Google Cloud – $0.0015
Cloudsight – $0.02
Clarifai – $0.0015

Most of the APIs charge between $0.001 and $0.002 per image for a few million images, but Cloudsight is notably more expensive at $0.02 an image, with pricing based on 30,000 images per month. Lower volume accounts can pay up to $0.07 an image!

Further Research

If you would like to conduct your own highly unscientific yet wildly entertaining research into image recognition APIs, it may be helpful to know that the chihuahua vs. muffin meme originator Karen Zack made a ton of “food vs animal” comparisons that are ripe for API benchmarking!

These are some of my favorites:

Have fun & let me know what results you get in the comments below!

About Mariya Yao

Mariya is the co-author of Applied AI: A Handbook For Business Leaders and former CTO at Metamaven. She "translates" arcane technical concepts into actionable business advice for executives and designs lovable products people actually want to use. Follow her on Twitter at @thinkmariya to raise your AI IQ.

Comments

Ryan says

September 27, 2017 at 12:46 pm

These are great Mariya. Hope you can do the same run of tests with language APIs. Is there such a thing as an ‘intent’ classifier?
Pirm says

October 4, 2017 at 10:17 am

I would want to see which of the api’s was more likely to know the correct answer than how many tag’s they’ve found and what not.
Tae Jun Park says

June 3, 2019 at 9:45 am

The link in use ‘open source code’ written by engineer Gaurav does not exist.
Could you tell me where it is? Thanks.
betjee says

November 30, 2023 at 12:44 am

Cool ideas on the APIs. Honestly , we are implementing now a project and some have pesky API tech issues. This is a welcome read for me.
Tiny Fishing says

December 7, 2023 at 10:29 pm

amazing
Danielnzr says

March 11, 2024 at 11:42 pm

Your personal email should only be obtainable to these who are already your shut mates to contact you when want. The international date line is an imaginary north-south line drawn by way of the middle of the Pacific Ocean about half way around the globe, close to the 180 degrees of longitude from the Prime Meridian, which cuts by way of Greenwich, England. The IDL is essential to have a fixed, albeit arbitrary, boundary on the globe the place the calendar date advances in the westbound course. Judaism is used to demarcate the change of 1 calendar day to the next within the Jewish calendar. When the Portuguese explorer Magellan and his crew returned from their 16th-century westward circumnavigation of the globe, they found that someday had somehow been misplaced. Devote extra time to studying profiles and see which lady is the proper one for you. Do not reveal all the pieces about yourself proper in the beginning. When signing up for a Canada online dating website, make sure that you choose the folks you wish to be pals with and keep away from sending out an “I need to be associates with you e-mail” to dozens of people.

The most effective half for brisker Online Site members or for those who’re very younger and new to relationship world, some websites additionally present tutorials by mediums reminiscent of chat, e mail and even through the toll-free telephone. Avoid individuals who don’t offer you an excellent vibe. Nowadays, online dating is no longer a cheap thing since increasingly more individuals are utilizing this medium to get a perfect date. In case you are impressed with the opposite individual and would wish to take the relationship to the next level, you would do well to look for minor signs in the opposite individual’s communication to see if they are really interested too. However, you can share private information to these who have earned your trust; nonetheless, you are responsible for the outcome since courting sites do not take any credit score or blame. Some males who’re inexperience in online dating had misplaced cash prior to now to some people whom they grew to become associates to in dating sites. With the set turned off, TVs are dreary, dark holes. Even in cougar relationship sites you continue to have to be careful of whom you’re a friend to in order that you don’t get damage financially.

The very first thing you should keep away from divulging with your new on-line good friend is your private life story. Lastly, an necessary thing to keep away from while relationship online is talking about your other exploits, like your ex-girlfriend or ex-boyfriend. The competitors is fierce, so the first thing you might want to do is to differentiate your business. You can browse totally free on Match, but to get communications with other customers and start something severe, you need a paid membership. Be it online dating or face to face assembly, you will want some courting ideas in case you are totally new to this. If you are part of the online dating scene you should observe these etiquettes as they will provide help to to charmdate scam go a long way. The information you offered to them like e-mail id, cellphone number, place and country will stay safe to their database. Lane, Megan. How does a country change its time zones? This is certainly one of the principle the explanation why most Canadians resort to online dating as a substitute; it’s more time and cost environment friendly. But not all males are wary of the dos and don’ts of online dating which is why they find yourself heartbroken and unsuccessful.

End the relationship instantly when an individual starts asking of cash (in any of a dozen ways individuals can ask for money) from you since you’re in search of a courting mate not a benefactor. There are several methods of exhibiting that you want the opposite person and would really love to speak to them on cellphone or meet them sometime. Dates are amongst essentially the most historic of fruits, growing along the Nile as early because the 5th century B.C. Kahan, Dan M. “Whose Eyes Are You Going to Believe? Scott v. Harris and the Perils of Cognitive Illiberalism.” Faculty Scholarship Series. There are also websites that cater to the wants of a specific group equivalent to singles online dating team, vegetarian group and much more. It is completely as much as you! Although no guidelines govern how much gentle a specific room warrants, various guidelines exist. Well, there are not any rules in terms of dating on-line however how profitable you’ll be here depends upon the way you present your self. Chat rooms are the common place where a dater can see the present community members by way of their uploaded profile and accordingly can start his/her online communication often called online dating.
🔰 Transaction 48 877 $. GЕТ > https://forms.yandex.com/cloud/65db11965d2a06eb0179d25d?hs=ea26c55f6ee232b40614c3bc782bf395& 🔰 says

March 13, 2024 at 4:55 am

kcuvfv
Johnmiv says

March 14, 2024 at 7:47 pm

He knew I liked them, however this gorgeous accordion price a terrific deal, greater than new windows or a brand new carpet or a brand new boiler would have value. In fact, instead of relying on security gear to protect them, staff would slightly avoid radiation altogether whenever attainable. Fraud factories in Asia visitors employees to rip-off westerners into shopping for cryptocurrencies online. Liz Ziegler, fraud prevention director at Lloyds Banking Group, says the settlement is a â€˜step in the appropriate directionâ€™, as a result of these social media platforms facilitate organised crime. Action Fraud, the reporting and advice arm for the U.K.’s National Fraud Authority, obtained 592 accounts of online dating scams from 2010 to 2011. They accounted for a total of 8.5 million in losses, Action Fraud’s quality assurance officer Steve Proffitt informed msnbc. But, to start out off with you could well go along with the free membership senior dating web sites or use trial no price membership potentialities of the compensated web sites.

So, staff use a whole lot of water to both cool such materials and to comprise their radiation, generally for years at a time. Interestingly sufficient, decontamination crews typically use the identical mops, brooms, shovels and brushes to carry out their jobs that you would possibly discover at a neighborhood hardware store. When radioactive materials spreads into giant bodies of water or into the atmosphere, decontamination may be impossible. Uranium and its byproduct, plutonium, both produce gamma rays at levels extremely dangerous to humans — even temporary exposure to a small amount of plutonium can prove fatal, for instance — however nuclear energy would be unimaginable without them. By combining the bacteria with inositol phosphates latamdate review — an agricultural waste material — scientists can first bind uranium to the phosphates and then harvest the uranium to remove it from the setting. Much of the problem comes from the truth that radioactive material can unfold to the atmosphere in a number of methods — notably when issues go incorrect — making cleanup exponentially more difficult. Still, these amenities can’t operate perpetually, and that’s when radioactive cleanup is critical.

Environmental Protection Agency (EPA) oversee the development of groundwater extraction and remedy amenities. Along with water, concrete, glass and dirt prove pretty efficient at storing radioactive materials, notably when paired with containment vessels and storage amenities. If the soil itself is contaminated, however, it could must be extracted and buried at a containment facility and even encased in concrete. It is not a too-good-to-be-true miracle cure, however it may be utilized in a variety of the way to boost your efforts to combat critical, chronic diseases (and as noted in the box on pages 75 and 76, it could lend a healing hand in opposition to some frequent, minor discomforts). May be they speak some philosophy to you to hide truth from you. Over time, the nuclei of radioactive atoms emit what’s referred to as ionizing radiation, which may are available in three primary types: alpha particles, beta particles and gamma rays. Unlike alpha and beta particles, however, gamma rays can move directly by the body, wreaking havoc in the process.

However, what if it does? Thanks to rigorous safety standards and mechanisms, nonetheless, staff at nuclear energy plants (and everywhere else radioactive materials is handled) very rarely come involved with dangerous ranges of radiation. In many instances, workers are tasked with easy chores like sweeping up low-degree radioactive materials, wiping down surfaces with decontaminating chemicals and amassing debris for disposal. Regardless of the type of contamination, mopping up radioactive materials is a dangerous task, and patience is typically one of the best approach to safely decontaminating a site. As an illustration, Germany volunteered two robots to aid in stabilizing and, ultimately, decontaminating Fukushima Daiichi. On December 19, 2017, Yapian, the owner of South Korean trade Youbit, filed for bankruptcy after suffering two hacks that yr. In July 2021, two RCMP officers in Nova Scotia stopped a automotive containing a black couple, and ordered the male driver at gunpoint to exit the automobile with arms raised. On July 5, she performed one other new tune, “Deeper Shade Of Us”, with disco influences. GarlandÂ´s character, Dorothy, has to click the slippers’ heels 3 times and repeat, ‘ThereÂ´s no place like home,’ to return house to Kansas. William claimed he was eager for Vera to return to the UK to reside with the lady whereas he was working in Nigeria and she sent a further Â£1,000 to pay for flights.
Skyeskor says

March 19, 2024 at 12:58 am

Kemp, 45, from Torquay, denied murder, declaring she had brandished the knife to self-harm. But it doesn’t stop there, there is a disturbing ploy used by specific totally free dating websites, where any genuine member that registers, to their free dating website, unexpectedly has their inbox filled with messages from other members, declaring how much they would like to get together, the messages are apparently from extremely pretty and handsome members, with the sole aim; getting the members to update their subscription through charge card or some other payment method to a premium subscription, while unbeknownst to them those messages originated from bots, pre-programmed scripts run by the administrator of the website to fool individuals into upgrading their subscription. Adding photographs is an approach to construct your existence on the Internet. The majority of the adult singles can be connect with their adult swingers from all over the world through the Internet. Because of the various online dating websites offered in the Internet today, a lot of individuals can now use them to look for an appropriate date or perhaps a partner for life. There’s the old stating that you need to kiss a lot of frogs to discover a prince – and I believe that actually applies to online dating.

“This is a chance to be clear about who you are and who you desire to fulfill,” includes Keely Kolmes, PsyD, a San Francisco- and Oakland-based psychologist – and if you have a “offer breaker” concern, discussing it upfront can safe a lot of time and effort. What to do when you see someone on a dating site who you as soon as dated a little but it ended odd? To implement a radiocarbon dating program for an archeological site or area, the archeologist needs to know how to pick appropriate dating products, covered in Selecting Samples, how to effectively report radiocarbon data, gone over in Reporting Results, and how to relate the radiocarbon date to the archeological problem it is intended to deal with, presented in Interpreting Radiocarbon Results. Do you understand what it is? Others know her by her singing ability as shes constantly been a praise and worship leader in her local church.

The method to know for sure is request buddies who know the local scene. They strongly suggest that the proper way to find a suitable romantic asiame review partner is to learning more about singles that fulfills your requirements and specifications. Grouper, an online site, presents group online dating to offered songs. One in 5 singles have actually dated online. I’ll state what I stated in another thread here, I work in health care and have actually been for the previous 18 years. Cara needs to know if blind individuals dating brings an entire new definition, and I’m not going to state it but I’m gon na let her say it. According to Lo, the websites not just encourage global citizenship, they also enable young people in conservative nations to select possible matches with higher freedom. Others hypothosize that it is the words that you compose within your self-summary and about me section that will be the most important to winning over the eyes of a potential mate. At times, males feel that lying or displaying themselves ostentatiously will make females fall for them.

I was kind of calculated and cerebral in the way I approached females. When females join dating sites, it doesn’t take them long to learn that, for the many part, their job includes signing up with, installing a profile with some pictures, and enjoying the influx of messages, winks and so on from males. â€¢ This is mainly noticable amongst the rather inexperienced males. Professional always negligent thing; they’re always a step up about the community place. The are personal e-mails, video chats, senior community boards, senior security ideas and expert suggestions about online dating do’s and do n’ts. Probably one of the most useful flirting tips is to have humorousness. Having said that, I have always been a direct type of individual. If you meet someone through a buddy or member of the family, just having that third-party connection is a way of helping validate specific characteristics about somebody (physical look, values, characteristic, and so on). Pew Proving ground information has actually found that although the number of individuals utilizing online dating services is growing and the percentage of individuals who think it’s a great way of meeting people is growing – more than a 3rd of individuals who report being an online dater haven’t in fact gone out with someone they have actually fulfilled online.
Vanity Fair says

March 21, 2024 at 2:05 am

Fashion Style at Vanity Fair embodies a harmonious blend of sophistication and trendsetting elegance. Explore a curated collection of timeless and avant-garde fashion, where every detail is meticulously crafted to redefine luxury. Immerse yourself in the world of Vanity Fair’s Fashion Style, where classic aesthetics meet contemporary allure, setting the standard for sartorial excellence.

You must be logged in to post a comment.

Chihuahua OR Muffin? Searching For The Best Computer Vision API

Computer Vision & Image Recognition APIs

Conducting The Test

Testing With Real-World Images

Handling Noisy Labels

Playing Trickster

So, Which Computer Vision API Is The Best?

Further Research

Related

Bots

Brands

Business

China

Commerce

Computer Vision

Conversational AI

Customer Service

Cybersecurity

Data Science & Engineering

Design

Education

Ethics & Safety

Finance

Gaming

Healthcare

HR & Recruiting

Infrastructure

Leadership & Management

Manufacturing

Marketing

Natural Language Processing

Reinforcement Learning

Research

Retail & CPG

Society

Technical Guide

Technology

About TOPBOTS

Computer Vision & Image Recognition APIs

Conducting The Test

Testing With Real-World Images

Handling Noisy Labels

Playing Trickster

So, Which Computer Vision API Is The Best?

Further Research

Related

Reader Interactions

About Mariya Yao

Comments

Leave a Reply

Footer

About TOPBOTS