You’ve probably seen this internet meme demonstrating the alarming resemblance of chihuahuas and muffins. Everyone in the AI industry (including myself) loves putting the image in their presentations.
But, one question I haven’t seen anyone answer rigorously is: just how good IS modern AI at disambiguating between a chihuahua and a muffin? For your entertainment and education, I’ll be investigating this question today.
Binary classification has been possible ever since the perceptron algorithm was invented in 1957. If you think AI is hyped now, the New York Times reported in 1958 that the invention was the beginning of a computer that would “be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” But while perceptron machines like the Mark 1 were designed for image recognition, in reality they could only discern patterns that are linearly separable, preventing them from learning the complex patterns that underlie most visual media.
No wonder the world was disillusioned and an “AI Winter” ensued. Since then, multi-layer perceptions (popular in the 1980s) and convolutional neural networks (pioneered by Yann LeCun in 1998) have greatly outperformed single-layer perceptions in image recognition tasks. With the advent of large labeled data sets like ImageNet and powerful GPU computing, increasingly more advanced neural network architectures like AlexNet, VGG, Inception, and ResNet have achieved state-of-the-art performance in computer vision.
Computer Vision & Image Recognition APIs
If you’re a machine learning engineer, it’s easy to start experimenting with and fine-tuning these models by using pre-trained models and weights in either Keras / Tensorflow or PyTorch. If you’re not comfortable tweaking neural networks on your own, you’re in luck because virtually all the leading technology giants and promising startups claim to “democratize AI” by offering easy-to-use computer vision APIs:
- Amazon Rekognition
- Microsoft Computer Vision
- Google Cloud Vision
- IBM Watson Visual Recognition
- Cloudsight
- Clarifai
Which one is the “best”? To truly answer that question, you’d have to clearly define your business goals, product use cases, test data sets, and metrics of success before you benchmark the solutions against each other.
In lieu of a serious inquiry, we can at least get a high-level sense of the different behaviors of each platform by testing them with our toy problem of chihuahua vs. muffin.
Conducting The Test
To do this, I split the canonical meme above into 16 separate test images and use open source code written by engineer Gaurav Oberoi to consolidate results from the different APIs. Each image is pushed through the 6 APIs listed above which return high confidence labels as predictions. The only exceptions are Microsoft which returns both labels and a caption and Cloudsight which uses human-AI hybrid technology to return only a single caption. This is why Cloudsight can return eerily accurate captions for complex images, but takes 10-20x longer to process.
Below is an example of the output. You can see the full list of results on all 16 chihuahua vs. muffin images by clicking here.
How well did the APIs do? Other than Microsoft which confused this muffin for a stuffed animal, every other API recognized the image was of food, but there wasn’t agreement as to whether the food was bread, cake, cookies, or muffins. Google was the only API to successfully identify “muffin” as the highest probability label.
Let’s look at a chihuahua example:
Again, the APIs did rather well. All of them realized the image was of a dog, although a few of them missed the exact breed.
There were definite failures, though. Microsoft returned a blatantly wrong caption three separate times describing muffins as either stuffed animals or a teddy bear.
Google was the ultimate muffin identifier, returning “muffin” as its highest confidence label for 6 out of the 7 muffin images in our test set. The other APIs never returned “muffin” as the first label for any muffin picture, but instead related but less relevant labels like “bread”, “cookie”, or “cupcake”.
However, despite the string of successes, Google did fail on this specific muffin image, returning “snout” and “dog breed group” as predictions.
Even the world’s most advanced machine learning platforms are tripped up by our facetious chihuahua vs. muffin challenge! A human toddler beats deep learning when it comes to figuring what’s food and what’s Fido.
Testing With Real-World Images
As a further test, I’d like to know how well the APIs perform on more real-world images of chihuahuas and muffins, not just ones carefully curated to resemble each other. ImageNet happens to have 1750 images of chihuahuas and 1335 images of various types of muffins.
Some of the images turned out to be pretty easy for our APIs to recognize because they exhibit very distinct class features, such as buggy eyes and pointy ears in the case of this chihuahua:
Other images, on the other hand, proved tricky. APIs often miss identifying objects in photos if there are multiple subjects within the same photo or if the subject is costumed or otherwise obstructed:
In the above case, the costume on the dog may have prevented the APIs (and likely many human classifiers) from correctly identifying the breed. IBM Watson manages to tag just the hats but not the dog or the person wearing them.
Handling Noisy Labels
With unstructured real-world data, including images, human-tagged labels are not always “ground truth” and labels can be incorrect or “noisy”. Here’s an example of an image that was included in the “muffin” category on ImageNet:
We humans would likely identify this “muffin in disguise” more accurately as a “cupcake”. Fortunately many of our APIs did return “cake”, “cupcake”, or “cookie” as predictions that are more relevant than the ImageNet category. Cloudsight’s human labeling produced the most accurate result of “cookie monster cupcake” for what is indeed a strange human invention for machines to interpret.
Utilizing multiple different models and APIs could be one interesting way to assess the “noisiness of labels”. In the case of ImageNet’s “muffin” category, the muffin varieties (i.e. bran, corn, popover, etc) can appear quite visually distinct and many are actually mislabeled cupcakes or other non-muffin types of baked goods.
Running large numbers of images through a number of different image recognition APIs and tracking the common overlaps and divergent one-offs can help you systematically flag images which might have noisy or incorrect labels.
Weird side note: in searching for different muffin categories on ImageNet, I happened across an unexpected category called “muffin man”, which ImageNet defines as “Formerly an itinerant peddler of muffins”. If you’re ever looking for photos of dudes presenting muffins, now you know where to go.
Playing Trickster
Just for fun, I tried to fool the APIs with these types of tricky photos:
- Photos of both a chihuahua AND a muffin
- Photos of dog-shaped cupcakes
Here’s how the APIs did on one of the photos featuring both a chihuahua AND a muffin:
IBM and Cloudsight were the only two APIs that acknowledged any food was present in the image, although IBM got a bit creative with its guesses of “takoyaki”, “gyoza”, and “cannoli”.
There was also confusion caused by the dog-shaped cupcakes:
Microsoft, in traditional fashion, captioned the image as “a bunch of stuffed animals.” Google predicted the photo was more likely to be of a “dog like mammal” (0.89) than “cake” (0.79). Clarifai seemed to think the image contained both “food” (0.99) and a “mammal” (0.96) with very high confidence.
In these complex or unusual cases, Cloudsight’s human captioning demonstrated superior results, with this last image tagged very specifically as “12-piece West Highland White Terrier cupcakes” and the previous image even being recognized as being a popular meme!
So, Which Computer Vision API Is The Best?
While we can’t determine conclusively that one API is “better” than another just by these joke examples, you can definitely observe qualitative differences in how they perform.
Amazon Rekognition
Amazon’s Rekognition is not just good at identifying the primary object, but also many of the objects around the scene, such as when a human, bird, or piece of furniture is also in the image. It also includes qualitative judgements, just as “cute” or “adorable”. There’s a nice balance of objective and subjective labels in their top predictions.
Google & IBM
Google’s Vision API and IBM Watson Vision are both very literal and never seem to return labels other than straightforward descriptive labels. The performance seems comparable between the two, with IBM typically returning slightly more labels on average for any given photo.
Microsoft
Microsoft’s tags were usually too high level, i.e. “dog”, “canine”, “mammal” and they never once specified “chihuahua” or “muffin” which is a huge surprise. They also seemed to be very trigger happy with identifying muffins as “stuffed animals” in their automatically generated captions. You’d think that the company behind ResNet would have better performance to show for it, but this may be a quirk of this dataset so I encourage more robust testing on your own.
Cloudsight
Cloudsight is a hybrid between human tagging and machine labeling, so the API is much slower than the others as you can see from the speed stats below. That said, for difficult or strange photos, the Cloudsight description tends to be the most accurate, i.e. “12-piece West Highland White Terrier cupcakes.”
Clarifai
Clarifai returns by far the most tags (20), yet never once correctly identified the breed of the dog images as “chihuahua”. Instead they resorted to more generic tags like “dog”, “mammal”, or “animal”. What Clarifai does do well is add a lot of qualitative and subjective labels, such as “cute”, “funny”, “adorable”, “delicious”, etc. They also sometimes return abstracted concepts like “facial expression” or “no person”. These can be useful if you’re looking for a richer description of images for use in advertising or other consumer-facing purposes.
Other Considerations
As stated before, actual assessment of these APIs would require you to define clear business and product goals, an appropriate test data set, and metrics for success. You’d likely also need to consider factors such as cost, speed, and number of tags returned.
Here’s the summary for these additional metrics based on the 16 images from the classic chihuahua vs. muffin meme. Amazon Rekognition regularly performs a smidge faster than the other fully automated APIs. Cloudsight, as expected is materially slower because of the human / AI hybrid structure and only returns a single caption. Clarifai returns 20 labels by default.
Pricing for all of the APIs can be found on their pricing pages which are linked below. Most of the APIs offer a free tier and then charge based on monthly processing volume. These are approximate starting prices per image as of the date of this article, but pricing is constantly in flux so you’ll need to check for updates before you commit to any platform.
- Amazon – $0.001
- Microsoft – $0.001
- IBM Watson – $0.002
- Google Cloud – $0.0015
- Cloudsight – $0.02
- Clarifai – $0.0015
Most of the APIs charge between $0.001 and $0.002 per image for a few million images, but Cloudsight is notably more expensive at $0.02 an image, with pricing based on 30,000 images per month. Lower volume accounts can pay up to $0.07 an image!
Further Research
If you would like to conduct your own highly unscientific yet wildly entertaining research into image recognition APIs, it may be helpful to know that the chihuahua vs. muffin meme originator Karen Zack made a ton of “food vs animal” comparisons that are ripe for API benchmarking!
These are some of my favorites:
Have fun & let me know what results you get in the comments below!
Ryan says
These are great Mariya. Hope you can do the same run of tests with language APIs. Is there such a thing as an ‘intent’ classifier?
Pirm says
I would want to see which of the api’s was more likely to know the correct answer than how many tag’s they’ve found and what not.
Tae Jun Park says
The link in use ‘open source code’ written by engineer Gaurav does not exist.
Could you tell me where it is? Thanks.
betjee says
Cool ideas on the APIs. Honestly , we are implementing now a project and some have pesky API tech issues. This is a welcome read for me.
Tiny Fishing says
amazing
Danielnzr says
Your personal email should only be obtainable to these who are already your shut mates to contact you when want. The international date line is an imaginary north-south line drawn by way of the middle of the Pacific Ocean about half way around the globe, close to the 180 degrees of longitude from the Prime Meridian, which cuts by way of Greenwich, England. The IDL is essential to have a fixed, albeit arbitrary, boundary on the globe the place the calendar date advances in the westbound course. Judaism is used to demarcate the change of 1 calendar day to the next within the Jewish calendar. When the Portuguese explorer Magellan and his crew returned from their 16th-century westward circumnavigation of the globe, they found that someday had somehow been misplaced. Devote extra time to studying profiles and see which lady is the proper one for you. Do not reveal all the pieces about yourself proper in the beginning. When signing up for a Canada online dating website, make sure that you choose the folks you wish to be pals with and keep away from sending out an “I need to be associates with you e-mail” to dozens of people.
The most effective half for brisker Online Site members or for those who’re very younger and new to relationship world, some websites additionally present tutorials by mediums reminiscent of chat, e mail and even through the toll-free telephone. Avoid individuals who don’t offer you an excellent vibe. Nowadays, online dating is no longer a cheap thing since increasingly more individuals are utilizing this medium to get a perfect date. In case you are impressed with the opposite individual and would wish to take the relationship to the next level, you would do well to look for minor signs in the opposite individual’s communication to see if they are really interested too. However, you can share private information to these who have earned your trust; nonetheless, you are responsible for the outcome since courting sites do not take any credit score or blame. Some males who’re inexperience in online dating had misplaced cash prior to now to some people whom they grew to become associates to in dating sites. With the set turned off, TVs are dreary, dark holes. Even in cougar relationship sites you continue to have to be careful of whom you’re a friend to in order that you don’t get damage financially.
The very first thing you should keep away from divulging with your new on-line good friend is your private life story. Lastly, an necessary thing to keep away from while relationship online is talking about your other exploits, like your ex-girlfriend or ex-boyfriend. The competitors is fierce, so the first thing you might want to do is to differentiate your business. You can browse totally free on Match, but to get communications with other customers and start something severe, you need a paid membership. Be it online dating or face to face assembly, you will want some courting ideas in case you are totally new to this. If you are part of the online dating scene you should observe these etiquettes as they will provide help to to charmdate scam go a long way. The information you offered to them like e-mail id, cellphone number, place and country will stay safe to their database. Lane, Megan. How does a country change its time zones? This is certainly one of the principle the explanation why most Canadians resort to online dating as a substitute; it’s more time and cost environment friendly. But not all males are wary of the dos and don’ts of online dating which is why they find yourself heartbroken and unsuccessful.
End the relationship instantly when an individual starts asking of cash (in any of a dozen ways individuals can ask for money) from you since you’re in search of a courting mate not a benefactor. There are several methods of exhibiting that you want the opposite person and would really love to speak to them on cellphone or meet them sometime. Dates are amongst essentially the most historic of fruits, growing along the Nile as early because the 5th century B.C. Kahan, Dan M. “Whose Eyes Are You Going to Believe? Scott v. Harris and the Perils of Cognitive Illiberalism.” Faculty Scholarship Series. There are also websites that cater to the wants of a specific group equivalent to singles online dating team, vegetarian group and much more. It is completely as much as you! Although no guidelines govern how much gentle a specific room warrants, various guidelines exist. Well, there are not any rules in terms of dating on-line however how profitable you’ll be here depends upon the way you present your self. Chat rooms are the common place where a dater can see the present community members by way of their uploaded profile and accordingly can start his/her online communication often called online dating.
🔰 Transaction 48 877 $. GЕТ > https://forms.yandex.com/cloud/65db11965d2a06eb0179d25d?hs=ea26c55f6ee232b40614c3bc782bf395& 🔰 says
kcuvfv
Johnmiv says
He knew I liked them, however this gorgeous accordion price a terrific deal, greater than new windows or a brand new carpet or a brand new boiler would have value. In fact, instead of relying on security gear to protect them, staff would slightly avoid radiation altogether whenever attainable. Fraud factories in Asia visitors employees to rip-off westerners into shopping for cryptocurrencies online. Liz Ziegler, fraud prevention director at Lloyds Banking Group, says the settlement is a ‘step in the appropriate direction’, as a result of these social media platforms facilitate organised crime. Action Fraud, the reporting and advice arm for the U.K.’s National Fraud Authority, obtained 592 accounts of online dating scams from 2010 to 2011. They accounted for a total of 8.5 million in losses, Action Fraud’s quality assurance officer Steve Proffitt informed msnbc. But, to start out off with you could well go along with the free membership senior dating web sites or use trial no price membership potentialities of the compensated web sites.
So, staff use a whole lot of water to both cool such materials and to comprise their radiation, generally for years at a time. Interestingly sufficient, decontamination crews typically use the identical mops, brooms, shovels and brushes to carry out their jobs that you would possibly discover at a neighborhood hardware store. When radioactive materials spreads into giant bodies of water or into the atmosphere, decontamination may be impossible. Uranium and its byproduct, plutonium, both produce gamma rays at levels extremely dangerous to humans — even temporary exposure to a small amount of plutonium can prove fatal, for instance — however nuclear energy would be unimaginable without them. By combining the bacteria with inositol phosphates latamdate review — an agricultural waste material — scientists can first bind uranium to the phosphates and then harvest the uranium to remove it from the setting. Much of the problem comes from the truth that radioactive material can unfold to the atmosphere in a number of methods — notably when issues go incorrect — making cleanup exponentially more difficult. Still, these amenities can’t operate perpetually, and that’s when radioactive cleanup is critical.
Environmental Protection Agency (EPA) oversee the development of groundwater extraction and remedy amenities. Along with water, concrete, glass and dirt prove pretty efficient at storing radioactive materials, notably when paired with containment vessels and storage amenities. If the soil itself is contaminated, however, it could must be extracted and buried at a containment facility and even encased in concrete. It is not a too-good-to-be-true miracle cure, however it may be utilized in a variety of the way to boost your efforts to combat critical, chronic diseases (and as noted in the box on pages 75 and 76, it could lend a healing hand in opposition to some frequent, minor discomforts). May be they speak some philosophy to you to hide truth from you. Over time, the nuclei of radioactive atoms emit what’s referred to as ionizing radiation, which may are available in three primary types: alpha particles, beta particles and gamma rays. Unlike alpha and beta particles, however, gamma rays can move directly by the body, wreaking havoc in the process.
However, what if it does? Thanks to rigorous safety standards and mechanisms, nonetheless, staff at nuclear energy plants (and everywhere else radioactive materials is handled) very rarely come involved with dangerous ranges of radiation. In many instances, workers are tasked with easy chores like sweeping up low-degree radioactive materials, wiping down surfaces with decontaminating chemicals and amassing debris for disposal. Regardless of the type of contamination, mopping up radioactive materials is a dangerous task, and patience is typically one of the best approach to safely decontaminating a site. As an illustration, Germany volunteered two robots to aid in stabilizing and, ultimately, decontaminating Fukushima Daiichi. On December 19, 2017, Yapian, the owner of South Korean trade Youbit, filed for bankruptcy after suffering two hacks that yr. In July 2021, two RCMP officers in Nova Scotia stopped a automotive containing a black couple, and ordered the male driver at gunpoint to exit the automobile with arms raised. On July 5, she performed one other new tune, “Deeper Shade Of Us”, with disco influences. Garland´s character, Dorothy, has to click the slippers’ heels 3 times and repeat, ‘There´s no place like home,’ to return house to Kansas. William claimed he was eager for Vera to return to the UK to reside with the lady whereas he was working in Nigeria and she sent a further £1,000 to pay for flights.