Understand the difference in image recognition platforms
AI is on fire and so are services delivering different forms of artificial intelligence. In this post, I would like to focus on the Visual segment and compare two different approaches. General vision and custom vision.
What is the difference and what works better for different visual tasks?
General vision platforms
Here are some examples of general vision platforms:
- Google Cloud Vision API
- Amazon Rekognition
- Microsoft Computer Vision
- IBM Visual Recognition Watson
These platforms are built for understanding everyday’s objects (dogs, airplanes, faces, table…). They mostly provide photo tagging. This means understanding as many objects and abstracts in the image as possible.
The main goal of general platforms is human level understanding of images. They are reaching for more than object understanding. The idea is to understand the abstract interaction between objects, moods and contexts. In the video, they are trying to understand action, its impact and time continuity. These are a very complex task and need a lot of labeled data to learn.
General vision providers gain millions of images from different sources. Users upload images into Google Photos, OneDrive and Google index every image on the web for Google images.
How users then benefit from this never-ending data sources?
Machine learning needs a lot of data for training. In this case, users don’t have to provide any data. General models learned thousands of everyday’s objects, face emotions, landmarks, car types… We can start using this treasure right away with no pain of gathering relevant training data.
This is an amazing benefit and general models learn more and more categories. We can set up many cool apps by implementing general models because our apps often look at everyday’s objects.
Another benefit of General solutions is other functionalities they provide out of the box. Generate a thumbnail, read a text or find celebrity name. All with no training data and for a reasonable price.
How to choose?
The most important factor we are trying to maximise is accuracy on our specific task. Each provider has a different number of training images, different deep learning architecture and provides different tasks. These are company secret data we will never reach.
Generally, in AI, we want to find all the providers who offer the functionality we need and test them out to find best performing solution. For complex tasks, it is common to mix few provider with best results.
Who is this for?
General vision suits the best to the application that needs to recognise everyday’s objects. Robots reading human faces, e-shop image captioning for better SEO performance and helping blind people to understand new environments. These are the great examples of general vision. When reaching for vision solutions the first question should be: Is this something I could find online? If yes, then it is worth to try general models.
Services like Google Vision provide the power of millions of images to everyone.
But what happens when we come out of everyday’s space. What if we have scientific data available only to few universities. Here comes the Custom vision.
Custom vision solutions
One example (here I promote the solution we are developing):
and some more:
- Microsoft custom vision AI
- Clarifai custom image recognition
- Immaga computer vision
In Custom vision, users are creating their own rules to sort images.
Rather than asking the type of the flower, you may want to know if there is a sun shining on the flower. Sometimes you want to alert when your security camera spots human but your neighbour on a mower tractor is all right.
You could make sure all the product thumbnails you display show unboxed product on a white background. Someone wants to make sure that product on the end of the line is not damaged. This is something that off-shelf solutions are not built for.
About a year ago there was only one solution. Hire an AI team to deliver an expensive on-premiss solution. Custom vision solution opens up whole new possibilities in visual AI. Compared to general platforms there is an infinite number of tasks we can solve by defining custom objects. We can also detect different states of one object or environment. Custom vision is a little machine learning lab where everyone can test his idea. As a result, we can automate boring human tasks and save some time.
Custom vision goal is not general image understanding but 100% accurate understanding of the specific task. This is very close to the market but it comes with one disadvantage. User have to gather their own training data.This can be a pain and time consuming but it is a competitive advantage too.
At this moment all the custom services offer image classification task. This means sorting the images into classes while looking at the whole image. The task can be as simple as deciding “ok” or “broken” or it can consist of many classes (eg. several terrain appearances).
Custom vision can also come handy when we need high accuracy on a smaller set of categories. We don’t always need to recognise thousands of categories. We would like to find 10 that are interesting for us. Custom vision can often supply better accuracy for general tasks.
The key for the user is the number of images they need for training. This is very hard to estimate in general but can be as little as 20 images per class.Read more about custom datasets in this post. Smaller the visual difference is, higher number of images we need.
How to choose?
We are looking for a solution that is easy to use, provides a simple user interface and the best accuracy for our task. We should test all the available solutions before we start using one. Before we start testing our idea I would also recommend to save some time discussing project with support team.
Who is this for?
Custom vision suit to the application that needs to recognize very specific images or object states. It also fits images that are not available online or are not mass-produced by web users. It can solve many scientific, industry, medical, laboratory tasks. General models are often made for needs of the online business. Custom vision can help in a variety of industries. Agriculture, production lines, security and many others.
Custom vision opens new possibilities in visual automation. All made simple for users with no technical background.
Machine learning is technology that saves people hours of time. Vision is one of the human abilities that is now possible to automate. There is no universal approach for vision task so we have to decide what type of task we are facing.
General vision is here to organise and structure images that are available on the web. It is very simple to use and needs no training data.
Custom vision makes sense in images that are very specific to the task or not available online. It needs some effort to gather training data but it provides very accurate results for vision tasks.
Which one is the best one for you? Let me know in the comments below!
This article was originally published by David Rajnoch.