Human analogy to describe machine learning in image classification
I could point to dozens of articles about machine learning and convolutional neural networks. Every article describes different details. Sometimes too many details are mentioned and so I decided to write my own post using the parallel of machine learning and the human brain. I will not touch any mathematics or deep learning details. The goal is to stay simple and help people experimenting with Ximilar (Vize.ai) to meet their goals.
Machine learning provides computers with the ability to learn without being explicitly programmed.
For images: We want something that can look at a set of images and remember the patterns. When we expose a new image to our smart “model” it will “guess” what is on the image. That’s how people learn!
I mentioned two important words:
- Model — is how we call machine learning algorithms. It is not coded anymore (if green then grass). It is a structure that can learn and generalize (small, rounded, green is apple).
- Guess — we are not in binary world. Now, we moved into probability domain. We receive a likelihood of an image to be an apple.
Deeper but still simple
Model is like a child’s brain. You show an apple to a kid and say “this is an apple”. Repeating it for 20 times a connection in its brain establishes and it can now recognize apples. What is important at the beginning it can not differentiate small details. Small ball in your hand is going to be an apple because it follows the same pattern (small, rounded, green). Only an apple is rooted in a little brain.
Set of images shown to the kid is called training dataset.
The brain is a model and it can recognize only categories from image dataset. It is made of layers and connections. This makes it similar to our brain structure. Different parts of network are learning different abstract patterns.
Supervised learning means we have to say “this is apple” and add a visual information to it. We are adding a label to each image.
Evaluation — model accuracy
In human terms this is like exam time. At school, we learn a lot of information and general concepts. To understand how much we actually know teacher prepares a set of questions we have not seen in study books. Then we evaluate our brain and we know 9 of 10 questions are answered right.
Teachers questions are what we call testing dataset. It is usually parted from training dataset before training (20% of provided pictures in our case).
Accuracy is the number of images we answer right (in percents). What is important: we do not care how sure he was about his answer. We only care the final answers.
Limits of computers
Why don’t we have computers with human level skills yet? Because the brain is the most powerful computer. It has amazing processing power, huge memory and some magical sauce we don’t even understand.
Our computer models are limited with memory and computational power. We are fine with storage memory but short with super-fast memory accessible by processors. Power is limited by heat, technology, price etc.
Bigger models can hold more information but take longer to train. This makes the AI development in 2017 focus on:
- making the models smaller,
- less computationally intensive,
- able to learn more information.
Connection to custom image recognition – Recognition / Vize.ai
This technology is what drives our custom image classification API. People can build image recognizer without deep knowledge in few minutes. Sometimes clients asked me if we can recognize 10 000 categories having one training image of each. Imagine the kid’s brain learning this. It is nearly impossible. The idea is more categories you want your child to know, more images it has to see. It takes ages for our brain to develop and understand the world. Same as child starts with basic objects start with basic categories.
What child is confident about is good/bad. Teaching models to differentiate good from bad is very accurate and does not need many images.
I tried to simplify the machine learning to visual task only and compare it with something we all know. In Ximilar (Vize.ai) we often think of human brain while experimenting with new models and processing pipelines. I will be happy to hear some feedback from you.
This article was originally published by David Rajnoch.