How does machine learning work? Like a brain!

David Novák, Ximilar
David Novak
23. July 2017
blank

Human analogy to describe machine learning in image classification

I could point to dozens of articles about machine learning and convolutional neural networks. Every article describes different details. Sometimes too many details are mentioned and so I decided to write my own post using the parallel of machine learning and the human brain. I will not touch any mathematics or deep learning details. The goal is to stay simple and help people experimenting with Ximilar (Vize.ai) to meet their goals.

Introduction

Machine learning provides computers with the ability to learn without being explicitly programmed.

For images: We want something that can look at a set of images and remember the patterns. When we expose a new image to our smart “model” it will “guess” what is on the image. That’s how people learn!

I mentioned two important words:

  • Model — is how we call machine learning algorithms. It is not coded anymore (if green then grass). It is a structure that can learn and generalize (small, rounded, green is apple).
  • Guess — we are not in binary world. Now, we moved into probability domain. We receive a likelihood of an image to be an apple.

Deeper but still simple

Model is like a child’s brain. You show an apple to a kid and say “this is an apple”. Repeating it for 20 times a connection in its brain establishes and it can now recognize apples. What is important at the beginning it can not differentiate small details. Small ball in your hand is going to be an apple because it follows the same pattern (small, rounded, green). Only an apple is rooted in a little brain.

Set of images shown to the kid is called training dataset.

The brain is a model and it can recognize only categories from image dataset. It is made of layers and connections. This makes it similar to our brain structure. Different parts of network are learning different abstract patterns.

Supervised learning means we have to say “this is apple” and add a visual information to it. We are adding a label to each image.

deep learning network

Simple deep learning network

Evaluation — model accuracy

In human terms this is like exam time. At school, we learn a lot of information and general concepts. To understand how much we actually know teacher prepares a set of questions we have not seen in study books. Then we evaluate our brain and we know 9 of 10 questions are answered right.

Teachers questions are what we call testing dataset. It is usually parted from training dataset before training (20% of provided pictures in our case).

Accuracy is the number of images we answer right (in percents). What is important: we do not care how sure he was about his answer. We only care the final answers.

Limits of computers

Why don’t we have computers with human level skills yet? Because the brain is the most powerful computer. It has amazing processing power, huge memory and some magical sauce we don’t even understand.

Our computer models are limited with memory and computational power. We are fine with storage memory but short with super-fast memory accessible by processors. Power is limited by heat, technology, price etc.

Bigger models can hold more information but take longer to train. This makes the AI development in 2017 focus on:

  • making the models smaller,
  • less computationally intensive,
  • able to learn more information.

Connection to custom image recognition – Recognition / Vize.ai

This technology is what drives our custom image classification API. People can build image recognizer without deep knowledge in few minutes. Sometimes clients asked me if we can recognize 10 000 categories having one training image of each. Imagine the kid’s brain learning this. It is nearly impossible. The idea is more categories you want your child to know, more images it has to see. It takes ages for our brain to develop and understand the world. Same as child starts with basic objects start with basic categories.

What child is confident about is good/bad. Teaching models to differentiate good from bad is very accurate and does not need many images.

Summary:

I tried to simplify the machine learning to visual task only and compare it with something we all know. In Ximilar (Vize.ai) we often think of human brain while experimenting with new models and processing pipelines. I will be happy to hear some feedback from you.

This article was originally published by David Rajnoch.

David Novák, Ximilar

David Novák CEO & Co-founder

David founded Ximilar after more than ten years of academic research.
He wanted to build smart AI products not only for the corporate sphere, but especially for medium to small businesses. He is a skilled team leader with experience in both business and technology.

Related Articles

Introducing sports card recognition API for card collector shops, apps, and websites.
Read moreFebruary 2024
An overview and analysis of serving systems and deployment methods for Machine Learning and AI models.
Read moreOctober 2023
An in-depth look into AI card grading by Ximilar individually evaluating centering, edges, corners, and surface according to PSA or Beckett.
Read moreSeptember 2023