How to plan computer vision features and choose the right provider
Earlier, I wrote a post about the difference between general and custom computer vision platforms. Today I would like to focus on real world use-case. Lets dive into image recognition features planning.
Imaginary use-case
We are running a small business to sell colourful socks. We want to add a “Socks Matching Engine” feature into our app. Customers will upload a picture of two different socks and app tells them the visual alternatives which will be filtered on basic features as color or pattern. This is sometimes called a visual search. This way we are going to gain a respect of Hollywood’s fashion policy and increase the sales!
Plan visual image recognition task
Before we start let’s think about this.
Vision Model — “Parameter Extractor”
What we want is a model designed to extract information from images. We will focus on several parameters:
- Pattern (dotted, striped, winter, summer)
- Color
- Sock type (ankle length, quarter length, crew length)
Each customer’s image is evaluated and labeled. This provides us with information about what are the favourite colours, patterns and types of socks of each customer. That’s great because we can now customise the next newsletter to fit your customer’s style.
At this point, matching can be as simple as adding few rules saying that blue and orange socks go together, striped goes with dotted and so on. This, of course, is a hack that does not bring much value into fashion field but it will work at the beginning.
We can also align categories with our e-shop categories and recommend customers similar socks to these they already have. When you have enough images collected we are ready to build a “Fashion advisor” model. We will also keep data extraction models to help us understand the customers and make clever suggestions.

Finding the right providers
Now we know what functions we are looking for:
- Extract colour
- Extract pattern
- Extract sock type
- Custom fashion recommender
The most important is model accuracy. There is not a solution that can provide a 100% accuracy because your customer’s images are going to be so much different. Reaching 80–90% accuracy is great!
Extract colour from image via API
It is easy to define a colour using general model. Our Dominant colors service should work in this case. We can extract dominant colours with drag and drop via demo or extracting them via API. You are able to ignore background and analyse colors of the product. We can use these colors for filtering socks from our shop site for specific color.

Extract pattern with AI
This might be the hardest part to recognize. I recommend training custom model for patterns because we want every image to have a pattern label. General models can detect strong patterns but do not provide pattern for every option. Having 20 pattern categories means we need to only about 400 images of socks for custom model training. More information about custom vision dataset is mentioned in this post. A custom image recognition model can be trained online via browser, just login to app.ximilar.com and build your own models.

Identify type of sock
General vision can also detect the type of sock. I tested a few socks and got mostly “outdoor shoe” results which are not very accurate. I prefer to spend one more hour on getting images from my e-shop database and sorting them into classes, rather than having blank spaces in my image recognition engine. Using a custom model also leads to higher accuracy on classification.
Having three parameters extracted from an image and saved in the database, we are now able to create the matching logic.

Searching and recommending socks
Building your own image-recommending engine can be hard. Luckily, Ximilar offers a solution that is able to easily find products from your socks database. We can use previous models for filtering the result on Sock type, pattern and color.

Summary
Most computer vision tasks are more complex than what one provider can deliver. It often needs some insight into business, so it is necessary to take time and think about the path to our goals. Most of the solutions like Google Vision or Azure AI services are very expensive for training and deploying your custom models.
Ximilar provides easy to use and powerful solution for training and deploying your custom machine learning models for images. The training and deployment of the image recognition models is free, which saves your expenses for the development of your business idea. If you have some business ideas that require a customized computer vision solution, then contact us. We have experts and tools that can help you grow your project …