Computer Vision Platform FAQ
About Computer Vision Platform
Image Recognition is an AI-powered technology able to recognize images and assign them categories. Custom image recognition models can be built and deployed on Ximilar’s computer vision platform. Namely, you can train and combine classification models – categorization & tagging or image regression (value prediction) based on image recognition, and object detection.
Both custom and ready-to-use services by Ximilar can work together in a modular hierarchical structure made of Flows.
Object Detection requires manual annotation of training data, which can be done both in our App and dedicated interface Annotate.
- Custom Image Recognition
- Read more about Flows
- Explainable AI: What is My Image Recognition Model Looking At?
- Which services does Ximilar provide, and what are the differences between them?
- How Ximilar technology works?
- How can I train my own categorization & tagging model?
- Can I combine machine learning models or put them in a sequence?
- Can I combine visual search services with solutions created with Ximilar platform?
Categorization & Tagging
What means categorization, tagging, tags, and labels? What is the difference between categorization and tagging?
Automatic categorization is a process in which every image is assigned to a category. It is powered by AI which was trained to recognize the categories of images. Each image belongs only to one, visually distinctive, category – for example, dress vs. trousers. E-shops typically use a hierarchical taxonomy of products, so they work with both categories and sub-categories.
Automatic image tagging is a process in which each image is tagged by AI based on attributes recognized in the image. For example, when you upload a photo of a dress on a model, the dress will be put into the category dresses and subcategory casual dresses, and tagged with tags describing its colour, pattern, design, style, material, length, and other attributes.
In the case of categorization & tagging, a label is a term describing both categories and tags. Object Detection, on the other hand, works with detection labels, labelling the detected objects, people, or animals in the images.
Ximilar provides several categorization and tagging services. You can either use our ready-to-use Image Recognition services (such as Fashion Tagging or Home Decor & Furniture Tagging) or train your own from scratch on Ximilar computer vision platform. Both ready-to-use services and custom tasks can cooperate through Flows.
The training of machine learning models is available to everyone. First, log in to Ximilar computer vision platform, then check our comprehensive guide on How to Build Your Own Image Recognition API.
- How Ximilar technology works?
- Is there a difference between a task and a model?
- Can I use one task (model) in multiple Flows?
- How do I connect to Ximilar API?
- How do I test the accuracy of my Image Recognition and Object Detection models? What are evaluation metrics?
- What is A/B testing of machine learning models?
- What is a machine learning loop?
Technically, there is no limitation to the number of labels per task. However, using hundreds or more labels would require a very large volume of training data. In such cases, we recommend creating a hierarchy of models and combining them with Flows.
Yes! The basic principle of our solutions lies in separate machine learning models, which work together or in a modular structure of Flows. Flows enable you to chain your models in a sequence, combine them, put them in a hierarchical structure, and implement conditional processing.
Object detection models’ training requires the training dataset to be annotated. The annotation of images means marking the objects we need to be detected with bounding boxes. Precise and consistent image annotation is a key to successful object detection solutions. It is done manually, but can be assisted by the AI. At the Ximilar platform, users can annotate their training images both in the App and in a dedicated interface Annotate.
Annotate is an advanced image annotation tool by Ximilar built for the annotation of large volumes of training data effectively, precisely, and fast. It is web-based and connected to the same back-end & database as Ximilar App. Therefore, all changes you do in Annotate will also be visible in your workspace in App. It is connected to Rest API with Python SDK, and annotated data are queried.
You can upload images through the App, and annotate them in Annotate. You can work in a team, assign annotating jobs to your teammates, and set how many times each image should be checked.
The basic principle of image annotation in Ximilar App and Annotate is the same: you view an image, pick which object detection task should predict the objects in the image, and then check the position of bounding boxes or draw new ones. You can check the assigned labels (categories and tags) and add new ones from the hierarchy (in Annotate) or even create new ones (in App). You can use both interfaces to create and train your tasks.
Ximilar App is great for creating entities such as labels and tasks, uploading data, model training, task management, and batch management of images (bulk actions for labelling and filtering). If you need to add new labels (tags and categories) even while annotating your images, you can do it in App.
If you need to annotate a lot of images, and you already have a hierarchy of labels (in App), we recommend uploading images there and then working in Annotate. The main difference is Annotate enables you to process large amounts of training images precisely, but fast, with several advanced features.
Your company account can have many workspaces, each for one project. Your team members can get access to different workspaces. Also, everyone can switch between the workspaces in the App as well as in Annotate (top right, next to the user icon). Did you know, that the workspaces are also accessible via API? Check out our documentation and learn how to connect to API.
Flows are a technology for chaining, combining and building a hierarchy of tasks for complex visual AI systems. It was developed by Ximilar to make the building of visual AI solutions easier and accessible without the need for coding. We use Flows to streamline complex image processing.
Flows were made to combine different tasks into complex image processing systems. A Flow is built by adding the following actions:
- Branch Selector – contains an image recognition task; based on the recognition, your images are then processed by one or more branches of your Flow
- Recognition – performs an image recognition task and provides an output as specified by the user
- Detection – performs an object detection task, provides bounding boxes, and other outputs as specified by the user
- Object Selector – performs an object detection task; the detected objects are then analyzed separately by other actions
- Ximilar Service – calls any of the Ready-to-use Image Recognition services
- List – performs a list of actions with images (in sequence or simultaneously)
- Nested flow – calls another Flow
Yes, one task (machine learning model) can be used in multiple Flows. Just add the right type of action in your Flow and then pick your task from the selection list.
Flows are available to the users of all pricing plans. Check our Pricing for details.
Need help with setup? Watch the video tutorial or read the guide. If you need a solution tailored to your business, feel free to contact us. We can prepare a demo on your data and deploy the system for you.
Using Ximilar services consumes API credits. Ximilar provides three pricing plans with different monthly API credit supplies, including a free scheme with 3,000 free credits. Creating a Flow and chaining tasks don’t cost any API credits. There are, however, limitations as to how many Flows you can have.
Models – Training & History
The task is the latest and the most accurate version of the machine learning model you trained. Read How Ximilar technology works? for details.Read more:
Any image that is processed by a deployed machine learning model can be saved to the workspace and used to retrain the model to improve it. The retraining is done manually after annotators check these new images. Then the new, better and more accurate, version of the model is deployed. This loop improves the accuracy of your model in the long term, especially if the character of the data changes over time (e.g. the lightning of the scene changes dramatically). See pricing for details about availability.
No, the training of custom image recognition models on Ximilar platform is completely free, as well as deploying your solutions and making them accessible to the end users. Also, unlike the competition, Ximilar does not charge the customers for idle time or training time, and the basic scheme for our App users is Free. With Ximilar, you only pay for the actual usage of the services – read about API calls & credits.
Accuracy, Performance & Evaluation Metrics
How do I test the accuracy of my Image Recognition and Object Detection models? What are evaluation metrics?
Each of your models automatically goes through testing on an evaluation dataset during the training phase. You can also add a separate test set.
You can test your models both via standard Rest API endpoint and in Ximilar App. Click on Tasks, scroll down to Models and click on a magnifying glass to view the details about a particular model. There you can check the evaluation metrics (accuracy, precision, recall, confusion matrix, and failed images). Based on these metrics, you can further iterate and improve your training dataset to make your model more robust. We are also able to change the architecture of the neural network if needed and improve the robustness for your specific data.
You can also test the accuracy of your model on a particular image. Drag & drop the image or copy its URL to Test under your service.
- Evaluation on an Independent Dataset
- Guide: Inspect the Results and Errors
- Guide: Reliability of the Image Recognition Results
A/B testing is a technique to test and decide whether the new version of your machine learning model performs better than the previous one based on selected metrics.Read more:
The mAP, or mean average precision (AP) is an evaluation metric describing the precision of your object detection models. It is calculated from IOU (the Intersection Over Union is used to determine whether the bounding box was correctly predicted), Precision, Recall, Precision-Recall Curve, and AP. You can find the mAP of your object detection model per label in your workspace under Object Detection: Tasks: Models in the detail of the particular model.