Building superfast Image Similarity for your website

avatar
Michal Lukac
6. January 2020

With the service Image Similarity added to the Ximilar App you can build your own visual similarity engine powered by Artificial Intelligence in just a few clicks, and several lines of code. With such a service, you can create a better user experience of your applications and quickly increase revenue. The technology behind the service is robust, reliable & fast.

Ximilar is using state of the art deep learning models. Some of our customers have tens of millions of images in their collections and do more than 100 million requests per month. Every collection can be focused either on generic photos (tools, landscapes, animals, people) or product photos for example, focused on fashion:

Features of the Image Similarity

  • easy to use through the Ximilar App and API
  • can handle collections with tens of millions of images
  • ultra-fast and reliable engine
  • can handle hundreds of requests per second
  • collections can be optimized for generic or product (fashion) photos
  • every image can have JSON meta-data that you can use for query filtering
  • combination of visual similarity with the similarity of keywords

Applications Using Similarity

  • expand your e-commerce/shops with visually related images
  • stock photo/video similarity search
  • show visually similar images for real estates (room design, interiors, …)
  • similar recommendations for home decor, fashion/apparel/clothing, footwear, furniture, jewellery, watch, luxury retail products, …
  • show similar art designs, wall art, … on your website
  • detect near-duplicate photos (image matching)
  • rank the images

Building Real-Estate Visual Search

Let’s explain how easy is to build your similarity search engine with Ximilar. The first step is to log in to the Ximilar App at app.ximilar.com. If you don’t have an account then simply sign up — it takes just a minute. On the dashboard, click on the Launch button of the Image Similarity service.

Then go to the collections in the left menu and click on Create New Collection. You will see the Generic Photo Collection and Product Photo Collection. Click on one of the card and your collection will be created. For our next example, I will use Generic Photo Collection.

In this example, we will build a similarity search on a real estate photo collection. Let us first prepare a text file with JSON records corresponding to images to be inserted into our collection. The key field is "_url" with the image URL.

{"_id": "1_1", "_url": "_URL_IMAGE_PATH_", "estate_id": "1", "category": "indoor", "subcategory": "kitchen", "tags": []}
{"_id": "1_2", "_url": "_URL_IMAGE_PATH_", "estate_id": "1", "category": "indoor", "subcategory": "kitchen", "tags": []}
{"_id": "2_1", "_url": "_URL_IMAGE_PATH_", "estate_id": "2", "category": "outdoor", "subcategory": "garden", "tags": []}
...

The next step requires a few lines of code. We are going to insert the prepared images to our collection using our python-client library. You can install the library using pip or directly from GitLab. The usage of the client is very straightforward and basically you can just use the script tools/collections/insert_json_records.py:

python insert_json_records.py --type generic --auth_token __YOUR_TOKEN__ --collection_id __COLLECTION_ID__ --path /path/to/the/file.json

You will find the collection ID and the Authorization token on the “collection page” in the Ximilar App.

If you don’t have image URLs, you can use either "_file" or "_base64" fields for to the image data (locally stored "_file" are automatically converted by the Python client to base64). The image similarity engine is indexing every record of the collection by extracting a representation from the image by a neural network model. Be aware that we are not storing the images in our engine. So only records that contain "_url" will be visualized in the Ximilar App.

We recommend storing your unique identifiers of each image to the "_id" field to identify your images in the collection. You can also store additional fields for every JSON record and then you can use these fields for filtering, grouping and tuning the similarity function (see below).

That’s it! Simple.

That was pretty easy, right? Now if you go to the collections page, you can see something like:

All images from the JSON file were indexed and now you can inspect the collection in the Ximilar App. Select the Similarity Search in the left menu of the Image Similarity service and test how the similarity works. You can specify the query image either by upload, by URL, its unique ID, or by choosing one of the randomly selected images from the collection. Even though we have indexed just several hundreds images, you can see that the similarity engine works pretty well. The first image is the query image and the next images are the k-nearest to the query image:

API Integration

The next step might be to integrate the service via API into your application. You can either directly use the REST API for searching visually similar images or, if you are using Python, we recommend our Python client like this:

# pip install ximilar-client
from ximilar.client import SimilarityPhotosClient

client = SimilarityPhotosClient("_API_TOKEN_", "_COLLECTION_ID_")
# search k nearest items
client.search({"_id": "1"}, k = 3)

# search by external image
client.search({"_url": "_URL_PATH_"})

Filtering, Grouping and Combining with Tags

The search for visually similar images can be combined with filtering on meta-data. This meta-data can be stored in the JSON as in our example with the "category" and "subcategory" fields. In the API, the filtering is specified using a MongoDB-like syntax — see the documentation. Let’s say that we want to search for images similar to the image with ID=1_1 that are indoor photos made in a kitchen. We assume that this meta-information is stored in “category” and “subcategory” fields of every JSON record. The query will look like this:

client.search({"_id": "1_1"}, filter={"category": "Indoor", "subcategory": "Kitchen"})

If we know that we will often filter on some fields, we can specify them in the “Fields to index” option of the collection to make the query processing more efficient.

Often, your data contain several photos of one “object” — product or, in our example, a real estate listing. Our service can group the search results not by individual photos but by the real estate ids. You can set this in the advance options of the collection by specifying the name of the Product ID field and the magic will happen.

Our image similarity is based purely on the visual content of the image. If you have some tags (labels, keywords) for each image, you might want to exploit these tags to enhance the similarity search. In order to do this, you just put your tags to the "tags" field for every record and when you use method /v2/visualTagsKNN your search results will be based on a combination of visual similarity and similarity of keywords.

Enhancing Similarity with Custom Tagger

In this example, we have assumed that your data contain categories, subcategories, and tags. If you don’t have this information, you can create your own Real Estate photo tagger through our Recognition service and enrich your image data automatically before indexing. You can build several models:

  • One classifier for categorizing Indoor/Outdoor/Floor plan photos
  • One classifier for getting room type (Bedroom, Kitchen, Living room, …)
  • One tagger for outdoor tags like (Pool, Garden, Garage, Terrace, House view, ….)



You can build even more models, for example, if you need to filter photos that are taken at night or you need to distinguish modern and luxury houses from the older ones. Our blog contains several posts on how to create such recognition tasks with high accuracy.

To sum up…

The Real Estate photos similarity search is only one use case from many (Fashion, E-commerce, Stock Photos, Healthcare, …). We hope that you will enjoy working with Image similarity service and we are looking forward to seeing your projects based on it. Thanks to our developers Libor and Ludovit you can use this service through the frontend APP. This service is really unique when we are talking about search quality, speed performance, and all the possibilities of the API. Our engineers are constantly upgrading the quality of the search, so you don’t have to. With multiple collections, you can even A/B test the performance on your websites. This can run as a service or in your warehouse! If you have more questions about pricing, technical details,  or you would like to run the similarity search engine on your own machines then contact us. In Free plan, you can upload only several thousands of images, see pricing.

Related Articles