2 min

Google announced TensorFlow Similarity, a Python package for training similarity models with the company’s TensorFlow machine learning framework. Similarity models train to spot similar/related items for application in tasks like identifying matching songs or similar products.

The search giant says the similarity models train using a technique named contrastive learning.

Contrastive learning relies on clustering algorithms that identify patterns in data automatically by operating in the theory that data points in groups should have similar features. The learning method allows a model to project items into an ’embedding space’ when applied to a dataset, such that the distance between embeddings is indicative of how similar input samples are.  

What can it do?

Training with TensorFlow Similarity shows mathematical representations of the items by showing a small distance between similar items and a bigger distance between dissimilar ones.

An example would be training a similarity model on the Oxford-IIT Pet dataset. It would lead to clusters where breeds with a resemblance are close to each other while dogs and cats are separated. After model training, TensorFlow Similarity builds an index that contains the embeddings of the various items for searchability.

According to Google, the library enables searches over millions of indexed items, displaying the top results in fractions of a second.

Real-world application

In addition to that, TensorFlow Similarity can add an unlimited new number of classes to the index without retraining, instead of computing only the embeddings for items in new classes.

The initial release of the library focuses on providing the components to build contrastive learning-based similarity models. However, Google plans to add support for additional models in the future.

Elie Bursztein and Owen S. Vallis of Google said that the ability to search for related items has real-world applications and is a vital part of many core information systems like clustering pipelines, recommender systems, and multimedia searches.