Optical sensors such as cameras and lidar (LIght Detection And Ranging) have become an important part of modern robotics for spatial recognition. However, there is one drawback, and that is that transparent objects, such as glasses or bottles, tend to ‘confuse’ the sensors. Google claims to have a solution for this problem with ClearGrasp AI.

Most algorithms that analyse the data from sensors assume that all surfaces are opaque, and that these surfaces reflect light evenly in all directions and from all angles. Transparent objects naturally break some of the light, but they also reflect it, making sensor data seem invalid and often causing a lot of noise in a dataset.

A team of Google researchers worked with Columbia University and Synthesis AI, a data generation platform for computer vision, to develop ClearGrasp. It is a combination of algorithms capable of estimating accurate 3D data from transparent objects, based on RGB images. Incidentally, the algorithm works with input from any standard RGB camera, using AI to reconstruct the depth of transparent objects.

Three algorithms

ClearGrasp consists of a combination of three machine learning algorithms. Firstly, there is a network that can estimate the normals of surfaces, secondly a network for ‘occlusion boundaries’ (deviations in depth at a surface) and thirdly a network that masks transparent objects. This ‘mask’ removes all pixels associated with transparent objects. In this way, the correct depth can be measured, instead of the erroneous depth initially measured by a ‘confused’ algorithm. Furthermore, a module can optimise the measured depth of the surface, to improve the shape of the reconstruction. The predicted occlusion boundaries help to recognise the spaces between different objects.