Google details what its exabyte-scale Colossus file system can do

Google published a blog post giving technical details of Colossus. It is an internal file system that powers Google Cloud and many consumer services, including the search engine. The software platform manages the storage hardware running in Alphabet’s subsidiaries’ data centres.

It is used to move information into and out of storage hardware for the applications that depend on it. With its massive size, Google has to ensure that users can access data reliably, which involves high levels of computational power. One of the main objectives for Colossus is reducing and fixing technical issues.

Failure is natural

At Google’s data centers, hardware is failing all the time but not because it is unreliable, but because there’s just too much of it. All of this was explained by Dean Hildebrand, a technical director at Google Cloud’s Office of the Chief Technology Officer, and Denis Serenyi, the tech head of Google’s cloud storage.

Failures are natural when you have a scale this big. It is always important that file systems have fault tolerance and transparent recovery. Colossus uses several programs that Google engineers have dubbed Custodians to ensure things are smoothly operating.

Custodians and Curators

When one of the drives in the storage system fails, the Custodians can reassemble the lost information from the data remaining in the still working drives. The program can also perform tasks like increasing the durability of Google’s storage environments, reaching several exabytes, running on thousands of machines.

The system also has Curators to determine the best storage options and control the metadata that customers and Google services are storing in Colossus. Colossus can automatically assign data to the most suitable hardware based on the metadata and information provided about the data.

The system’s complexities are hidden behind an abstraction layer, making it easy for non-technical users to extensively use Colossus for things like assigning data to the most suitable hardware.

Whitepapers

Google details what its exabyte-scale Colossus file system can do

Failure is natural

Custodians and Curators

Stay tuned, subscribe!

Zscaler Cellular brings Zero Trust to IoT and OT devices

HPE OpsRamp plays a very important role in the platform

Yealink delivers secure collaboration with Microsoft’s MDEP

It’s World Backup Day, but backups alone are not enough

Enterprise Data Cloud is a logical but important evolution of the Pure platform

Pure’s FlashBlade//EXA should solve storage bottlenecks in AI and HPC

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon