Google details what its exabyte-scale Colossus file system can do

Google published a blog post giving technical details of Colossus. It is an internal file system that powers Google Cloud and many consumer services, including the search engine. The software platform manages the storage hardware running in Alphabet’s subsidiaries’ data centres.

It is used to move information into and out of storage hardware for the applications that depend on it. With its massive size, Google has to ensure that users can access data reliably, which involves high levels of computational power. One of the main objectives for Colossus is reducing and fixing technical issues.

Failure is natural

At Google’s data centers, hardware is failing all the time but not because it is unreliable, but because there’s just too much of it. All of this was explained by Dean Hildebrand, a technical director at Google Cloud’s Office of the Chief Technology Officer, and Denis Serenyi, the tech head of Google’s cloud storage.

Failures are natural when you have a scale this big. It is always important that file systems have fault tolerance and transparent recovery. Colossus uses several programs that Google engineers have dubbed Custodians to ensure things are smoothly operating.

Custodians and Curators

When one of the drives in the storage system fails, the Custodians can reassemble the lost information from the data remaining in the still working drives. The program can also perform tasks like increasing the durability of Google’s storage environments, reaching several exabytes, running on thousands of machines.

The system also has Curators to determine the best storage options and control the metadata that customers and Google services are storing in Colossus. Colossus can automatically assign data to the most suitable hardware based on the metadata and information provided about the data.

The system’s complexities are hidden behind an abstraction layer, making it easy for non-technical users to extensively use Colossus for things like assigning data to the most suitable hardware.

Whitepapers

Google details what its exabyte-scale Colossus file system can do

Failure is natural

Custodians and Curators

Stay tuned, subscribe!

Memory-safe malware: Rust challenges security researchers

AI only works if the infrastructure is right

Inside TCS’ digital race behind Formula E

HPE’s strategy: AI, smart switches, GreenLake and beyond

EUVD security database is Europe’s next step towards autonomy

Dutch government starts consultation for NIS2 bill

NIS2 leads to better basic hygiene

NIS2: law lacks future-proof ideas, challenging ambitions and recovery

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Webdevcon