3 min Applications

Nvidia acquires SchedMD, developer of workload manager Slurm

Nvidia acquires SchedMD, developer of workload manager Slurm

Nvidia has acquired SchedMD, the company behind the development and maintenance of Slurm, an open-source workload manager that plays a central role in high-performance computing (HPC) and large-scale AI environments.

Slurm is used to schedule computing tasks and allocate resources within large server clusters in research, industry, and government.

SchedMD was founded in 2010 by the original developers of Slurm. The company not only focuses on the further development of the software, but also provides commercial support and advice to organizations that use Slurm in production. According to SiliconANGLE, SchedMD serves several hundred customers, including government agencies, banks, and organizations in the healthcare sector.

Slurm is designed for environments in which large numbers of parallel tasks are performed. The system determines which computing resources are used and when, preventing workloads from being unnecessarily slowed down by poorly distributed resources. In practice, this means, among other things, that GPUs do not remain unused while others are overloaded. Slurm can manage clusters with more than 100,000 GPUs, making it suitable for both supercomputers and large-scale AI training.

Slurm and Kubernetes differ from each other

In AI environments, Slurm is often compared to Kubernetes, which is also used for cluster management. Both platforms can schedule and distribute workloads, but Slurm is more focused on HPC-like scenarios with strict requirements for performance and scalability. For example, Slurm offers more options for fine-grained scheduling, such as placing tasks that exchange a lot of data physically close to each other on the cluster. Kubernetes can perform similar optimizations, but often requires additional extensions to do so.

In addition to Slurm, SchedMD also maintains a second open-source project, Slinky. This makes it possible to run Slurm on top of Kubernetes. As a result, organizations do not need to manage separate clusters for cloud-native workloads and traditional HPC tasks.

Nvidia states that Slurm will remain open source and vendor-neutral after the acquisition. The software will therefore continue to be suitable for heterogeneous environments in which hardware from different vendors is combined. For existing users, little will change in the short term. Nvidia will take over support, training, and further development for SchedMD’s customers.

The acquisition is part of a broader movement in which large technology companies are playing a greater role in maintaining essential open-source infrastructure. Slurm is a fundamental part of many HPC and AI environments. The extent to which the project retains its independent position will become apparent in the long term in the governance and development choices surrounding the software. For the time being, Slurm remains a freely available and widely used scheduler within large-scale computing environments.