3 min Applications

Joint AI training without sharing data: FlexOlmo makes it possible

Joint AI training without sharing data: FlexOlmo makes it possible

Researchers at the Allen Institute for Artificial Intelligence (AI2) have presented a new framework for training LLMs without the need to share the underlying training data between organizations.

LLMs typically rely on huge amounts of high-quality training data. In sectors such as healthcare or financial services, such datasets are often spread across multiple institutions or departments. In theory, combining this data would lead to more powerful models. In practice, however, this proves difficult. Regulations and data privacy make it impossible to simply move or centralize data.

FlexOlmo offers a solution to this problem. It enables organizations to train AI models locally on their own data and then combine those models without the data itself leaving the organization’s network.

Joint anchor model

The FlexOlmo framework starts with a joint anchor model that each participant trains locally on their own data. Unlike traditional federated learning methods, the models are not then averaged or merged in a linear process. Instead, they are integrated into a so-called Mixture of Experts (MoE) architecture: an AI model consisting of multiple specialized neural networks.

Each participant therefore provides a trained submodel, including its own router. This is a module that determines whether a particular submodel should perform a task. When the models are combined, the routers are also integrated. This ensures that the combined MoE model can still efficiently and accurately assign tasks to the right expert.

According to the team behind FlexOlmo, this architecture allows models to be developed asynchronously: organizations can join at different times and add their models without retraining the whole system.

Security and performance

In a paper, the researchers investigated whether it would be possible to retrieve training data from the final FlexOlmo model. Their evaluation showed that the risk of data extraction is very low. It was only 0.7% in their test scenario, while an overfitted model in a control group revealed 60% of the data.

FlexOlmo also performs well in terms of performance. In a benchmark with models containing up to 37 billion parameters, the researchers saw a 10.1% improvement in accuracy compared to previous methods for combining models. The performance is close to that of a centrally trained model, despite the lack of shared data.

The researchers see broad applications for FlexOlmo, especially in sectors with sensitive or regulated data. Think of collaborations between hospitals, financial institutions, or government agencies that want to use AI without compromising privacy and compliance.