2 min Analytics

Google launches VaultGemma: privacy AI without compromising performance

Google launches VaultGemma: privacy AI without compromising performance

Google presents VaultGemma, an AI model that protects sensitive data without compromising performance. The 1 billion-parameter model uses differential privacy and will be available as open source.

Google Research and Google DeepMind are behind VaultGemma, a language model that solves the privacy issues of traditional AI. The model builds on Google’s Gemma architecture, demonstrating that differential privacy does not necessarily mean reduced performance.

Differential privacy works by adding controlled noise to datasets. This makes it impossible to retrieve specific information while maintaining overall usability. VaultGemma was built from the ground up and trained with a differential privacy framework to ensure that it cannot remember or leak sensitive data.

New scaling laws break through old limitations

Traditional scaling laws for AI models do not apply when differential privacy is applied. Google therefore developed new “DP Scaling Laws” that take into account added noise and larger batch sizes. This breakthrough enables the development of larger and more powerful private language models.

The team adapted the training protocols to counteract the instability caused by noise addition. Private models require batch sizes with millions of examples to train stably. Google found ways to reduce these computational costs without undermining privacy guarantees.

Performance comparable to public models

In evaluations on benchmarks such as MMLU and Big-Bench, VaultGemma performs comparably to non-private Gemma models with the same number of parameters. This is remarkable because previous differential private models always performed significantly worse.

VaultGemma uses a decoder-only transformer architecture with 26 layers and Multi-Query Attention. The sequence length is limited to 1,024 tokens to keep the intensive computational requirements of private training manageable.

Open source for wider adoption

Google is making VaultGemma fully open source via Hugging Face and Kaggle. This contrasts with proprietary models such as Gemini Pro. The new scaling laws should be applicable to much larger private models, potentially up to trillions of parameters. Google envisions collaboration with healthcare providers, with VaultGemma analyzing sensitive patient data without privacy risks.

By refusing to disclose training data, the model also reduces the risks of misinformation and bias amplification, the researchers say.

Tip: Google puts Gemini largely behind a paywall