JetBrains introduces Developer Productivity AI Arena (DPAI Arena), the first open benchmark platform that measures the effectiveness of AI coding agents. The platform is being donated to the Linux Foundation and aims to bring transparency and standardization to the evaluation of AI tools for software development.
JetBrains has 25 years of experience with development tools for millions of developers. That knowledge is now being used to tackle a problem: there is no neutral standard for measuring how much AI coding agents actually contribute to productivity.
According to JetBrains, existing benchmarks are limited. They work with old datasets, focus on only a few programming languages, and concentrate almost exclusively on issue-to-patch workflows. While AI tools are advancing rapidly, there is no shared framework for objectively determining their impact.
DPAI Arena aims to fill this gap. The platform offers a multi-language, multi-framework, and multi-workflow approach. Think of patching, bug fixing, PR review, test generation, and static analysis. It uses a track-based architecture that enables fair comparisons across different development environments.
Transparency and reproducibility are key
Kirill Skrygan, CEO of JetBrains, argues that evaluating AI coding agents requires more than simple performance measurements. “We see firsthand how teams are trying to reconcile productivity gains with code quality, transparency, and trust – challenges that take more than performance benchmarks to address.”
DPAI Arena emphasizes transparent evaluation pipelines, reproducible infrastructure, and datasets that are supplemented by the community. Developers can bring their own datasets and reuse them for evaluations.
The platform launches with the Spring Benchmark as its technical standard. This benchmark shows how datasets should be constructed, which evaluation formats are supported, and which rules apply. Spring AI Bench is also being considered further to expand the Java ecosystem with variable and multi-track benchmarks.
For everyone in the AI chain
The added value varies per user group. AI tool suppliers can benchmark and refine their products on real-world tasks. Technology companies keep their ecosystems up to date by contributing domain-specific benchmarks. Companies get a reliable way to evaluate tools before deployment. And developers gain transparent insights into what actually increases productivity.
JetBrains is donating the platform to the Linux Foundation. That organization is setting up a diverse Technical Steering Committee to determine the future direction. Providers of coding agents and frameworks are invited to participate. End users can also contribute by validating AI tools on their own workloads. In this way, the ecosystem grows based on openness, trust, and measurable impact.
Tip: JetBrains CEO: “People are tired of bold AI statements.”