OpenAI opens the door to reinforcement fine-tuning for o4-mini

OpenAI is making reinforcement fine-tuning (RFT) available to external developers using the o4-mini reasoning model. This gives companies the opportunity to adapt this compact AI engine to their own business context for the first time, without having to set up complex machine learning infrastructure themselves.

The announcement appeared on OpenAI’s developer account on X. Developers can now train a customized version of the o4-mini model via the OpenAI platform. This can be tailored to specific internal needs such as products, processes, terminology, or safety standards. Companies can then deploy the customized model via the OpenAI API and link it to internal systems such as databases, business applications, or custom chatbots.

More precise tuning

Reinforcement fine-tuning differs fundamentally from traditional supervised training. Where classic models are trained on fixed question-answer pairs, RFT works with an assessment model that evaluates multiple answers per prompt and adjusts the language model accordingly. This ensures much finer tuning to subtle requirements, such as using a specific communication style, policy guidelines, or domain-specific expertise.

According to OpenAI, developers can go through the entire process relatively easily. Via the dashboard or an API, they can start a training session, upload datasets, set up assessment logic, and monitor progress in real time. RFT is currently only available for models in the o series, specifically for the o4-mini model.

The initial results from the business world are promising. For example, Accordance AI was able to improve the performance of a tax analysis model by almost 40 percent. Ambience Healthcare increased the accuracy of medical coding, and Harvey achieved success in legal document analysis. Other applications, such as generating Stripe API code (Runloop), complex planning scenarios (Milo), and content moderation (SafetyKit), showed similar improvements.

More transparent cost model

RFT’s cost model is more transparent than previous fine-tuning options. Instead of charging per token processed, billing is based on seconds of active training time at a rate of $100 per hour. Only actual model adjustments are charged; preparatory phases or waiting times are not. Those who use OpenAI models to evaluate answers pay for this separately via the regular API rates, but can also opt for cheaper, external evaluators. Organizations that share their training data with OpenAI also receive a 50 percent discount, a clear incentive for collaboration and further improvement of the models.

With RFT, OpenAI offers organizations more control and expressiveness in the use of AI, without the need for specialized AI teams or proprietary infrastructure. For companies with well-defined tasks and measurable objectives, this represents a new way to accurately tailor language models to real-world applications.

Interested developers can get started right away with OpenAI’s fine-tuning documentation .