OpenAI has developed the new CriticGPT model to identify errors in code generated by ChatGPT. This should help make the output of large language models (LLMs) more accurate.
Reinforcement Learning from Human Feedback (RLHF) is usually used to improve the output. In this process, a human evaluates the model’s output for further refinement. This can be time-consuming and error-prone, especially when a model is huge. The number of incorrect or unwanted responses can be large.
OpenAI wants to change that by making GPT-4 the basis of CriticGPT. “When people get help from CriticGPT to review ChatGPT code they outperform those without help 60% of the time,” said the creator of the new tool. CriticGPT is also said to detect hallucinations that people sometimes do not perceive themselves.
The new model is trained from a dataset of code samples with intentional bugs and sample feedback. As a result, CriticGPT can detect common bugs as well as bugs that occur less frequently.
Performance
To demonstrate how CriticGPT performs, OpenAI compared the model to human performance. It proved to be more capable than the average human code reviewer. Critiques, meaning the observations and descriptions of errors, were preferred over critiques written by humans in 63 percent of cases. According to OpenAI, this is because the model nitpicks less about code and generates fewer false positives than humans themselves.
OpenAI plans to integrate CriticGPT-like models into its RLHF labeling pipeline, providing assistance to model trainers. Many of the results that OpenAI currently shows are mostly from a research phase.