ChatGPT produces incorrect code about half the time. This is what researchers at Purdue University have discovered, based on the OpenAI AI tool’s answers to code queries from Stack Overflow.
In the study, Purdue University researchers tested ChatGPT ‘s answers to 517 Stack Overflow questions. The goal was to find out which service provides better code answers.
Tests included the accuracy, consistency, understandability and awareness of the corresponding answers. The answers were also subjected to linguistic and sentiment analysis, and developers’ opinions on the answers were also sought.
Answers often incorrect
The results show that 52 percent of ChatGPT answers are incorrect. In addition, 77 percent of answers are too elaborate. However, developers do prefer ChatGPT’s answers in almost 40 percent of code questions. This is because of their understandability and highly articulate language style. Notwithstanding, 77 percent of these answers are wrong.
In addition, the researchers found that when an answer in ChatGPT is clearly wrong, it is immediately recognized by developers. When an answer requires more research, it is more difficult for users to recognize whether an answer is wrong or the degree of error is underestimated.
The researchers also found that the wrong answers were more often accepted because they contained more text, were more detailed and insightful, have more polite language and often promise a solution.
More conceptual errors
In addition, ChatGPT answers were found to have more conceptual errors than factual errors. The study states that this is because the AI tool cannot understand the underlying context of a Stack Overflow question.
Also, ChatGPT would engage in more formal and analytical thinking, be more focused on achieving goals and show less negative emotions.