5 min

Developers are not likely to agree on which AI coding tools are up to scratch, if any at all. Despite its great potential, robust AI code with any degree of complexity is still seen as a rare occurrance. Moreover, nobody wants it to be shoved in their face. Instead, the desire for artificial assistance can’t be assumed.

This week, Meta launched a new Code Llama model with 70 billion parameters based on Llama 2. In benchmarks, it rivals GPT-4, the LLM that underlies the successful GitHub Copilot. Following in the footsteps of previous AI models, Code Llama 70B is completely free to use (but not truly open-source, as has been pointed out previously).

Such a model can be run on local hardware, provided quite a few modifications are made. The fact that over 100,000 users on Reddit are interested in running Llama models themselves shows a clear level of organic enthusiasm. Nevertheless, the reaction to JetBrains’ new ubiquitous AI Assistant shows that many developers aren’t keen on its forced inclusion. Spurred on by the “dubious licensing/IP/legal situation,” as one JetBrains user put it, many want to already have the option to remove the functionality introduced in December.

Vastly different capabilities

AI coding tools manifest themselves in many ways. ChatGPT and Google Bard can be initially interesting to beginners, though repeatedly unreliable. In addition, OpenAI’s chatbot is highly conversational but, like all LLMs, has no concept of facts. For this reason, it is not surprising that ChatGPT regularly generates unsafe code – it has no inherent safeguards against them beyond the patchwork of fixes that OpenAI has to add to constantly. Google Bard is a similar story, with volatile performance as a result. Given both Google and OpenAI don’t present these chatbots as coding geniuses, users should look elsewhere for professional aid.

That is different with GitHub Copilot and, say, CodiumAI. In the latter case, one of the promises is to help write “code that works.” AWS Code Whisperer, Tabnine, Divi AI and Sourcegraph Cody are other AI-powered tools for which programming help is the explicit application. What stands out is the wide variety of features offered – and at times absent. For example, AWS Code Whisperer has no chat feature, whereas many competitors do. In fact GitHub Copilot has recently expanded exactly this option with Copilot Chat.

A bigger feature set doesn’t mean a better product. For example, CodiumAI is often praised for its generated tests for existing code. In the words of one user, “it doesn’t do more than what it’s capable of.” At the same time, the wide range of offerings within GitHub Copilot need not be a disadvantage: those who consider its limitations can be far more productive programmers.

Actual use, practical limitations

Those familiar with the limitations of LLMs for textual applications will recognize many similarities when using them for programming help. As mentioned earlier, there is no inherent control over the security or correctness of code. A popular line of thought is that LLMs are nothing more than out-of-the-blue autocomplete models, tools that simply try to guess the next word. That there is more going on behind the scenes has been known for some time. Now the possibility does exist for generative AI solutions to simply predict the next phrase or piece of code. Several tools appear to be successful at this. For example, CodeWhisperer offers“essentially an auto-complete” within VSCode, as does GitHub Copilot.

That application is not the most ambitious, but it can bail out programmers who are momentarily stuck. More complex code typically yields poorer predictions, so deployment is limited to a certain level. Explaining code is also a popular application, with the potential for impressive results. For example, Code Llama 34B would have been able to fathom 166 lines of Python code in an in-depth way. Others, however, have yet to experience such a “wow-worthy” moment, suggests a somewhat skeptical user.

As with text-based applications, coding LLMs have their limitations. Time and again, one can read that the technology produces variable results. It means that even in this paper we must continually talk about LLMs that “can” do something. Prompt engineering may offer a higher success rate, but since parties such as GitHub, AWS and JetBrains already have this targeting built into these tools, it is not foolproof. The technology seems to be fundamentally limited.

Productivity gains

There is one point that cannot be ignored: the extraordinarily large productivity gains these tools bring. AI-assisted developers show a 25 to 30 percent greater success rate for completing a complex task on time versus colleagues that have to make do without. Other results offer an even greater contrast between these two groups: a coding task that would normally take 160 minutes can now be completed in 71 minutes. There is a learning curve, though, as novices actually take 7 to 10 percent longer to complete a task with AI versus without.

Not all survey results paint precisely the same picture, which supports the subjective impression that there’s widespread disagreement on AI coding’s worth. Nor is that too surprising for a technology that has emerged so quickly. On the one hand, DIY coders hasten to make LLMs deployable at home. At the same time, many users react with great dismay when AI features cannot simply be turned off. JetBrains presumably will have seen the stunning theoretical time savings for developers and have thus enthusiastically enabled the AI Assistant for all its IDEs. Irrespective of developer adoption, there are persistent problems with LLMs generally, from questionable usage rights to volatile (and potentially dangerous) results.

Just as fundamental as those limitations, however, is the potential productivity gain. Linux founder Linus Torvalds was recently asked what he thought of AI-generated code being submitted to him and came up with an illuminating answer. “It’s clearly something where automation has always helped people write code. This is not anything new at all.” He also believes that the mistakes AI can make should not be exaggerated. “I see the bugs that happen without AI every day. So that’s why I’m not so worried. I think we’re doing just fine at making mistakes on our own.”

Also read: GitHub Copilot Chat makes AI programming assistance even more nimble