4 min

“Our largest and most capable AI model,” is the promising title given to Gemini by Google. It makes sure everybody has high expectations for the AI model. However, that is not entirely unjustified, because Gemini is indeed a “next-gen AI model”. But what do those promising words mean?

Does Google wipe OpenAI’s models of the map with the launch of Gemini? It can show on paper that with with Gemini Ultra, the most advanced version of Gemini, it does. Gemini Pro and Gemini Nano are the smaller variants, which Google does not allow to compete against GPT-4 in benchmark tests.

Multiple forms of data input are possible

The AI model Gemini can handle multiple types of data. Textual prompts are however still the most workable prompts. “With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask Language Understanding), which combines 57 subjects, including math, physics, history, law, medicine and ethics, for testing both general knowledge and problem-solving ability.” According to Google, it has this score because Gemini does not simply give the first answer that comes to mind.

As a next-generation AI model, Gemini can also handle other forms of data input. For an AI chatbot, that means the prompt can also take the form of an image, audio, video or code. That is no longer an exceptional feature. ChatGPT can conduct conversations with users in five different voices, due to the addition of a new text-to-speech model, Whisper. GPT-3.5 and GPT-4 were also made capable of processing images at the same time. This allows ChatGPT Plus and Enterprise to see, hear and speak.

Delivering code remains the only thing ChatGPT lags behind. It does allow a text prompt to ask for code for a specific task. Just check those results carefully. Tests has already proven in the past that ChatGPT is a sloppy programmer.

Also read: ChatGPT writes incorrect (but convincing) code half the time

ChatGPT can do everything, but not GPT-4

For Google, the expansion to multiple types of data input is still big news, though. Chatbot Bard can only handle textual input to this day. That alone is not enough to take over ChatGPT’s crown as the most powerful chatbot. Bard does deserve that crown, and that’s because of the way Gemini Ultra was trained. Those paying close attention already noticed that ChatGPT can process audio input due to the addition of a new model. So the capability does not come from GPT-4, but ChatGPT can address multiple models to turn different types of input into something meaningful. Gemini Ultra, on the other hand, is trained from the build to handle multiple types of data. Therefore, only one model drives Bard.

The result of Google’s distinctive approach is a model with “sophisticated multimodal reasoning and advanced programming capabilities.” To calculate the capabilities of the model when the input uses both image, and video, and audio, the MMMU benchmark is used. Here, Gemini achieved a result of 59.4 percent, surpassing GPT-4V (56.8%). This however implies that the score of GPT-4V is a combined score of the capabilities of GPT-4V and Whisper, which must account for audio input.

Waiting for Bard Advanced

For now, it all remains plans for the future. Google can demonstrate on paper that Gemini Ultra entitles the company to take over the leading position within the AI world from OpenAI. It cannot claim that title until Gemini Ultra becomes available. According to the tech giant, the model is at this point not tested thoroughly. Which puts the launch on hold. Early next year, the company expects to change that, and a second variant of Bard should become available under the name Bard Advanced.

Until Bard Advanced is available, we will have to compare Gemini Pro to OpenAI’s models for positioning. There, Google throws in a comparison with GPT-3.5. “In six of the eight benchmarks, Gemini Pro outperformed GPT-3.5, including in MMLU (Massive Multitask Language Understanding), one of the main leading standards for measuring large AI models, and GSM8K, which measures mathematical reasoning in elementary school.”

Again, we can make no further comparison than the one provided by Google. Other possibilities we don’t have because Bard with Gemini Pro is not available in Europe. To see the upgraded version of the chatbot at work, Google did create a video in collaboration with Youtuber Mark Rober. He creatively tests the reasoning ability of the chatbot and is himself very pleased with the results now that a project that normally takes one year can be built in three weeks.

Also read: Bard with Gemini Pro is not yet available in Europe

We are bound to promises

Gemini once again promises to give Google the leading role in the AI field. For now, that remains just promises as the most advanced version of Gemini is still left in the testing phase. In the category of free chatbots, it may well convince users who previously used GPT-3.5 by having more capabilities to reason, summarise, understand, code and plan. But again, the European market is left behind for now as Bard will not get an expansion of Gemini Pro in Europe immediately. So until at least next year, Europeans will be at least as comfortable with the products from competitor OpenAI.