Since OpenAI unveiled GPT-4’s visual capabilities in late 2023, the AI arms race has centered on multimodality. However, Nvidia researchers argue this is too often happening behind closed doors. Their latest creation, a family of models called NVLM 1.0, is both advanced and open – though not truly open-source.
The most powerful NVLM variant, dubbed NVLM-D-72B, boasts 72 billion parameters. While it performs strongly in established AI benchmarks, competing models often edge ahead. Where NVLM truly shines is in image understanding. This achievement is impressive on its own, but the real academic breakthrough lies in maintaining strong textual performance alongside it. The complete testing methodology and model architecture are detailed in their scientific paper.
In it, Nvidia takes a different stance than Google (Gemini), Anthropic (Claude), and Meta (Llama) when addressing top benchmark scores. While independent verification of these benchmarks is possible, they don’t tell the whole story. Good benchmark results are one thing, but real-world usage consistently reveals each model’s unique characteristics.
Open isn’t open-source
While GPT-4 (and its variants/successors) remains completely closed, Meta’s Llama models offer more transparency – though with limitations. They provide open access, but without full visibility into training data and with significant usage restrictions. Like its predecessors, Meta isn’t giving Llama 3.2 away freely.
Nvidia has released the model weights on Hugging Face. While these values are essential to the AI model’s operation, there’s more that could be shared. Nvidia has promised to eventually release the training code, while the inference code is already available.
Nvidia’s move toward openness is well-intentioned, but falls short of being open-source. Nevertheless, it has converted some already. VentureBeat, for instance, enthusiastically praises this “bombshell” release from the AI chipmaker. “This decision [to share model weights and code] This decision grants researchers and developers unprecedented access to cutting-edge technology,” the author notes. However, calling this an open-source initiative – as the article does – goes too far. Nvidia itself has benefited from open-source resources in creating NVLM 1.0, from insights drawn from other AI models to training data.
However, NVLM-D-72B cannot be used commercially, nor can it be modified for resale, as stated in the licensing terms. In essence, Nvidia is sharing the model solely for research purposes and hobbyists looking to push their high-end graphics cards to the limit. The researchers’ use of “open” is therefore carefully calculated. While Nvidia’s findings certainly offer value, the commercial use restrictions prevent it from being truly open-source – which would require freedom to use, modify, and share. In the AI field, major players – whether OpenAI, Meta, Google, Anthropic, or Nvidia – aren’t ready for that level of openness yet.