Elon Musk’s AI developer xAI has finally made the basic model, underlying parameters and architecture of the Grok-1 LLM open source. This was previously announced.
In a short blog post, Elon Musk’s AI development startup provides more details on the part of the network architecture of the Grok-1 LLM’s structural design, such as how the different layers and nodes are established and connected to each other for processing data.
It also revealed how many parameters the basic model behind the LLM consists of and is trained on. In the case of Grok-1, that is 314 billion parameters. The LLM’s underlying code includes JAX and Rust.
MoE model
More specifically, the base model is a so-called Mixture-of-Experts model trained from the beginning by xAI. A Mixture-of-Experts (MoE) model is a machine learning method that combines the outputs of several specialized sub-models (the ‘experts’). This should produce a final prediction that optimizes different tasks or data subsets based on the expertise of each model.
In addition, training a MoE model requires less computational power, making it easier to scale up and increase data size within the existing computational power budget. Also, a MoE model offers more efficient pre-training and faster inference than high-density models.
xAI is not the only AI developer using a MoE model. Mistral AI’s Mixtral 8x7B LLM is also based on this method.
Not yet suitable for applications
The Grok-1 model now open-sourced is the “raw” base model from last October’s pre-training phase. According to the AI developer, this means that the model has not yet been optimized for specific applications, such as dialogue. Thus, the open-source release of Grok-1 does not yet allow for truly interactive GenAI solutions and applications.
More details about the Grok-1 LLM model have not been disclosed. The release notes on GitHub recommend running the test model available under an Apache 2.0 license on a computer with enough CPU memory. This is because of the many parameters that make up the model.
Furthermore, the developers of xAI indicate that the implementation of the MoE layer in the GitHub repository is not yet efficient. This implementation was preferred to ensure that no custom kernels were needed to validate the correctness of the model.
Read more: xAI will open source ChatGPT competitor Grok this week