3 min

Tags in this article

, , ,

LLMs have significant system requirements, which means that only advanced hardware can currently handle them. That seems to be changing, if Apple’s research is to be believed.

Apple researchers recently managed to run large AI models with highly limited system memory in the so-called “LLM In A Flash” study. AI inferencing, the calculations that enable a chatbot’s response to a prompt, became possible thanks to making the best use of the characteristics of flash and DRAM memory.

Falcon 7B, a popular open-source model widely used for benchmarks, was able to run 90 percent faster than before as a result of the researchers’ work. AI models that normally demand twice a device’s available system memory were deployable on local systems with the new methods.

Software the secret to AI success

The Apple researchers therefore stress that their findings will be useful for systems with a limited pool of memory, highlighting use cases where either the CPU and GPU perform the required calculations. Where a model with 7 billion parameters previously took 14GB of memory, an iPhone 15 Pro with 8GB would suddenly be able to run an AI model thanks to the new methodology.

It shows that there is still much to be gained in the AI sphere by making smarter use of already available hardware. In addition, it has been noted more often than not that progress in AI development is particularly dependent on software, something Intel VP & GM of Client AI John Rayfield reiterated earlier this year.

While there is now plenty of AI hardware in development for PCs, laptops and smartphones, there are still many hurdles to overcome before the technology runs in a fully matured form on local devices. For example, training and fine-tuning LLMs is completely impractical for anyone who does not have a well-equipped server with multiple GPUs on hand. Even then, the training process can take weeks to complete. Therefore, for now, manufacturers are concentrating on AI accelerators that enable inferencing. Even that can involve immense system requirements, meaning a lot of optimization is sorely needed.

Limitations though

Running Llama 2 70B, the largest and most potent variant that Meta has made open-source available, requires an extremely expensive consumer graphics card with lots of memory, even with all kinds of optimizations. AI models suitable for local inferencing can only be of limited size as a result.

Still, Apple’s revelation is significant. After all, the company offers significantly less system memory in Macs and Macbooks than comparably priced Windows desktops and laptops. It says this is primarily due to the fact Apple chips are more efficient with available memory. However, it is of great importance to the company to make AI computations possible for as many users as possible. If large parts of the tech industry are to be believed, we’re heading into a future where virtually every application will be AI-driven. Apple needs to ensure that those applications also run competitively on its own hardware offerings, especially now that it has delivered significant performance improvements in recent years with its self-designed M-series of chips.