Hugging Face and AWS collaboration promises package deal in AI development

Users of AI hub Hugging Face can now more easily use AWS Inferentia2 AI accelerators to run their LLMs. This is done through AWS Sagemaker or with dedicated EC2 instances. According to both parties, this increases efficiency and lowers operational costs for AI developers.

Thanks to the collaboration that was recently announced, AI developers using the numerous LLMs present in Hugging Face to develop their own models can now run on AWS’ Inferentia2 AI accelerator in the production phase as well.

According to Hugging Face and AWS, this primarily offers AI developers greater efficiency and cost savings. The AWS Inferentia2 processors would be especially suitable for the many inference operations LLMs perform in their production phase.

Additionally, AWS hopes that more AI developers will use its cloud environment to develop such models. This would benefit all parties involved, is the promise at least.

Building on previous collaboration

Hugging Face users can use these dedicated AI accelerator processors through two options. The first option builds on the collaboration between the two parties announced earlier this year involving the cloud-based machine-learning platform AWS Sagemaker. Here, an LLM like Meta’s LLama 3 can run on AWS Inferentia2 accelerators via this tool for inference tasks. This functionality has now been extended to more than 100,000 public LLMs, including 14 LLM architectures in the new constellation.

The second option, tailor-made for Llama 3, is deploying LLMs via the Hugging Face Inference Endpoints solution. End users can deploy their LLMs here using dedicated AWS Inferentia2 EC2 instances.

With this option, the Hugging Face Inference Endpoints solution uses so-called Text Generation Inference for Neuron (TGI) technology to run LLama 3 on the AWS Inferentia2 accelerator. This technology is specifically designed to support large-scale LLMs for production workloads, including continuous batching and streaming functionality. Charges are billed per capacity/second and on a scale-up and scale-down basis.

Two flavours

Users are again given two options for deploying their LLMs, especially for Llama 3 using the AWS EC2 option: one cheaper and one more expensive. First, an AWS EC2 Inf2 small instance with two cores and 32 GB of memory. According to Hugging Face, this is excellent for use with Llama 3 8B and costs $0.75 per hour.

The second option is an Inf2-xlarge EC2 instance with 24 cores and 384 GB of memory. This option is great for use with Llama 3 70B. It costs $12 per hour of use.

Also read: Meta unveils powerful open-source model Llama 3 and chatbot Meta AI

Top story

Domain-specific AI beats general models in business applications

Visma’s AI team is quietly redefining document processing across Europe. With a background spanning nearly ...

Berry Zwets 1 day ago

Tech calendar

Hugging Face and AWS collaboration promises package deal in AI development

Building on previous collaboration

Two flavours

Stay tuned, subscribe!

KnowBe4 evolves from security training to human risk management

The AI wave is forcing organizations to rethink their infrastructure

Ingram Micro slowly gets back on its feet after ransomware attack

Nvidia reaches milestone of $4 trillion market value

New Alteryx release tears down walls between cloud services and datasets

Wikidata unlocks its own knowledge base by vectorizing its data

Microsoft Fabric will be like Office, but for data platforms

SAP Datasphere makes data access easier

Krijg Volledig Inzicht van Gebruiker tot Cloud met Cisco ThousandEyes

GITEX DIGI_HEALTH 5.0 - Thailand

IT Arena

Innovation Week 2025

Luxembourg Venture Days

Appdevcon

Experience Synology’s latest enterprise backup solution

How to choose the right Enterprise Linux platform?

Enhance your data protection strategy for 2025

Strengthen your cybersecurity with DNS best practices