Back to Blog

FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs

FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs

Jeremy Howard and collaborators released a new tool combining FSDP, QLoRA, and HQQ to enable training 70b-parameter models on affordable consumer GPUs like RTX 4090s with only 24GB RAM, overcoming traditional memory constraints that required expensive data center GPUs costing over $150k. The approach shards quantized models across multiple GPUs and uses techniques like gradient checkpointing and CPU offloading to achieve efficient training on desktop-class hardware. The blogpost details challenges and solutions integrating these methods, highlighting a significant cost reduction from $150k to under $2.5k for training large language models. Additionally, Twitter recaps mention Inflection AI's Inflection-2.5 model rivaling GPT-4 in benchmarks with less compute, and Grok improving speed by 3x. Yann LeCun discusses multi-step reasoning training for LLMs.

Read original post

Turn insight into implementation

Want help turning this idea into a production system?

xAGI Labs helps teams scope, build, and deploy AI products, agent workflows, voice systems, and enterprise rollouts.

If this topic is relevant to your roadmap, we can translate "FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs" into a concrete build plan and launch path.