Back to Blog

Too Cheap To Meter: AI prices cut 50-70% in last 30 days

Too Cheap To Meter: AI prices cut 50-70% in last 30 days

Gemini 1.5 Flash has cut prices by approximately 70%, offering a highly competitive free tier of 1 million tokens per minute at $0.075/mtok, intensifying the AI model price war. Other significant price reductions include GPT-4o (~50% cut to $2.50/mtok), GPT-4o mini (70-98.5% cut to $0.15/mtok), Llama 3.1 405b (46% cut to $2.7/mtok), and Mistral Large 2 (62% cut to $3/mtok). Deepseek v2 introduced context caching, reducing input token costs by up to 90% to $0.014/mtok. New model releases include Llama 3.1 405b, Sonnet 3.5, EXAONE-3.0 (7.8B instruction-tuned by LG AI Research), and MiniCPM V 2.6 (vision-language model combining SigLIP 400M and Qwen2-7B). Benchmarks show Mistral Large performing well on ZebraLogic and Claude-3.5 leading LiveBench. FlexAttention, a new PyTorch API, simplifies and optimizes attention mechanisms. Andrej Karpathy analyzed RLHF, highlighting its limitations compared to traditional reinforcement learning. Google DeepMind research on compute-optimal scaling was also summarized.

Read original post

Turn insight into implementation

Want help turning this idea into a production system?

xAGI Labs helps teams scope, build, and deploy AI products, agent workflows, voice systems, and enterprise rollouts.

If this topic is relevant to your roadmap, we can translate "Too Cheap To Meter: AI prices cut 50-70% in last 30 days" into a concrete build plan and launch path.