Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch

11/6/2025

Moonshot AI launched Kimi K2 Thinking, a 1 trillion parameter mixture-of-experts (MoE) model with 32 billion active experts, a 256K context window, and native INT4 quantization-aware training. It achieves state-of-the-art results on benchmarks like HLE (44.9%), BrowseComp (60.2%), and agentic tool use with 200-300 sequential tool calls. The model is deployed with vLLM support and OpenAI-compatible APIs, available on platforms like Arena, Baseten, and Yupp. Early user reports note some API instability under launch load. Meanwhile, Google announced the TPU v7 (Ironwood) with a 10× peak performance improvement over TPU v5p, aimed at training and agentic inference for models like Gemini. Apple added support for M5 Neural Accelerators in llama.cpp for inference acceleration.

Read original post

Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch

Want help turning this idea into a production system?