Back to Blog

QwQ-32B claims to match DeepSeek R1-671B

Alibaba Qwen released their QwQ-32B model, a 32 billion parameter reasoning model using a novel two-stage reinforcement learning approach: first scaling RL for math and coding tasks with accuracy verifiers and code execution servers, then applying RL for general capabilities like instruction following and alignment. Meanwhile, OpenAI rolled out GPT-4.5 to Plus users, with mixed feedback on coding performance and noted inference cost improvements. The QwQ model aims to compete with larger MoE models like DeepSeek-R1. *"GPT-4.5 is unusable for coding"* was a notable user critique, while others praised its reasoning improvements due to scaling pretraining.

Read original post