Back to Blog

Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing

Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing

Cursor, an AI-native IDE, announced a speculative edits algorithm for code editing that surpasses GPT-4 and GPT-4o in accuracy and latency, achieving speeds of over 1000 tokens/s on a 70b model. OpenAI released GPT-4o with multimodal capabilities including audio, vision, and text, noted to be 2x faster and 50% cheaper than GPT-4 turbo, though with mixed coding performance. Anthropic introduced streaming, forced tool use, and vision features for developers. Google DeepMind unveiled Imagen Video and Gemini 1.5 Flash, a small model with a 1M-context window. HuggingFace is distributing $10M in free GPUs for open-source AI models like Llama, BLOOM, and Stable Diffusion. Evaluation insights highlight challenges with LLMs on novel problems and benchmark saturation, with new benchmarks like MMLU-Pro showing significant drops in top model performance.

Read original post

Turn insight into implementation

Want help turning this idea into a production system?

xAGI Labs helps teams scope, build, and deploy AI products, agent workflows, voice systems, and enterprise rollouts.

If this topic is relevant to your roadmap, we can translate "Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing" into a concrete build plan and launch path.