Blog

6/17/2026

Midjourney Medical: scan your organs like you step on a scale

**Midjourney** unveiled a new **medical imaging/scanning system** called the **Midjourney Scanner**, described as **radiation-free, magnet-free, fast, and low-c...

6/16/2026

GLM 5.2: the top Frontend Coding model in the world, IndexShare reduces costs

**Z.ai released GLM-5.2**, an MIT-licensed open-weight frontier model targeting **coding and long-horizon agentic tasks** with a **1M-token context window** and...

6/9/2026

Anthropic Claude Fable 5

**Anthropic** released two major models: **Claude Fable 5** for general availability and **Claude Mythos 5** for restricted access, with fallback to **Claude Op...

6/2/2026

Microsoft Build: MAI-Thinking-1 and MAI Family models, Surface RTX Spark Dev Box, and OpenClaw in Windows

**Microsoft** introduced **MAI-Thinking-1**, a **35B parameter MoE model** with **256K context**, achieving **97% on AIME 2025** and outperforming **Sonnet 4.6*...

5/28/2026

Anthropic raises $65B in Series H at a $965B post-money valuation, releases Opus 4.8 and Dynamic Workflows

**Anthropic** announced a massive **$65B Series H financing** at a **$965B valuation**, led by **Altimeter, Dragoneer, Greenoaks, and Sequoia**, with run-rate r...

5/18/2026

Google I/O 2026: Gemini 3.5 Flash, Omni, and Google’s Agent Stack

**Google** announced at I/O the repositioning of **Gemini** as a consumer AI and developer/agent platform with three key releases: **Gemini 3.5 Flash** for fast...

5/7/2026

GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs

**OpenAI** released **GPT-Realtime-2**, a voice model with **GPT-5-class reasoning**, tool use, interruption handling, and extended context windows up to **128K...

5/6/2026

Anthropic-SpaceXai's 300MW/$5B/yr deal for Colossus I, ARR growth is 8000% annualized

**Anthropic** announced a new **SpaceX compute partnership** to significantly increase capacity for **Claude** products, doubling **Claude Code's 5-hour rate li...

4/24/2026

DeepSeek v4

**DeepSeek-V4** technical release features a **1.6T-parameter MoE with 49B active parameters** and **1M-token context**, showcasing hybrid attention and compres...

4/23/2026

GPT 5.5

**OpenAI launched GPT-5.5** as its new flagship model for "real work and powering agents," immediately available in ChatGPT and Codex but with delayed API acc...

4/21/2026

GPT-Image-2

**OpenAI** launched **GPT-Image-2**, enhancing image generation with improved text rendering, layout fidelity, editing, multilingual support, and "thinking" c...

4/16/2026

Anthropic's Claude Opus 4.7

**Anthropic** launched **Claude Opus 4.7**, its most capable Opus model yet, featuring stronger coding and agentic performance, a new tokenizer, and improved lo...

4/7/2026

anthropic

**Anthropic** is highlighted for its impressive business growth with a claimed **15x revenue run-rate increase in one year**, a valuation around **$380B**, and ...

4/7/2026

Anthropic @ $30B ARR, Project GlassWing and Claude Mythos Preview — first model too dangerous to release since GPT-2

**Anthropic** strategically challenges **OpenAI** amid its upcoming IPO concerns by announcing a jump from **$19B ARR in March** to **$30B ARR in April**, highl...

4/4/2026

OpenClaw for GTM

AutoClaw turns your positioning and website into an always-on GTM engine that finds best-fit accounts, researches context, and starts high-quality conversations across channels.

4/2/2026

Gemma 4

**Google DeepMind** released **Gemma 4**, a family of open-weight, multimodal models with long-context support up to **256K tokens** under an **Apache 2.0 licen...

3/26/2026

How to Run a Pilot with xAGI Labs

A practical guide to running a successful AI pilot with xAGI Labs: kickoff, KPIs, timeline, limited production rollout, and expansion planning.

3/26/2026

From Vapi to LiveKit: Hard-Won Lessons Building Production Voice AI Agents

Hard-won production lessons from migrating voice AI agents from Vapi + n8n to LiveKit Agents, with practical guidance on latency, IVR, prompts, and post-call extraction.

3/25/2026

OpenClaw Guide (2026): From Zero to Production

A detailed OpenClaw guide for setup, hosting, model providers, channels, security, workflows, and production launch in 2026.

3/24/2026

The Claude Code Source Leak

**Anthropic's** closed-source coding product **Claude Code** experienced a significant source leak exposing over **500k lines** of orchestration logic, includin...

3/18/2026

MiniMax 2.7: GLM-5 at 1/3 cost SOTA Open Model

**MiniMax M2.7** is the headline model release, described as a "self-evolving agent" with strong performance metrics including **56.22% on SWE-Pro**, **57.0% ...

3/10/2026

Yann LeCun’s AMI Labs launches with a $1.03B seed to build world models around JEPA

**Yann LeCun** launched **Advanced Machine Intelligence (AMI Labs)** with a record **$1.03B seed round** at a **$3.5B pre-money valuation**, aiming to build AI ...

3/9/2026

Autoresearch: Sparks of Recursive Self Improvement

**RSI** covers AI developments from 3/5/2026 to 3/9/2026, highlighting the emergence of **LLMs autonomously training smaller LLMs**, marking a significant "Aut...

3/5/2026

GPT 5.4: SOTA Knowledge Work -and- Coding -and- CUA Model, OpenAI is so very back

**OpenAI** launched **GPT-5.4** and **GPT-5.4 Pro** with unified mainline and Codex models, featuring **native computer use**, up to **~1M token context**, and ...

2/27/2026

OpenAI closes $110B raise from Amazon, NVIDIA, SoftBank in largest startup fundraise in history @ $840B post-money

**OpenAI** has closed a major funding round totaling **$110 billion** at a **$730 billion pre-money valuation**, with investments from **SoftBank ($30B)**, **NV...

2/26/2026

Nano Banana 2 aka Gemini 3.1 Flash Image Preview: the new SOTA Imagegen model

**Google and DeepMind** launched **Nano Banana 2** (aka **Gemini 3.1 Flash Image Preview**), a leading image generation and editing model integrated across mult...

2/25/2026

Agentic Engineering: WTF Happened in December 2025?

**Perplexity** launched **Computer**, an orchestration-first agent platform featuring multi-model routing, usage-based pricing, and parallel asynchronous sub-ag...

2/24/2026

Claude Code Anniversary + Launches from: Qwen 3.5, Cursor Demos, Cognition Devin 2.2, Inception Mercury 2

**Alibaba** launched the **Qwen 3.5 Medium Model Series** featuring models like **Qwen3.5-Flash**, **Qwen3.5-35B-A3B (MoE)**, and **Qwen3.5-122B-A10B (MoE)** em...

2/24/2026

Anthropic accuses DeepSeek, Moonshot, and MiniMax of "industrial-scale distillation attacks".

**Anthropic** alleges *industrial-scale* distillation attacks on its **Claude** model by **DeepSeek**, **Moonshot AI**, and **MiniMax**, involving **~24,000 fra...

2/19/2026

Gemini 3.1 Pro: 2x 3.0 on ARC-AGI 2

**Google** released **Gemini 3.1 Pro**, a developer preview integrated across the **Gemini app**, **NotebookLM**, **Gemini API / AI Studio**, and **Vertex AI**,...

2/17/2026

Claude Sonnet 4.6: clean upgrade of 4.5, mostly better with some caveats

**Anthropic** launched **Claude Sonnet 4.6**, an upgrade over Sonnet 4.5, featuring broad improvements in **coding, long-context reasoning, agent planning, know...

2/16/2026

Qwen3.5-397B-A17B: the smallest Open-Opus class, very efficient model

**Alibaba** released **Qwen3.5-397B-A17B**, an open-weight model featuring **native multimodality**, **spatial intelligence**, and a **hybrid linear attention +...

2/13/2026

MiniMax-M2.5: SOTA coding, search, toolcalls, $1/hour

**MiniMax-M2.5** is now open source, featuring an "agent-native" reinforcement learning framework called **Forge** trained across **200k+ RL environments** fo...

2/12/2026

new Gemini 3 Deep Think, Anthropic $30B @ $380B, GPT-5.3-Codex Spark, MiniMax M2.5

**Google DeepMind** is rolling out the upgraded **Gemini 3 Deep Think V2** reasoning mode to **Google AI Ultra** subscribers and opening early access to the **V...

2/11/2026

Z.ai GLM-5: New SOTA Open Weights LLM

**Zhipu AI** launched **GLM-5**, an **Opus-class** model scaling from **355B to 744B parameters** with **DeepSeek Sparse Attention** integration for cost-effici...

2/10/2026

Qwen-Image 2.0 and Seedance 2.0

**OpenAI** advances its Responses API for multi-hour agent workflows with features like **server-side compaction**, **hosted containers**, and **Skills API**, a...

2/5/2026

OpenAI and Anthropic go to war: Claude Opus 4.6 vs GPT 5.3 Codex

**OpenAI** launched **GPT-5.3-Codex**, emphasizing **token efficiency**, **inference speed**, and hardware/software co-design with **GB200-NVL72** and **NVIDIA*...

2/4/2026

ElevenLabs $500m Series D at $11B, Cerebras $1B Series H at $23B, Vibe Coding -> Agentic Engineering

**Google's Gemini 3** is being integrated widely, including a new **Chrome side panel** and **Nano Banana** UX features, with rapid adoption and a **78% unit-co...

2/3/2026

Context Graphs: Hype or actually Trillion-dollar opportunity?

**Zhipu AI** launched **GLM-OCR**, a lightweight **0.9B** multimodal OCR model excelling in complex document understanding with top benchmark scores and day-0 d...

2/2/2026

OpenAI Codex App: death of the VSCode fork, multitasking worktrees, Skills Automations

**OpenAI** launched the **Codex app** on macOS as a dedicated agent-native command center for coding, featuring **multiple agents in parallel**, **built-in work...

1/30/2026

MoltBook takes over the timeline

**Moltbook** and **OpenClaw** showcase emergent multi-agent social networks where AI agents autonomously interact, creating an AI-native forum layer with comple...

1/29/2026

xAI Grok Imagine API - the #1 Video Model, Best Pricing and Latency - and merging with SpaceX

**Google DeepMind** launched **Project Genie (Genie 3 + Nano Banana Pro + Gemini)**, a prototype for creating interactive, real-time generated worlds from text ...

1/27/2026

Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager

**MoonshotAI's Kimi K2.5** is a **32B active-1T parameter open-weights model** featuring **native multimodality** with image and video understanding, built thro...

1/26/2026

Anthropic launches the MCP Apps open spec, in Claude.ai

**Anthropic** has officially absorbed the independent MCP UI project and, collaborating with **OpenAI**, **Block**, **VS Code**, **Antigravity**, **JetBrains**,...

1/21/2026

OpenEvidence, the ‘ChatGPT for doctors,’ raises $250m at $12B valuation, 12x from $1b last Feb

**OpenEvidence** raised **$12 billion**, a 12x increase from last year, with usage by 40% of U.S. physicians and over $100 million in annual revenue. **Anthropi...

1/16/2026

ChatGPT starts testing ads on free tier + new $8/mo Go plan in the US

**OpenAI** announced the **ChatGPT Go** tier at **$8/month** with ads testing in the US free tier, emphasizing that ads will not influence responses and will be...

1/15/2026

Open Responses: explicit spec for OpenAI's Responses API supported by OpenRouter, Ollama, Huggingface, vLLM, et al

**OpenAI** launched the **Open Responses** API spec, an open-source, multi-provider standard for interoperable LLM APIs designed to simplify agent stacks and to...

1/13/2026

Anthropic Labs: Cowork, Claude Code, MCP, Skills incubator led by Mike Krieger and Ben Mann

**Anthropic** consolidates its AI agent products under the **Cowork** brand, integrating prior tools like **Claude Code** and **Claude for Chrome** into a unifi...

1/12/2026

Apple picks Google's Gemini to power Siri's next generation

**Apple** has decided to power Siri with **Google's Gemini models** and cloud technology, marking a significant partnership and a setback for **OpenAI**, which ...

1/6/2026

xAI raises $20B Series E at ~$230B valuation

**xAI**, Elon Musk's AI company, completed a massive **$20 billion Series E funding round**, valuing it at about **$230 billion** with investors like **Nvidia**...

12/29/2025

Meta Superintelligence Labs acquires Manus AI for over $2B, at $100M ARR, 9months after launch

**Manus** achieved a rapid growth trajectory in 2025, raising **$500M** from Benchmark and reaching **$100M ARR** before being acquired by **Meta** for an estim...

12/24/2025

Nvidia buys (most of) Groq for $20B cash; largest execuhire ever

**Groq** leadership team is joining **Nvidia** under a "non-exclusive licensing agreement" in a deal valued at **$20 billion cash**, marking a major acquisiti...

12/18/2025

Claude Skills grows: Open Standard, Directory, Org Admin

**Claude Skills** are gaining significant traction since their launch in October, with a milestone of 100k views in one day for the Claude Skills talk, signalin...

12/17/2025

Gemini 3.0 Flash Preview: 1/4 cost of Pro, but ~as smart, retakes Pareto Frontier

**Google** launched **Gemini 3 Flash**, a pro-grade reasoning model with flash latency, supporting tool calling and multimodal IO, available via multiple platfo...

12/16/2025

OpenAI GPT Image-1.5 claims to beat Nano Banana Pro, #1 across all Arenas, but completely fails Vibe Checks

**OpenAI** released its new image model **GPT Image 1.5**, featuring precise image editing, better instruction following, improved text and markdown rendering, ...

12/15/2025

NVIDIA Nemotron 3: hybrid Mamba-Transformer completely open source models from 30B to 500B

**NVIDIA** has released **Nemotron 3 Nano**, a fully open-source hybrid Mamba-Transformer Mixture-of-Experts (MoE) model with a **30B parameter size** and a **1...

12/11/2025

GPT-5.2 (Instant/Thinking/Pro): 74% on GDPVal, 1.4x cost of GPT 5.1, on 10 Year OpenAI Anniversary

**OpenAI** celebrates its 10 year anniversary with the launch of **GPT-5.2**, featuring significant across-the-board improvements including a rare 40% price inc...

12/9/2025

MCP -> Agentic AI Foundation, Mistral Devstral 2

**OpenAI Engineering** sees a significant collaborative milestone with the launch of the **Agentic AI Foundation** under the Linux Foundation, uniting projects ...

12/4/2025

OpenRouter's State of AI - An Empirical 100 Trillion Token Study

**OpenRouter** released its first survey showing usage trends with 7 trillion tokens proxied weekly, highlighting a 52% roleplay bias. **Deepseek**'s open model...

12/2/2025

Mistral 3: Mistral Large 3 + Ministral 3B/8B/14B open weights models

**Mistral** has launched the **Mistral 3 family** including **Ministral 3** models (3B/8B/14B) and **Mistral Large 3**, a sparse MoE model with **675B total par...

12/2/2025

DeepSeek V3.2 & 3.2-Speciale: GPT5-High Open Weights, Context Management, Plans for Compute Scaling

**DeepSeek** launched the **DeepSeek V3.2** family including Standard, Thinking, and Speciale variants with up to **131K context window** and competitive benchm...

11/25/2025

Black Forest Labs FLUX.2 [pro|flex|dev|klein]: near-Nano Banana quality but Open Weights

**Black Forest Labs' FLUX.2** release features **Multi-Reference Support** for up to **4 Megapixel** output and up to **10 images** with consistency, including ...

11/24/2025

Claude Opus 4.5: 3rd new SOTA coding model in past week, 1/3 the price of Opus

**Anthropic** launched **Claude Opus 4.5**, a new flagship model excelling in **coding, agents, and tooling** with a significant **3x price cut** compared to Op...

11/21/2025

AI Engineer Code Summit

The recent **AIE Code Summit** showcased key developments including **Google DeepMind's Gemini 3 Pro Image model, Nano Banana Pro**, which features enhanced tex...

11/20/2025

Nano Banana Pro (Gemini Image Pro) solves text-in-images, infographic generation, 2-4k resolution, and Google Search grounding

**Google** launched **Gemini 3 Pro Image (Nano Banana Pro)**, a next-generation AI image generation and editing model with integrated Google Search grounding, m...

11/19/2025

OpenAI fires back: GPT-5.1-Codex-Max (API) and GPT 5.1 Pro (ChatGPT)

**OpenAI** released **GPT-5.1-Codex-Max**, featuring compaction-native training, an "Extra High" reasoning mode, and claims of over 24-hour autonomous operati...

11/18/2025

Gemini 3 Pro — new GDM frontier model 6, Gemini 3 Deep Think, and Antigravity IDE

**Google** launched **Gemini 3 Pro**, a state-of-the-art model with a **1M-token context window**, **multimodal reasoning**, and strong agentic capabilities, pr...

11/17/2025

xAI Grok 4.1: #1 in Text Arena, #1 in EQ-bench, and better Creative Writing

**xAI** launched **Grok 4.1**, achieving a #1 rank on the LM Arena Text Leaderboard with an Elo score of **1483**, showing improvements in creative writing and ...

11/13/2025

minor updates to GPT 5.1 and SIMA 2

**OpenAI** released **GPT-5.1** family models including **5.1-Codex** and **5.1-Codex-Mini** with improved steerability, faster responses, and new tools like ap...

11/12/2025

GPT 5.1 in ChatGPT: No evals, but adaptive thinking and instruction following

**OpenAI** launched **GPT-5.1** with improvements in conversational tone, instruction following, and adaptive reasoning. **GPT-5.0** is being sunset in 3 months...

11/7/2025

Terminal-Bench 2.0 and Harbor

**Terminal-Bench** has fixed task issues and launched version 2.0 with cloud container support via the **Harbor framework**, gaining recognition from models lik...

11/6/2025

Kimi K2 Thinking: 1T-A32B params, SOTA HLE, BrowseComp, TauBench && Soumith leaves Pytorch

**Moonshot AI** launched **Kimi K2 Thinking**, a **1 trillion parameter** mixture-of-experts (MoE) model with **32 billion active experts**, a **256K context wi...

10/29/2025

Cursor 2.0 & Composer-1: Fast Models and New Agents UI

**Cursor 2.0** launched with **Composer-1**, an agentic coding model optimized for speed and precision, featuring multi-agent orchestration, built-in browser fo...

10/28/2025

OpenAI completes Microsoft + For-profit restructuring + announces 2028 AI Researcher timeline + Platform / AI cloud product direction + next $1T of compute

**OpenAI** has completed a major recapitalization and restructuring, forming a Public Benefit Corporation with a non-profit Foundation holding special voting ri...

10/27/2025

MiniMax M2 230BA10B — 8% of Claude Sonnet's price, ~2x faster, new SOTA open model

**MiniMax M2**, an open-weight sparse MoE model by **Hailuo AI**, launches with **≈200–230B parameters** and **10B active parameters**, offering strong performa...

10/21/2025

ChatGPT Atlas: OpenAI's AI Browser

**OpenAI** launched the **Chromium fork AI browser Atlas** for macOS, featuring integrated **Agent mode** and browser memory with local login capabilities, aimi...

10/20/2025

DeepSeek-OCR finds vision models can decode 10x more efficiently with ~97% accuracy of text-only, 33/200k pages/day/A100

As **ICCV 2025** begins, **DeepSeek** releases a novel **DeepSeek-OCR** 3B MoE vision-language model that compresses long text as visual context with high accur...

10/17/2025

The Karpathy-Dwarkesh Interview delays AGI timelines

The recent AI news highlights the **Karpathy interview** as a major event, alongside significant discussions on reasoning improvements without reinforcement lea...

10/16/2025

Claude Agent Skills - glorified AGENTS.md? or MCP killer?

**Anthropic** achieves a rare feat with back-to-back AI news headlines featuring **Claude's** new **Skills**—a novel way to build specialized agents using Markd...

10/15/2025

Claude Haiku 4.5

**Anthropic** released **Claude Haiku 4.5**, a model that is over 2x faster and 3x cheaper than **Claude Sonnet 4.5**, improving iteration speed and user experi...

10/13/2025

OpenAI Titan XPU: 10GW of self-designed chips with Broadcom

**OpenAI** is finalizing a custom ASIC chip design to deploy **10GW** of inference compute, complementing existing deals with **NVIDIA** (10GW) and **AMD** (6GW...

10/9/2025

Air Street's State of AI 2025 Report

**Reflection** raised **$2B** to build frontier open-weight models with a focus on safety and evaluation, led by a team with backgrounds from **AlphaGo**, **PaL...

10/7/2025

Gemini 2.5 Computer Use preview beats Sonnet 4.5 and OAI CUA

**Google DeepMind** released a new **Gemini 2.5 Computer Use model** for browser and Android UI control, evaluated by Browserbase. **OpenAI** showcased **GPT-5 ...

10/6/2025

OpenAI Dev Day: Apps SDK, AgentKit, Codex GA, GPT‑5 Pro and Sora 2 APIs

**OpenAI** showcased major product launches at their DevDay including the **Apps SDK**, **AgentKit**, and **Codex** now generally available with SDK and enterpr...

10/1/2025

Thinking Machines' Tinker: LoRA based LLM fine-tuning API

**Thinking Machines** recently raised **$2 billion** without shipping a product until now, launching their first product **Tinker**, a managed service API for f...

9/30/2025

Sora 2: new video+audio model and OpenAI's first Social Network

**Sora 2** released with improvements on physical world video modeling and a new "character consistency" feature allowing real-world element injection from a ...

9/29/2025

Anthropic Claude Sonnet 4.5, Claude Code 2.0, new VS Code Extensions

**Anthropic** launched a major update with **Claude Sonnet 4.5**, achieving **77.2% SWE-Bench** verified performance and improvements in finance, law, and STEM....

9/25/2025

GDPVal finding: Claude Opus 4.1 within 95% of AGI (human experts in top 44 white collar jobs)

**OpenAI**'s Evals team released **GDPval**, a comprehensive evaluation benchmark covering 1,320 tasks across 44 predominantly digital occupations, assessing AI...

9/23/2025

Alibaba Yunqi: 7 models released in 4 days (Qwen3-Max, Qwen3-Omni, Qwen3-VL) and $52B roadmap

**Alibaba's Tongyi Qianwen (Qwen) team** launched major updates including the **1T parameter Qwen3-Max**, **Qwen3-Omni**, and **Qwen3-VL** models, alongside spe...

9/22/2025

NVIDIA to invest $100B in OpenAI for 10GW of Vera Rubin rollout

**NVIDIA** and **OpenAI** announced a landmark strategic partnership to deploy at least **10 gigawatts** of AI datacenters using NVIDIA's systems, with NVIDIA i...

9/19/2025

Grok 4 Fast: Xai's distilled, 40% more token efficient, 2m context, 344 tok/s frontier model

**xAI** announced **Grok 4 Fast**, a highly efficient model running at **344 tokens/second**, offering reasoning and nonreasoning modes and free trials on major...

9/18/2025

Softbank, NVIDIA and US Govt take 2%, 5% and 10% of Intel, will develop Intel x86 RTX SOCs for consumer & datacenters

**Nvidia and Intel** announced a joint development partnership for multiple new generations of x86 products, marking a significant shift in the tech industry. T...

9/15/2025

GPT-5 Codex launch and OpenAI's quiet rise in Agentic Coding

**OpenAI** released **GPT-5-Codex**, an agentic coding model optimized for long-running software engineering tasks with dynamic task-adaptive thinking, multi-ho...

9/11/2025

Qwen3-Next-80B-A3B-Base: Towards Ultimate Training & Inference Efficiency

**MoE (Mixture of Experts) models** have become essential in frontier AI models, with **Qwen3-Next** pushing sparsity further by activating only **3.7% of param...

9/10/2025

Oracle jumps +36% in a day after winning $300B OpenAI contract

**Oracle's OCI division** reported a stunning **+359% revenue bookings growth to $455B** with cloud revenue guidance of **$144B by 2030**, driven significantly ...

9/8/2025

Cognition's $10b Series C; Smol AI updates

**Cognition** raised **$400M** at a **$10.2B** valuation to advance AI coding agents, with **swyx** joining the company. **Vercel** launched an OSS coding platf...

9/5/2025

Kimi K2‑0905 and Qwen3‑Max preview: two 1T open weights models launched

**Moonshot AI** updated their **Kimi K2-0905** open model with doubled context length to **256k tokens**, improved coding and tool-calling, and integration with...

9/2/2025

Anthropic raises $13B at $183B Series F

**Anthropic** achieved a **$183B post-money valuation** in Series F funding by September 2025, growing from about $1B run-rate in January to over **$5B run-rate...

8/28/2025

OpenAI Realtime API GA and new `gpt-realtime` model, 20% cheaper than 4o

**OpenAI** launched the **gpt-realtime** model and **Realtime API** to GA, featuring advanced speech-to-speech capabilities, new voices (**Cedar**, **Marin**), ...

8/27/2025

OpenAI updates Codex, VSCode Extension that can sync tasks with Codex Cloud

**OpenAI Codex** has launched a new IDE Extension integrating with VS Code and Cursor, enabling seamless local and cloud task handoff, sign-in via ChatGPT plans...

8/26/2025

nano-banana is Gemini‑2.5‑Flash‑Image, beating Flux Kontext by 170 Elo with SOTA Consistency, Editing, and Multi-Image Fusion

**Google DeepMind** revealed **Gemini-2.5-Flash-Image-Preview**, a state-of-the-art image editing model excelling in **character consistency**, **natural-langua...

8/21/2025

Cohere Command A Reasoning beats GPT-OSS-120B and DeepSeek R1 0528

**Cohere's Command A Reasoning** model outperforms GPT-OSS in open deep research capabilities, emphasizing agentic use cases for 2025. **DeepSeek-V3.1** introdu...

8/20/2025

DeepSeek V3.1: 840B token continued pretrain, beating Claude 4 Sonnet at 11% of its cost

**DeepSeek** released **DeepSeek V3.1**, a quietly rolled out open model with an **128K context window** and improvements in **token efficiency**, coding, and a...

8/19/2025

Databricks' $100B Series K

**Databricks** reached a **$100 billion valuation**, becoming a centicorn with new Data ([Lakebase](https://www.databricks.com/product/lakebase)) and AI ([Agent...

8/14/2025

Western Open Models get Funding: Cohere $500m @ 6.8B, AI2 gets $152m NSF+NVIDIA grants

**OpenAI's GPT-5** achieved a speedrun of Pokemon Red 3x faster than **o3**. **Perplexity** raised **$200M** at a **$20B valuation**. **AI2** secured **$75M NSF...

8/11/2025

OpenAI's IMO Gold model also wins IOI Gold

**OpenAI** announced placing **#6 among human coders** at the IOI, reflecting rapid progress in competitive coding AI over the past two years. The **GPT-5** lau...

8/7/2025

OpenAI rolls out GPT-5 and GPT-5 Thinking to >1B users worldwide; -mini and -nano help claim Pareto Frontier

**OpenAI** launched **GPT-5**, a unified system featuring a fast main model and a deeper thinking model with a real-time router, supporting up to **400K context...

8/5/2025

OpenAI's gpt-oss 20B and 120B, Claude Opus 4.1, DeepMind Genie 3

**OpenAI** released the **gpt-oss** family, including **gpt-oss-120b** and **gpt-oss-20b**, their first open-weight models since GPT-2, designed for agentic tas...

8/4/2025

Qwen-Image: SOTA text rendering + 4o-imagegen-level Editing Open Weights MMDiT

**Alibaba** surprised with the release of **Qwen-Image**, a **20B MMDiT** model excelling at bilingual text rendering and graphic poster creation, with open wei...

8/1/2025

Gemini 2.5 Deep Think finally ships

**OpenAI** is rumored to soon launch new **GPT-OSS** and **GPT-5** models amid drama with **Anthropic** revoking access to **Claude**. **Google DeepMind** quiet...

7/31/2025

Figma's $50+b IPO

**OpenAI**'s stealth model **horizon-alpha** on **OpenRouter** sparks speculation as a precursor to **GPT-5**, showing strong reasoning and SVG generation capab...

7/28/2025

GLM-4.5: Deeper, Headier, & better than Kimi/Qwen/DeepSeek (SOTA China LLM?)

**Z.ai** (Zhipu AI) released the **GLM-4.5-355B-A32B** and **GLM-4.5-Air-106B-A12B** open weights models, claiming state-of-the-art performance competitive with...

7/24/2025

3x in 3 months: Cursor @ $28b, Cognition + Windsurf @ $10b

**Cursor** is reportedly fundraising at a **$28 billion valuation with $1 billion ARR**, while the combined **Cognition+Windsurf** entity is fundraising at a **...

7/21/2025

OAI and GDM announce IMO Gold-level results with natural language reasoning, no specialized training or tools, under human time limits

**OpenAI** and **Google DeepMind** achieved a major milestone by solving 5 out of 6 problems at the **International Mathematical Olympiad (IMO) 2025** within th...

7/17/2025

ChatGPT Agent: new o* model + unified Deep Research browser + Operator computer use + Code Interpreter terminal

**OpenAI** launched the **ChatGPT Agent**, a new advanced AI system capable of browsing the web, coding, analyzing data, and creating reports, marking a signifi...

7/15/2025

Voxtral - Mistral's SOTA ASR model in 3B (mini) and 24B ("small") sizes beats OpenAI Whisper large-v3

**Mistral** surprises with the release of **Voxtral**, a transcription model outperforming **Whisper large-v3**, **GPT-4o mini Transcribe**, and **Gemini 2.5 Fl...

7/11/2025

Kimi K2 - SOTA Open MoE proves that Muon can scale to 15T tokens/1T params

**Moonshot AI** has released **Kimi K2**, a **1 trillion parameter** Mixture-of-Experts model trained on **15.5 trillion tokens** using the new **MuonClip** opt...

7/10/2025

Grok 4: xAI succeeds in going from 0 to new SOTA LLM in 2 years

**xAI** launched **Grok 4** and **Grok 4 Heavy**, large language models rumored to have **2.4 trillion parameters** and trained with **100x more compute** than ...

7/8/2025

SmolLM3: the SOTA 3B reasoning open source LLM

**HuggingFace** released **SmolLM3-3B**, a fully open-source small reasoning model with open pretraining code and data, marking a high point in open source mode...

6/26/2025

OpenAI releases Deep Research API (o3/o4-mini)

**OpenAI** has launched the **Deep Research API** featuring powerful models **o3-deep-research** and **o4-mini-deep-research** with native support for MCP, Sear...

6/25/2025

Context Engineering: Much More than Prompts

**Context Engineering** emerges as a significant trend in AI, highlighted by experts like **Andrej Karpathy**, **Walden Yan** from **Cognition**, and **Tobi Lut...

6/24/2025

Bartz v. Anthropic PBC — "Training use is Fair Use"

**Anthropic** won a significant fair use ruling allowing the training of **Claude** on copyrighted books, setting a precedent for AI training legality despite c...

6/20/2025

The Quiet Rise of Claude Code vs Codex

**Claude Code** is gaining mass adoption, inspiring derivative projects like **OpenCode** and **ccusage**, with discussions ongoing in AI communities. **Mistral...

6/19/2025

minor ai followups: MultiAgents, Meta-SSI-Scale, Karpathy, AI Engineer

**OpenAI** released a paper revealing how training models like **GPT-4o** on insecure code can cause broad misalignment, drawing reactions from experts like *@s...

6/18/2025

Zuck goes Superintelligence Founder Mode: $100M bonuses + $100M+ salaries + NFDG Buyout?

**Meta AI** is reportedly offering **8-9 figure signing bonuses and salaries** to top AI talent, confirmed by **Sam Altman**. They are also targeting key figure...

6/17/2025

Gemini 2.5 Pro/Flash GA, 2.5 Flash-Lite in Preview

**Gemini 2.5** models are now generally available, including the new **Gemini 2.5 Flash-Lite**, **Flash**, **Pro**, and **Ultra** variants, featuring sparse **M...

6/16/2025

Chinese Models Launch - MiniMax-M1, Hailuo 2 "Kangaroo", Moonshot Kimi-Dev-72B

**MiniMax AI** launched **MiniMax-M1**, a 456 billion parameter open weights LLM with a 1 million token input and 80k token output using efficient "lightning a...

6/13/2025

Cognition vs Anthropic: Don't Build Multi-Agents/How to Build Multi-Agents

Within the last 24 hours, **Cognition**'s Walden Yan advised *"Don't Build Multi-Agents,"* while **Anthropic** shared their approach to building multi-agent s...

6/11/2025

Execuhires Round 2: Scale-Meta, Lamini-AMD, and Instacart-OpenAI

**Meta** hires **Scale AI's Alexandr Wang** to lead its new "Superintelligence" division following a **$15 billion investment** for a 49% stake in Scale. **La...

6/10/2025

Reasoning Price War 2: Mistral Magistral + o3's 80% price cut + o3-pro

**OpenAI** announced an **80% price cut** for its **o3** model, making it competitively priced with **GPT-4.1** and rivaling **Anthropic's Claude 4 Sonnet** and...

6/9/2025

Apple exposes Foundation Models API and... no new Siri

**Apple** released on-device foundation models for iOS developers, though their recent "Illusion of Reasoning" paper faced significant backlash for flawed met...

6/5/2025

Gemini 2.5 Pro (06-05) launched at AI Engineer World's Fair

At the second day of **AIE**, **Google's Gemini 2.5 Pro** reclaimed the top spot on the LMArena leaderboard with a score of **1470** and a +24 Elo increase, sho...

6/4/2025

AI Engineer World's Fair Talks Day 1

**Mistral** launched a new **Code** project, and **Cursor** released version **1.0**. **Anthropic** improved **Claude Code** plans, while **ChatGPT** announced ...

5/31/2025

Mary Meeker is so back: BOND Capital AI Trends report

**Mary Meeker** returns with a comprehensive **340-slide report** on the state of AI, highlighting accelerating tech cycles, compute growth, and comparisons of ...

5/29/2025

DeepSeek-R1-0528 - Gemini 2.5 Pro-level model, SOTA Open Weights release

**DeepSeek R1-0528** marks a significant upgrade, closing the gap with proprietary models like **Gemini 2.5 Pro** and surpassing benchmarks from **Anthropic**, ...

5/27/2025

Mistral's Agents API and the 2025 LLM OS

**The LLM OS** concept has evolved since 2023, with **Mistral AI** releasing a new **Agents API** that includes code execution, web search, persistent memory, a...

5/22/2025

Anthropic releases Claude 4 Sonnet and Opus: Memory, Agent Capabilities, Claude Code, Redteam Drama

**Anthropic** has officially released **Claude 4** with two variants: **Claude Opus 4**, a high-capability model for complex tasks priced at **$15/$75 per milli...

5/21/2025

OpenAI buys Jony Ive's io for $6.5b, LMArena lands $100m seed from a16z

**OpenAI** confirmed a partnership with **Jony Ive** to develop consumer hardware. **LMArena** secured a $100 million seed round from **a16z**. **Mistral** laun...

5/20/2025

Google I/O: new Gemini native voice, Flash, DeepThink, AI Mode (DeepSearch+Mariner+Astra)

**Google I/O 2024** showcased significant advancements with **Gemini 2.5 Pro** and **Deep Think** reasoning mode from **google-deepmind**, emphasizing AI-driven...

5/16/2025

ChatGPT Codex, OpenAI's first cloud SWE agent

**OpenAI** launched **Codex**, a cloud-based software engineering agent powered by **codex-1** (an optimized version of **OpenAI o3**) available in research pre...

5/15/2025

Gemini's AlphaEvolve agent uses Gemini 2.0 to find new Math and cuts Gemini cost 1% — without RL

**Deepmind's AlphaEvolve**, a 2025 update to AlphaTensor and FunSearch, is a Gemini-powered **coding agent for algorithm discovery** that designs faster matrix ...

5/14/2025

Granola launches team notes, while Notion launches meeting transcription

**GPT-4.1** is now available in **ChatGPT** for Plus, Pro, and Team users, focusing on coding and instruction following, with **GPT 4.1 mini** replacing **GPT 4...

5/12/2025

Prime Intellect's INTELLECT-2 and PRIME-RL advance distributed reinforcement learning

**Prime Intellect** released **INTELLECT-2**, a decentralized GPU training and RL framework with a vision for distributed AI training overcoming colocation limi...

5/7/2025

AI Engineer World's Fair: Second Run, Twice The Fun

**The 2025 AI Engineer World's Fair** is expanding with **18 tracks** covering topics like **Retrieval + Search**, **GraphRAG**, **RecSys**, **SWE-Agents**, **A...

5/6/2025

Gemini 2.5 Pro Preview 05-06 (I/O edition) - the SOTA vision+coding model

**Gemini 2.5 Pro** has been updated with enhanced multimodal image-to-code capabilities and dominates the WebDev Arena Leaderboard, surpassing **Claude 3.7 Sonn...

5/5/2025

Cursor @ $9b, OpenAI Buys Windsurf @ $3b

**OpenAI** is reportedly close to closing a deal with Windsurf, coinciding with **Cursor's** $900M funding round at a $9B valuation. **Nvidia** launched the **L...

4/30/2025

ChatGPT responds to GlazeGate + LMArena responds to Cohere

**OpenAI** faced backlash after a controversial ChatGPT update, leading to an official retraction admitting they "focused too much on short-term feedback." Re...

4/29/2025

LlamaCon: Meta AI gets into the Llama API platform business

**Meta** celebrated progress in the **Llama** ecosystem at LlamaCon, launching an AI Developer platform with finetuning and fast inference powered by **Cerebras...

4/28/2025

Qwen 3: 0.6B to 235B MoE full+base models that beat R1 and o1

**Qwen 3** has been released by **Alibaba** featuring a range of models including two MoE variants, **Qwen3-235B-A22B** and **Qwen3-30B-A3B**, which demonstrate...

4/25/2025

Cognition's DeepWiki, a free encyclopedia of all GitHub repos

**Silas Alberti** of **Cognition** announced **DeepWiki**, a free encyclopedia of all GitHub repos providing Wikipedia-like descriptions and Devin-backed chatbo...

4/23/2025

gpt-image-1 - ChatGPT's imagegen model, confusingly NOT 4o, now available in API

**OpenAI** officially launched the **gpt-image-1** API for image generation and editing, supporting features like alpha channel transparency and a "low" conte...

4/19/2025

Grok 3 & 3-mini now API Available

**Grok 3** API is now available, including a smaller version called Grok 3 mini, which offers competitive pricing and full reasoning traces. **OpenAI** released...

4/18/2025

Gemini 2.5 Flash completes the total domination of the Pareto Frontier

**Gemini 2.5 Flash** is introduced with a new "thinking budget" feature offering more control compared to Anthropic and OpenAI models, marking a significant u...

4/17/2025

OpenAI o3, o4-mini, and Codex CLI

**OpenAI** launched the **o3** and **o4-mini** models, emphasizing improvements in **reinforcement-learning scaling** and overall efficiency, making **o4-mini**...

4/16/2025

QwQ-32B claims to match DeepSeek R1-671B

**Alibaba Qwen** released their **QwQ-32B** model, a **32 billion parameter** reasoning model using a novel two-stage reinforcement learning approach: first sca...

4/16/2025

SOTA Video Gen: Veo 2 and Kling 2 are GA for developers

**Google's Veo 2** video generation model is now available in the **Gemini API** with a cost of **35 cents per second** of generated video, marking a significan...

4/15/2025

GPT 4.1: The New OpenAI Workhorse

**OpenAI** released **GPT-4.1**, including **GPT-4.1 mini** and **GPT-4.1 nano**, highlighting improvements in **coding**, **instruction following**, and handli...

4/10/2025

Google's Agent2Agent Protocol (A2A)

**Google Cloud Next** announcements featured the launch of **Google and DeepMind's** full **MCP support** and a new **Agent to Agent protocol** designed for age...

4/9/2025

DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

**Together AI and Agentica** released **DeepCoder-14B**, an open-source 14B parameter coding model rivaling OpenAI's **o3-mini** and **o1** on coding benchmarks...

4/8/2025

Llama 4's Controversial Weekend Release

**Meta** released **Llama 4**, featuring two new medium-size MoE open models and a promised 2 Trillion parameter "behemoth" model, aiming to be the largest op...

4/1/2025

>$41B raised today (OpenAI @ 300b, Cursor @ 9.5b, Etched @ 1.5b)

**OpenAI** is preparing to release a highly capable open language model, their first since GPT-2, with a focus on reasoning and community feedback, as shared by...

3/27/2025

OpenAI adopts MCP

**OpenAI** announced support for **MCP**, a significant technical update. **Google's Gemini 2.5 Pro** leads benchmarks with top scores in **MMLU-Pro (86%)**, **...

3/26/2025

Gemini 2.5 Pro + 4o Native Image Gen

**Gemini 2.5 Pro** from **Google DeepMind** has become the new top AI model, surpassing **Grok 3** by 40 LMarena points, with contributions from **Noam Shazeer*...

3/25/2025

Halfmoon is Reve Image: a new SOTA Image Model from ex-Adobe/Stability trio

**Reve**, a new composite AI model from former Adobe and Stability alums **Christian Cantrell**, **Taesung Park**, and **Michaël Gharbi**, has emerged as the to...

3/22/2025

lots of little things happened this week

**Anthropic** introduced a novel 'think' tool enhancing instruction adherence and multi-step problem solving in agents, with combined reasoning and tool use dem...

3/20/2025

Promptable Prosody, SOTA ASR, and Semantic VAD: OpenAI revamps Voice AI

**OpenAI** has launched three new state-of-the-art audio models in their API, including **gpt-4o-transcribe**, a speech-to-text model outperforming Whisper, and...

3/20/2025

Every 7 Months: The Moore's Law for Agent Autonomy

**METR** published a paper measuring AI agent autonomy progress, showing it has doubled every 7 months since **2019 (GPT-2)**. They introduced a new metric, the...

3/18/2025

Cohere's Command A claims #3 open model spot (after DeepSeek and Gemma)

**Cohere's Command A** model has solidified its position on the LMArena leaderboard, featuring an open-weight **111B** parameter model with an unusually long **...

3/13/2025

Gemma 3 beats DeepSeek V3 in Elo, 2.0 Flash beats GPT4o with Native Image Gen

**Google DeepMind** launched the **Gemma 3** family of models featuring a **128k context window**, **multimodal input (image and video)**, and **multilingual su...

3/12/2025

The new OpenAI Agents Platform

**OpenAI** introduced a comprehensive suite of new tools for AI agents, including the **Responses API**, **Web Search Tool**, **Computer Use Tool**, **File Sear...

3/8/2025

DeepSeek's Open Source Stack

**DeepSeek's Open Source Week** was summarized by PySpur, highlighting multiple interesting releases. The **Qwen QwQ-32B model** was fine-tuned into **START**, ...

3/4/2025

Anthropic's $61.5B Series E

**Anthropic** raised a **$3.5 billion Series E funding round** at a **$61.5 billion valuation**, signaling strong financial backing for the **Claude** AI model....

2/28/2025

GPT 4.5 — Chonky Orion ships!

**OpenAI released GPT-4.5** as a research preview, highlighting its **deep world knowledge**, **improved understanding of user intent**, and a **128,000 token c...

2/27/2025

lots of small launches

**GPT-4o Advanced Voice Preview** is now available for free ChatGPT users with enhanced daily limits for Plus and Pro users. **Claude 3.7 Sonnet** has achieved ...

2/25/2025

Claude 3.7 Sonnet

**Anthropic** launched **Claude 3.7 Sonnet**, their most intelligent model to date featuring hybrid reasoning with two thinking modes: near-instant and extended...

2/22/2025

AI Engineer Summit Day 1

The **AIE Summit** in NYC highlighted key talks including **Grace Isford's Trends Keynote**, **Neo4j/Pfizer's presentation**, and **OpenAI's first definition of...

2/20/2025

The Ultra-Scale Playbook: Training LLMs on GPU Clusters

**Huggingface** released "The Ultra-Scale Playbook: Training LLMs on GPU Clusters," an interactive blogpost based on **4000 scaling experiments on up to 512 G...

2/18/2025

X.ai Grok 3 and Mira Murati's Thinking Machines

**Grok 3** has launched with mixed opinions but strong benchmark performance, notably outperforming models like **Gemini 2 Pro** and **GPT-4o**. The **Grok-3 mi...

2/18/2025

LLaDA: Large Language Diffusion Models

**LLaDA (Large Language Diffusion Model) 8B** is a breakthrough diffusion-based language model that rivals **LLaMA 3 8B** while training on **7x fewer tokens (2...

2/14/2025

Reasoning Models are Near-Superhuman Coders (OpenAI IOI, Nvidia Kernels)

**o3 model** achieved a **gold medal at the 2024 IOI** and ranks in the **99.8 percentile on Codeforces**, outperforming most humans with reinforcement learning...

2/13/2025

small news items

**OpenAI** announced plans for **GPT-4.5 (Orion)** and **GPT-5**, with GPT-5 integrating the **o3** model and offering unlimited chat access in the free tier. *...

2/7/2025

s1: Simple test-time scaling (and Kyutai Hibiki)

**"Wait" is all you need** introduces a novel reasoning model finetuned from **Qwen 2.5 32B** using just **1000 questions with reasoning traces** distilled fr...

2/6/2025

Gemini 2.0 Flash GA, with new Flash Lite, 2.0 Pro, and Flash Thinking

**Google DeepMind** officially launched **Gemini 2.0** models including **Flash**, **Flash-Lite**, and **Pro Experimental**, with **Gemini 2.0 Flash** outperfor...

2/5/2025

How To Scale Your Model, by DeepMind

**Researchers at Google DeepMind (GDM)** released a comprehensive "little textbook" titled **"How To Scale Your Model"** covering modern Transformer archite...

2/4/2025

OpenAI takes on Gemini's Deep Research

**OpenAI** released the full version of the **o3** agent, with a new **Deep Research** variant showing significant improvements on the **HLE benchmark** and ach...

2/1/2025

o3-mini launches, OpenAI on "wrong side of history"

**OpenAI** released **o3-mini**, a new reasoning model available for free and paid users with a "high" reasoning effort option that outperforms the earlier **...

1/31/2025

Mistral Small 3 24B and Tulu 3 405B

**Mistral AI** released **Mistral Small 3**, a **24B parameter** model optimized for local inference with low latency and **81% accuracy on MMLU**, competing wi...

1/28/2025

DeepSeek #1 on US App Store, Nvidia stock tanks -17%

**DeepSeek** has made a significant cultural impact by hitting mainstream news unexpectedly in 2025. The **DeepSeek-R1** model features a massive **671B paramet...

1/25/2025

TinyZero: Reproduce DeepSeek R1-Zero for $30

**DeepSeek Mania** continues to reshape the frontier model landscape with Jiayi Pan from Berkeley reproducing the *OTHER* result from the DeepSeek R1 paper, R1-...

1/24/2025

OpenAI launches Operator, its first Agent

**OpenAI** launched **Operator**, a premium computer-using agent for web tasks like booking and ordering, available now for Pro users in the US with an API prom...

1/23/2025

Bespoke-Stratos + Sky-T1: The Vicuna+Alpaca moment for reasoning

**Reasoning Distillation** has emerged as a key technique, with Berkeley/USC researchers releasing **Sky-T1-32B-Preview**, a finetuned model of **Qwen 2.5 32B**...

1/22/2025

Project Stargate: $500b datacenter (1.7% of US GDP) and Gemini 2 Flash Thinking 2

**Project Stargate**, a US "AI Manhattan project" led by **OpenAI** and **Softbank**, supported by **Oracle**, **Arm**, **Microsoft**, and **NVIDIA**, was ann...

1/21/2025

DeepSeek R1: o1-level open weights model and a simple recipe for upgrading 1.5B models to Sonnet/4o level

**DeepSeek** released **DeepSeek R1**, a significant upgrade over **DeepSeek V3** from just three weeks prior, featuring 8 models including full-size 671B MoE m...

1/16/2025

Titans: Learning to Memorize at Test Time

**Google** released a new paper on "Neural Memory" integrating persistent memory directly into transformer architectures at test time, showing promising long-...

1/15/2025

small little news items

**Ollama** enhanced its models by integrating **Cohere's R7B**, optimized for **RAG** and **tool use tasks**, and released **Ollama v0.5.5** with quality update...

1/11/2025

Moondream 2025.1.9: Structured Text, Enhanced OCR, Gaze Detection in a 2B Model

**Moondream** has released a new version that advances VRAM efficiency and adds structured output and gaze detection, marking a new frontier in vision model pra...

1/7/2025

PRIME: Process Reinforcement through Implicit Rewards

**Implicit Process Reward Models (PRIME)** have been highlighted as a significant advancement in online reinforcement learning, trained on a **7B model** with i...

12/31/2024

not much happened to end the year

**Reinforcement Fine-Tuning (RFT)** is introduced as a **data-efficient** method to improve **reasoning in LLMs** using minimal **training data** with strategie...

12/27/2024

DeepSeek v3: 671B finegrained MoE trained for $5.5m USD of compute on 15T tokens

**DeepSeek-V3** has launched with **671B MoE parameters** and trained on **14.8T tokens**, outperforming **GPT-4o** and **Claude-3.5-sonnet** in benchmarks. It ...

12/24/2024

not much happened this weekend

**o3** model gains significant attention with discussions around its capabilities and implications, including an OpenAI board member referencing "AGI." **Lang...

12/21/2024

o3 solves AIME, GPQA, Codeforces, makes 11 years of progress in ARC-AGI and 25% in FrontierMath

**OpenAI** announced the **o3** and **o3-mini** models with groundbreaking benchmark results, including a jump from **2% to 25%** on the **FrontierMath** benchm...

12/20/2024

ModernBert: small new Retriever/Classifier workhorse, 8k context, 2T tokens,

**Answer.ai/LightOn** released **ModernBERT**, an updated encoder-only model with **8k token context**, trained on **2 trillion tokens** including code, with **...

12/19/2024

Genesis: Generative Physics Engine for Robotics (o1-mini version)

**OpenAI** launched the **o1 model** API featuring function calling, structured outputs, vision support, and developer messages, achieving **60% fewer reasoning...

12/19/2024

Genesis: Generative Physics Engine for Robotics (o1-2024-12-17)

**Genesis** is a newly announced **universal physics engine** developed by a large-scale collaboration led by **CMU PhD student Zhou Xian**. It integrates multi...

12/18/2024

OpenAI Voice Mode Can See Now - After Gemini Does

**OpenAI** launched **Realtime Video** shortly after **Gemini**, which led to less impact due to Gemini's earlier arrival with lower cost and fewer rate limits....

12/18/2024

o1 API, 4o/4o-mini in Realtime API + WebRTC, DPO Finetuning

**OpenAI** launched the **o1 API** with enhanced features including vision inputs, function calling, structured outputs, and a new `reasoning_effort` parameter,...

12/17/2024

Meta Apollo - Video Understanding up to 1 hour, SOTA Open Weights

**Meta** released **Apollo**, a new family of state-of-the-art video-language models available in **1B, 3B, and 7B** sizes, featuring "Scaling Consistency" fo...

12/14/2024

Meta BLT: Tokenizer-free, Byte-level LLM

**Meta AI** introduces the **Byte Latent Transformer (BLT)**, a tokenizer-free architecture that dynamically forms byte patches for efficient compute allocation...

12/12/2024

Google wakes up: Gemini 2.0 et al

**Google DeepMind** launched **Gemini 2.0 Flash**, a new multimodal model outperforming Gemini 1.5 Pro and o1-preview, featuring vision and voice APIs, multilin...

12/11/2024

ChatGPT Canvas GA

**OpenAI** launched **ChatGPT Canvas** to all users, featuring **code execution** and **GPT integration**, effectively replacing Code Interpreter with a Google ...

12/10/2024

OpenAI Sora Turbo and Sora.com

**OpenAI** launched **Sora Turbo**, enabling text-to-video generation for ChatGPT Plus and Pro users with monthly generation limits and regional restrictions in...

12/6/2024

Meta Llama 3.3: 405B/Nova Pro performance at 70B price

**Meta AI** released **Llama 3.3 70B**, matching the performance of the 405B model with improved efficiency using *"a new alignment process and progress in onl...

12/6/2024

$200 ChatGPT Pro and o1-full/pro, with vision, without API, and mixed reviews

**OpenAI** launched the **o1** model with multimodal capabilities, faster reasoning, and image input support, marking it as a state-of-the-art model despite som...

12/4/2024

Olympus has dropped (aka, Amazon Nova Micro|Lite|Pro|Premier|Canvas|Reel)

**Amazon** announced the **Amazon Nova** family of multimodal foundation models at AWS Re:Invent, available immediately with no waitlist in configurations like ...

11/29/2024

not much happened to end the week

**AI News for 11/29/2024-11/30/2024** covers key updates including the **Gemini multimodal model** advancing in musical structure understanding, a new **quantiz...

11/28/2024

Qwen with Questions: 32B open weights reasoning model nears o1 in GPQA/AIME/Math500

**DeepSeek r1** leads the race for "open o1" models but has yet to release weights, while **Justin Lin** released **QwQ**, a **32B open weight model** that ou...

11/27/2024

OLMo 2 - new SOTA Fully Open LLM

**AI2** has updated **OLMo-2** to roughly **Llama 3.1 8B** equivalent, training with **5T tokens** and using learning rate annealing and new high-quality data (...

11/26/2024

Anthropic launches the Model Context Protocol

**Anthropic** has launched the **Model Context Protocol (MCP)**, an open protocol designed to enable seamless integration between large language model applicati...

11/22/2024

Vision Everywhere: Apple AIMv2 and Jina CLIP v2

**Apple** released **AIMv2**, a novel vision encoder pre-trained with autoregressive objectives that achieves **89.5% accuracy on ImageNet** and integrates join...

11/22/2024

LMSys killed Model Versioning (gpt 4o 1120, gemini exp 1121)

**AI News for 11/21/2024-11/22/2024** highlights the intense frontier lab race with **OpenAI's gpt-4o-2024-11-20** and **Google DeepMind's gemini-exp-1121** tra...

11/21/2024

DeepSeek-R1 claims to beat o1-preview AND will be open sourced

**DeepSeek** has released **DeepSeek-R1-Lite-Preview**, an open-source reasoning model achieving **o1-preview-level performance** on math benchmarks with transp...

11/20/2024

Perplexity starts Shopping for you

**Stripe** launched their Agent SDK, enabling AI-native shopping experiences like **Perplexity Shopping** for US Pro members, featuring one-click checkout and f...

11/19/2024

Pixtral Large (124B) beats Llama 3.2 90B with updated Mistral Large 24.11

**Mistral** has updated its **Pixtral Large** vision encoder to 1B parameters and released an update to the **123B parameter Mistral Large 24.11** model, though...

11/16/2024

Stripe lets Agents spend money with StripeAgentToolkit

**Stripe** has pioneered an AI SDK specifically designed for agents that handle payments, integrating with models like **gpt-4o** to enable financial transactio...

11/15/2024

Gemini (Experimental-1114) retakes #1 LLM rank with 1344 Elo

**Anthropic** released the **3.5 Sonnet** benchmark for jailbreak robustness, emphasizing adaptive defenses. **OpenAI** enhanced **GPT-4** with a new RAG techni...

11/14/2024

Common Corpus: 2T Open Tokens with Provenance

**Pleais** via **Huggingface** released **Common Corpus**, the largest fully open multilingual dataset with over **2 trillion tokens** including detailed **prov...

11/13/2024

BitNet was a lie?

**Scaling laws for quantization** have been modified by a group led by Chris Re, analyzing over **465 pretraining runs** and finding benefits plateau at FP6 pre...

11/12/2024

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

**Epoch AI** collaborated with over **60 leading mathematicians** to create the **FrontierMath benchmark**, a fresh set of hundreds of original math problems wi...

11/6/2024

Tencent's Hunyuan-Large claims to beat DeepSeek-V2 and Llama3-405B with LESS Data

**Tencent** released a notable >300B parameter MoE model pretrained on **7T tokens**, including **1.5T synthetic data** generated via **Evol-Instruct**. The mod...

11/5/2024

OpenAI beats Anthropic to releasing Speculative Decoding

**Prompt lookup** and **Speculative Decoding** techniques are gaining traction with implementations from **Cursor**, **Fireworks**, and teased features from **A...

11/1/2024

The AI Search Wars Have Begun — SearchGPT, Gemini Grounding, and more

**ChatGPT** launched its search functionality across all platforms using a fine-tuned version of **GPT-4o** with synthetic data generation and distillation from...

10/30/2024

Creating a LLM-as-a-Judge

**Anthropic** released details on Claude 3.5 SWEBench+SWEAgent, while **OpenAI** introduced SimpleQA and **DeepMind** launched NotebookLM. **Apple** announced n...

10/30/2024

GitHub Copilot Strikes Back

**GitHub's tenth annual Universe conference** introduced the **Multi-model Copilot** featuring **Anthropic's Claude 3.5 Sonnet**, **Google's Gemini 1.5 Pro**, a...

10/25/2024

s{imple|table|calable} Consistency Models

**Model distillation** significantly accelerates diffusion models, enabling near real-time image generation with only 1-4 sampling steps, as seen in **BlinkShot...

10/23/2024

Claude 3.5 Sonnet (New) gets Computer Use

**Anthropic** announced new Claude 3.5 models: **3.5 Sonnet** and **3.5 Haiku**, improving coding performance significantly, with Sonnet topping several coding ...

10/22/2024

DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing

**UC Berkeley's EPIC lab** introduces innovative LLM data operators with projects like **LOTUS** and **DocETL**, focusing on effective programming and computati...

10/18/2024

DeepSeek Janus and Meta SpiRit-LM: Decoupled Image and Expressive Voice Omnimodality

**DeepSeek Janus** and **Meta SpiRit-LM** are two notable multimodality AI models recently released, showcasing advances in image generation and speech synthesi...

10/17/2024

Did Nvidia's Nemotron 70B train on test?

**NVIDIA's Nemotron-70B** model has drawn scrutiny despite strong benchmark performances on **Arena Hard**, **AlpacaEval**, and **MT-Bench**, with some standard...

10/14/2024

Not much (in AI) happened this weekend

**OpenAI** introduced an "edit this area" feature for image generation, praised by **Sam Altman**. **Yann LeCun** highlighted a NYU paper improving pixel gene...

10/10/2024

State of AI 2024

**Nathan Benaich's State of AI Report** in its 7th year provides a comprehensive overview of AI research and industry trends, including highlights like **BitNet...

10/9/2024

The AI Nobel Prize

**Geoff Hinton** and **John Hopfield** won the **Nobel Prize in Physics** for their work on **Artificial Neural Networks**. The award citation spans **14 pages*...

10/5/2024

Contextual Document Embeddings: `cde-small-v1`

**Meta** announced a new text-to-video model, **Movie Gen**, claiming superior adaptation of **Llama 3** to video generation compared to OpenAI's Sora Diffusion...

10/3/2024

Canvas: OpenAI's answer to Claude Artifacts

**OpenAI** released **Canvas**, an enhanced writing and coding tool based on **GPT-4o**, featuring inline suggestions, seamless editing, and a collaborative env...

10/2/2024

Not much technical happened today

**OpenAI** announced raising **$6.6B** in new funding at a **$157B valuation**, with ChatGPT reaching *250M weekly active users*. **Poolside** raised **$500M** ...

10/2/2024

OpenAI Realtime API and other Dev Day Goodies

**OpenAI** launched the **gpt-4o-realtime-preview** Realtime API featuring text and audio token processing with pricing details and future plans including visio...

10/1/2024

Liquid Foundation Models: A New Transformers alternative + AINews Pod 2

**Liquid.ai** emerged from stealth with three subquadratic foundation models demonstrating superior efficiency compared to state space models and Apple’s on-dev...

9/25/2024

Llama 3.2: On-device 1B/3B, and Multimodal 11B/90B (with AI2 Molmo kicker)

**Meta** released **Llama 3.2** with new multimodal versions including **3B** and **20B** vision adapters on a frozen Llama 3.1, showing competitive performance...

9/25/2024

ChatGPT Advanced Voice Mode

**OpenAI** rolled out **ChatGPT Advanced Voice Mode** with 5 new voices and improved accent and language support, available widely in the US. Ahead of rumored u...

9/23/2024

a calm before the storm

**Anthropic** is raising funds at a valuation up to **$40 billion** ahead of anticipated major releases. **OpenAI** launched new reasoning models **o1** and **o...

9/18/2024

o1 destroys Lmsys Arena, Qwen 2.5, Kyutai Moshi release

**OpenAI's o1-preview** model has achieved a milestone by fully matching top daily AI news stories without human intervention, consistently outperforming other ...

9/18/2024

nothing much happened today

**OpenAI's o1 model** faces skepticism about open-source replication due to its extreme restrictions and unique training advances like RL on CoT. **ChatGPT-4o**...

9/17/2024

a quiet weekend

**OpenAI** released the new **o1** model, leveraging reinforcement learning and chain-of-thought prompting to excel in reasoning benchmarks, achieving an IQ-lik...

9/14/2024

Learnings from o1 AMA

**OpenAI** released the **o1 model series**, touted as their "most capable and aligned models yet," trained with reinforcement learning to enhance reasoning. ...

9/13/2024

o1: OpenAI's new general reasoning models

**OpenAI** has released the **o1** model family, including **o1-preview** and **o1-mini**, focusing on test-time reasoning with extended output token limits ove...

9/12/2024

Pixtral 12B: Mistral beats Llama to Multimodality

**Mistral AI** released **Pixtral 12B**, an open-weights **vision-language model** with a **Mistral Nemo 12B** text backbone and a 400M vision adapter, featurin...

9/9/2024

AIPhone 16: the Visual Intelligence Phone

**Apple** announced the new **iPhone 16** lineup featuring **Visual Intelligence**, a new AI capability integrated with Camera Control, Apple Maps, and Siri, em...

9/7/2024

Reflection 70B, by Matt from IT Department

**Reflection Tuning** technique has been used by a two-person team from **Hyperwrite** and **Glaive** to finetune **llama-3.1-70b**, showing strong performance ...

9/6/2024

Replit Agent - How did everybody beat Devin to market?

**Replit Agent** launched as a fully integrated Web IDE enabling text-to-app generation with planning and self-healing, available immediately to paid users with...

9/5/2024

$1150m for SSI, Sakana, You.com + Claude 500m context

**Safe Superintelligence** raised **$1 billion** at a **$5 billion** valuation, focusing on safety and search approaches as hinted by Ilya Sutskever. **Sakana A...

9/4/2024

Everybody shipped small things this holiday weekend

**xAI** announced the **Colossus 100k H100 cluster** capable of training an FP8 GPT-4 class model in 4 days. **Google** introduced **Structured Output** for **G...

8/30/2024

Summer of Code AI: $1.6b raised, 1 usable product

**Code + AI** is emphasized as a key modality in AI engineering, highlighting productivity and verifiability benefits. Recent major funding rounds include **Cog...

8/29/2024

Cerebras Inference: Faster, Better, AND Cheaper

**Groq** led early 2024 with superfast LLM inference speeds, achieving ~450 tokens/sec for Mixtral 8x7B and 240 tokens/sec for Llama 2 70B. **Cursor** introduce...

8/28/2024

CogVideoX: Zhipu's Open Source Sora

**Zhipu AI**, Alibaba's AI arm and China's 3rd largest AI lab, released the open 5B video generation model **CogVIdeoX**, which can run without GPUs via their C...

8/23/2024

Nvidia Minitron: LLM Pruning and Distillation updated for Llama 3.1

**Nvidia** and **Meta** researchers updated their **Llama 3** results with a paper demonstrating the effectiveness of combining **weight pruning** and **knowled...

8/23/2024

super quiet day

**AI21 Labs** released **Jamba 1.5**, a scaled-up State Space Model optimized for long context windows with **94B parameters** and up to **2.5X faster inference...

8/22/2024

Ideogram 2 + Berkeley Function Calling Leaderboard V2

**Ideogram** returns with a new image generation model featuring **color palette control**, a fully controllable API, and an iOS app, reaching a milestone of **...

8/20/2024

The DSPy Roadmap

**Omar Khattab** announced joining **Databricks** before his MIT professorship and outlined the roadmap for **DSPy 2.5 and 3.0+**, focusing on improving core co...

8/15/2024

Grok 2! and ChatGPT-4o-latest confuses everybody

**OpenAI** quietly released a new **GPT-4o** model in ChatGPT, distinct from the API version, reclaiming the #1 spot on Lmsys arena benchmarks across multiple c...

8/14/2024

Gemini Live

**Google** launched **Gemini Live** on Android for **Gemini Advanced** subscribers during the Pixel 9 event, featuring integrations with Google Workspace apps a...

8/9/2024

Too Cheap To Meter: AI prices cut 50-70% in last 30 days

**Gemini 1.5 Flash** has cut prices by approximately **70%**, offering a highly competitive free tier of **1 million tokens per minute** at **$0.075/mtok**, int...

8/7/2024

GPT4o August + 100% Structured Outputs for All (GPT4o mini edition)

**Stability.ai** users are leveraging **LoRA** and **ControlNet** for enhanced line art and artistic style transformations, while facing challenges with **AMD G...

8/7/2024

GPT4o August + 100% Structured Outputs for All (GPT4o August edition)

**OpenAI** released the new **gpt-4o-2024-08-06** model with **16k context window** and **33-50% lower pricing** than the previous 4o-May version, featuring a n...

8/5/2024

How Carlini Uses AI

**Groq's** shareholders' net worth rises while others fall, with **Intel's CEO** expressing concern. **Nicholas Carlini** of **DeepMind** gains recognition and ...

8/3/2024

Execuhires: Tempting The Wrath of Khan

**Character.ai's $2.5b execuhire to Google** marks a significant leadership move alongside **Adept's $429m execuhire to Amazon** and **Inflection's $650m execuh...

8/2/2024

Rombach et al: FLUX.1 [pro|dev|schnell], $31m seed for Black Forest Labs

**Stability AI** co-founder Rombach launched **FLUX.1**, a new text-to-image model with three variants: pro (API only), dev (open-weight, non-commercial), and s...

8/1/2024

Gemma 2 2B + Scope + Shield

**Gemma 2B**, a 2 billion parameter model trained on **2 trillion tokens** and distilled from a larger unnamed LLM, has been released by **Google DeepMind** and...

7/30/2024

Apple Intelligence Beta + Segment Anything Model 2

**Meta** advanced its open source AI with a sequel to the **Segment Anything Model**, enhancing image segmentation with memory attention for video applications ...

7/26/2024

AlphaProof + AlphaGeometry2 reach 1 point short of IMO Gold

**Search+Verifier** highlights advances in neurosymbolic AI during the 2024 Math Olympics. **Google DeepMind**'s combination of **AlphaProof** and **AlphaGeomet...

7/24/2024

Mistral Large 2 + RIP Mistral 7B, 8x7B, 8x22B

**Mistral Large 2** introduces **123B parameters** with **Open Weights** under a Research License, focusing on **code generation**, **math performance**, and a ...

7/24/2024

Llama 3.1: The Synthetic Data Model

**Meta AI** has released **Llama 3.1**, including a **405B parameter model** that triggers regulatory considerations like the **EU AI Act** and **SB 1047**. The...

7/23/2024

Llama 3.1 Leaks: big bumps to 8B, minor bumps to 70b, and SOTA OSS 405b model

**Llama 3.1** leaks reveal a **405B dense model** with **128k context length**, trained on **39.3M GPU hours** using H100-80GB GPUs, and fine-tuned with **over ...

7/20/2024

DataComp-LM: the best open-data 7B model/benchmark/dataset

**DataComp team** released a competitive **7B open data language model** trained on only **2.5T tokens** from the massive **DCLM-POOL dataset** of **240 trillio...

7/19/2024

Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o-mini version)

**OpenAI** launched the **GPT-4o Mini**, a cost-efficient small model priced at **$0.15 per million input tokens** and **$0.60 per million output tokens**, aimi...

7/19/2024

Mini, Nemo, Turbo, Lite - Smol models go brrr (GPT4o version)

**GPT-4o-mini** launches with a **99% price reduction** compared to text-davinci-003, offering **3.5% the price of GPT-4o** and matching Opus-level benchmarks. ...

7/17/2024

Gemma 2 tops /r/LocalLlama vibe check

**Gemma 2 (9B, 27B)** is highlighted as a top-performing local LLM, praised for its speed, multilingual capabilities, and efficiency on consumer GPUs like the 2...

7/17/2024

SciCode: HumanEval gets a STEM PhD upgrade

**PhD-level benchmarks** highlight the difficulty of coding scientific problems for LLMs, with **GPT-4** and **Claude 3.5 Sonnet** scoring under 5% on the new *...

7/16/2024

Microsoft AgentInstruct + Orca 3

**Microsoft Research** released **AgentInstruct**, the third paper in its **Orca** series, introducing a generative teaching pipeline that produces **25.8 milli...

7/13/2024

We Solved Hallucinations

**Reddit's URL structure causes link errors in AI-generated summaries, especially with NSFW content affecting models like Claude and GPT-4.** The team fixed thi...

7/12/2024

FlashAttention 3, PaliGemma, OpenAI's 5 Levels to Superintelligence

**FlashAttention-3** introduces fast and accurate attention optimized for **H100 GPUs**, advancing native **FP8 training**. **PaliGemma**, a versatile **3B Visi...

7/10/2024

Test-Time Training, MobileLLM, Lilian Weng on Hallucination (Plus: Turbopuffer)

**Lilian Weng** released a comprehensive literature review on **hallucination detection** and **anti-hallucination methods** including techniques like Factualit...

7/9/2024

Problems with MMLU-Pro

**MMLU-Pro** is gaining attention as the successor to MMLU on the **Open LLM Leaderboard V2** by **HuggingFace**, despite community concerns about evaluation di...

7/6/2024

Qdrant's BM42: "Please don't trust us"

**Qdrant** attempted to replace BM25 and SPLADE with a new method called "BM42" combining transformer attention and collection-wide statistics for semantic an...

7/3/2024

GraphRAG: The Marriage of Knowledge Graphs and RAG

**Microsoft Research** open sourced **GraphRAG**, a retrieval augmented generation (RAG) technique that extracts knowledge graphs from sources and clusters them...

7/2/2024

RouteLLM: RIP Martian? (Plus: AINews Structured Summaries update)

**LMSys** introduces RouteLLM, an open-source router framework trained on **preference data** from Chatbot Arena, achieving **cost reductions over 85% on MT Ben...

6/29/2024

That GPT-4o Demo

**Romain Huet** demonstrated an unreleased version of **GPT-4o** on ChatGPT Desktop showcasing capabilities like low latency voice generation, whisper tone mode...

6/28/2024

Gemma 2: The Open Model for Everyone

**Gemma 2**, a **27B** parameter model from **google-deepmind**, was released with innovations like 1:1 local-global attention alternation and logit soft-cappin...

6/27/2024

Mozilla's AI Second Act

**Mozilla** showcased detailed live demos of **llamafile** and announced **sqlite-vec** for vector search integration at the AIE World's Fair. **LlamaIndex** la...

6/26/2024

Shall I compare thee to a Sonnet's day?

**Claude 3.5 Sonnet** from **Anthropic** achieves top rankings in coding and hard prompt arenas, surpassing **GPT-4o** and competing with **Gemini 1.5 Pro** at ...

6/25/2024

Gemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canary

The latest **Chrome Canary** now includes a feature flag for **Gemini Nano**, offering a prompt API and on-device optimization guide, with models Nano 1 and 2 a...

6/22/2024

Shazeer et al (2024): you are overpaying for inference >13x

**Noam Shazeer** explains how **Character.ai** serves **20% of Google Search Traffic** for LLM inference while reducing serving costs by a factor of **33** comp...

6/21/2024

Claude Crushes Code - 92% HumanEval and Claude.ai Artifacts

**Claude 3.5 Sonnet**, released by **Anthropic**, is positioned as a Pareto improvement over Claude 3 Opus, operating at **twice the speed** and costing **one-f...

6/20/2024

There's Ilya!

**Ilya Sutskever** has co-founded **Safe Superintelligence Inc** shortly after leaving **OpenAI**, while **Jan Leike** moved to **Anthropic**. **Meta** released...

6/18/2024

Gemini launches context caching... or does it?

**Nvidia's Nemotron** ranks #1 open model on LMsys and #11 overall, surpassing **Llama-3-70b**. **Meta AI** released **Chameleon 7B/34B** models after further p...

6/18/2024

Is this... OpenQ*?

**DeepSeekCoder V2** promises GPT4T-beating performance at a fraction of the cost. **Anthropic** released new research on reward tampering. **Runway** launched ...

6/14/2024

Nemotron-4-340B: NVIDIA's new large open models, built on syndata, great for syndata

**NVIDIA** has scaled up its **Nemotron-4** model from **15B** to a massive **340B** dense model, trained on **9T tokens**, achieving performance comparable to ...

6/13/2024

Hybrid SSM/Transformers > Pure SSMs/Pure Transformers

**NVIDIA**'s Bryan Catanzaro highlights a new paper on **Mamba models**, showing that mixing Mamba and Transformer blocks outperforms either alone, with optimal...

6/12/2024

The Last Hurrah of Stable Diffusion?

**Stability AI** launched **Stable Diffusion 3 Medium** with models ranging from **450M to 8B parameters**, featuring the MMDiT architecture and T5 text encoder...

6/11/2024

Francois Chollet launches $1m ARC Prize

**François Chollet** critiques current paths to **AGI**, emphasizing the importance of benchmarks that resist saturation and focus on skill acquisition and open...

6/11/2024

Talaria: Apple's new MLOps Superweapon

**Apple Intelligence** introduces a small (~3B parameters) on-device model and a larger server model running on Apple Silicon with Private Cloud Compute, aiming...

6/7/2024

HippoRAG: First, do know(ledge) Graph

**Alibaba** released new open-source **Qwen2** models ranging from **0.5B to 72B parameters**, achieving SOTA results on benchmarks like MMLU and HumanEval. Res...

6/6/2024

Qwen 2 beats Llama 3 (and we don't know how)

**Alibaba** released **Qwen 2** models under Apache 2.0 license, claiming to outperform **Llama 3** in open models with multilingual support in **29 languages**...

6/6/2024

5 small news items

**OpenAI** announces that ChatGPT's voice mode is "coming soon." **Leopold Aschenbrenner** launched a 5-part AGI timelines series predicting a **trillion doll...

6/3/2024

Mamba-2: State Space Duality

**Mamba-2**, a new **state space model (SSM)**, outperforms previous models like Mamba and Transformer++ in **perplexity** and **wall-clock time**, featuring **...

5/31/2024

Ways to use Anthropic's Tool Use GA

**Anthropic** launched general availability of tool use/function calling with support for streaming, forced use, and vision, alongside **Amazon** and **Google**...

5/31/2024

Contextual Position Encoding (CoPE)

**Meta AI** researcher **Jason Weston** introduced **CoPE**, a novel positional encoding method for transformers that incorporates *context* to create learnable...

5/29/2024

1 TRILLION token context, real time, on device?

**Cartesia**, a startup specializing in **state space models (SSMs)**, launched a low latency voice model outperforming transformer-based models with **20% lowe...

5/29/2024

Somebody give Andrej some H100s already

**OpenAI**'s GPT-2 sparked controversy five years ago for being "too dangerous to release." Now, with **FineWeb** and **llm.c**, a tiny GPT-2 model can be tra...

5/28/2024

Life after DPO (RewardBench)

**xAI raised $6 billion at a $24 billion valuation**, positioning it among the most highly valued AI startups, with expectations to fund **GPT-5 and GPT-6 class...

5/24/2024

Ten Commandments for Deploying Fine-Tuned Models

**Gemini-in-Google-Slides** is highlighted as a useful tool for summarizing presentations. Kyle Corbitt's talk on deploying fine-tuned models in production emph...

5/23/2024

Clémentine Fourrier on LLM evals

**Clémentine Fourrier** from **Huggingface** presented at **ICLR** about **GAIA** with **Meta** and shared insights on **LLM evaluation** methods. The blog outl...

5/23/2024

ALL of AI Engineering in One Place

The upcoming **AI Engineer World's Fair** in San Francisco from **June 25-27** will feature a significantly expanded format with booths, talks, and workshops fr...

5/21/2024

Anthropic's "LLM Genome Project": learning & clamping 34m features on Claude Sonnet

**Anthropic** released their third paper in the MechInterp series, **Scaling Monosemanticity**, scaling interpretability analysis to **34 million features** on ...

5/20/2024

Skyfall

Between 5/17 and 5/20/2024, key AI updates include **Google DeepMind's Gemini 1.5 Pro and Flash models**, featuring sparse multimodal MoE architecture with up t...

5/17/2024

Chameleon: Meta's (unreleased) GPT4o-like Omnimodal Model

**Meta AI FAIR** introduced **Chameleon**, a new multimodal model family with **7B** and **34B** parameter versions trained on **10T tokens** of interleaved tex...

5/17/2024

Cursor reaches >1000 tok/s finetuning Llama3-70b for fast file editing

**Cursor**, an AI-native IDE, announced a **speculative edits** algorithm for code editing that surpasses **GPT-4** and **GPT-4o** in accuracy and latency, achi...

5/14/2024

Google I/O in 60 seconds

**Google** announced updates to the **Gemini model family**, including **Gemini 1.5 Pro** with **2 million token support**, and the new **Gemini Flash** model o...

5/13/2024

GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4T version)

**OpenAI** launched **GPT-4o**, a frontier model supporting real-time reasoning across **audio, vision, and text**, now free for all ChatGPT users with enhanced...

5/13/2024

GPT-4o: the new SOTA-EVERYTHING Frontier model (GPT4O version)

**OpenAI** has released **GPT-4o**, a new **multimodal** model capable of reasoning across text, audio, and video in real time with low latency (~300ms). It fea...

5/11/2024

Quis promptum ipso promptiet?

**Anthropic** released upgrades to their Workbench Console, introducing new prompt engineering features like chain-of-thought reasoning and prompt generators th...

5/10/2024

LMSys advances Llama 3 eval analysis

**LMSys** is enhancing LLM evaluation by categorizing performance across **8 query subcategories** and **7 prompt complexity levels**, revealing uneven strength...

5/9/2024

OpenAI's PR Campaign?

**OpenAI** faces user data deletion backlash over its new partnership with StackOverflow amid GDPR complaints and US newspaper lawsuits, while addressing electi...

5/7/2024

Kolmogorov-Arnold Networks: MLP killers or just spicy MLPs?

**Ziming Liu**, a grad student of **Max Tegmark**, published a paper on **Kolmogorov-Arnold Networks (KANs)**, claiming they outperform **MLPs** in interpretabi...

5/6/2024

DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost

**DeepSeek V2** introduces a new state-of-the-art MoE model with **236B parameters** and a novel Multi-Head Latent Attention mechanism, achieving faster inferen...

5/3/2024

$100k to predict LMSYS human preferences in a Kaggle contest

**Llama 3 models** are making breakthroughs with Groq's 70B model achieving record low costs per million tokens. A new **Kaggle competition** offers a $100,000 ...

5/2/2024

Evals: The Next Generation

**Scale AI** highlighted issues with data contamination in benchmarks like **MMLU** and **GSM8K**, proposing a new benchmark where **Mistral** overfits and **Ph...

5/1/2024

LLMs-as-Juries

**OpenAI** has rolled out the **memory feature** to all ChatGPT Plus users and partnered with the **Financial Times** to license content for AI training. Discus...

4/26/2024

Apple's OpenELM beats OLMo with 50% of its dataset, using DeLighT

**Apple** advances its AI presence with the release of **OpenELM**, its first relatively open large language model available in sizes from **270M to 3B** parame...

4/26/2024

Snowflake Arctic: Fully Open 10B+128x4B Dense-MoE Hybrid LLM

**Snowflake Arctic** is a notable new foundation language model released under Apache 2.0, claiming superiority over **Databricks** in data warehouse AI applica...

4/25/2024

OpenAI's Instruction Hierarchy for the LLM OS

**OpenAI** published a paper introducing the concept of privilege levels for LLMs to address prompt injection vulnerabilities, improving defenses by 20-30%. **M...

4/23/2024

Perplexity, the newest AI unicorn

**Perplexity** doubles its valuation shortly after its Series B with a Series B-1 funding round. Significant developments around **Llama 3** include context len...

4/23/2024

FineWeb: 15T Tokens, 12 years of CommonCrawl (deduped and filtered, you're welcome)

**2024** has seen a significant increase in dataset sizes for training large language models, with **Redpajama 2** offering up to **30T tokens**, **DBRX** at **...

4/20/2024

Llama-3-70b is GPT-4-level Open Model

**Meta** has released **Llama 3**, their most capable open large language model with **8B and 70B parameter versions** supporting **8K context length** and outp...

4/19/2024

Meta Llama 3 (8B, 70B)

**Meta** partially released **Llama 3** models including **8B** and **70B** variants, with a **400B** variant still in training, touted as the first GPT-4 level...

4/17/2024

Mixtral 8x22B Instruct sparks efficiency memes

**Mistral** released an instruct-tuned version of their **Mixtral 8x22B** model, notable for using only **39B active parameters** during inference, outperformin...

4/17/2024

Lilian Weng on Video Diffusion

**OpenAI** expands with a launch in **Japan**, introduces a **Batch API**, and partners with **Adobe** to bring the **Sora video model** to Premiere Pro. **Reka...

4/15/2024

Multi-modal, Multi-Aspect, Multi-Form-Factor AI

Between April 12-15, **Reka Core** launched a new GPT4-class multimodal foundation model with a detailed technical report described as "full Shazeer." **Coher...

4/12/2024

Zero to GPT in 1 Year

**GPT-4 Turbo** reclaimed the top leaderboard spot with significant improvements in coding, multilingual, and English-only tasks, now rolled out in paid **ChatG...

4/11/2024

Mergestral, Meta MTIAv2, Cohere Rerank 3, Google Infini-Attention

**Meta** announced their new **MTIAv2 chips** designed for training and inference acceleration with improved architecture and integration with PyTorch 2.0. **Mi...

4/10/2024

Music's Dall-E moment

**Google's Griffin architecture** outperforms transformers with faster inference and lower memory usage on long contexts. **Command R+** climbs to 6th place on ...

4/10/2024

Gemini Pro and GPT4T Vision go GA on the same day by complete coincidence

At **Google Cloud Next**, **Gemini 1.5 Pro** was released with a **million-token context window**, available in **180+ countries**, featuring **9.5 hours of aud...

4/9/2024

Anime pfp anon eclipses $10k A::B prompting challenge

**Victor Taelin** issued a $10k challenge to GPT models, initially achieving only **10% success** with state-of-the-art models, but community efforts surpassed ...

4/5/2024

Mixture of Depths: Dynamically allocating compute in transformer-based language models

**DeepMind** introduces the Mixture-of-Depths (MoD) technique, dynamically allocating FLOPs across transformer layers to optimize compute usage, achieving over ...

4/4/2024

Cohere Command R+, Anthropic Claude Tool Use, OpenAI Finetuning

**Cohere** launched **Command R+**, a **104B dense model** with **128k context length** focusing on **RAG**, **tool-use**, and **multilingual** capabilities acr...

4/4/2024

ReALM: Reference Resolution As Language Modeling

**Apple** is advancing in AI with a new approach called **ReALM: Reference Resolution As Language Modeling**, which improves understanding of ambiguous referenc...

4/1/2024

AdamW -> AaronD?

**Aaron Defazio** is gaining attention for proposing a potential tuning-free replacement of the long-standing **Adam optimizer**, showing promising experimental...

3/29/2024

Evals-based AI Engineering

**Hamel Husain** emphasizes the importance of comprehensive evals in AI product development, highlighting evaluation, debugging, and behavior change as key iter...

3/28/2024

Jamba: Mixture of Architectures dethrones Mixtral

**AI21 labs** released **Jamba**, a **52B parameter MoE model** with **256K context length** and open weights under Apache 2.0 license, optimized for single A10...

3/27/2024

DBRX: Best open model (just not most efficient)

**Databricks Mosaic** has released a new open-source model called **DBRX** that outperforms **Grok**, **Mixtral**, and **Llama2** on evaluations while being abo...

3/27/2024

Claude 3 is officially America's Next Top Model

**Claude 3 Opus** outperforms **GPT4T** and **Mistral Large** in blind Elo rankings, with **Claude 3 Haiku** marking a new cost-performance frontier. Fine-tunin...

3/26/2024

Andrew likes Agents

**Andrew Ng's The Batch writeup on Agents** highlighted the significant improvement in coding benchmark performance when using an iterative agent workflow, with...

3/26/2024

Astro Nano

Minimal portfolio and blog build with astro and no frameworks....

3/21/2024

Welcome /r/LocalLlama!

**Sakana** released a paper on evolutionary model merging. **OpenInterpreter** launched their **O1 devkit**. Discussions highlight **Claude Haiku**'s underrated...

3/21/2024

Shipping and Dipping: Inflection + Stability edition

**Inflection AI** and **Stability AI** recently shipped major updates (**Inflection AI 2.5** and **Stable Diffusion 3**) but are now experiencing significant ex...

3/20/2024

World_sim.exe

**NVIDIA** announced **Project GR00T**, a foundation model for humanoid robot learning using multimodal instructions, built on their tech stack including Isaac ...

3/19/2024

Grok-1 in Bio

**Grok-1**, a **314B parameter Mixture-of-Experts (MoE) model** from **xAI**, has been released under an Apache 2.0 license, sparking discussions on its archite...

3/18/2024

Astro Sphere

Portfolio and blog build with astro....

3/15/2024

MM1: Apple's first Large Multimodal Model

**Apple** announced the **MM1** multimodal LLM family with up to **30B parameters**, claiming performance comparable to **Gemini-1** and beating larger older mo...

3/14/2024

Not much happened piday

**DeepMind** announces **SIMA**, a generalist AI agent capable of following natural language instructions across diverse 3D environments and video games, advanc...

3/14/2024

DeepMind SIMA: one AI, 9 games, 600 tasks, vision+language ONLY

**DeepMind SIMA** is a generalist AI agent for 3D virtual environments evaluated on **600 tasks** across **9 games** using only screengrabs and natural language...

3/12/2024

The world's first fully autonomous AI Engineer

**Cognition Labs's Devin** is highlighted as a potentially groundbreaking AI software engineer agent capable of learning unfamiliar technologies, addressing bug...

3/12/2024

Fixing Gemma

**Google's Gemma model** was found unstable for finetuning until **Daniel Han from Unsloth AI** fixed 8 bugs, improving its implementation. **Yann LeCun** expla...

3/8/2024

FSDP+QLoRA: the Answer to 70b-scale AI for desktop class GPUs

**Jeremy Howard** and collaborators released a new tool combining **FSDP**, **QLoRA**, and **HQQ** to enable training **70b-parameter** models on affordable con...

3/8/2024

Inflection-2.5 at 94% of GPT4, and Pi at 6m MAU

**Mustafa Suleyman** announced **Inflection 2.5**, which achieves *more than 94% the average performance of GPT-4 despite using only 40% the training FLOPs*. **...

3/5/2024

Stable Diffusion 3 — Rombach & Esser did it again!

**Over 2500 new community members joined following Soumith Chintala's shoutout, highlighting growing interest in SOTA LLM-based summarization. The major highlig...

3/4/2024

Claude 3 just destroyed GPT 4 (see for yourself)

**Claude 3** from **Anthropic** launches in three sizes: Haiku (small, unreleased), Sonnet (medium, default on claude.ai, AWS, and GCP), and Opus (large, on Cla...

3/1/2024

The Era of 1-bit LLMs

**The Era of 1-bit LLMs** research, including the **BitNet b1.58** model, introduces a ternary parameter approach that matches full-precision Transformer LLMs i...

3/1/2024

Dia de las Secuelas (StarCoder, The Stack, Dune, SemiAnalysis)

**HuggingFace/BigCode** has released **StarCoder v2**, including the **StarCoder2-15B** model trained on over **600 programming languages** using the **The Stac...

2/29/2024

... and welcome AI Twitter!

The AI Twitter discourse from **2/27-28/2024** covers a broad spectrum including **ethical considerations** highlighted by **Margaret Mitchell** around **Google...

2/27/2024

Welcome Interconnects and OpenRouter

**Discord communities** analyzed **22 guilds**, **349 channels**, and **12885 messages** revealing active discussions on **model comparisons and optimizations**...

2/26/2024

Mistral Large disappoints

**Mistral** announced **Mistral Large**, a new language model achieving **81.2% accuracy on MMLU**, trailing **GPT-4 Turbo** by about 5 percentage points on ben...

2/24/2024

One Year of Latent Space

**Latent Space** podcast celebrated its first anniversary, reaching #1 in AI Engineering podcasts and 1 million unique readers on Substack. The **Gemini 1.5** i...

2/23/2024

Ring Attention for >1M Context

**Google Gemini Pro** has sparked renewed interest in long context capabilities. The CUDA MODE Discord is actively working on implementing the **RingAttention**...

2/22/2024

Google AI: Win some (Gemma, 1.5 Pro), Lose some (Image gen)

**Google's Gemma open models** (2-7B parameters) outperform **Llama 2** and **Mistral** in benchmarks but face criticism for an unusual license and poor image g...

2/21/2024

Karpathy emerges from stealth?

**Andrej Karpathy** released a comprehensive 2-hour tutorial on **tokenization**, detailing techniques up to **GPT-4**'s tokenizer and noting the complexity of ...

2/20/2024

Companies liable for AI hallucination is Good Actually for AI Engineers

**Air Canada** faced a legal ruling requiring it to honor refund policies communicated by its AI chatbot, setting a precedent for corporate liability in AI engi...

2/16/2024

Sora pushes SOTA

**Discord communities** analyzed over **20 guilds**, **312 channels**, and **10550 messages** reveal intense discussions on AI developments. Key highlights incl...

2/15/2024

AI gets Memory

**AI Discords** analysis covered **20 guilds**, **312 channels**, and **6901 messages**. The report highlights the divergence of RAG style operations for contex...

2/13/2024

The Dissection of Smaug (72B)

**Abacus AI** launched **Smaug 72B**, a large finetune of **Qwen 1.0**, which remains unchallenged on the **Hugging Face Open LLM Leaderboard** despite skeptici...

2/9/2024

Gemini Ultra is out, to mixed reviews

**Google** released **Gemini Ultra** as a paid tier for "Gemini Advanced with Ultra 1.0" following the discontinuation of Bard. Reviews noted it is "slightly...

2/7/2024

MetaVoice & RIP Bard

**Coqui**, a TTS startup that recently shut down, inspired a new **TTS model** supporting voice cloning and longform synthesis from a small startup called **Met...

2/6/2024

Qwen 1.5 Released

**Chinese AI models Yi, Deepseek, and Qwen** are gaining attention for strong performance, with **Qwen 1.5** offering up to **32k token context** and compatibil...

2/6/2024

Less Lazy AI

The AI Discord summaries for early 2024 cover various community discussions and developments. Highlights include **20** guilds, **308** channels, and **10449** ...

2/4/2024

The Core Skills of AI Engineering

**AI Discords for 2/2/2024** analyzed **21 guilds**, **312 channels**, and **4782 messages** saving an estimated **382 minutes** of reading time. Discussions in...

2/3/2024

AI2 releases OLMo - the 4th open-everything LLM

**AI2** is gaining attention in 2024 with its new **OLMo** models, including 1B and 7B sizes and a 65B model forthcoming, emphasizing open and reproducible rese...

2/2/2024

Trust in GPTs at all time low

**Discord communities** were analyzed with **21 guilds**, **312 channels**, and **8530 messages** reviewed, saving an estimated **628 minutes** of reading time....

1/31/2024

Miqu confirmed to be an early Mistral-medium checkpoint

**Miqu**, an open access model, scores **74 on MMLU** and **84.5 on EQ-Bench**, sparking debates about its performance compared to **Mistral Medium**. The **CEO...

1/30/2024

CodeLLama 70B beats GPT4 on HumanEval

**Meta AI** surprised the community with the release of **CodeLlama**, an open-source model now available on platforms like **Ollama** and **MLX** for local use...

1/30/2024

RWKV "Eagle" v5: Your move, Mamba

**RWKV v5 Eagle** was released with better-than-**mistral-7b** evaluation results, trading some English performance for multilingual capabilities. The mysteriou...

1/26/2024

GPT4Turbo A/B Test: gpt-4-0125-preview

**OpenAI** released a new **GPT-4 Turbo** version in January 2024, prompting natural experiments in summarization and discussions on API performance and cost tr...

1/26/2024

GPT4Turbo A/B Test: gpt-4-1106-preview

**OpenAI** released a new **GPT-4 Turbo** version, prompting a natural experiment in summarization comparing the November 2023 and January 2024 versions. The **...

1/25/2024

Adept Fuyu-Heavy: Multimodal model for Agents

**Adept** launched **Fuyu-Heavy**, a multimodal model focused on UI understanding and visual QA, outperforming **Gemini Pro** on the MMMU benchmark. The model u...

1/25/2024

Google Solves Text to Video

**Google Research** introduced **Lumiere**, a text-to-video model featuring advanced inpainting capabilities using a Space-Time diffusion process, surpassing pr...

1/24/2024

RIP Latent Diffusion, Hello Hourglass Diffusion

**Katherine Crowson** from **Stable Diffusion** introduces a hierarchical pure transformer backbone for diffusion-based image generation that efficiently scales...

1/22/2024

Nightshade poisons AI art... kinda?

Over the weekend of **1/19-20/2024**, discussions in **TheBloke Discord** covered key topics including **Mixture of Experts (MoE)** model efficiency, GPU parall...

1/22/2024

Sama says: GPT-5 soon

**Sam Altman** at Davos highlighted that his top priority is launching the new model, likely called **GPT-5**, while expressing uncertainty about **Ilya Sutskev...

1/18/2024

1/17/2024: Help crowdsource function calling datasets

**LM Studio** updated its FAQ clarifying its **closed-source** status and perpetual freeness for personal use with no data collection. The new beta release incl...

1/17/2024

1/16/2024: ArtificialAnalysis - a new model/host benchmark site

**Artificial Analysis** launched a new models and hosts comparison site, highlighted by **swyx**. **Nous Research AI** Discord discussed innovative summarizatio...

1/16/2024

1/16/2024: TIES-Merging

**TheBloke's Discord** community actively discusses **Mixture of Experts (MoE) models**, focusing on **random gate routing layers** for training and the challen...

1/16/2024

1/13-14/2024: Don't sleep on #prompt-engineering

The **OpenAI** Discord community engaged in diverse discussions including **prompt engineering** techniques like contrastive Chain of Thought and step back prom...

1/13/2024

1/12/2024: Anthropic coins Sleeper Agents

**Anthropic** released a new paper exploring the persistence of deceptive alignment and backdoors in models through stages of training including supervised fine...

1/12/2024

1/11/2024: Mixing Experts vs Merging Models

**18 guilds**, **277 channels**, and **1342 messages** were analyzed with an estimated reading time saved of **187 minutes**. The community switched to **GPT-4 ...

1/11/2024

1/10/2024: All the best papers for AI Engineers

**OpenAI** launched the **GPT Store** featuring over **3 million** custom versions of **ChatGPT** accessible to Plus, Team, and Enterprise users, with weekly hi...

1/11/2024

1/9/2024: Nous Research lands $5m for Open Source AI

**Nous Research** announced a **$5.2 million seed financing** focused on **Nous-Forge**, aiming to embed transformer architecture into chips for powerful server...

1/9/2024

1/8/2024: The Four Wars of the AI Stack

The **Nous Research AI Discord** discussions highlighted several key topics including the use of **DINO**, **CLIP**, and **CNNs** in the **Obsidian Project**. A...

1/8/2024

1/6-7/2024: LlaMA Pro - an alternative to PEFT/RAG??

New research papers introduce promising **Llama Extensions** including **TinyLlama**, a compact **1.1B** parameter model pretrained on about **1 trillion tokens...

1/5/2024

1/4/2024: Jeff Bezos backs Perplexity's $520m Series B.

**Perplexity** announced their **Series B** funding round with notable investor **Jeff Bezos**, who previously invested in **Google** 25 years ago. **Anthropic*...

1/4/2024

1/3/2024: RIP Coqui

**Coqui**, a prominent open source text-to-speech project from the Mozilla ML group, officially shut down. Discussions in the **HuggingFace** Discord highlighte...

1/3/2024

1/2/2024: Smol tweaks to Smol Talk

**OpenAI** Discord discussions highlight a detailed comparison of AI search engines including **Perplexity**, **Copilot**, **Bard**, and **Claude 2**, with Bard...

1/3/2024

1/1/2024: How to start with Open Source AI

**OpenAI Discord** discussions revealed mixed sentiments about **Bing's AI** versus **ChatGPT** and **Perplexity AI**, and debated **Microsoft Copilot's** integ...

1/1/2024

12/31/2023: Happy New Year

**LM Studio** community discussions highlight variations and optimizations in **Dolphin** and **Mistral 7b** models, focusing on hardware-software configuration...

12/31/2023

12/30/2023: Mega List of all LLMs

**Stella Biderman**'s tracking list of **LLMs** is highlighted, with resources shared for browsing. The **Nous Research AI** Discord discussed the **Local Atten...

12/30/2023

12/29/2023: TinyLlama on the way

The **Nous/Axolotl community** is pretraining a **1.1B model on 3 trillion tokens**, showing promising results on **HellaSwag** for a small 1B model. The **LM S...

12/29/2023

12/28/2023: Smol Talk updates

**Nous Research AI** Discord discussions covered topics such as AI placement charts, **ChatGPT**'s issues with Latex math format compatibility with Obsidian, an...

12/29/2023

12/27/2023: NYT vs OpenAI

The LM Studio Discord community extensively discussed **model performance** comparisons, notably between **Phi2** by **Microsoft Research** and **OpenHermes 2.5...

12/26/2023

12/25/2023: Nous Hermes 2 Yi 34B for Christmas

**Teknium** released **Nous Hermes 2** on **Yi 34B**, positioning it as a top open model compared to **Mixtral**, **DeepSeek**, and **Qwen**. **Apple** introduc...

12/26/2023

12/24/2023: Dolphin Mixtral 8x7b is wild

**Mistral** models are recognized for being uncensored, and Eric Hartford's **Dolphin** series applies uncensoring fine-tunes to these models, gaining popularit...

12/24/2023

12/23/2023: NeurIPS Best Papers of 2023

The **Latent Space Pod** released a **3-hour recap** of the **best NeurIPS 2023 papers**. The **Nous Research AI Discord** community discussed **optimizing AI p...

12/23/2023

12/22/2023: Anyscale's Benchmark Criticisms

**Anyscale** launched their **LLMPerf leaderboard** to benchmark large language model inference performance, but it faced criticism for lacking detailed metrics...

12/22/2023

12/21/2023: The State of AI (according to LangChain)

**LangChain** launched their first report based on **LangSmith** stats revealing top charts for mindshare. On **OpenAI**'s Discord, users raised issues about th...

12/21/2023

12/20/2023: Project Obsidian - Multimodal Mistral 7B from Nous

**Project Obsidian** is a multimodal model being trained publicly, tracked by **Teknium** on the Nous Discord. Discussions include **4M: Massively Multimodal Ma...

12/20/2023

12/19/2023: Everybody Loves OpenRouter

**OpenRouter** offers an easy OpenAI-compatible proxy for **Mixtral-8x7b-instruct**. Discord discussions highlight **GPT-4** performance and usability issues co...

12/19/2023

12/18/2023: Gaslighting Mistral for fun and profit

**OpenAI** Discord discussions reveal comparisons among language models including **GPT-4 Turbo**, **GPT-3.5 Turbo**, **Claude 2.1**, **Claude Instant 1**, and ...

12/16/2023

12/16/2023: ByteDance suspended by OpenAI

The OpenAI Discord community discussed hardware options like **Mac racks** and the **A6000 GPU**, highlighting their value for AI workloads. They compared **Cla...

12/15/2023

12/15/2023: Mixtral-Instruct beats Gemini Pro (and matches GPT3.5)

Thanks to a **karpathy** shoutout, **lmsys** now has enough data to rank **mixtral** and **gemini pro**. The discussion highlights the impressive performance of...

12/14/2023

12/14/2023: $1e7 for Superalignment

**Jan Leike** is launching a new grant initiative inspired by **Patrick Collison's Fast Grants** to support AI research. **OpenAI** introduced a new developers ...

12/13/2023

12/13/2023 SOLAR10.7B upstages Mistral7B?

**Upstage** released the **SOLAR-10.7B** model, which uses a novel Depth Up-Scaling technique built on the **llama-2** architecture and integrates **mistral-7b*...

12/13/2023

12/12/2023: Towards LangChain 0.1

The **Langchain rearchitecture** has been completed, splitting the repo for better maintainability and scalability, while remaining backwards compatible. **Mist...

12/11/2023

12/11/2023: Mixtral beats GPT3.5 and Llama2-70B

**Mistral AI** announced the **Mixtral 8x7B** model featuring a Sparse Mixture of Experts (SMoE) architecture, sparking discussions on its potential to rival **...

12/9/2023

12/9/2023: The Mixtral Rush

**Mixtral's weights** were released without code, prompting the **Disco Research community** and **Fireworks AI** to implement it rapidly. Despite efforts, no s...

12/8/2023

12/8/2023 - Mamba v Mistral v Hyena

Three new AI models are highlighted: **Mistral's 8x7B MoE model (Mixtral)**, **Mamba models** up to 3B by Together, and **StripedHyena 7B**, a competitive subqu...

12/7/2023

12/7/2023: Anthropic says "skill issue"

**Anthropic** fixed a glitch in their **Claude 2.1** model's needle in a haystack test by adding a prompt. Discussions on **OpenAI's** Discord compared **Google...

12/6/2023

Is Google's Gemini... legit?

**Google's Gemini** AI model is generating significant discussion and skepticism, especially regarding its **32-shot chain of thought** MMLU claim and **32k con...