1 TRILLION token context, real time, on device?

5/29/2024

Cartesia, a startup specializing in state space models (SSMs), launched a low latency voice model outperforming transformer-based models with 20% lower perplexity, 2x lower word error, and 1 point higher NISQA quality. This breakthrough highlights the potential for models that can continuously process and reason over massive streams of multimodal data (text, audio, video) with a trillion token context window on-device. The news also covers recent AI developments including Mistral's Codestral weights release, Schedule Free optimizers paper release, and Scale AI's new elo-style eval leaderboards. Additionally, a debate between yann-lecun and elon-musk on the importance of publishing AI research versus engineering achievements was noted. The Gemini 1.5 Pro/Advanced models were mentioned for their strong performance.

Read original post

1 TRILLION token context, real time, on device?

Want help turning this idea into a production system?