Back to Blog

Jamba: Mixture of Architectures dethrones Mixtral

AI21 labs released Jamba, a 52B parameter MoE model with 256K context length and open weights under Apache 2.0 license, optimized for single A100 GPU performance. It features a unique blocks-and-layers architecture combining transformer and MoE layers, competing with models like Mixtral. Meanwhile, Databricks introduced DBRX, a 36B active parameter MoE model trained on 12T tokens, noted as a new standard for open LLMs. In image generation, advancements include Animatediff for video-quality image generation and FastSD CPU v1.0.0 beta 28 enabling ultra-fast image generation on CPUs. Other innovations involve style-content separation using B-LoRA and improvements in high-resolution image upscaling with SUPIR.

Read original post