A New Era of Text Generation Paved by Diffusion Models
On June 10, 2026, Google DeepMind announced 'DiffusionGemma,' an experimental open model that brings innovation to the field of text generation. The most significant feature of this model is its application of 'Diffusion Model' technology, previously used mainly in image generation AI, to achieve text generation up to four times faster on GPUs than conventional general language models. This is expected to further advance the use of AI in applications where real-time performance is crucial.
Technical Details: Why is it 4x Faster?
Traditional Large Language Models (LLMs) like the GPT series primarily use 'autoregressive models,' generating words (tokens) one by one in sequence. While this method produces high-quality text, its sequential processing limits generation speed. In contrast, DiffusionGemma adopts an approach where it starts from noise and restores it into meaningful text over multiple steps. Specifically, it generates entire text in parallel, in block units of, for example, 256 tokens at once, and refines it iteratively. This 'printing press'-like approach maximizes GPU computational resources, achieving dramatic speed improvements. Furthermore, it is a 26-billion-parameter Mixture of Experts (MoE) model based on the Gemma 4 architecture, but only 3.8 billion parameters are active during inference, and it is designed to fit within consumer GPUs with 18GB of VRAM or less when quantized.
Impact on Engineers and Future Prospects
The advent of DiffusionGemma holds significant implications for engineers in Japan. For instance, it enables the development of applications that dramatically improve user experience, such as drastically faster chatbot response times, real-time code completion and inline editing, and document summarization. Notably, the 'bidirectional attention' property, where tokens within a generated block can refer to each other's context in both directions, demonstrates powerful performance in non-linear tasks like code completion. Reduced inference costs will allow AI functionalities to be integrated into more services at lower costs, expanding business opportunities. While Google recommends the standard Gemma 4 for cases where quality is paramount, DiffusionGemma has the potential to become a new standard for speed-critical use cases. This model is released under the Apache 2.0 license and is available from platforms like Hugging Face. Engineers will need to closely monitor the developments of this new architecture going forward.
📦