Google's open-source diffusion language model generates 256 tokens in parallel and self-corrects, hitting 4x speed on one GPU ...
Deploying DFlash block diffusion on NVIDIA hardware accelerates autoregressive LLMs during latency-sensitive inference.
Another day, another AI model from Google. This time, Google DeepMind has released a new member of the Gemma 4 open model ...
Every time a language model like GPT-4, Claude or Mistral generates a sentence, it does something deceptively simple: It picks one word at a time. This word-by-word approach is what gives ...
Google recently released DiffusionGemma, and it's weird in the best way.
DiffusionGemma generates text up to 4x faster than traditional models by producing entire blocks simultaneously, achieving roughly 1,479 tokens per second.
Google DeepMind released DiffusionGemma on June 10, 2026, an experimental open-weights model that writes text using discrete ...