Google DeepMind announces Gemini Diffusion, a diffusion model that generates text at explosive speeds

Google has announced a diffusion model called Gemini Diffusion that can process 1,479 tokens per second, generating content faster than the 'fastest model ever made.'
Gemini Diffusion - Google DeepMind

Gemini Diffusion: Google DeepMind's experimental research model
Gemini Diffusion generates text using a ' diffusion model ' that is primarily used in image generation AI.
According to Google, traditional autoregressive language models generate text one word (token) at a time, which takes time and can limit the quality and consistency of the output.
Unlike this, the diffusion model learns to generate output by gradually improving noise instead of directly predicting text. This allows it to process the output quickly and also correct errors during the output process. According to Google, the overhead from entering the prompt to starting generation is only 0.84 seconds, and the sampling rate excluding overhead reaches 1479 tokens per second.

'Gemini Diffusion helps you get better at tasks like solving math problems and generating code,' Google said.
An image of Gemini Diffusion processing has been released.
In benchmarks, Gemini Diffusion performed comparable to Google's lower-cost model,

Google has released a demo of Gemini Diffusion, but you have to join a waiting list to access it.
The company also plans to release the faster Gemini 2.0 Flash-Lite soon.