How is imagen trained?

Instead of training Imagen using an image-text dataset, the Google team simply utilised a "off-the-shelf" text encoder, T5, to turn input text into embeddings. Imagen employs a series of diffusion models to turn the embedding into a picture.