Generative Models
Models that learn to generate new data: images, text, audio, video. GANs, VAEs, diffusion models, and large language models.
Sub-topics
Goodfellow et al. (2014) introduced adversarial training: a generator creates samples while a discriminator judges authenticity. Produced breakthrough image synthesis results.
Kingma and Welling (2013) combined autoencoders with variational inference, creating a principled framework for learning latent representations and generating new samples.
Ho et al.'s DDPM (2020) generates data by learning to reverse a noise diffusion process. Achieved image quality surpassing GANs with more stable training.
Stability AI's open-source latent diffusion model (2022). Performs diffusion in a compressed latent space, making high-quality image generation accessible on consumer GPUs.
OpenAI's text-to-image model (January 2021) using a transformer to generate images from text descriptions. DALL-E 2 (2022) and DALL-E 3 (2023) dramatically improved quality.
Massive transformer-based models trained on internet-scale text corpora. Exhibit emergent abilities including reasoning, code generation, and instruction following.
The field of synthesizing images from noise, text, or other inputs. Evolved from GANs through VAEs to diffusion models, achieving photorealistic quality by 2022.
Generating video from text descriptions. OpenAI's Sora (2024) and similar models extend diffusion to temporal consistency. One of the most challenging generative tasks.