Transformer Architecture (2017)
Vaswani et al.'s 'Attention Is All You Need' (2017) replaced recurrence with self-attention. Enabled massive parallelization and scaling, becoming the foundation of modern AI.
Sub-topics
Google's Bidirectional Encoder Representations from Transformers (2018). Pre-trained on masked language modeling, it set new benchmarks across 11 NLP tasks.
OpenAI's first Generative Pre-trained Transformer (2018). Demonstrated that unsupervised pre-training on text followed by fine-tuning could achieve strong NLP performance.
OpenAI's 175-billion parameter model (2020) demonstrated emergent few-shot learning abilities. Showed that scaling language models yields qualitatively new capabilities.
OpenAI's multimodal model released March 2023. Accepts text and images, passes professional exams (bar exam 90th percentile), and demonstrates broad reasoning abilities.
Dosovitskiy et al. (2020) applied transformers directly to image patches, proving that pure attention without convolutions matches or beats CNNs on image classification.