Transformer Attention Visualizer
Visualize self-attention, multi-head attention, and positional encoding in transformers. Type text and see attention heatmaps update live.
Click a token to highlight its attention pattern
Type some text above to see the attention heatmap.
How it works
Tokenization: Text is split on whitespace and punctuation into tokens.
Embeddings: Each token gets a deterministic pseudo-random embedding vector of size dmodel. Sinusoidal positional encoding is added so the model can distinguish token positions.
Self-Attention: For each head, random weight matrices WQ, WK, WV project embeddings into queries, keys, and values. Attention scores are computed as softmax(Q · KT / √dk). The temperature parameter sharpens (<1) or softens (>1) the distribution.
Multi-Head Attention: 4 independent heads with separate weight matrices learn different attention patterns. In a real transformer, outputs are concatenated and projected; here each head is shown independently for educational clarity.
All attention computations run from scratch in your browser — no ML libraries or external APIs. Random Q, K, V weight matrices are generated on load; click "Randomize Weights" to resample.