Lab / Transformer Attention Visualizer
👁

Transformer Attention Visualizer

Visualize self-attention, multi-head attention, and positional encoding in transformers. Type text and see attention heatmaps update live.

1.0
6 tokens

Click a token to highlight its attention pattern

Head:

Type some text above to see the attention heatmap.

How it works

Tokenization: Text is split on whitespace and punctuation into tokens.

Embeddings: Each token gets a deterministic pseudo-random embedding vector of size dmodel. Sinusoidal positional encoding is added so the model can distinguish token positions.

Self-Attention: For each head, random weight matrices WQ, WK, WV project embeddings into queries, keys, and values. Attention scores are computed as softmax(Q · KT / √dk). The temperature parameter sharpens (<1) or softens (>1) the distribution.

Multi-Head Attention: 4 independent heads with separate weight matrices learn different attention patterns. In a real transformer, outputs are concatenated and projected; here each head is shown independently for educational clarity.

All attention computations run from scratch in your browser — no ML libraries or external APIs. Random Q, K, V weight matrices are generated on load; click "Randomize Weights" to resample.