Transformer Attention Visualizer

Visualize self-attention, multi-head attention, and positional encoding in transformers. Type text and see attention heatmaps update live.

Input text

Presets

Dim:

Temperature:1.0

6 tokens

Click a token to highlight its attention pattern

Head:

Type some text above to see the attention heatmap.

How it works

Tokenization: Text is split on whitespace and punctuation into tokens.

Embeddings: Each token gets a deterministic pseudo-random embedding vector of size d_model. Sinusoidal positional encoding is added so the model can distinguish token positions.

Self-Attention: For each head, random weight matrices W_Q, W_K, W_V project embeddings into queries, keys, and values. Attention scores are computed as softmax(Q · K^T / √d_k). The temperature parameter sharpens (<1) or softens (>1) the distribution.

Multi-Head Attention: 4 independent heads with separate weight matrices learn different attention patterns. In a real transformer, outputs are concatenated and projected; here each head is shown independently for educational clarity.

All attention computations run from scratch in your browser — no ML libraries or external APIs. Random Q, K, V weight matrices are generated on load; click "Randomize Weights" to resample.