Tokens to Vectors: How LLMs Process Language

Stage 1 — Tokenisation to Embedding

"cat"

→

4,096-dimensional vector

0.234 -0.871 0.125 0.456 -0.332 0.089 ... 4,090 more

Each dimension captures a learned semantic feature

Words don't enter the model as text. Each token is immediately converted into a vector — a list of thousands of numbers representing its position in high-dimensional space. These numbers are learned during training and encode relationships between concepts.

Stage 2 — Semantic Vector Space

dimension 847 → dimension 2,103 →

cat

dog

pet

money

deposit

loan

river

water

stream

bank

democracy

animals finance nature

Similar concepts cluster together in vector space. "Cat" and "dog" are close; "cat" and "democracy" are far apart. Ambiguous words like "bank" sit between their possible meanings — until context pulls them toward one cluster or another.

Stage 3 — Attention Mechanism

The fish swam to the bank of the river

The 0.02

fish 0.15

swam 0.11

to 0.01

the 0.01

bank —

of 0.03

the 0.01

river 0.66

When processing "bank," the attention mechanism computes similarity between bank's Query vector and every other token's Key vector. High similarity = high attention weight. "River" gets 0.66 attention; "the" gets almost none. These weights determine how much each token's Value vector contributes to bank's updated representation.

Stage 4 — Contextual Vector Shift

Financial meaning

Riverbank meaning

river (0.66)

fish (0.15)

bank (initial)

bank (after attention)

After attention, "bank" isn't in the same place in vector space. The weighted contribution of "river" (and "fish," "swam") has pulled the vector toward the nature cluster. It's not that information was added to a packet — the vector itself was warped by gravitational influence from contextually relevant tokens.