Embeddings are the workhorse of modern ML — they're how recommenders, semantic search, and LLMs all start. The literature makes them sound mystical. They're not. If you can multiply two arrays of numbers and add the results, you can build, debug, and reason about embeddings.
The one-paragraph definition
An embedding is a fixed-length list of numbers that represents an object. The trick: we choose those numbers so that two objects that are similar end up with similar lists, and two objects that are different end up with different lists. That's it. The "model" is just the function that produces the list.
Why dot products work
When two unit-length vectors point in similar directions, their dot product is close to 1. When they point opposite, it's close to -1. So the dot product is a similarity score — and that's exactly what a recommender needs. No fancy math, just geometry.
60 lines to a movie recommender
Take a dataset of (user, movie, rating) triples. Initialise random vectors for every user and movie. For each row, predict rating = dot(user, movie). Take the gradient. Nudge the vectors. Repeat. After a few thousand steps, similar movies cluster together and you can recommend by nearest neighbour.
What embeddings are NOT
They're not magic. They're not interpretable feature by feature. They're not a substitute for clean data. They are a fast, cheap way to compress "how similar is X to Y" into a number you can compute in microseconds.
Where to go next
Once you understand the dot-product version, the leap to transformer embeddings is mostly engineering: bigger model, more dimensions, smarter optimisation. The intuition stays the same — distance in vector space encodes similarity in the real world.