A no-magic intro to embeddings for engineers who never took ML

Embeddings are the workhorse of modern ML — they're how recommenders, semantic search, and LLMs all start. The literature makes them sound mystical. They're not. If you can multiply two arrays of numbers and add the results, you can build, debug, and reason about embeddings.

The one-paragraph definition

An embedding is a fixed-length list of numbers that represents an object. The trick: we choose those numbers so that two objects that are similar end up with similar lists, and two objects that are different end up with different lists. That's it. The "model" is just the function that produces the list.

Why dot products work

When two unit-length vectors point in similar directions, their dot product is close to 1. When they point opposite, it's close to -1. So the dot product is a similarity score — and that's exactly what a recommender needs. No fancy math, just geometry.

60 lines to a movie recommender

Take a dataset of (user, movie, rating) triples. Initialise random vectors for every user and movie. For each row, predict rating = dot(user, movie). Take the gradient. Nudge the vectors. Repeat. After a few thousand steps, similar movies cluster together and you can recommend by nearest neighbour.

What embeddings are NOT

They're not magic. They're not interpretable feature by feature. They're not a substitute for clean data. They are a fast, cheap way to compress "how similar is X to Y" into a number you can compute in microseconds.

Where to go next

Once you understand the dot-product version, the leap to transformer embeddings is mostly engineering: bigger model, more dimensions, smarter optimisation. The intuition stays the same — distance in vector space encodes similarity in the real world.

Written by Priya Iyer

Curriculum Lead — Data Science

Your first step to find your dream tech job.

Join the next cohort. Live mentor sessions, real projects, and lifetime placement support.

Start Learning

18,000+ learners

already enrolled

94%

placement rate within 6 months

₹1.8 Cr

highest package, FY 2025

900+

hiring partners

18,000+

learners trained