Recommendation Engine

The 'you might also like' system Netflix made famous. LLMs are changing what's possible — but the boring old approaches still win in most enterprise use cases.

Industry Applications

The Technical Definition

A recommendation engine predicts what a user will want based on what other users wanted, what the user has chosen before, or some blend of the two. The three classical approaches: collaborative filtering (“people who bought this also bought”), content-based filtering (“you liked this product, here are similar products by attribute”), and hybrid systems that combine both with business rules. The newer approach uses LLM-generated embeddings to represent users and items as vectors in semantic space, then recommends by similarity.

The math is well-understood. The engineering is mostly a data problem. The recommendation engine is only as good as the behavioral data feeding it.

What This Actually Means for Your Business

The Netflix-style “recommended for you” works at Netflix because they have 250 million users generating billions of interactions, a tightly bounded catalog of 15,000 titles, and clear behavioral signals — users finish a movie or they don’t. Your B2B catalog has 800 SKUs, your customers visit twice a quarter, and you have no idea whether a click meant interest or whether the user just got lost. The same recommendation algorithm produces wildly different results.

This is the gap most CEOs miss. Recommendation engines are sold as a feature. They behave as a data system. If your customer data is sparse — most enterprise data is — the engine has nothing to learn from, and the recommendations will be worse than a hand-curated “top sellers” list maintained by a merchandiser. The boring approach often outperforms the sophisticated one.

LLMs are changing the math in two specific ways. First, semantic understanding of products: instead of needing thousands of clicks to learn that “industrial grade lithium battery 18650” is similar to “rechargeable Li-ion 18650 cell,” an LLM-generated embedding knows that on day one, from the product description alone. This is a meaningful upgrade for catalogs with sparse interaction data — which is most enterprise catalogs. Second, conversational recommendations: a user can describe what they need in natural language (“something like the part we ordered last March but rated for higher temperatures”) and the system can match it without exact keyword overlap.

But LLM-based recommendations also fail in places the old approaches didn’t. They don’t know your business rules — that this customer is on a contract that excludes certain SKUs, that this product is end-of-life, that we always upsell from the basic to the pro version. They hallucinate plausible-but-wrong product associations. They cost an order of magnitude more per recommendation than a SQL query against a precomputed similarity table. For high-volume, low-margin recommendations (the homepage carousel, the email blast), the old approach still wins on cost and reliability. The new approach wins for high-stakes, low-volume conversations where understanding intent matters more than serving recommendations at sub-50ms latency.

Reality Check

What the vendor says: “Our AI-driven recommendation engine personalizes the experience for every customer.”

What that means in practice: It’s collaborative filtering with embeddings layered on top, and the personalization quality is bounded by how much behavioral data you have on each customer. For your top 100 accounts with rich purchase history, it works well. For the 80% of customers with two purchases on file, it’s effectively recommending top sellers with a thin coat of paint.

What Operators Actually Do

The teams getting real lift from recommendations start by being honest about their data. They count how many customers have meaningful behavioral history — typically more than 10 interactions — and they accept that the recommendation engine works for that segment and not for the rest. The rest get top sellers, contract-specific defaults, or merchandiser-curated lists, which is fine.

They also separate the offline math from the runtime serving. The expensive computation — training a model, generating embeddings, building similarity tables — runs nightly or weekly. The runtime lookup is a fast database query. This is how Netflix and Amazon serve recommendations at sub-50ms latency at scale, and it’s how your team should serve them too. Calling an LLM at request time for every recommendation looks elegant in a demo and falls over at 10,000 requests per second.

The third pattern, increasingly common: hybrid systems where the classical engine generates candidates and an LLM re-ranks the top 20 using context the classical model can’t see (the user’s stated intent in this session, the product page they’re currently viewing, business rules about contracts and inventory). This gets you the cost profile of the old approach with the contextual intelligence of the new one. It’s also where the operational complexity goes up, so the team needs to be prepared to maintain two systems instead of one.

The Questions to Ask

How much behavioral data do we actually have per customer? If most of your customers have fewer than 10 interactions on file, no recommendation engine will work well. Fix the data problem first, or use simpler defaults.
What’s the cost per recommendation, and how does that compare to the lift in conversion? A recommendation that costs $0.02 to generate and lifts conversion by 0.5% on a $200 order is profitable. The same cost on a $20 order is not. Run the math before scaling.
What business rules does the engine need to respect, and how are they enforced? Contracts, inventory, end-of-life products, regulatory restrictions. These are where vendor-built engines quietly recommend the wrong thing. The rule layer is your team’s responsibility, not the vendor’s.

The Technical Definition

What This Actually Means for Your Business

Reality Check

What Operators Actually Do

The Questions to Ask

One operator. Every other Wednesday.