Glossary / Deployment & Ops

AI Gateway

The middleware between your applications and AI providers. Every enterprise will have one by 2027 — the only question is whether you build it on purpose or accumulate it by accident.

Deployment & Ops

The Technical Definition

An AI Gateway is a middleware layer that sits between your applications and the AI model providers they call. Instead of every application connecting directly to OpenAI, Anthropic, or a self-hosted model, calls flow through a single gateway that handles routing across providers, rate limiting, response caching, cost attribution, audit logging, prompt and PII filtering, key management, retries, and fallback when a provider is down.

Portkey, Kong AI Gateway, LiteLLM, Cloudflare AI Gateway, and Databricks AI Gateway all operate in this category. The pattern mirrors the API gateway layer that became standard infrastructure in the 2010s — same shape, different traffic.

What This Actually Means for Your Business

By the end of 2025, most enterprise organizations had a quiet problem: nobody knew how many AI calls the company was making, to which providers, at what cost, on whose behalf. Marketing was using one model through a SaaS tool. Support was using another through a vendor. Engineering had API keys in three different cloud accounts. Finance saw an OpenAI invoice grow 400 percent year over year and could not explain why.

An AI Gateway solves the visibility problem first. Every call gets attributed to a team, an application, and a use case. Every cost gets logged. Every output gets auditable. The CFO can finally answer the question: what did we spend on AI last month, and on what.

The second thing it solves is provider concentration. If your customer support agent runs on a single provider and that provider has an outage — which has now happened to all of the major ones — your support stops. A gateway lets a team configure fallback routing: try Anthropic, fall back to OpenAI, fall back to a self-hosted model for non-sensitive queries. Same input, multiple paths, no single point of failure.

The third thing it solves is policy. Enterprise security teams cannot review every prompt sent to every model from every application. They can review the gateway. PII redaction, prompt injection filtering, output classification, geo-restriction, allow-listing of models for regulated workloads — these become policies enforced once, applied everywhere.

The companies that don’t build a gateway on purpose end up with one anyway, made of duct tape, by 2027. The difference is whether it was designed or accumulated.

Reality Check

What the vendor says: “Just plug your applications into our gateway and you’re done.”

What that means in practice: Every application that currently calls a model directly needs to be re-routed through the gateway. Every API key needs to be migrated. Every team needs to agree on the policy framework. The gateway is the easy part. The retrofit is where the budget goes.

What Operators Actually Do

Companies deploying gateways well treat them as infrastructure, not products. They pick one — based on hosting model, provider coverage, and policy capabilities — and they make it the only sanctioned way for an internal application to call a model. Direct calls get blocked at the network layer. Shadow IT gets surfaced through cost reporting.

They also use the gateway to make model choice negotiable. Instead of locking an application to a provider, they expose a logical model name (such as “fast-summary” or “high-stakes-reasoning”) and let the gateway route to the actual provider behind it. When a better or cheaper option appears, the swap happens at the gateway, not in fifteen application repos.

The other discipline that separates the well-run deployments: they treat the gateway as the source of truth for AI cost. Finance reads from it. Procurement reads from it. The CFO’s quarterly AI line item is computed from gateway logs, not from provider invoices reconciled by hand. That single change reshapes how AI spending gets governed.

The Questions to Ask

  1. Where do our AI calls go today, and who owns each path? If the answer requires a survey of teams, the gateway is overdue. The first job of the gateway is to make this question answerable in a query.

  2. What happens when our primary provider has an outage? If the answer is “the application stops,” the gateway is providing routing, not resilience. Fallback paths need to be configured and tested, not theoretical.

  3. Who enforces our AI security policy today, application by application or in one place? Distributed enforcement is no enforcement. The gateway is the only layer where policy can be applied once and audited reliably.

Get the next Brief

One operator. Every other Wednesday.

Plus the AI Glossary and the Failure Museum.
Real names. Real numbers. Honest analysis.