Glossary / Deployment & Ops

Private Cloud AI

Models running in a vendor's cloud, in a tenant isolated to you. The middle ground between API convenience and on-prem control. What 'isolated' actually means — and what it doesn't.

Deployment & Ops

The Technical Definition

Private cloud AI means running models inside a hyperscaler’s cloud (AWS, Azure, Google) in a tenant configured so the inference traffic, prompts, and outputs stay isolated to your account. AWS Bedrock with VPC endpoints, Azure OpenAI with private link, Google Vertex AI with private service connect — these are the production patterns.

The model itself runs on the vendor’s GPUs. Your data flows to and from the model over private networking that doesn’t traverse the public internet. The vendor commits — contractually — that prompts and outputs aren’t used to train their models and aren’t logged for human review.

What This Actually Means for Your Business

Private cloud AI is the middle ground most large enterprises actually deploy. It gets you 80% of the control of on-prem at 20% of the operational cost, and it gives you access to frontier closed models (GPT-4 via Azure OpenAI, Claude via Bedrock) that you can’t run on your own hardware.

The catch is the word “isolated.” Read the fine print on three things.

First, isolated from training does not mean isolated from the vendor. Microsoft, Amazon, and Google still operate the underlying infrastructure. They have the keys. They run the patches. A subpoena to them is a subpoena to your data. If you need to be outside US jurisdiction (or inside a specific EU one), private cloud on a US hyperscaler doesn’t get you there. Sovereign AI does.

Second, “private” varies by SKU. Default Bedrock is multi-tenant — your inference shares GPUs with other customers. Provisioned Throughput dedicates capacity but not always hardware. Same for Azure. The thing your security team thinks they bought may not be the thing the procurement contract actually delivered. Get the SKU in writing and verify with the vendor’s solutions architect, not the salesperson.

Third, regional residency. AWS, Azure, and Google offer specific regions where data stays within a geography. Sometimes. Some models in some regions actually route inference to a different region under load. This is documented if you look for it. Most procurement reviews don’t.

Reality Check

What the vendor says: “Your data stays in your private VPC and is never used to train our models.”

What that means in practice: Your prompts and outputs aren’t used for training and aren’t human-reviewed. They are still processed by the vendor’s infrastructure, in a region the vendor operates, under the vendor’s operational keys. That’s a meaningful step up from public API. It’s not the same thing as “your data never leaves your control.”

What Operators Actually Do

The operators using private cloud well treat it as the default deployment for sensitive but not classified workloads. Customer support content, internal knowledge bases, sales call analysis, contract review — workloads where you want frontier capability and an audit trail but don’t have a regulator demanding the bits never leave your perimeter.

They also do the contract work most teams skip. Specifically: a written commitment on data residency by region, a written commitment on logging and human review, a Data Processing Addendum that names the legal basis under GDPR (or the equivalent in your jurisdiction), and a list of every subcontractor in the inference path. Bedrock alone touches multiple AWS sub-services. Each one needs to be in scope.

The other pattern: they pair private cloud with an abstraction layer (LiteLLM, an internal gateway, or a vendor like Portkey) so they can switch between Bedrock, Azure, and Vertex without rewriting application code. Vendor lock-in at the deployment layer is the silent cost of private cloud — and the abstraction layer is the cheap insurance policy.

The Questions to Ask

  1. What SKU am I actually buying, and is the isolation what I think it is? Get the configuration in writing. “Bedrock” alone tells you nothing — Provisioned Throughput vs. on-demand vs. dedicated capacity behave differently for compliance.

  2. Where is the inference physically running, and what jurisdictions are in the data path? US hyperscaler in an EU region is still a US-headquartered operator. If that matters to your regulator or your customers, you need to know now, not at audit time.

  3. What happens if I need to migrate off this vendor in 18 months? What’s the contract exit, what’s locked in (fine-tuned models, embeddings, vector stores), and how long does a real migration take? The honest answer is usually six to nine months. Plan accordingly.

Get the next Brief

One operator. Every other Wednesday.

Plus the AI Glossary and the Failure Museum.
Real names. Real numbers. Honest analysis.