Private LLM hosting: dedicated, on-prem, and sovereign options, compared.
Public AI cloud is fine — until it isn't. For ITAR, CUI, classified, or contractually air-gapped work you need a private LLM. Here's how the options actually compare, and what Backplain Sovereign Compute adds.
Public AI stops at the compliance line.
Every regulated organization eventually hits the same wall: the frontier model that would answer their question is only available on shared, multi-tenant infrastructure operated by a hyperscaler that can't (or won't) sign the specific contract they need. ITAR, CUI, sealed litigation, sovereign-nation data, classified research — the list is long and growing.
Private LLM hosting solves this by moving the model to infrastructure you control. The tradeoff is model freshness (open-weight models trail the frontier by 3–6 months) and operational overhead (you're now running an AI stack, not just consuming one). Backplain Sovereign Compute is our answer to the second problem.
Four ways to run a frontier LLM privately.
| Option | Model access | Data boundary | Operational burden | Best for |
|---|---|---|---|---|
| Public API (OpenAI / Anthropic / Google) | Full frontier, day-of-release | Shared multi-tenant, upstream vendor | None | Most workloads · with a redaction layer (AI Firewall) |
| Cloud enterprise tier (Azure OpenAI, Vertex, Bedrock) | Full frontier, dedicated capacity | Single-cloud tenancy · vendor BAA | Low | Regulated workloads that accept hyperscaler as processor |
| Self-hosted open weights | Llama 4, Mistral, Codestral, Pixtral | You own everything | High — GPU ops, model updates, evals | Teams with real MLOps capacity |
| Backplain Sovereign Compute | Curated open-weight lineup, kept current | Dedicated single-tenant · your VPC or ours | None — Backplain operates it | ITAR / CUI / sovereign / air-gapped work |
Backplain's approach: use public frontier models behind the AI Firewall for general work, and route regulated workloads to Sovereign Compute — same interface, same audit log, different model boundary.
Which private-hostable models actually matter.
| Model | Maker | Context | Strength | Notes |
|---|---|---|---|---|
| Llama 4 Maverick | Meta | 1M | General reasoning, multi-modal | Best all-round open-weight |
| Llama 4 Scout | Meta | 10M | Extreme context, retrieval | For deep-doc and codebase work |
| Mistral Large 2 | Mistral | 128K | Efficient reasoning | EU-hosted, GDPR-strict friendly |
| Codestral 25.01 | Mistral | 256K | Code generation, 80+ languages | Specialized code model |
| Pixtral Large | Mistral | 128K | Multi-modal (text + image) | EU-hosted vision |
| DeepSeek V3 | DeepSeek | 128K | Strong reasoning, low cost | Popular for on-prem eval |
ITAR / CUI research
Defense and dual-use research where any data crossing a US-person boundary is a violation. Sovereign Compute deploys inside the compliant boundary from day one.
Sovereign-nation deployments
Governments and regulated national infrastructure that require in-country processing. Backplain deploys per jurisdiction with local operational staff.
Sealed litigation & M&A
Deal rooms and sealed matters where even the existence of the prompt is confidential. Single-tenant hosting removes multi-tenant metadata leakage.
On-prem healthcare & research
Academic medical centers with data-use agreements that prohibit cloud egress. On-prem Llama 4 with the AI Firewall pattern, operated by Backplain.
The Sovereign AI Buyer's Guide
A one-page brief on when to use public frontier, dedicated cloud, self-hosted open-weight, or true sovereign. Sent to your inbox.
We'll only use your email to send the guide and occasional Backplain updates. Unsubscribe anytime.
If public AI won't work, private will.
Sovereign Compute inquiries route directly to the founder.