Self-Hosted Code Execution Sandbox for Production AI
Most code-execution sandboxes for AI are hosted services: you call an API, the LLM's code runs on the vendor's machines, and you never touch a KVM host. For a lot of teams that's the right trade. But there's a specific buyer for whom 'the code runs on someone else's infrastructure' is a non-starter — because of where data is allowed to live, what a customer's contract forbids, or what the bill looks like at scale. This post is for that reader: why you'd run a self-hosted code execution sandbox, what it honestly costs in operational weight, and the cases where you should stay hosted anyway.
The phrase people search for is 'self-hosted E2B alternative,' and it's a good frame. E2B is a strong, hosted-first Firecracker sandbox; the reason you're looking past it is usually not a missing feature — it's the deployment model. So the real question isn't 'which sandbox is best,' it's 'should untrusted code execute on my infrastructure or someone else's,' and what the answer costs either way.
Why self-host a code execution sandbox at all
Self-hosting is more work than calling a hosted API — there's no pretending otherwise. So it only makes sense when one of a few specific forces is pushing you, and they're nearly always about control rather than features. Three reasons account for most of it.
Data residency and compliance
If you're running AI-generated code, that code frequently touches your customers' data — it reads files, queries databases, processes documents. The moment execution happens on a third party's machines, that customer data (and the model's output derived from it) has left your trust boundary. For a large class of regulated buyers — healthcare, finance, government, EU data-residency regimes, enterprise contracts with explicit 'no data leaves our VPC' clauses — that's simply not allowed, regardless of how good the vendor's security is.
Self-hosting collapses that problem. The sandbox runs inside your VPC, on hosts you control, in the region your compliance team signed off on. Customer code and the data it processes never cross a boundary you can't account for. This is the single most common reason teams move off a hosted-only provider — not because the hosted one is insecure, but because 'where does execution physically happen' is a line item in a contract or an audit, and the only answer that passes is 'our infrastructure.'
VPC isolation and the audit surface
Beyond residency, there's the network-shape argument. When the sandbox runs in your VPC, the code that needs to reach your internal Postgres, your private S3 bucket, or a service behind your firewall can do so over private networking instead of a public ingress you had to punch open for a hosted vendor. You control egress policy per sandbox, you see the traffic in your own observability stack, and the execution layer is something your security team can read and audit rather than a black box they have to take on trust. For a security review, 'here is the open-source code that runs the VM and here are its network rules' is a fundamentally easier conversation than 'we trust this SaaS.'
Cost control at scale
Hosted sandboxes bill on metered usage — some mix of CPU time, memory, creations, storage, and egress. At low and bursty volume, that metering almost always wins: you pay for what you use and operate nothing. But the curve crosses. If you're spinning up millions of sandboxes a month, or running long-lived agent environments continuously, the per-second hosted bill can dwarf the cost of the equivalent compute on hardware you own or reserve. At that point, owning the substrate — paying for the KVM hosts plus the engineering to run them — flips from liability to savings. The honest version: the crossover point is real but it's high, and you should model it against your own volume before assuming you've hit it. Most teams have not.
What it actually takes to run (the honest ops weight)
Here's the part vendor pages skip. A self-hosted code execution sandbox is real infrastructure, and you are now the one operating it. Concretely, you take on four moving parts.
- KVM hosts. Untrusted code needs hardware-virtualization isolation, which means Linux machines exposing /dev/kvm — bare metal, or cloud instances that support nested virtualization. You're responsible for provisioning, patching, and capacity-planning them. This is the load-bearing requirement: no /dev/kvm, no microVM sandbox.
- An agent fleet. Each host runs a per-host agent that manages the VM lifecycle — creating microVMs, wiring networking, restoring snapshots, reaping idle sandboxes. You deploy it, monitor it, roll it forward, and handle the case where a host dies with live sandboxes on it.
- Snapshot and template storage. The fast boot path depends on baked snapshots and rootfs images. You decide where those live (local disk, S3-compatible object storage), how they replicate to new hosts, and how you keep them current when a template changes. Multi-host means an artifact-distribution problem you now own.
- Networking and the control plane. Per-sandbox network isolation, egress rules, and a control-plane API with auth, scheduling across hosts, and a metadata database (Postgres in production). On a single host this is light; across a fleet it's a distributed system with leases, heartbeats, and scheduling to keep healthy.
None of that is exotic if you already run infrastructure — it's the same class of problem as operating any fleet of stateful compute. But it is not zero, and it doesn't go away after setup. You're signing up for ongoing operations: host patching, capacity headroom, snapshot hygiene, and being on call when a KVM host wedges. If your team's reaction to that list is 'fine, that's Tuesday,' self-hosting is viable. If it's 'we don't have anyone for that,' read the next section carefully before you commit.
When a hosted provider is the better call
Being honest about self-hosting means being honest about when it's the wrong move. Plenty of teams should stay on a hosted sandbox, and the reasons are legitimate, not consolation prizes.
- You don't have an infra team (and don't want one). If nobody on the team wants to be paged for a KVM host at 2am, a hosted provider is genuinely less work and that operational simplicity has real dollar value. Don't take on a fleet to save money you'll spend twice over in engineering time.
- Your volume is low or spiky. Metered hosted billing is built exactly for bursty, unpredictable usage. Below the cost crossover — and most workloads are below it — paying per-second beats paying for idle hardware plus the people to run it.
- You have no residency or compliance constraint. If there's no contract clause or regulation forcing execution into your VPC, the strongest reason to self-host evaporates. Convenience usually wins the tie.
- You're early and optimizing for speed. When you're still finding product-market fit, the right move is to ship on a hosted sandbox and revisit self-hosting once volume, compliance, or cost actually force the question — not before.
The clean test: self-host when something is forcing your hand — a residency rule, a compliance audit, a VPC-isolation requirement, or a cost curve you've actually modeled and crossed. Self-host because it's 'more in control' in the abstract, with none of those pushing, and you'll likely spend more on operations than you ever would have on the hosted bill.
PandaStack: the Apache-2.0 self-host path
If you've decided self-hosting is the right call, here's where PandaStack fits. The core is open-source under Apache-2.0 and built to run end-to-end on your own Linux KVM hosts — anything exposing /dev/kvm. You run two things: the control-plane API and a per-host agent. The control plane handles auth, scheduling, and metadata; the agent on each host manages the microVM lifecycle. Sandboxes execute entirely on your infrastructure. There's a hosted offering too, but self-host is a first-class, supported path — the same binaries, the same agent, with the SDK's base URL configurable so identical code points at either.
On the isolation question that matters most for untrusted code: every sandbox is a Firecracker microVM with its own guest kernel (5.10, Ubuntu 24.04 guest), isolated by hardware virtualization via KVM. That is categorically different from a shared-kernel container — a container shares the host kernel, so a kernel-level escape is a host compromise, whereas Firecracker runs under a jailer that drops privileges and exposes only a minimal virtio device model (net/block/vsock), giving the guest a far smaller, far better-audited attack surface than the full Linux syscall interface a container shares. For why that distinction is the whole game with arbitrary LLM output, see /blog/firecracker-vs-docker and /blog/why-docker-is-not-a-sandbox.
The performance shape is specific and worth knowing before you architect around it. There is no warm pool of idle VMs. Every create restores a baked Firecracker snapshot on demand — the snapshot already holds a booted kernel, a running guest agent, and an open network stack, so 'starting' a sandbox is really 'restore memory pages and resume.' That lands at 179ms p50 (p99 ~203ms). The only slow path is the first-ever spawn of a brand-new template, which cold-boots (~3s) and bakes the snapshot; every create after is on the fast restore path. The trade-off to name: vCPU and RAM are fixed at bake time, so a 4 GiB guest means a 4 GiB template — you can't resize at restore. See /docs/internals/snapshot-restore for the full boot path.
Forking is first-class and copy-on-write. A snapshot captures the full machine state (memory plus rootfs). A fork clones a running sandbox by sharing guest memory through MAP_PRIVATE — the kernel only copies pages on write — and reflinking the rootfs with XFS so disk data is shared until something writes. A same-host fork completes in about 400ms; a cross-host fork (download plus restore) runs 1.2–3.5s. The pattern this unlocks: warm one environment to a known state — dependencies installed, dataset loaded, REPL hot — then fork it N times to explore branches in parallel from the exact same memory without re-running setup. See /docs/concepts/snapshots-and-forks for the API and /blog/snapshot-and-fork-explained for how the CoW machinery works.
Two pieces directly address the fleet-operations weight from the section above. Networking is NATID: each sandbox gets its own Linux network namespace plus a veth pair and tap device, with up to 16,384 /30 subnets per agent and per-sandbox egress isolation — so the VPC-isolation and egress-control story you self-host for is built into the substrate, not bolted on. And for multi-host snapshot distribution, there's optional UFFD memory streaming: instead of downloading a multi-gigabyte memory image before a new host can boot a snapshot, the agent pages vm.mem on demand from object storage (HTTP Range GET, 4 MiB chunks), with zero-page elision, a prefetch trace, and a shared per-host chunk cache so the first restore pays the network cost once and every later one is local-disk fast. That's the artifact-distribution problem made cheaper. The internals are at /docs/internals/streaming-restore.
Because it's one microVM substrate, self-hosting the sandbox also brings the rest of the platform onto your infrastructure: managed PostgreSQL 16, git-driven app hosting with scale-to-zero, serverless functions with cron schedules, and durable volumes — all on the same isolation model and the same hosts. If your reason for self-hosting is keeping the whole AI stack inside your VPC, that consolidation is the argument; if all you need is to run code, it's irrelevant and a more focused tool may suit you better.
The SDKs make the hosted-vs-self-host switch a config change rather than a rewrite. There's a Python SDK (pandastack), a TypeScript SDK (@pandastack/sdk), and a CLI (pandastack); each reads a PANDASTACK_API_KEY (prefix pds_) and talks to a configurable base URL — point it at the hosted API or at your own control plane with the same code.
import os
from pandastack import PandaStack
# Same code, hosted or self-hosted — only the base URL changes.
client = PandaStack(
token=os.environ["PANDASTACK_API_KEY"], # prefix pds_
base_url=os.environ.get("PANDASTACK_API", "https://api.pandastack.ai"),
)
# Create a sandbox on your own KVM host (179ms p50 via snapshot-restore)
sandbox = client.sandboxes.create(template="code-interpreter", ttl_seconds=600)
# Run untrusted code inside the Firecracker microVM, in your VPC
result = sandbox.exec("python -c 'print(2 ** 10)'", timeout_seconds=30)
print(result.stdout) # -> 1024
# Fork into N parallel branches from the same warmed state (~400ms same-host)
branch = sandbox.fork()
Putting it together
Self-hosting a code execution sandbox is a control decision, not a feature decision. You take it on when data residency, VPC isolation, a compliance audit, or a cost curve you've actually crossed forces execution onto your infrastructure — and you accept the real ongoing weight of KVM hosts, an agent fleet, snapshot storage, and a control plane in exchange. PandaStack is the Apache-2.0 path for that decision: open-source Firecracker microVMs you run on your own /dev/kvm hosts, with snapshot-restore on every create, first-class CoW forking, per-sandbox network isolation, and streaming snapshot distribution to make the fleet cheaper to operate. If none of those forces is pushing you, a hosted provider is the lower-effort, often cheaper answer, and choosing it is the right call. For the broader landscape of options and a comparison against E2B specifically, see /blog/e2b-alternatives and /blog/pandastack-vs-e2b.
Frequently asked questions
What is a self-hosted code execution sandbox?
It's a sandbox for running untrusted or AI-generated code that runs on infrastructure you operate rather than on a vendor's servers. With PandaStack, you deploy two open-source components under Apache-2.0 — a control-plane API and a per-host agent — onto your own Linux KVM hosts (anything exposing /dev/kvm), and every sandbox runs as a Firecracker microVM with its own guest kernel and hardware-virtualization isolation. The code, and the data it touches, never leave your VPC, which is the main reason teams choose a self-hosted execution layer over a fully managed hosted API.
Why self-host instead of using a hosted E2B-style API?
Three forces drive it: data residency and compliance (regulated or contractual rules that forbid customer code and data from leaving your VPC), VPC isolation and auditability (private networking to internal services plus an open-source execution layer your security team can read), and cost control at scale (above a high but real volume crossover, owning the substrate beats a per-second hosted bill). If none of those applies — low volume, no residency constraint, no infra team — a hosted provider is genuinely less work and usually cheaper, and self-hosting would be the wrong call.
What does it actually take to run a self-hosted sandbox in production?
Four moving parts: Linux KVM hosts exposing /dev/kvm (bare metal or nested-virt cloud instances) that you provision and patch; a per-host agent managing the microVM lifecycle that you deploy and monitor; snapshot and template storage that distributes baked images to new hosts; and a control plane with auth, scheduling, and a Postgres metadata database. On a single host it's light; across a fleet it's a distributed system with leases, heartbeats, and scheduling. The isolation primitive (Firecracker on KVM) is the easy part — the ongoing weight is operating the fleet around it. PandaStack reduces the distribution cost with UFFD memory streaming, so new hosts page snapshot memory on demand instead of downloading the whole image first.
Is PandaStack a true open-source, self-hostable E2B alternative?
Yes — the PandaStack core is open-source under Apache-2.0 and designed to run end-to-end on your own KVM hosts, with sandboxes executing entirely on your infrastructure. It's not a black box with a self-host label: you run the control-plane API and the per-host agent yourself, the same binaries that power the hosted offering, and the SDK's base URL is configurable so the same code points at either. Every sandbox is a Firecracker microVM, so a self-hosted PandaStack deployment puts the same hardware-isolation model on infrastructure you own and operate end to end. Other providers have their own licensing and deployment stories — verify any candidate's license and self-host architecture against its own repository and docs before committing.
49ms p50 cold start. Fork, snapshot, and scale to zero.