Code Interpreter API Pricing, Compared
If you're shopping for a code interpreter API — a service that runs AI-generated or otherwise untrusted code in an isolated sandbox and hands you back stdout — the pricing pages are harder to compare than they look. Every vendor bills on a slightly different unit, the meters tick on different events, and the headline number on the marketing site rarely matches what you'll owe at the end of a busy month. This post is a map of the pricing models in the space, what actually drives the bill, and the one structural question that decides whether a hosted per-second meter or your own KVM hosts is cheaper: scale.
I'm going to be deliberate about one thing. I will not print competitor dollar figures. Pricing in this category changes monthly, the published numbers come and go, and a stale rate in a blog post is worse than no rate at all. What's durable is the shape of each pricing model — and the shape is what you should reason about first, because it's what determines how your bill scales with your usage pattern. For the actual numbers, go to the vendor's pricing page on the day you're deciding.
What you actually pay for
Almost every code interpreter API in the market is usage-based, and almost all of them meter some combination of the same three things. Understanding these three drivers is most of the work, because the marketing unit (a 'session,' a 'request,' a 'sandbox-hour') is usually just one of them wearing a costume.
- Compute time — the dominant line item. You pay for how long a sandbox is alive and consuming CPU, usually metered per second (sometimes per fraction of a second). This is the 'sandbox-seconds' number, and it's where the bulk of a real bill comes from.
- Memory and CPU size — most meters multiply time by the resources you reserved. A 4 GiB / 4 vCPU sandbox alive for ten seconds costs more than a 1 GiB / 1 vCPU one for the same ten seconds. Bigger boxes, bigger bill, even when idle.
- Egress and storage — the quiet add-ons. Network egress (especially pulling large dependencies or pushing big outputs), persistent disk, and snapshot storage are often billed separately from compute. For a code interpreter that installs packages on every run, egress is easy to underestimate.
There's a fourth thing some vendors charge for that isn't a resource at all: the seat or platform fee. A monthly minimum, a per-org plan price, or a support tier that sits on top of usage. It's worth separating from the metered cost when you compare, because a low per-second rate behind a high monthly floor is a different deal than the rate alone suggests.
The pricing models, by shape
Once you know the cost drivers, the vendors mostly fall into a handful of billing shapes. The shape matters more than the rate because it decides which of your usage patterns is cheap and which is expensive. Read the shape first; the per-unit price only tells you the slope once you know what's being metered.
Per-second (or sub-second) compute
The most common model: you're billed for the wall-clock time a sandbox is alive, multiplied by its size, metered in seconds or finer. This rewards short-lived, fast-booting sandboxes — spin up, run the code, tear down, stop the meter. It punishes the opposite: a sandbox you create and forget. If your agent leaves boxes running between turns, a per-second meter quietly bills the gaps. The defense is cheap teardown and fast restart, so you can afford to kill idle sandboxes and recreate them on demand rather than keeping them warm.
Active vs idle billing
A meaningful split. Some providers bill only for active CPU — time the sandbox is actually executing — and charge little or nothing while it sits idle waiting for the next request. Others bill for the full lifetime of the sandbox regardless of whether it's doing work. For an AI agent, where a sandbox often waits on a model call or a human between bursts of execution, this distinction can dominate the bill. Active-CPU billing favors bursty, conversational workloads; full-lifetime billing favors workloads that keep the box busy the whole time it's alive. Some vendors split the difference with a reduced 'idle' or 'standby' rate plus storage for the paused state — read carefully which one you're getting, because the word 'sandbox-hour' can mean either.
Per-request / per-invocation
A few products price closer to a function-as-a-service model: you pay per invocation, sometimes with a compute-time component folded in. Simple to predict for spiky, stateless workloads. It gets expensive and awkward for long-running or stateful sessions, where you'd really rather hold one sandbox open across many calls than pay per call to a fresh one. If your agent does many small steps against a warm REPL, a per-invocation meter and your access pattern are working against each other.
Free tiers and credits
Most hosted vendors offer a free tier, hobby allotment, or starting credit. These are real and useful for prototyping, but they're calibrated to get you to production, not to run production. The honest way to use one is to build your prototype, watch which meter ticks fastest, then model that meter at your expected production volume before you commit. The free tier tells you nothing about the shape of your bill at scale — only the metered rate does.
What actually drives your bill
Your monthly cost on any per-second hosted code interpreter is, to a first approximation, sandbox-seconds times the size you reserved, plus egress and storage. So the levers that matter are the ones that move those terms, and almost all of them are about the sandbox lifecycle rather than the rate card. Optimize the lifecycle and you cut the bill on any vendor's meter.
- Boot and teardown speed — if creating a sandbox is slow, you keep boxes warm to hide the latency, and warm boxes burn sandbox-seconds. Fast create and cheap teardown let you run truly ephemeral sandboxes, which is the cheapest way to use a per-second meter.
- Idle time — a sandbox waiting on a model call or a user is pure cost on a full-lifetime meter and near-free on an active-CPU one. Match the billing model to how much your agents actually idle.
- Sandbox size — reserving 4 GiB when 1 GiB would do multiplies every second you're billed. Right-size per template instead of defaulting everything to the largest box.
- Dependency egress — installing the same packages on every cold run pulls bytes you pay for. Baking deps into the template image turns repeated egress into a one-time cost.
- Snapshot and fork reuse — if you can fork an already-warm state instead of cold-booting and re-installing, you cut both the boot time and the egress on every branch.
The self-host break-even
Here's the argument that the per-second pricing pages don't make for you. A hosted code interpreter bills you the cost of the underlying compute plus a margin — that's the business. At low volume, the margin is invisible and the convenience is worth everything: you'd spend more engineer-hours standing up KVM hosts than you'd ever save. At high, steady volume, the margin becomes the largest controllable line on the bill, and the math flips.
The break-even is a straightforward comparison. A hosted bill is (sandbox-seconds × size × per-second rate) + egress + platform fees — and the per-second rate already contains the vendor's markup over their raw compute. A self-hosted bill is the amortized cost of the KVM hosts (which you can buy as reserved or spot instances, or run on metal you already own), plus your operational time, plus your own egress. When your usage is high and predictable enough that the hosts run near capacity most of the time, you pay for the hardware once and skip the per-second markup on every sandbox-second after that. The flatter and more sustained your load, the better self-host looks; the spikier and lower your load, the better hosted looks.
The honest costs on the self-host side are real and you should not hand-wave them: you now operate KVM hosts, an agent fleet, networking, and snapshot storage; you carry the on-call; you eat the under-utilization when load dips below the capacity you provisioned. For a team without infra appetite, that operational weight can easily exceed the markup you'd save. Self-host wins on cost only past a volume threshold and only if running infrastructure is something you're set up to do well. We walk through this calculus in more depth — including where the crossover tends to land — in /blog/e2b-cost-at-scale, and the broader landscape of who runs what isolation model in /blog/e2b-alternatives.
A hosted per-second rate is raw compute plus margin. At low volume you're paying for convenience; at high steady volume you're paying the margin — and that's the line a self-hosted, open-source substrate lets you delete.
Where PandaStack fits
PandaStack's core is open source under Apache-2.0, and self-hosting is a first-class path, not an afterthought. You run the control-plane API and a per-host agent on your own Linux KVM hosts (anything with /dev/kvm); every sandbox is a Firecracker microVM with its own guest kernel (5.10, Ubuntu 24.04), isolated by hardware virtualization rather than a shared kernel. The pricing consequence is simple: when you self-host, there is no per-second markup, because there is no vendor meter in the path. Your cost is the compute you actually run. There's a hosted offering too for teams that want the convenience, but the option to drop the markup entirely is always on the table — same binaries, same agent, base URL configurable so the same SDK points at either.
The architecture is built to keep the cost drivers low on whichever path you pick. There's no warm pool — every create restores a baked Firecracker snapshot, 179 ms p50, ~203 ms p99 (the first spawn of a new template cold-boots in ~3 s, then bakes the snapshot for everyone after). That speed is what makes truly ephemeral sandboxes practical: you can tear a box down the moment it's idle and recreate it in under 200 ms, so you never pay for idle time just to dodge a slow boot. Forking is first-class copy-on-write — same-host forks land in ~400 ms by mapping guest memory MAP_PRIVATE and reflinking the rootfs on XFS, so branching an already-warm state costs metadata, not a fresh install. Optional UFFD memory streaming pages the memory image in from object storage on demand (HTTP Range GETs over 4 MiB chunks) instead of downloading the whole thing up front, so an agent boots without waiting on a multi-gigabyte transfer. On a self-hosted fleet, those are the exact levers — boot speed, teardown, fork reuse, egress — that decide whether your hosts run cheap.
And the same substrate runs more than a code interpreter: managed PostgreSQL 16, git-driven app hosting with scale-to-zero, serverless functions with cron, and durable volumes all sit on one microVM platform. If your code interpreter bill is climbing and your volume is steady, the question worth pricing out is whether the per-second markup you're paying has crossed the cost of hosts you'd run yourself. If it has, self-hosting on an open-source substrate is how you stop paying it.
For the deeper mechanics behind the cost drivers above, see /docs/internals/snapshot-restore for the boot path, /docs/concepts/snapshots-and-forks and /docs/internals/fork-cow for forking, /docs/internals/streaming-restore for UFFD memory streaming, and /docs/guides/code-interpreter to wire up the code-interpreter template. To compare products by isolation model and hosting rather than price, /blog/e2b-alternatives is the honest-broker map, and /blog/e2b-cost-at-scale runs the self-host break-even in detail.
Frequently asked questions
How is code interpreter API pricing usually structured?
Most code interpreter APIs are usage-based and meter some mix of three things: compute time (how long a sandbox is alive, usually per second), the memory/CPU size you reserved (the meter multiplies time by size), and egress plus storage. Some vendors add a monthly platform or seat fee on top. The dominant line item for almost everyone is sandbox-seconds — wall-clock time times the box size. Verify the exact rates on each vendor's live pricing page, since they change frequently.
What's the difference between active and idle billing?
Active billing charges only for time the sandbox is actually executing CPU and little or nothing while it idles; full-lifetime billing charges for the whole time the sandbox exists regardless of work done. For AI agents — which often leave a sandbox waiting on a model call or a user between bursts — this distinction can dominate the bill. Bursty, conversational workloads favor active-CPU billing; workloads that keep the box busy the whole time favor full-lifetime. Some vendors offer a reduced idle or standby rate plus storage for the paused state, so read which one you're being quoted.
When does self-hosting a code interpreter beat a hosted per-second bill?
A hosted per-second rate includes the vendor's markup over raw compute. At low or spiky volume, the convenience is worth more than the markup and hosted wins. At high, steady volume where your own hosts would run near capacity, you pay for the hardware once and skip the markup on every sandbox-second after that, so self-host wins — provided you can absorb the operational weight of running KVM hosts, networking, and snapshot storage. The crossover depends on your sustained load and whether running infrastructure is a strength on your team.
Does PandaStack charge a per-second markup?
When you self-host PandaStack — the core is open source under Apache-2.0 and runs on your own Linux KVM hosts — there is no vendor meter in the path, so there is no per-second markup; your cost is the compute you actually run. A hosted offering exists for teams that want the convenience, but the self-host path is first-class and lets you drop the markup entirely. The architecture (no warm pool, ~179 ms snapshot-restore creates, copy-on-write forks, optional UFFD memory streaming) keeps the underlying compute cost low on either path.
49ms p50 cold start. Fork, snapshot, and scale to zero.