Stop an AI Agent Touching the Host Filesystem & Network
When you let an AI agent run commands, three failure modes matter more than any other: it reads files on the host it was never meant to see, it reaches an internal service on your network that trusts whatever calls it, or it ships data out to an endpoint it controls. None of these require a clever exploit — they're the default behavior of a process that has ambient access to a filesystem and a network. Zero-trust for agents means assuming the agent (or the content it ingested) is hostile and building a boundary that makes those three things structurally impossible, not merely discouraged by a prompt. The cleanest way to get that boundary is to give every agent its own machine: its own guest kernel, its own filesystem, and its own network namespace. This post walks through exactly what each layer buys you, where PandaStack's per-sandbox design draws the line, and — honestly — what you still have to configure yourself, because no boundary handles egress policy and secrets for free.
What "touching the host" actually means
Three distinct things hide under that phrase, and they have different fixes. Keeping them separate is the whole game:
- Filesystem reach — the agent reads or writes files outside its task: your source tree, SSH keys, other tenants' data, host config. This is a property of which filesystem the process can see at all.
- Internal-network reach — the agent opens a connection to something on your private network: a database, an admin API, the cloud metadata endpoint at 169.254.169.254, a neighbor sandbox. Lateral movement and credential theft live here.
- Exfiltration — the agent sends data it read to an outbound destination it controls. This is the egress direction, and it's the one a perfect isolation boundary does nothing about on its own.
A container addresses the first one reasonably well and the other two only if you configure them — but it does all of this while sharing the host's Linux kernel, so the entire isolation story rests on namespaces, cgroups, capabilities, and seccomp all holding inside that one shared kernel. A reachable kernel bug defeats the lot. The canonical real example is runc CVE-2019-5736, where a container could overwrite the host runc binary via /proc/self/exe and escape. That's the structural weakness: the boundary is enforced by the same kernel the untrusted code is talking to. For why a container alone isn't a sandbox here, see /blog/why-docker-is-not-a-sandbox.
Own guest kernel, own filesystem
On PandaStack every sandbox is a Firecracker microVM, not a shared-kernel container. It boots its own guest kernel (5.10, Ubuntu 24.04 userland) inside hardware virtualization via KVM. The agent's code talks to that guest kernel — not yours. To reach the host it would have to break the hypervisor boundary itself: Firecracker is a small Rust VMM running under a jailer that drops privileges, exposing a minimal virtio device model (net, block, vsock) rather than the ~300-plus-syscall surface a container shares with the host. That's a deliberately tiny, heavily audited attack surface.
The filesystem consequence falls straight out of that. The guest's only disk is the VM's own rootfs — a copy-on-write clone of the template image, made with an XFS reflink so the create is O(metadata) and data stays shared until a write copies a block. There is no bind mount from the host, no shared volume, no path that resolves back to your filesystem unless you explicitly attach a durable volume. The agent can `rm -rf /` inside the guest and the only casualty is a throwaway rootfs you were going to discard. "The agent only sees the VM's filesystem" isn't a policy you switch on; it's the absence of any channel to anywhere else.
Be honest about the residual risk: a microVM is a much smaller and more-scrutinized boundary than a shared kernel, but it is not unbreakable. KVM has had escape CVEs, Google's kvmCTF pays up to $250k for working KVM exploits, and speculative-execution side channels can in principle cross a VM boundary. "Hardware-isolated" means the bar to escape is high and the surface is small — it does not mean zero. The right framing is that you've reduced the attack surface from the full Linux syscall interface to the VMM plus the KVM ioctl interface, which is the best general-purpose boundary available for running someone else's code. For the full hierarchy from in-process eval up to microVM, see /blog/code-isolation-hierarchy.
Per-sandbox network namespace and controlled egress
Filesystem and kernel isolation answer "can the agent reach the host?" Networking answers "can the agent reach anything else?" — and that's where exfiltration and lateral movement actually happen. PandaStack's networking layer is called NATID, and the core idea is that every sandbox gets its own Linux network namespace rather than sharing a bridge with its neighbors.
Concretely, each sandbox is wired up as a dedicated netns containing a veth pair and a tap device that Firecracker attaches to, carved out of a pool of 16,384 per-sandbox /30 subnets per agent (a /30 holds exactly the gateway plus the one guest, so there are no other guests addressable inside that subnet). Because each sandbox lives in its own namespace, the routing table, the iptables/NAT rules, and the visible interfaces are per-sandbox. There is no flat shared network where sandbox A can ARP-scan and then connect to sandbox B. Egress isolation is a property of the namespace boundary, not a filter you bolt on afterward — and tearing the sandbox down atomically removes its whole network world with it. The full design is documented at /docs/concepts/networking-natid.
- ns-<id> — the sandbox's dedicated Linux network namespace.
- vh-<id> / vg-<id> — the host-side and guest-side ends of the veth pair connecting that namespace to the root namespace.
- tap0 — the TAP device inside the namespace that Firecracker drives as the guest's NIC.
What this structurally prevents: one sandbox cannot see or address another sandbox's traffic, and the per-namespace rule set is where you express egress policy — so "this sandbox may reach the package registry and nothing else" is enforced at a boundary the guest can't touch, instead of relying on the agent to behave.
What the boundary does NOT do for you
Here's the honest part, because it's the part that bites people. A per-sandbox namespace with NAT'd egress gives you the enforcement point. It does not, by itself, decide your policy. The most common real-world leak from an agent sandbox is never an exotic hypervisor escape — it's a perfectly isolated VM that still had wide-open outbound access and a secret in its environment it should never have held. The isolation boundary contains the blast radius of an escape; egress policy and secret hygiene prevent the far more likely quiet exfiltration. You own these:
- Egress allowlist. Default-deny outbound, then allow only the destinations the task genuinely needs — the package registry, a specific API. An agent that can reach arbitrary outbound endpoints can exfiltrate anything it managed to read, and prompt injection makes "it decided to" a realistic path. The namespace is the place to enforce this; the rules are yours to write.
- Block the metadata endpoint. Cloud instance metadata (169.254.169.254) is a classic credential-theft target. It must be unreachable from inside the sandbox — verify it, don't assume.
- Secrets handling. Never inject host cloud keys, database passwords, or long-lived tokens into the guest. Pass only what the task needs, scoped and short-lived, and treat anything in the guest's environment as readable by the agent. Isolation keeps the agent off your host; it does not un-read a secret you handed it directly.
- Ephemerality and TTL. One sandbox per task with a TTL so an abandoned or runaway VM is reaped, even when your code forgets. Reusing a long-lived sandbox across tasks or tenants reintroduces exactly the cross-contamination the boundary was meant to remove.
Putting it together
The pattern that holds up under an autonomous agent acting on adversarial input is the three layers stacked: a hardware-virtualized guest so the agent can't reach the host kernel, a copy-on-write rootfs so it only ever sees its own throwaway filesystem, and a per-sandbox network namespace where you enforce a default-deny egress allowlist. PandaStack gives you the first two by construction and the enforcement point for the third; you supply the egress policy and keep secrets out of the guest. None of this is expensive to adopt — every create restores a baked Firecracker snapshot in about 179ms p50 (no warm pool of idle VMs), so a fresh, fully isolated environment per task is the default rather than an optimization you ration.
PandaStack's core is Apache-2.0 and self-hostable on your own Linux KVM hosts, which matters specifically for this threat model: the sandboxes — and therefore the egress rules and any secrets in play — run on infrastructure you control, not a black box. A hosted offering exists too, but if your concern is exactly "what can this agent touch," running the substrate yourself is a first-class path. For the broader treatment of isolation, ephemerality, and network control for agents, see /blog/secure-code-execution-for-ai-agents; for the threat models and the full sandboxing hierarchy from scratch, start at /blog/how-to-sandbox-untrusted-code.
The model will keep getting better at writing code, and some fraction of the time it will write — or be tricked into writing — something it shouldn't. Your job is to make sure that when it does, the only filesystem it can damage is a disposable rootfs, the only network it sits on is a namespace with one address, and the only endpoints it can reach are the ones you allowlisted. That's zero-trust for agents: not a smarter prompt, but a boundary that doesn't depend on the agent behaving.
Frequently asked questions
How do I stop an AI agent from reading files on the host?
Run the agent inside a microVM that has its own guest kernel and its own filesystem rather than a shared-kernel container. On PandaStack the guest's only disk is a copy-on-write clone of the template rootfs (an XFS reflink), with no bind mount or shared volume back to the host — so there is no path from inside the guest to your host filesystem unless you explicitly attach a durable volume. The agent can delete its entire filesystem and the only casualty is a throwaway rootfs.
How does per-sandbox network isolation and egress filtering work?
PandaStack's NATID networking gives every sandbox its own Linux network namespace, veth pair, and tap device, drawn from 16,384 per-sandbox /30 subnets per agent. Because each sandbox is in its own namespace, the routing table and iptables/NAT rules are per-sandbox: one sandbox cannot see or address another's traffic, and the namespace is the enforcement point for a default-deny egress allowlist. You still define the allowlist itself — the boundary enforces your policy, it doesn't choose it.
Does microVM isolation guarantee an agent can't exfiltrate data?
No, and that's the honest caveat. Hardware isolation keeps the agent off your host kernel and out of other sandboxes' filesystems and networks, but exfiltration is an outbound-network question that a perfect isolation boundary doesn't address on its own. You must configure default-deny egress with an allowlist, block the cloud metadata endpoint (169.254.169.254), and keep secrets out of the guest. The namespace gives you a clean place to enforce all three; the policy is yours to write.
Is a microVM boundary actually unbreakable?
No — it's a much smaller and more-audited boundary than a shared kernel, not an impervious one. A container's attack surface is the host's full Linux syscall interface, which a kernel bug (for example runc CVE-2019-5736) can defeat. A Firecracker microVM reduces that to the VMM plus the KVM ioctl interface, run under a privilege-dropping jailer with a minimal virtio device model. That surface is small and heavily scrutinized, but KVM has had escape CVEs and speculative-execution side channels can cross VM boundaries, so treat it as strong defense in depth rather than an absolute guarantee.
49ms p50 cold start. Fork, snapshot, and scale to zero.