Isolation, networked: running untrusted code as systems, not snippets

Agents write code now. They write it, and then they run it. Nobody reviewed it first.

That is the new normal, and most sandboxes were not built for it. They were built to run a snippet. A function. A single cell of a notebook. One process, one box, no neighbors, no network it can reach on purpose. That model holds right up until the code your agent generated wants to talk to another service, stand up a worker, or behave like the small distributed system it actually is.

CreateOS Sandbox starts from a different premise. The unit of isolation is the sandbox. And a sandbox is a real computer, with its own kernel, that can be networked to other sandboxes under rules you can prove. This post is about why that combination matters, and what it lets you build.

One honesty note up front. CreateOS Sandbox is alpha. We do not hold SOC 2, HIPAA, GDPR, or ISO certifications yet. Where this post asks for enterprise trust, the interim control is self-hosting the whole thing inside your own boundary. We will name the stage every time it matters.

The blast radius problem: per-VM kernel vs shared kernel

Most container-based sandboxes share one host kernel across every tenant. The container is a namespace and a cgroup. It is a software boundary drawn inside a kernel that all the other code on that host is also using. When the boundary is a software promise, a kernel bug is a shared fate. One escape reaches the host, and the host is the neighborhood.

CreateOS Sandbox runs each sandbox as a Firecracker micro-VM with its own guest kernel. Not a shared kernel with namespaces drawn on top. A separate kernel, behind a hardware virtualization boundary. The blast radius of code inside a sandbox is that sandbox.

This is not a novel claim in the category. E2B runs Firecracker. Declaw runs micro-VMs with guardrails. Isolation by itself is table stakes now, and saying otherwise would be dishonest. The interesting question is no longer "is it isolated." It is "what can it do while it stays isolated." That is where the rest of this matters.

Egress you can prove, enforced where the code cannot reach it

Isolation stops code from escaping the box. It does not, by itself, stop code from phoning home. Untrusted code that can reach the open internet is a data-exfiltration path wearing a sandbox costume.

CreateOS Sandbox governs egress with an allowlist, enforced with eBPF at the host, outside the VM.

The model is deliberately blunt. An empty allowlist allows all, which is the convenient default for a throwaway. The moment you add your first rule, everything else is locked down. Allow-all becomes deny-by-default with one entry. There is no ingress unless you explicitly turn it on. Each sandbox also carries a bandwidth quota, accounted in the kernel and rechargeable when a workload needs more.

The part that earns the word "prove" is where the enforcement sits. The rules live on the host side, in eBPF, on the other side of the VM boundary. Code inside the sandbox cannot see them, edit them, or route around them. A compromised process that gains root inside its own guest kernel still cannot rewrite a rule that does not exist inside its kernel.

import { CreateosSandboxClient } from "@nodeops-createos/sandbox";

// reads CREATEOS_SANDBOX_API_KEY + CREATEOS_SANDBOX_BASE_URL
const client = new CreateosSandboxClient();

// lock this sandbox to a single API endpoint, nothing else, and no ingress
const sandbox = await client.createSandbox({
  shape: "s-1vcpu-1gb",
  rootfs: "devbox:1",
  egress: ["api.internal.example.com:443"],
  ingress_enabled: false,
});

Contrast this with a proxy-based egress filter, which is where some younger entrants in the category sit today. A proxy controls traffic that politely goes through the proxy. Enforcement at the host controls traffic whether the code cooperates or not. The honest difference is "filtered if it complies" versus "filtered regardless." For untrusted code, regardless is the only version that counts.

Networking: the part almost nobody else does

Here is the limitation that defines "snippets, not systems." Most sandbox platforms give you an isolated box and stop. There is no supported way for one sandbox to talk to another. E2B, as strong as it is on adoption, does not offer inter-sandbox networking. Neither does Declaw. Daytona has no multi-node networking story either.

So if your agent's workload is actually three services that need to find each other, you are stuck stuffing them into one box or wiring something brittle yourself.

CreateOS Sandbox gives sandboxes a real network. Private overlay networks span hosts, and between networks the default is deny. Within a network, each sandbox gets its own address on the overlay, and the others reach it there while nothing off the network can reach in. The part you work with is simple: create a network, put sandboxes on it, look up a peer's address, and talk to it.

// one private overlay network, two sandboxes joined to it
const net = await client.networks.create({ name: "team-eval" });

const api = await client.createSandbox({
  shape: "s-1vcpu-1gb",
  rootfs: "devbox:1",
  networks: [{ id: net.id }],
});
const worker = await client.createSandbox({
  shape: "s-1vcpu-1gb",
  rootfs: "devbox:1",
  networks: [{ id: net.id }],
});

// look up the worker's address on the overlay, then reach it from `api`
const { members } = await client.networks.get(net.id);
const workerIp = members?.find((m) => m.sandbox_id === worker.id)?.ip;
await api.runCommand("curl", [`http://${workerIp}:8080/health`]);

This is the wedge. Not isolation alone, which is table stakes. Not networking alone. The combination: per-VM kernel isolation by default, real networking between sandboxes, kernel-level egress governance, and self-host. No competitor matches the full set today.

Standing up a cluster, inside isolation

Once sandboxes share a private overlay network, they stop being boxes and start being nodes. You can stand up a multi-node k3s or Nomad cluster across several sandboxes, each one its own micro-VM with its own kernel, all on a default-deny private network, all under an eBPF egress allowlist.

That is a genuinely different shape of thing. It is a real distributed system that an agent can spin up, exercise, and tear down, with the isolation boundary drawn around the whole cluster and between every node in it. You get to test the system, not a snippet that pretends to be one.

A few honest constraints so this lands as engineering and not a pitch. Shapes are fixed at create: s-1vcpu-256mb, s-1vcpu-1gb, s-2vcpu-2gb, s-4vcpu-4gb, with a disk size you set up front. There is no live resize, so size the node for the job at create time. The devbox image is Ubuntu with Node 22 LTS, Bun, Python 3.12 with uv, Go, and Rust. There is no GPU.

Self-host, for the buyers who cannot send code elsewhere

For a regulated, security-first team, the honest blocker is rarely the isolation model. It is the sentence "our security team will not let us send code to someone else's cloud." Certifications usually answer that. We do not have them yet.

So the answer we actually have is better for sovereignty anyway: run the control plane and the storage yourself. Storage is BYOS3, so S3, R2, Tigris, or MinIO, with FUSE mounts, live attach and detach, and local to remote sync. Self-host the control plane and point it at your own buckets, and the code, the snapshots, and the data never leave your boundary. That is real data residency, not a residency label on someone else's region.

While the product is alpha, self-host is the interim trust mechanism we are honest about, not a substitute for the certs and controls a regulated buyer will eventually require.

What is alpha, and what is on the roadmap

Candor is the differentiator in this category, so here is the line drawn straight.

Shipping today: per-VM guest kernel isolation, eBPF egress allowlist with no-ingress default and bandwidth quotas, overlay networks, the create / pause / resume / fork / destroy lifecycle, BYOS3 storage, and full self-host of control plane and storage.

Not shipping today, and we will not imply otherwise:

Audit logging and RBAC / org policies. Required before an enterprise can get security sign-off. On the roadmap, not in your hands yet.
SOC 2 / HIPAA / GDPR / ISO certifications. Roadmap. Self-host is the interim answer.
GPU passthrough and live vertical resize. Not in this release. Size shapes at create.

We are not going to publish a CreateOS Sandbox cold-start number we have not measured. The category competes on published latency, and we would rather show you a multi-node cluster nobody else can stand up than quote a figure we cannot yet back. Sandboxes start in seconds via snapshot restore. When we have measured numbers, you will get them with the methodology attached.

Two ways in

If you are building an agent or a code-gen platform and you want to run untrusted code that behaves like a system, grab 500 free credits and spin up your first networked sandbox. Two boxes on one network, talking on the overlay, in a few minutes.

If you are a security or platform team that cannot send code to someone else's cloud, book a design-partner call. We will walk the self-host topology with you, name exactly what is alpha, and be straight about which of your controls are roadmap and which are real today.

CreateOS Sandbox is one product under the unified execution layer for AI. The premise is the same all the way down: run code you cannot trust, as systems, not snippets.