Fork your agent's state: parallel rollouts, pause for idle, resume warm

An agent run is not a straight line. It is a tree.

The agent reaches a decision point, and at that point there are several reasonable next moves. A different tool. A different query. A different plan. The honest way to find the best branch is to try more than one and compare results. The cheap way most stacks handle it is to pick one branch, hope, and pay for a full restart if you guessed wrong.

CreateOS Sandbox treats the tree as the unit of work. You can fork a running agent's full state into N branches, run them in parallel, keep the winner, and pause anything sitting idle so you stop paying for it. This post is about how those primitives map to how agents actually behave, and what they do to the cost.

One honesty note first. CreateOS Sandbox is alpha. The pricing below is cleared. Where this post talks cost, the numbers are real. Where it would normally talk speed, it stays qualitative, because we have not published measured latency yet and will not invent it.

How agents actually work: decision points and dead ends

Watch a real agent run and you see the same shape repeat. It gathers context, hits a decision point, commits to an approach, and either succeeds or backs out and tries again. The expensive part is rebuilding context after a back-out. The environment was warm. Files were written, a server was running, a dataset was loaded. Throw the box away and you pay to rebuild all of it.

Two costs fall out of this shape. The first is exploration cost: to compare three approaches you would normally run three full environments from scratch. The second is idle cost: an agent waiting on a human, a webhook, or a long external job is holding a warm machine and burning compute to do nothing.

Fork attacks the first. Pause attacks the second.

Pause and resume: stop paying for idle

A paused sandbox is not a stopped process you hope restarts cleanly. Pause snapshots the memory and the overlay to object storage and releases the compute. The machine state, the RAM, the working set, is captured. Then the compute goes away, and so does the compute bill.

// `agent` is a Sandbox handle from client.createSandbox(...)
await agent.pause();   // snapshot memory + overlay, release compute
await agent.resume();  // bring it back where it left off

Resume comes in two flavors, and the distinction is worth understanding. A warm resume restores from a snapshot still cached on the host. A cold resume pulls the bundle from object storage first. Warm is the faster path. Both put the sandbox back exactly where it paused, same memory, same working state, no rebuild.

This changes the economics of waiting. An agent blocked on a code review, a slow API, or an overnight job does not need to hold a live machine. Pause it, pay only the cheap snapshot storage, and resume when the thing it was waiting on arrives. Pause equals stop paying compute. That is the whole point.

Fork: branch full state into N rollouts

Fork is the primitive that makes parallel exploration cheap instead of wasteful.

A fork is a server-side copy of a paused sandbox into a new sandbox, not a boot from zero. The forked sandbox carries the full state of its parent at the moment it was paused: memory, overlay, working files, the loaded dataset, the running context. Branching is near-instant because it is a copy of state that already exists, not a fresh environment built from nothing.

So the back-out problem inverts. Instead of running three approaches from scratch, you run the agent once to the decision point, pause, and fork that single warm state into three. Each fork starts from the exact context the agent had built, then diverges. You explore three branches for roughly the setup cost of one.

// run once to the decision point, then branch the warm state
await base.pause();
const tryA = await base.fork(); // clones the paused checkpoint into a new sandbox
const tryB = await base.fork();
const tryC = await base.fork();

// evaluate the three, keep one, destroy the rest
await tryA.destroy();
await tryC.destroy();

A worked flow: create, run to a decision, fork into N, keep one

Put the pieces together into the loop an agent harness would actually run.

Create. Spin up a base sandbox via snapshot restore. It comes up in seconds with the devbox image: Ubuntu, Node 22 LTS, Bun, Python 3.12 with uv, Go, Rust.
Run to a decision point. The agent loads its context, writes its files, and reaches the branch where several next moves are plausible.
Pause and fork into N. Snapshot the warm state, then fork it into as many rollouts as you want to compare. Each one inherits the full context, so none of them pays the setup tax again.
Evaluate in parallel. Let the N branches run. Score them however your harness scores: tests passed, eval metric, a judge model, a human glance.
Keep the winner, destroy the rest. Promote the branch that won. Destroy the others, or pause them if you might revisit. You paid for one buildup and N short divergences, not N full runs.

The same primitives compose with networking from the companion post. A fork can carry a whole multi-node setup, not just a single process, because the state being copied is the sandbox's, kernel and all.

import { CreateosSandboxClient } from "@nodeops-createos/sandbox";

const client = new CreateosSandboxClient();

// the loop above, in the TypeScript SDK
const base = await client.createSandbox({ shape: "s-2vcpu-2gb", rootfs: "devbox:1" });
await runUntilDecision(base);        // your agent does its thing
await base.pause();                  // snapshot the warm state

const forks = await Promise.all(
  Array.from({ length: N }, () => base.fork()), // each clone resumes and runs
);
const results = await Promise.all(forks.map(evaluate)); // run and score the branches

const winner = pickBest(results);
for (const f of forks) {
  if (f.id !== winner.sandbox.id) await f.destroy();
}

The cost story: $0 egress, and you pay nothing while paused

The pricing is cleared, so here it is plainly. Compute is $0.0504 per vCPU-hour plus $0.0162 per GiB-RAM-hour, billed per second. That sits at parity with E2B's published rates, so the unit price is not where the story is.

Two things are where the story is.

Egress is $0. Networking between sandboxes is unmetered. For a fan-out-and-compare workload, where branches generate traffic and the orchestrator pulls results, an egress meter is a tax on parallelism. There is no such meter here. Fan out as wide as the work justifies.

Pause stops the compute bill. A paused branch costs only cheap snapshot storage, not vCPU and RAM hours. The fork-and-keep-one pattern leans directly on this. The branches you are not actively running can be paused, and the branch you keep is the only one paying for compute. You scale on concurrency when you need it and stop paying the instant you do not. New accounts start with free credits.

Per-second billing matters more than it looks for this pattern, because forks are short-lived by design. You are not running three machines for an hour. You are running three divergences for as long as it takes to score them, paying by the second, and tearing down the losers.

Honest roadmap note

CreateOS Sandbox is alpha. Fork, pause, resume, and destroy are real and work as described, server-side copy and all. We are not publishing measured create, resume, or fork latency yet, so this post says "seconds" and "near-instant" and stops there rather than quoting a number we have not measured. When the benchmarks land, you will get p50 and p95 with the methodology, not a marketing figure. Audit logging, RBAC, certifications, GPU, and live resize are roadmap, not shipping. Shapes are fixed at create, so size the base sandbox for the heaviest branch you expect to run.

Start branching

If you are building an agent or a code-gen platform, this is a five-minute thing to try. Grab 500 free credits, create a sandbox, run it to a decision point, and fork it into three. Watch how little it costs to explore in parallel when egress is $0 and idle branches pause.

The quickstart walks the create-run-fork-evaluate loop end to end. CreateOS Sandbox is one product under the unified execution layer for AI, and forking agent state is the part you will feel first.