Top-level snapshot
# Applies to all boxes
sandbox:
memory_mb: 256
snapshot: "abc123def456"
VoidBox supports sub-second VM restore via snapshot/restore. Snapshots capture the full VM state (vCPU registers, memory, devices) and restore via COW mmap — the guest resumes execution without re-booting the kernel or re-running initialization.
All snapshot features are explicit opt-in only. If you never set a snapshot field, the system behaves exactly as before — cold boot, zero snapshot code runs.
| Type | When Created | Contents | Use Case |
|---|---|---|---|
| Base | After cold boot, VM stopped | Full memory dump + all KVM state | Golden image for repeated boots |
| Diff | After dirty tracking enabled, VM stopped | Only modified pages since base | Layered caching (base + delta) |
# Applies to all boxes
sandbox:
memory_mb: 256
snapshot: "abc123def456"
pipeline:
boxes:
- name: analyst
prompt: "analyze data"
sandbox:
snapshot: "def789"
- name: coder
prompt: "write code"
# no snapshot = cold boot
use void_box::agent_box::VoidBox;
// Cold boot (default — no snapshot)
let box1 = VoidBox::new("analyst")
.prompt("analyze data")
.memory_mb(256)
.build()?;
// Restore from snapshot (explicit opt-in)
let box2 = VoidBox::new("analyst")
.prompt("analyze data")
.snapshot("/path/to/snapshot/dir") // or hash prefix
.build()?;
# Create a snapshot from a running VM
voidbox snapshot create --config-hash <hash>
# List stored snapshots
voidbox snapshot list
# Delete a snapshot
voidbox snapshot delete <hash-prefix>
# Run with a snapshot (via spec)
voidbox run --file spec.yaml # spec has sandbox.snapshot set
# POST /runs with snapshot override
curl -X POST http://localhost:8080/runs \
-H 'Content-Type: application/json' \
-d '{"file": "workflow.yaml", "snapshot": "abc123def456"}'
None — the system behaves identically to before if untouchedMeasured on Linux/KVM with 256 MB RAM, 1 vCPU, userspace virtio-vsock:
| Phase | Time | Notes |
|---|---|---|
| Cold boot | ~10 ms | |
| Base snapshot | ~420 ms | Full 256 MB memory dump |
| Base restore | ~1.3 ms | COW mmap, lazy page loading |
| Diff snapshot | ~270 ms | Only dirty pages (~1.5 MB, 0.6% of RAM) |
| Diff restore | ~3 ms | Base COW mmap + dirty page overlay |
| Base speedup | ~8x | Cold boot / base restore |
| Diff savings | 99.4% | Memory file size reduction |
~/.void-box/snapshots/
<hash-prefix>/ # first 16 chars of config hash
state.bin # bincode: VmSnapshot (vCPU regs, irqchip, PIT, vsock, config)
memory.mem # full memory dump (base)
memory.diff # dirty pages only (diff snapshots)
The 7-step restore process:
1. VmSnapshot::load(dir) Read state.bin (vCPU, irqchip, PIT, vsock, config)
2. Vm::new(memory_mb) Create KVM VM with matching memory size
3. restore_memory(mem, path) COW mmap(MAP_PRIVATE|MAP_FIXED) — lazy page loading
4. vm.restore_irqchip(state) Restore PIC master/slave + IOAPIC
5. VirtioVsockMmio::restore() Restore vsock device registers (userspace backend)
6. create_vcpu_restored(state) Per-vCPU restore (see register restore order below)
7. vCPU threads resume Guest continues execution from snapshot point
Memory restore uses kernel MAP_PRIVATE lazy page loading — pages are demand-faulted from the file, writes create anonymous copies. No userfaultfd required.
The restore sequence in cpu.rs is order-sensitive. Getting it wrong causes silent guest crashes (kernel panic → reboot via port 0x64).
1. MSRs KVM_SET_MSRS
2. sregs KVM_SET_SREGS (segment regs, CR0/CR3/CR4)
3. LAPIC KVM_SET_LAPIC + periodic timer bootstrap (see below)
4. vcpu_events KVM_SET_VCPU_EVENTS (exception/interrupt state)
5. XCRs (XCR0) KVM_SET_XCRS — MUST come before xsave
6. xsave (FPU/SSE) KVM_SET_XSAVE — depends on XCR0 for feature mask
7. regs KVM_SET_REGS (GP registers, RIP, RFLAGS)
XCR0 restore is critical. XCR0 controls which XSAVE features (x87, SSE, AVX) are active. Without it, the guest's XRSTORS instruction triggers a #GP because the default XCR0 only enables x87, but the guest's XSAVE area references SSE/AVX features.
When the guest was idle (NO_HZ) at snapshot time, the LAPIC timer is masked with vector=0 (LVTT=0x10000). After restore, no timer interrupt ever fires, so the scheduler never runs. The restore code detects this state and bootstraps a periodic LAPIC timer (mode=periodic, vector=0xEC, TMICT=0x200000, TDCR=divide-by-1) to kick the scheduler back to life.
The userspace virtio-vsock backend must be used for VMs that will be snapshotted. The kernel vhost backend (/dev/vhost-vsock) does not expose internal vring indices, making queue state capture incomplete. The userspace backend tracks last_avail_idx/last_used_idx directly, ensuring clean snapshot/restore of the virtqueue state.
The snapshot stores the VM's actual CID (assigned at cold boot). On restore, the same CID is reused — the guest kernel caches the CID during virtio-vsock probe and silently drops packets with mismatched dst_cid.
Every layer has an optional snapshot field that defaults to None:
| Layer | Field | Type | Default |
|---|---|---|---|
SandboxBuilder | .snapshot(path) | Option<PathBuf> | None |
BoxConfig | snapshot | Option<PathBuf> | None |
SandboxSpec (YAML) | sandbox.snapshot | Option<String> | None |
BoxSandboxOverride | sandbox.snapshot | Option<String> | None |
CreateRunRequest (API) | snapshot | Option<String> | None |
Resolution chain: per-box override → top-level spec → None (cold boot).
When a snapshot string is provided, the runtime resolves it as:
~/.void-box/snapshots/<prefix>/ (if state.bin exists)state.bin exists)No env var fallback, no auto-detection.
evict_lru(max_bytes) removes oldest snapshots firstcompute_layer_hash(base, layer, content) for deterministic cache keyslist_snapshots() / voidbox snapshot listdelete_snapshot(prefix) / voidbox snapshot delete <prefix>Snapshot cache is stored at ~/.void-box/snapshots/.
Snapshot cloning shares identical VM state across restored instances:
/dev/urandom pool. Mitigated by: fresh CID per restore, hardware RDRAND re-seeding on rdtsc.