void-box logoVoid-Box
Snapshots

Sub-Second VM Restore

VoidBox supports sub-second VM restore via snapshot/restore. Snapshots capture the full VM state (vCPU registers, memory, devices) and restore via COW mmap — the guest resumes execution without re-booting the kernel or re-running initialization.

All snapshot features are explicit opt-in only. If you never set a snapshot field, the system behaves exactly as before — cold boot, zero snapshot code runs.

Snapshot Types

TypeWhen CreatedContentsUse Case
BaseAfter cold boot, VM stoppedFull memory dump + all KVM stateGolden image for repeated boots
DiffAfter dirty tracking enabled, VM stoppedOnly modified pages since baseLayered caching (base + delta)

YAML Spec

Top-level snapshot

# Applies to all boxes
sandbox:
  memory_mb: 256
  snapshot: "abc123def456"

Per-box override

pipeline:
  boxes:
    - name: analyst
      prompt: "analyze data"
      sandbox:
        snapshot: "def789"
    - name: coder
      prompt: "write code"
      # no snapshot = cold boot

Rust API

Rust
use void_box::agent_box::VoidBox;

// Cold boot (default — no snapshot)
let box1 = VoidBox::new("analyst")
    .prompt("analyze data")
    .memory_mb(256)
    .build()?;

// Restore from snapshot (explicit opt-in)
let box2 = VoidBox::new("analyst")
    .prompt("analyze data")
    .snapshot("/path/to/snapshot/dir")   // or hash prefix
    .build()?;

CLI Commands

Shell
# Create a snapshot from a running VM
voidbox snapshot create --config-hash <hash>

# List stored snapshots
voidbox snapshot list

# Delete a snapshot
voidbox snapshot delete <hash-prefix>

# Run with a snapshot (via spec)
voidbox run --file spec.yaml   # spec has sandbox.snapshot set

Daemon API

code-run
# POST /runs with snapshot override
curl -X POST http://localhost:8080/runs \
  -H 'Content-Type: application/json' \
  -d '{"file": "workflow.yaml", "snapshot": "abc123def456"}'

Design Principles

  • No snapshot field set → cold boot, zero snapshot code runs
  • No auto-detection of existing snapshots
  • No auto-creation of snapshots during normal runs
  • No auto-restore — only if the user passes an explicit path or hash
  • No env var fallback — spec or code only
  • Every new field defaults to None — the system behaves identically to before if untouched

Performance Benchmarks

Measured on Linux/KVM with 256 MB RAM, 1 vCPU, userspace virtio-vsock:

PhaseTimeNotes
Cold boot~10 ms
Base snapshot~420 msFull 256 MB memory dump
Base restore~1.3 msCOW mmap, lazy page loading
Diff snapshot~270 msOnly dirty pages (~1.5 MB, 0.6% of RAM)
Diff restore~3 msBase COW mmap + dirty page overlay
Base speedup~8xCold boot / base restore
Diff savings99.4%Memory file size reduction

Storage Layout

~/.void-box/snapshots/
  <hash-prefix>/        # first 16 chars of config hash
      state.bin          # bincode: VmSnapshot (vCPU regs, irqchip, PIT, vsock, config)
      memory.mem         # full memory dump (base)
      memory.diff        # dirty pages only (diff snapshots)

Restore Flow

The 7-step restore process:

Restore sequence
1. VmSnapshot::load(dir)           Read state.bin (vCPU, irqchip, PIT, vsock, config)
2. Vm::new(memory_mb)              Create KVM VM with matching memory size
3. restore_memory(mem, path)       COW mmap(MAP_PRIVATE|MAP_FIXED) — lazy page loading
4. vm.restore_irqchip(state)       Restore PIC master/slave + IOAPIC
5. VirtioVsockMmio::restore()      Restore vsock device registers (userspace backend)
6. create_vcpu_restored(state)     Per-vCPU restore (see register restore order below)
7. vCPU threads resume             Guest continues execution from snapshot point

Memory Restore

Memory restore uses kernel MAP_PRIVATE lazy page loading — pages are demand-faulted from the file, writes create anonymous copies. No userfaultfd required.

vCPU Register Restore Order

The restore sequence in cpu.rs is order-sensitive. Getting it wrong causes silent guest crashes (kernel panic → reboot via port 0x64).

Register restore order (7-step, order-sensitive)
1. MSRs              KVM_SET_MSRS
2. sregs             KVM_SET_SREGS (segment regs, CR0/CR3/CR4)
3. LAPIC             KVM_SET_LAPIC + periodic timer bootstrap (see below)
4. vcpu_events       KVM_SET_VCPU_EVENTS (exception/interrupt state)
5. XCRs (XCR0)       KVM_SET_XCRS — MUST come before xsave
6. xsave (FPU/SSE)   KVM_SET_XSAVE — depends on XCR0 for feature mask
7. regs              KVM_SET_REGS (GP registers, RIP, RFLAGS)

XCR0 restore is critical. XCR0 controls which XSAVE features (x87, SSE, AVX) are active. Without it, the guest's XRSTORS instruction triggers a #GP because the default XCR0 only enables x87, but the guest's XSAVE area references SSE/AVX features.

LAPIC Timer Bootstrap

When the guest was idle (NO_HZ) at snapshot time, the LAPIC timer is masked with vector=0 (LVTT=0x10000). After restore, no timer interrupt ever fires, so the scheduler never runs. The restore code detects this state and bootstraps a periodic LAPIC timer (mode=periodic, vector=0xEC, TMICT=0x200000, TDCR=divide-by-1) to kick the scheduler back to life.

Vsock Backend for Snapshot

The userspace virtio-vsock backend must be used for VMs that will be snapshotted. The kernel vhost backend (/dev/vhost-vsock) does not expose internal vring indices, making queue state capture incomplete. The userspace backend tracks last_avail_idx/last_used_idx directly, ensuring clean snapshot/restore of the virtqueue state.

CID Preservation

The snapshot stores the VM's actual CID (assigned at cold boot). On restore, the same CID is reused — the guest kernel caches the CID during virtio-vsock probe and silently drops packets with mismatched dst_cid.

Opt-in Plumbing

Every layer has an optional snapshot field that defaults to None:

LayerFieldTypeDefault
SandboxBuilder.snapshot(path)Option<PathBuf>None
BoxConfigsnapshotOption<PathBuf>None
SandboxSpec (YAML)sandbox.snapshotOption<String>None
BoxSandboxOverridesandbox.snapshotOption<String>None
CreateRunRequest (API)snapshotOption<String>None

Resolution chain: per-box override → top-level spec → None (cold boot).

Snapshot Resolution

When a snapshot string is provided, the runtime resolves it as:

  1. Hash prefix~/.void-box/snapshots/<prefix>/ (if state.bin exists)
  2. Literal path → treat as directory path (if state.bin exists)
  3. Neither → warning printed, cold boot

No env var fallback, no auto-detection.

Cache Management

  • LRU eviction: evict_lru(max_bytes) removes oldest snapshots first
  • Layer hashing: compute_layer_hash(base, layer, content) for deterministic cache keys
  • Listing: list_snapshots() / voidbox snapshot list
  • Deletion: delete_snapshot(prefix) / voidbox snapshot delete <prefix>

Snapshot cache is stored at ~/.void-box/snapshots/.

Security Considerations

Snapshot cloning shares identical VM state across restored instances:

  • RNG entropy: Restored VMs inherit the same /dev/urandom pool. Mitigated by: fresh CID per restore, hardware RDRAND re-seeding on rdtsc.
  • ASLR: Clones share guest page table layout. Mitigated by: short-lived tasks, no direct network addressability (SLIRP NAT), command allowlist limiting attack surface.
  • Session isolation: Restored VMs reuse the snapshot's stored session secret for vsock authentication (the secret is baked into the guest's kernel cmdline in snapshot memory). Per-restore secret rotation would require guest-side support.