04 REFERENCE APR 2026

Performance

Cold start, per-command overhead, memory footprint, and how to measure each yourself. Numbers from CI on a 2024 Mac mini, gated as regressions on every PR.

tok0 is engineered to add overhead so small you don’t notice it. Cold start under 5 ms, per-command compression under 1 ms, ~8 MB resident. CI gates every number on this page; a regression of >10% on any of them blocks the PR.

Headline numbers

Metric	Value	Test machine
Cold start (`tok0 --version`)	4.2 ms	M2 Mac mini, 2024
Compression latency, p50	0.31 ms	git diff fixture, 8 KB
Compression latency, p99	0.78 ms	docker build fixture, 50 KB
Stripped binary size	8.1 MB	x86_64-unknown-linux-gnu
Resident memory (idle)	6.3 MB	per-process
Resident memory (active compression)	11.2 MB	with rule cache loaded

These are the same numbers CI prints on every release. Replicate locally with the recipes below.

Why it’s fast

No async runtime. Pure synchronous Rust. tokio cold-starts add 5–10 ms; tok0 budgets <5 ms total. We don’t have the headroom.
Single-threaded by design. No locks, no atomics, no thread spawning. The metering writer is the only background thread.
Regex compiled once. Every regex lives in lazy_static!. First use pays the cost; subsequent uses are hash lookups.
mtime-based rule cache. Rules only re-parse when their TOML file changes. Steady-state rule loading is a handful of stat() calls.
Pure pipeline functions. Each stage takes &str and returns String. No allocations beyond the necessary one per stage. Stack-friendly.

Measuring cold start

hyperfine --warmup 5 'tok0 --version'

Expected output on a modern x86_64 / arm64 machine:

Benchmark 1: tok0 --version
  Time (mean ± σ):       4.4 ms ±  0.3 ms
  Range (min … max):     3.9 ms …  5.7 ms    593 runs

If you see >10 ms cold start, the most common cause is a bloated ~/.config/tok0/filters/ with thousands of TOML files. Check:

ls ~/.config/tok0/filters/ | wc -l

Realistic numbers are 0–50. Past 100 you should consolidate.

Measuring per-command overhead

hyperfine --warmup 3 'git diff' 'tok0 git diff'

The delta is tok0’s overhead. On a 4 KB diff fixture:

Benchmark 1: git diff
  Time (mean ± σ):     12.4 ms ±  1.1 ms

Benchmark 2: tok0 git diff
  Time (mean ± σ):     13.0 ms ±  1.2 ms

Summary
  'git diff' ran 1.05 ± 0.12 times faster than 'tok0 git diff'

~600 µs of overhead. The wrapped command’s own runtime dominates by 20×.

Profiling individual compressors

tok0 profile run "<command>"

Output reports raw size, compressed size, savings %, total pipeline time, and per-stage time. Use this when a specific compressor feels slow.

tok0 profile           # aggregate over the last 30 days

Identifies the slowest compressors in your actual workload — useful when deciding what to optimize.

Memory

tok0 uses ~6 MB resident at steady state (binary + libc + small heap). Active compression peaks at ~11 MB on the largest fixtures we test against (50 KB Docker build output).

If you see >50 MB resident, it’s almost certainly the SQLite meter database with millions of rows. Check:

ls -lh ~/.config/tok0/meter.db
sqlite3 ~/.config/tok0/meter.db "SELECT COUNT(*) FROM events;"

The meter retains data forever by default. To prune:

sqlite3 ~/.config/tok0/meter.db "DELETE FROM events WHERE ts < strftime('%s', 'now', '-90 days');"
sqlite3 ~/.config/tok0/meter.db "VACUUM;"

Or set [general].meter_retention_days = 90 in your config and tok0 prunes automatically on startup.

Binary size

$ size $(which tok0)
   text	   data	    bss	    dec	    hex	filename
8421376	  17280	    424	8439080	 80b768	tok0

8.1 MB stripped. CI gates this — any commit pushing it past 9 MB fails the build. The largest contributors:

regex and regex-syntax (~1.2 MB)
rusqlite + bundled SQLite (~2.4 MB)
120 embedded TOML rule files (~85 KB total)
52 native compressors (~600 KB)

The cloud feature adds ~600 KB (HTTP client, JSON serde). Default builds don’t include it.

Threading guarantee

tok0 is single-threaded plus one mpsc-fed metering writer thread. There is no thread spawning anywhere else in the binary. That’s a hard contract — clippy is configured to flag any new std::thread::spawn outside the metering module.

Why it matters: deterministic execution order in tests, no lock contention, predictable cold-start cost, no surprise CPU pinning.

Regression gates

Every PR runs:

cargo bench --features ci-bench

The bench suite asserts:

Cold start ≤ 5.0 ms (95th percentile across 100 runs)
Compression latency ≤ 1.0 ms (95th percentile, all built-in fixtures)
Binary size ≤ 9.0 MB stripped
No memory growth >5% across 1000 sequential compressions

A >10% regression on any of these fails CI. Authors must either restore parity or document the trade-off in the PR description.

Tip

The exact bench numbers from the latest release are published at tok0.dev/bench with raw csv output, so you can diff against your own machine.