Compiler And Memory Optimization Roadmap
This roadmap turns deep-research-report(5).html into implementable 0x0 work.
The report's central recommendation is a layered performance strategy: make the
default compiler incremental, cached, and parallel; make serious release builds
profile-guided and layout-aware; make allocation local, region-friendly, and
measurable; and keep expensive search, ML, and superoptimization off the hot
compile path until validation proves they are safe.
This roadmap is additive. It does not replace existing native performance,
runtime, ABI, package, or release gates. A milestone is done only when the
implementation, tests, diagnostics, release evidence, documentation, ADR/RFC
records, and public support status agree.
Research Summary
The dossier prioritizes these improvements for 0x0:
- query-based incremental compilation with red/green dependency tracking;
- content-addressed storage for query results and build artifacts;
- a multi-level typed SSA pipeline with target-bound lowering kept late;
- fast parallel linking for developer iteration;
- PGO, ThinLTO-style summary optimization, and post-link code layout for
release builds;
- copy-and-patch baseline JIT only for hosted runtime paths that need immediate
machine code;
- allocator fast paths with local ownership, remote-free batching, hugepage
hooks, and NUMA-aware optional backends;
- stack, region, and arena allocation before heap allocation;
- an explicit RC versus GC decision backed by language semantics and runtime
evidence;
- translation validation, sanitizers, differential tests, and optimization
remark reporting before aggressive optimization ships;
- ML-guided heuristics and equality saturation only as measured, optional,
validated lanes.
Non-Goals
- Do not introduce mandatory ML inference into default compilation.
- Do not put equality saturation or superoptimization into the default optimizer
without validation and performance gates.
- Do not add a JIT unless a hosted runtime surface has a real need for it.
- Do not enable hugepages, NUMA policies, pointer tagging, barriers, or tracing
GC as generic defaults on targets that cannot prove support.
- Do not treat benchmark wins as release evidence unless correctness,
reproducibility, and regression gates also pass.
Standing Rules
- Keep the default developer path resource-safe: incremental, cached, bounded,
and explainable.
- Keep expensive optimization in explicit release, profile-guided, background,
or research lanes.
- Every optimization must preserve source behavior under differential tests and
relevant validation gates.
- Every memory-management change must include leak, lifetime, stress, failure,
and RSS evidence.
- Every target-bound optimization must declare ISA, object format, OS, page
model, executable-memory policy, and fallback behavior.
- Every milestone needs ADR/RFC records and a documentation impact review.
Milestone 0: Research Baseline And Performance Ledger
Status: done.
Goal: turn the dossier into source-owned evidence and define what is already
implemented, what is planned, and what is intentionally deferred.
Submilestones:
- 0.1 Add a research ledger that maps each dossier recommendation to current
0x0 files, missing implementation, owner, target milestone, and gate.
- 0.2 Inventory current compiler performance surfaces: parser, checker, import
resolver, typed/effect IR, native IR, object emission, linker, static site
tools, package resolver, and app build paths.
- 0.3 Inventory current runtime memory surfaces: compiler arenas, direct
runtime allocations, ABI v1 layouts, host buffers, app runtime buffers, and
generated artifacts.
- 0.4 Define baseline metrics: cold compile time, no-change rebuild time,
single-file edit rebuild time, peak RSS, cache hit ratio, object size, link
time, binary size, startup time, and selected runtime counters.
- 0.5 Add a bounded
make compiler-memory-optimization-baseline-checkthat
validates the ledger and baseline schema without running heavy builds.
Acceptance:
- A machine-readable research ledger exists:
perf/compiler-memory-research-ledger.tsv.
- Current and missing surfaces are classified without claiming planned work is
implemented.
- Compiler and runtime memory surfaces are inventoried in
perf/compiler-performance-surfaces.tsv and
perf/runtime-memory-surfaces.tsv.
- Baseline metrics have stable names, units, collection commands, owners, and
budget policy in perf/compiler-memory-baseline-metrics.tsv.
- The bounded
make compiler-memory-optimization-baseline-checkgate passes.
Milestone 1: Query Engine And Incremental Compilation
Status: done.
Goal: make compiler work demand-driven, memoized, dependency-tracked, and
reusable across normal developer edits.
Submilestones:
- 1.1 Define stable query keys for source files, tokens, parsed forms, module
signatures, imports, type/effect results, HIR, MIR, native IR, object output,
diagnostics, and documentation extraction.
- 1.2 Implement red/green dependency tracking with explicit input fingerprints
and query dependency edges.
- 1.3 Persist query outputs with versioned schemas and invalidation reasons.
- 1.4 Make provider functions deterministic and side-effect-free, except for
explicitly marked always-evaluate operations.
- 1.5 Add query diagnostics for stale inputs, incompatible schema versions,
non-deterministic providers, dependency cycles, and invalidated cache roots.
- 1.6 Add incremental compile tests for no-change rebuilds, leaf edits, import
edits, signature changes, package changes, and diagnostic-only changes.
Acceptance:
- No-change and single-leaf rebuilds reuse parser, checker, IR, and object
work where dependencies are green, as recorded in
compiler-query/incremental-scenarios.tsv.
- The query graph can be inspected by the bounded
release/compiler-query-graph-report.tsv report.
release/compiler-query-cache-miss-report.tsvexplains why each recomputed
query was red.
make compiler-query-engine-checkvalidates query keys, schemas, provider
purity, dependency cycles, red/green consistency, cache roots, diagnostics,
and incremental scenarios.
Milestone 2: Content-Addressed Storage And Build Artifact Cache
Status: open.
Goal: store compiler and toolchain outputs by content so rebuilds, package
resolution, profile reuse, and release verification share one deterministic
artifact model.
Submilestones:
- 2.1 Define the CAS object schema for query values, typed AST summaries, MIR,
native IR, object files, profile summaries, link order files, API docs, and
generated package metadata.
- 2.2 Implement local CAS read/write, garbage collection, integrity checking,
and schema migration.
- 2.3 Integrate CAS with the query engine, package resolver, doc generation,
object emission, and linker inputs.
- 2.4 Add offline and hermetic modes for CI and release verification.
- 2.5 Add cache poisoning, checksum mismatch, schema mismatch, path traversal,
and rollback tests.
Acceptance:
- Repeated builds can reuse CAS objects across compiler invocations.
- Cache keys are stable and independent of process-local pointer identity.
- CAS integrity failures are fail-closed and diagnosable.
Milestone 3: Multi-Level Typed SSA Pipeline
Status: done.
Goal: split the compiler pipeline into clear IR levels that preserve semantic
facts early and bind target details late.
Submilestones:
- 3.1 Define HIR for parsed, resolved, and typed source structure.
- 3.2 Define MIR as typed/effect SSA with ownership, escape, alias, alignment,
hotness, and source-span metadata.
- 3.3 Define LIR for ABI lowering, legal types, calls, stack slots, and runtime
calls.
- 3.4 Define MachineIR for instruction selection, scheduling, register
allocation, frame layout, branch forms, relocations, and object metadata.
- 3.5 Add IR verifier passes and stable textual/TSV/JSON dump formats for each
level.
- 3.6 Update backends to consume the appropriate IR level instead of redoing
frontend decisions.
Acceptance:
- Each IR level has a spec, verifier, fixture corpus, and diagnostic class.
- Target-specific decisions are not introduced before the documented lowering
boundary.
- Existing backends either consume the new IR or are marked with explicit
transition evidence.
make typed-ssa-pipeline-checkvalidates HIR, MIR, LIR, MachineIR,
lowering boundaries, dump manifests, verifier reports, backend transition
rows, and negative diagnostics.
make native-ir-checknow runsmake typed-ssa-pipeline-checkfirst.
Milestone 4: Fast Linker And Developer Build Loop
Status: done.
Goal: make edit-build-run fast before adding heavier optimization tiers.
Submilestones:
- 4.1 Measure current link and package-build time across compiler, libraries,
apps, and static site artifacts.
- 4.2 Add parallel object discovery, symbol table construction, relocation
planning, archive index reads, and output writing where the current linker can
support it.
- 4.3 Add incremental link planning that reuses unchanged object metadata and
link order state from CAS.
- 4.4 Add deterministic section ordering, dead stripping evidence, and duplicate
symbol diagnostics.
- 4.5 Add developer-loop benchmarks for no-change builds, one-file edits,
package edits, and app edits.
Acceptance:
- The fast developer path has bounded wall-clock and RSS budgets.
- Linker output remains byte-reproducible when inputs are unchanged.
- Linker diagnostics stay stable under parallel execution.
make fast-linker-dev-loop-checkvalidates workload budgets, parallel-safe
planning stages, incremental reuse, deterministic output hashes, stable
diagnostics, compatibility rows, and fixture failures.
make native-linker-checkrunsmake fast-linker-dev-loop-checkbefore the
native linker toolchain evidence check.
Milestone 5: PGO, Thin Summary Optimization, And Profile Plumbing
Status: done.
Goal: make the serious AOT optimization path profile-guided without slowing the
default developer path.
Submilestones:
- 5.1 Define profile data formats for function counts, edge counts, branch
probabilities, indirect-call targets, hot/cold blocks, startup windows, and
temporal order.
- 5.2 Implement instrumentation profile generation and profile merge tooling.
- 5.3 Implement sample profile intake for hosted targets where external samples
are available.
- 5.4 Add ThinLTO-style summaries for call graph, imports, hotness, inlining
candidates, devirtualization candidates, and code size budgets.
- 5.5 Integrate profile-aware inlining, function splitting, indirect-call
promotion, branch layout, and hot/cold placement.
- 5.6 Add profile mismatch, stale profile, incompatible binary, and missing
source diagnostics.
Acceptance:
- Release builds can consume stable profile artifacts.
- Profile use is optional and reproducible.
- Performance wins must include correctness, wall-clock, RSS, and binary-size
evidence.
make pgo-thin-summary-checkvalidates profile formats, instrumentation
generation, deterministic merge, hosted sample intake, Thin summary rows,
profile-guided optimization decisions, stable diagnostics, performance
evidence, compatibility rows, and negative fixtures.
make native-post-link-checkrunsmake pgo-thin-summary-checkbefore the
native post-link evidence check.
Milestone 6: Post-Link Layout Optimization
Status: done.
Goal: improve final binary layout after ordinary PGO and summary optimization.
Submilestones:
- 6.1 Decide the supported strategy per target: Propeller-like relinking,
BOLT-like binary rewriting, or a 0x0-native ordering-file path.
- 6.2 Add basic-block section or equivalent fine-grained layout metadata where
the object format supports it.
- 6.3 Generate and consume function-order and block-order files.
- 6.4 Add front-end performance counters for i-cache, iTLB, branch misses,
startup page faults, and cold-start time where supported.
- 6.5 Add fallback behavior for targets that cannot expose block sections or
reliable sampled profiles.
Acceptance:
- Post-link optimization is release-only or explicit, never default dev-path
work.
- The optimizer records before/after layout, binary size, startup, and runtime
metrics.
- Unsupported targets fail closed with clear diagnostics.
make post-link-layout-checkvalidates target strategy selection,
block-level metadata, generated and consumed order files, before/after
metrics, fallback policy, stable diagnostics, release reports, compatibility
rows, and negative fixtures.
make native-post-link-checkruns bothmake pgo-thin-summary-checkand
make post-link-layout-check before the native post-link evidence check.
Milestone 7: Allocator Fast Paths And Page Backend
Status: done.
Goal: replace generic allocation behavior with a production allocator strategy
that keeps common paths local and cheap.
Submilestones:
- 7.1 Define allocator profiles: freestanding, hosted-small, hosted-server,
realtime, test-fixture, and compiler-internal.
- 7.2 Implement size classes, local free lists, local heap ownership, and
slow-path span refill.
- 7.3 Implement remote-free batching or message passing with bounded queues and
fail-closed overflow behavior.
- 7.4 Add transfer cache and central page/span management.
- 7.5 Add optional hugepage-aware backend hooks and NUMA policy hooks for hosted
server targets.
- 7.6 Add allocator stress tests for local free, remote free, cross-thread
allocation, fragmentation, large objects, exhaustion, and shutdown cleanup.
Acceptance:
- The allocator has measured fast-path instruction and branch budgets.
- Remote frees do not perturb the local fast path except at documented drain
points.
- Hugepage and NUMA behavior is opt-in and target-gated.
make allocator-fast-path-checkvalidates allocator profiles, size classes,
local free lists, local ownership, remote-free batching, transfer caches,
central page/span management, hugepage and NUMA hooks, stress reports,
diagnostics, compatibility rows, performance budgets, and negative fixtures.
make native-memory-control-checkrunsmake allocator-fast-path-check
before the native memory-control evidence check.
Milestone 8: Stack, Region, Arena, And Escape Promotion
Status: done.
Goal: make heap allocation the fallback by promoting short-lived and
non-escaping values to cheaper lifetime domains.
Submilestones:
- 8.1 Extend escape analysis with object lifetime, alias, ownership, closure,
actor, host-buffer, and FFI escape categories.
- 8.2 Add scalar replacement for eligible aggregates.
- 8.3 Add stack allocation for non-escaping values.
- 8.4 Add region and arena allocation for compiler IR, parser scratch, request
scope, transaction scope, and app runtime scope.
- 8.5 Add dynamic heapification only if a proven target requires optimistic
stack or region promotion.
- 8.6 Add diagnostics and reports that explain why allocation did or did not
promote.
Acceptance:
- Promotion decisions are deterministic and inspectable.
- Incorrect promotion is caught by validation, stress, and lifetime tests.
- Compiler and app workloads show allocator-traffic reductions without
behavioral drift.
make region-arena-promotion-checkvalidates escape categories, promotion
domains, scalar replacement, stack promotion, region and arena scopes,
dynamic heapification policy, stable diagnostics, release reports,
compatibility rows, performance budgets, and negative fixtures.
make escape-analysis-checkrunsmake region-arena-promotion-checkbefore
the broader escape-analysis evidence.
Milestone 9: RC, Regions, And Optional GC Decision
Status: done.
Goal: choose and implement the long-term managed memory semantics for 0x0
instead of mixing ad hoc conventions.
Submilestones:
- 9.1 Decide whether the production semantics prefer ownership plus regions,
precise reference counting with reuse, optional tracing GC, or target-specific
profiles.
- 9.2 If RC is selected, implement precise retain/release insertion, reuse
analysis, cycle policy, diagnostics, and stress tests.
- 9.3 If GC is selected, implement a concurrent regional design with explicit
barrier, metadata, pointer, and target support policy.
- 9.4 Add compatibility boundaries for ABI v1 layouts, host buffers, closures,
actors, remote actors, Live values, and app runtime values.
- 9.5 Add pause, throughput, memory footprint, leak, and shutdown evidence.
Acceptance:
- The memory semantics are documented and enforced by compiler/runtime gates.
- Cycles, shared graphs, actors, host resources, and FFI have explicit policy.
- Unsupported target combinations fail closed.
make memory-semantics-decision-checkvalidates profile semantics,
retain/release insertion policy, RC reuse, region and arena release, optional
GC target policy, ABI and host boundaries, actor mailbox retention, stable
diagnostics, stress reports, performance budgets, compatibility rows, and
negative fixtures.
make native-memory-control-checkrunsmake memory-semantics-decision-check
before the broader native memory-control evidence.
Milestone 10: Copy-And-Patch Baseline JIT
Status: open.
Goal: add a fast hosted JIT tier only where 0x0 runtime use cases need immediate
native code generation.
Submilestones:
- 10.1 Identify hosted runtime surfaces that require JIT behavior and prove the
need with latency or throughput evidence.
- 10.2 Define stencil formats, patch slots, relocation kinds, labels,
safepoints, deopt metadata, executable-memory policy, and code-cache layout.
- 10.3 Generate stencils from the same operation definitions used by the
interpreter or VM-shaped runtime.
- 10.4 Implement W^X-safe code allocation, patching, finalization, icache flush
behavior, and code cache reclamation.
- 10.5 Add diagnostics for unsupported stencils, patch overflow, target
mismatch, executable-memory denial, and invalid branch targets.
- 10.6 Add differential tests against interpreter/VM behavior.
Acceptance:
- JIT is opt-in and target-gated.
- The baseline tier compiles quickly and records code size, compile latency,
runtime latency, and correctness evidence.
- Unsupported targets remain fully functional without JIT.
Milestone 11: Validation, Sanitizers, And Miscompilation Defense
Status: open.
Goal: make aggressive optimization safe enough to ship.
Submilestones:
- 11.1 Add translation validation for dangerous IR-to-IR and LIR-to-LIR
rewrites.
- 11.2 Add sanitizer-backed lanes for bounds, use-after-free, leaks,
uninitialized reads, data races, undefined behavior, and aliasing
assumptions where the target supports them.
- 11.3 Add differential testing against interpreter, VM, direct ELF, native,
object/linker, and WASM-shaped paths where applicable.
- 11.4 Add optimization remarks with source spans, hotness, reason, and
expected cost impact.
- 11.5 Add rollback controls for passes that regress correctness, compile time,
memory, binary size, or runtime.
Acceptance:
- Aggressive rewrites are not enabled without validation or an explicit
release-risk exception.
- Miscompilation reports include minimized input, pass, IR level, target, and
validation result.
- Sanitizer and differential lanes are explicit release inputs for optimized
builds.
Milestone 12: Benchmarking, Profiling, And Performance Budgets
Status: open.
Goal: make performance claims reproducible and tied to workloads users care
about.
Submilestones:
- 12.1 Define benchmark suites for frontend speed, mid-end query time, codegen
speed, link time, compiler RSS, binary size, startup time, runtime throughput,
app latency, and allocator behavior.
- 12.2 Add profiler intake for compiler-side and binary-side profiles.
- 12.3 Record cache hit ratios, query colors, invalidation fan-out, profile
quality, and pass timings.
- 12.4 Add performance dashboards and TSV/JSON release reports.
- 12.5 Add regression thresholds with override policy, owner signoff, and
release-note requirements.
Acceptance:
- Every performance claim cites a workload, metric, baseline, target, date, and
command.
- Regressions fail bounded checks unless they have accepted ADR/RFC exceptions.
- Release artifacts include performance reports.
Milestone 13: ML-Guided Heuristics
Status: open.
Goal: use ML only for narrow, replaceable, measured decisions where hand-tuned
heuristics are weak.
Submilestones:
- 13.1 Choose first pilot decisions: inlining for size/speed, register
allocation eviction, branch probability fallback, or layout policy tuning.
- 13.2 Define feature schemas from IR, MachineIR, hotness, code size, register
pressure, loop depth, and target features.
- 13.3 Implement an optional advisor interface with deterministic fallback to
hand-written heuristics.
- 13.4 Version and sign model artifacts; keep training offline and outside the
compiler binary.
- 13.5 Log decisions, confidence, fallback reason, compile-time cost, and
benchmark outcome.
Acceptance:
- ML is optional and disabled by default unless a release explicitly enables it.
- Missing or incompatible models fall back without changing source semantics.
- A model can be rolled back independently of compiler source.
Milestone 14: Equality Saturation And Superoptimization Research Lane
Status: open.
Goal: use expensive search only for narrow, validated optimization discovery.
Submilestones:
- 14.1 Define first target domains: arithmetic/bit-manip kernels, late LIR
peepholes, vectorization candidates, or DSL fragments.
- 14.2 Add e-graph rewrite sets with budgets, extraction cost models, and
deterministic limits.
- 14.3 Validate candidates with translation validation, differential testing,
and benchmark checks before promotion.
- 14.4 Add LLM-guided proposal intake only as offline candidate generation.
- 14.5 Promote surviving rewrites into ordinary deterministic compiler rules
with ADR/RFC evidence.
Acceptance:
- No superoptimization proposal ships without equivalence validation and
performance evidence.
- Research output is separated from default compilation.
- Failed proposals leave reproducible rejection evidence.
Milestone 15: Production Release Integration
Status: open.
Goal: make compiler and memory optimization a normal, auditable part of release
readiness.
Submilestones:
- 15.1 Add final release rows for incremental compiler, CAS, IR verifiers,
optimized AOT profile path, allocator, memory semantics, validation, and
performance budgets.
- 15.2 Update public support matrices with implemented, target-gated, deferred,
or research-only status.
- 15.3 Add release artifact manifests for cache schemas, profiles, order files,
allocator reports, validation reports, benchmark results, and rollback
controls.
- 15.4 Add operator/developer docs for when to use fast dev builds, optimized
AOT builds, profile-guided builds, post-link optimization, JIT, and research
lanes.
- 15.5 Run bounded source gates first, then explicit heavy release/performance
gates after all implementation milestones are done.
Acceptance:
- A release cannot claim compiler/memory optimization closure without release
artifacts and public support status.
- Heavy gates are explicit release operations, not default development checks.
- Remaining target-specific limitations are documented in release notes.
Completion Signal
This roadmap is complete when:
- default development builds are incremental, cached, bounded, and measurable;
- release builds can consume profile data and layout evidence;
- the compiler has verified multi-level IR boundaries;
- allocator, region, and managed memory semantics are implemented and measured;
- optional JIT, ML, and superoptimization lanes are gated, validated, and
non-default unless a release enables them;
- performance reports are reproducible and release-owned;
- public docs distinguish implemented, target-gated, deferred, and
research-only capabilities.