Compiler And Memory Optimization Roadmap

This roadmap turns deep-research-report(5).html into implementable 0x0 work.

The report's central recommendation is a layered performance strategy: make the

default compiler incremental, cached, and parallel; make serious release builds

profile-guided and layout-aware; make allocation local, region-friendly, and

measurable; and keep expensive search, ML, and superoptimization off the hot

compile path until validation proves they are safe.

This roadmap is additive. It does not replace existing native performance,

runtime, ABI, package, or release gates. A milestone is done only when the

implementation, tests, diagnostics, release evidence, documentation, ADR/RFC

records, and public support status agree.

Research Summary

The dossier prioritizes these improvements for 0x0:

query-based incremental compilation with red/green dependency tracking;
content-addressed storage for query results and build artifacts;
a multi-level typed SSA pipeline with target-bound lowering kept late;
fast parallel linking for developer iteration;
PGO, ThinLTO-style summary optimization, and post-link code layout for

release builds;

copy-and-patch baseline JIT only for hosted runtime paths that need immediate

machine code;

allocator fast paths with local ownership, remote-free batching, hugepage

hooks, and NUMA-aware optional backends;

stack, region, and arena allocation before heap allocation;
an explicit RC versus GC decision backed by language semantics and runtime

evidence;

translation validation, sanitizers, differential tests, and optimization

remark reporting before aggressive optimization ships;

ML-guided heuristics and equality saturation only as measured, optional,

validated lanes.

Non-Goals

Do not introduce mandatory ML inference into default compilation.
Do not put equality saturation or superoptimization into the default optimizer

without validation and performance gates.

Do not add a JIT unless a hosted runtime surface has a real need for it.
Do not enable hugepages, NUMA policies, pointer tagging, barriers, or tracing

GC as generic defaults on targets that cannot prove support.

Do not treat benchmark wins as release evidence unless correctness,

reproducibility, and regression gates also pass.

Standing Rules

Keep the default developer path resource-safe: incremental, cached, bounded,

and explainable.

Keep expensive optimization in explicit release, profile-guided, background,

or research lanes.

Every optimization must preserve source behavior under differential tests and

relevant validation gates.

Every memory-management change must include leak, lifetime, stress, failure,

and RSS evidence.

Every target-bound optimization must declare ISA, object format, OS, page

model, executable-memory policy, and fallback behavior.

Every milestone needs ADR/RFC records and a documentation impact review.

Milestone 0: Research Baseline And Performance Ledger

Status: done.

Goal: turn the dossier into source-owned evidence and define what is already

implemented, what is planned, and what is intentionally deferred.

Submilestones:

0.1 Add a research ledger that maps each dossier recommendation to current

0x0 files, missing implementation, owner, target milestone, and gate.

0.2 Inventory current compiler performance surfaces: parser, checker, import

resolver, typed/effect IR, native IR, object emission, linker, static site

tools, package resolver, and app build paths.

0.3 Inventory current runtime memory surfaces: compiler arenas, direct

runtime allocations, ABI v1 layouts, host buffers, app runtime buffers, and

generated artifacts.

0.4 Define baseline metrics: cold compile time, no-change rebuild time,

single-file edit rebuild time, peak RSS, cache hit ratio, object size, link

time, binary size, startup time, and selected runtime counters.

0.5 Add a bounded make compiler-memory-optimization-baseline-check that

validates the ledger and baseline schema without running heavy builds.

Acceptance:

A machine-readable research ledger exists:

perf/compiler-memory-research-ledger.tsv.

Current and missing surfaces are classified without claiming planned work is

implemented.

Compiler and runtime memory surfaces are inventoried in

perf/compiler-performance-surfaces.tsv and

perf/runtime-memory-surfaces.tsv.

Baseline metrics have stable names, units, collection commands, owners, and

budget policy in perf/compiler-memory-baseline-metrics.tsv.

The bounded make compiler-memory-optimization-baseline-check gate passes.

Milestone 1: Query Engine And Incremental Compilation

Status: done.

Goal: make compiler work demand-driven, memoized, dependency-tracked, and

reusable across normal developer edits.

Submilestones:

1.1 Define stable query keys for source files, tokens, parsed forms, module

signatures, imports, type/effect results, HIR, MIR, native IR, object output,

diagnostics, and documentation extraction.

1.2 Implement red/green dependency tracking with explicit input fingerprints

and query dependency edges.

1.3 Persist query outputs with versioned schemas and invalidation reasons.
1.4 Make provider functions deterministic and side-effect-free, except for

explicitly marked always-evaluate operations.

1.5 Add query diagnostics for stale inputs, incompatible schema versions,

non-deterministic providers, dependency cycles, and invalidated cache roots.

1.6 Add incremental compile tests for no-change rebuilds, leaf edits, import

edits, signature changes, package changes, and diagnostic-only changes.

Acceptance:

No-change and single-leaf rebuilds reuse parser, checker, IR, and object

work where dependencies are green, as recorded in

compiler-query/incremental-scenarios.tsv.

The query graph can be inspected by the bounded

release/compiler-query-graph-report.tsv report.

release/compiler-query-cache-miss-report.tsv explains why each recomputed

query was red.

make compiler-query-engine-check validates query keys, schemas, provider

purity, dependency cycles, red/green consistency, cache roots, diagnostics,

and incremental scenarios.

Milestone 2: Content-Addressed Storage And Build Artifact Cache

Status: open.

Goal: store compiler and toolchain outputs by content so rebuilds, package

resolution, profile reuse, and release verification share one deterministic

artifact model.

Submilestones:

2.1 Define the CAS object schema for query values, typed AST summaries, MIR,

native IR, object files, profile summaries, link order files, API docs, and

generated package metadata.

2.2 Implement local CAS read/write, garbage collection, integrity checking,

and schema migration.

2.3 Integrate CAS with the query engine, package resolver, doc generation,

object emission, and linker inputs.

2.4 Add offline and hermetic modes for CI and release verification.
2.5 Add cache poisoning, checksum mismatch, schema mismatch, path traversal,

and rollback tests.

Acceptance:

Repeated builds can reuse CAS objects across compiler invocations.
Cache keys are stable and independent of process-local pointer identity.
CAS integrity failures are fail-closed and diagnosable.

Milestone 3: Multi-Level Typed SSA Pipeline

Status: done.

Goal: split the compiler pipeline into clear IR levels that preserve semantic

facts early and bind target details late.

Submilestones:

3.1 Define HIR for parsed, resolved, and typed source structure.
3.2 Define MIR as typed/effect SSA with ownership, escape, alias, alignment,

hotness, and source-span metadata.

3.3 Define LIR for ABI lowering, legal types, calls, stack slots, and runtime

calls.

3.4 Define MachineIR for instruction selection, scheduling, register

allocation, frame layout, branch forms, relocations, and object metadata.

3.5 Add IR verifier passes and stable textual/TSV/JSON dump formats for each

level.

3.6 Update backends to consume the appropriate IR level instead of redoing

frontend decisions.

Acceptance:

Each IR level has a spec, verifier, fixture corpus, and diagnostic class.
Target-specific decisions are not introduced before the documented lowering

boundary.

Existing backends either consume the new IR or are marked with explicit

transition evidence.

make typed-ssa-pipeline-check validates HIR, MIR, LIR, MachineIR,

lowering boundaries, dump manifests, verifier reports, backend transition

rows, and negative diagnostics.

make native-ir-check now runs make typed-ssa-pipeline-check first.

Milestone 4: Fast Linker And Developer Build Loop

Status: done.

Goal: make edit-build-run fast before adding heavier optimization tiers.

Submilestones:

4.1 Measure current link and package-build time across compiler, libraries,

apps, and static site artifacts.

4.2 Add parallel object discovery, symbol table construction, relocation

planning, archive index reads, and output writing where the current linker can

support it.

4.3 Add incremental link planning that reuses unchanged object metadata and

link order state from CAS.

4.4 Add deterministic section ordering, dead stripping evidence, and duplicate

symbol diagnostics.

4.5 Add developer-loop benchmarks for no-change builds, one-file edits,

package edits, and app edits.

Acceptance:

The fast developer path has bounded wall-clock and RSS budgets.
Linker output remains byte-reproducible when inputs are unchanged.
Linker diagnostics stay stable under parallel execution.
make fast-linker-dev-loop-check validates workload budgets, parallel-safe

planning stages, incremental reuse, deterministic output hashes, stable

diagnostics, compatibility rows, and fixture failures.

make native-linker-check runs make fast-linker-dev-loop-check before the

native linker toolchain evidence check.

Milestone 5: PGO, Thin Summary Optimization, And Profile Plumbing

Status: done.

Goal: make the serious AOT optimization path profile-guided without slowing the

default developer path.

Submilestones:

5.1 Define profile data formats for function counts, edge counts, branch

probabilities, indirect-call targets, hot/cold blocks, startup windows, and

temporal order.

5.2 Implement instrumentation profile generation and profile merge tooling.
5.3 Implement sample profile intake for hosted targets where external samples

are available.

5.4 Add ThinLTO-style summaries for call graph, imports, hotness, inlining

candidates, devirtualization candidates, and code size budgets.

5.5 Integrate profile-aware inlining, function splitting, indirect-call

promotion, branch layout, and hot/cold placement.

5.6 Add profile mismatch, stale profile, incompatible binary, and missing

source diagnostics.

Acceptance:

Release builds can consume stable profile artifacts.
Profile use is optional and reproducible.
Performance wins must include correctness, wall-clock, RSS, and binary-size

evidence.

make pgo-thin-summary-check validates profile formats, instrumentation

generation, deterministic merge, hosted sample intake, Thin summary rows,

profile-guided optimization decisions, stable diagnostics, performance

evidence, compatibility rows, and negative fixtures.

make native-post-link-check runs make pgo-thin-summary-check before the

native post-link evidence check.

Milestone 6: Post-Link Layout Optimization

Status: done.

Goal: improve final binary layout after ordinary PGO and summary optimization.

Submilestones:

6.1 Decide the supported strategy per target: Propeller-like relinking,

BOLT-like binary rewriting, or a 0x0-native ordering-file path.

6.2 Add basic-block section or equivalent fine-grained layout metadata where

the object format supports it.

6.3 Generate and consume function-order and block-order files.
6.4 Add front-end performance counters for i-cache, iTLB, branch misses,

startup page faults, and cold-start time where supported.

6.5 Add fallback behavior for targets that cannot expose block sections or

reliable sampled profiles.

Acceptance:

Post-link optimization is release-only or explicit, never default dev-path

work.

The optimizer records before/after layout, binary size, startup, and runtime

metrics.

Unsupported targets fail closed with clear diagnostics.
make post-link-layout-check validates target strategy selection,

block-level metadata, generated and consumed order files, before/after

metrics, fallback policy, stable diagnostics, release reports, compatibility

rows, and negative fixtures.

make native-post-link-check runs both make pgo-thin-summary-check and

make post-link-layout-check before the native post-link evidence check.

Milestone 7: Allocator Fast Paths And Page Backend

Status: done.

Goal: replace generic allocation behavior with a production allocator strategy

that keeps common paths local and cheap.

Submilestones:

7.1 Define allocator profiles: freestanding, hosted-small, hosted-server,

realtime, test-fixture, and compiler-internal.

7.2 Implement size classes, local free lists, local heap ownership, and

slow-path span refill.

7.3 Implement remote-free batching or message passing with bounded queues and

fail-closed overflow behavior.

7.4 Add transfer cache and central page/span management.
7.5 Add optional hugepage-aware backend hooks and NUMA policy hooks for hosted

server targets.

7.6 Add allocator stress tests for local free, remote free, cross-thread

allocation, fragmentation, large objects, exhaustion, and shutdown cleanup.

Acceptance:

The allocator has measured fast-path instruction and branch budgets.
Remote frees do not perturb the local fast path except at documented drain

points.

Hugepage and NUMA behavior is opt-in and target-gated.
make allocator-fast-path-check validates allocator profiles, size classes,

local free lists, local ownership, remote-free batching, transfer caches,

central page/span management, hugepage and NUMA hooks, stress reports,

diagnostics, compatibility rows, performance budgets, and negative fixtures.

make native-memory-control-check runs make allocator-fast-path-check

before the native memory-control evidence check.

Milestone 8: Stack, Region, Arena, And Escape Promotion

Status: done.

Goal: make heap allocation the fallback by promoting short-lived and

non-escaping values to cheaper lifetime domains.

Submilestones:

8.1 Extend escape analysis with object lifetime, alias, ownership, closure,

actor, host-buffer, and FFI escape categories.

8.2 Add scalar replacement for eligible aggregates.
8.3 Add stack allocation for non-escaping values.
8.4 Add region and arena allocation for compiler IR, parser scratch, request

scope, transaction scope, and app runtime scope.

8.5 Add dynamic heapification only if a proven target requires optimistic

stack or region promotion.

8.6 Add diagnostics and reports that explain why allocation did or did not

promote.

Acceptance:

Promotion decisions are deterministic and inspectable.
Incorrect promotion is caught by validation, stress, and lifetime tests.
Compiler and app workloads show allocator-traffic reductions without

behavioral drift.

make region-arena-promotion-check validates escape categories, promotion

domains, scalar replacement, stack promotion, region and arena scopes,

dynamic heapification policy, stable diagnostics, release reports,

compatibility rows, performance budgets, and negative fixtures.

make escape-analysis-check runs make region-arena-promotion-check before

the broader escape-analysis evidence.

Milestone 9: RC, Regions, And Optional GC Decision

Status: done.

Goal: choose and implement the long-term managed memory semantics for 0x0

instead of mixing ad hoc conventions.

Submilestones:

9.1 Decide whether the production semantics prefer ownership plus regions,

precise reference counting with reuse, optional tracing GC, or target-specific

profiles.

9.2 If RC is selected, implement precise retain/release insertion, reuse

analysis, cycle policy, diagnostics, and stress tests.

9.3 If GC is selected, implement a concurrent regional design with explicit

barrier, metadata, pointer, and target support policy.

9.4 Add compatibility boundaries for ABI v1 layouts, host buffers, closures,

actors, remote actors, Live values, and app runtime values.

9.5 Add pause, throughput, memory footprint, leak, and shutdown evidence.

Acceptance:

The memory semantics are documented and enforced by compiler/runtime gates.
Cycles, shared graphs, actors, host resources, and FFI have explicit policy.
Unsupported target combinations fail closed.
make memory-semantics-decision-check validates profile semantics,

retain/release insertion policy, RC reuse, region and arena release, optional

GC target policy, ABI and host boundaries, actor mailbox retention, stable

diagnostics, stress reports, performance budgets, compatibility rows, and

negative fixtures.

make native-memory-control-check runs make memory-semantics-decision-check

before the broader native memory-control evidence.

Milestone 10: Copy-And-Patch Baseline JIT

Status: open.

Goal: add a fast hosted JIT tier only where 0x0 runtime use cases need immediate

native code generation.

Submilestones:

10.1 Identify hosted runtime surfaces that require JIT behavior and prove the

need with latency or throughput evidence.

10.2 Define stencil formats, patch slots, relocation kinds, labels,

safepoints, deopt metadata, executable-memory policy, and code-cache layout.

10.3 Generate stencils from the same operation definitions used by the

interpreter or VM-shaped runtime.

10.4 Implement W^X-safe code allocation, patching, finalization, icache flush

behavior, and code cache reclamation.

10.5 Add diagnostics for unsupported stencils, patch overflow, target

mismatch, executable-memory denial, and invalid branch targets.

10.6 Add differential tests against interpreter/VM behavior.

Acceptance:

JIT is opt-in and target-gated.
The baseline tier compiles quickly and records code size, compile latency,

runtime latency, and correctness evidence.

Unsupported targets remain fully functional without JIT.

Milestone 11: Validation, Sanitizers, And Miscompilation Defense

Status: open.

Goal: make aggressive optimization safe enough to ship.

Submilestones:

11.1 Add translation validation for dangerous IR-to-IR and LIR-to-LIR

rewrites.

11.2 Add sanitizer-backed lanes for bounds, use-after-free, leaks,

uninitialized reads, data races, undefined behavior, and aliasing

assumptions where the target supports them.

11.3 Add differential testing against interpreter, VM, direct ELF, native,

object/linker, and WASM-shaped paths where applicable.

11.4 Add optimization remarks with source spans, hotness, reason, and

expected cost impact.

11.5 Add rollback controls for passes that regress correctness, compile time,

memory, binary size, or runtime.

Acceptance:

Aggressive rewrites are not enabled without validation or an explicit

release-risk exception.

Miscompilation reports include minimized input, pass, IR level, target, and

validation result.

Sanitizer and differential lanes are explicit release inputs for optimized

builds.

Milestone 12: Benchmarking, Profiling, And Performance Budgets

Status: open.

Goal: make performance claims reproducible and tied to workloads users care

about.

Submilestones:

12.1 Define benchmark suites for frontend speed, mid-end query time, codegen

speed, link time, compiler RSS, binary size, startup time, runtime throughput,

app latency, and allocator behavior.

12.2 Add profiler intake for compiler-side and binary-side profiles.
12.3 Record cache hit ratios, query colors, invalidation fan-out, profile

quality, and pass timings.

12.4 Add performance dashboards and TSV/JSON release reports.
12.5 Add regression thresholds with override policy, owner signoff, and

release-note requirements.

Acceptance:

Every performance claim cites a workload, metric, baseline, target, date, and

command.

Regressions fail bounded checks unless they have accepted ADR/RFC exceptions.
Release artifacts include performance reports.

Milestone 13: ML-Guided Heuristics

Status: open.

Goal: use ML only for narrow, replaceable, measured decisions where hand-tuned

heuristics are weak.

Submilestones:

13.1 Choose first pilot decisions: inlining for size/speed, register

allocation eviction, branch probability fallback, or layout policy tuning.

13.2 Define feature schemas from IR, MachineIR, hotness, code size, register

pressure, loop depth, and target features.

13.3 Implement an optional advisor interface with deterministic fallback to

hand-written heuristics.

13.4 Version and sign model artifacts; keep training offline and outside the

compiler binary.

13.5 Log decisions, confidence, fallback reason, compile-time cost, and

benchmark outcome.

Acceptance:

ML is optional and disabled by default unless a release explicitly enables it.
Missing or incompatible models fall back without changing source semantics.
A model can be rolled back independently of compiler source.

Milestone 14: Equality Saturation And Superoptimization Research Lane

Status: open.

Goal: use expensive search only for narrow, validated optimization discovery.

Submilestones:

14.1 Define first target domains: arithmetic/bit-manip kernels, late LIR

peepholes, vectorization candidates, or DSL fragments.

14.2 Add e-graph rewrite sets with budgets, extraction cost models, and

deterministic limits.

14.3 Validate candidates with translation validation, differential testing,

and benchmark checks before promotion.

14.4 Add LLM-guided proposal intake only as offline candidate generation.
14.5 Promote surviving rewrites into ordinary deterministic compiler rules

with ADR/RFC evidence.

Acceptance:

No superoptimization proposal ships without equivalence validation and

performance evidence.

Research output is separated from default compilation.
Failed proposals leave reproducible rejection evidence.

Milestone 15: Production Release Integration

Status: open.

Goal: make compiler and memory optimization a normal, auditable part of release

readiness.

Submilestones:

15.1 Add final release rows for incremental compiler, CAS, IR verifiers,

optimized AOT profile path, allocator, memory semantics, validation, and

performance budgets.

15.2 Update public support matrices with implemented, target-gated, deferred,

or research-only status.

15.3 Add release artifact manifests for cache schemas, profiles, order files,

allocator reports, validation reports, benchmark results, and rollback

controls.

15.4 Add operator/developer docs for when to use fast dev builds, optimized

AOT builds, profile-guided builds, post-link optimization, JIT, and research

lanes.

15.5 Run bounded source gates first, then explicit heavy release/performance

gates after all implementation milestones are done.

Acceptance:

A release cannot claim compiler/memory optimization closure without release

artifacts and public support status.

Heavy gates are explicit release operations, not default development checks.
Remaining target-specific limitations are documented in release notes.

Completion Signal

This roadmap is complete when:

default development builds are incremental, cached, bounded, and measurable;
release builds can consume profile data and layout evidence;
the compiler has verified multi-level IR boundaries;
allocator, region, and managed memory semantics are implemented and measured;
optional JIT, ML, and superoptimization lanes are gated, validated, and

non-default unless a release enables them;

performance reports are reproducible and release-owned;
public docs distinguish implemented, target-gated, deferred, and

research-only capabilities.