Production Compiler Pipeline
This is the phase contract for compiler/main.0x0. It complements
docs/compiler.html, which describes the user-visible backend surface.
Phase Graph
source files
-> import resolution
-> lexing with compact spans
-> parsing
-> semantic validation
-> type/effect checks
-> backend lowering
-> measure-only layout
-> emission
-> packaging and hashing
The graph is intentionally linear for the current compiler. Later object,
archive, linker, and optimizer milestones may add subgraphs, but they must keep
the same ownership rule: a completed phase owns a compact output and releases
scratch data.
Retained Outputs
Import resolution retains normalized module paths, module names, and exported
function names. It must not retain duplicate copies of imported source text once
the parser has produced the module forms needed by validation.
The file compiler entry uses load-file-forms-with-root: the root file is read
and parsed once, root forms are retained separately for module-name, and the
same parse contributes to the full import-expanded form list. Reintroducing a
separate (parse (read-file path)) in compile-file is a memory regression.
Lexing retains token kind, token payload, line, and column. Token kind names are
part of the interned vocabulary: token records store integer kind IDs and
payloads plus one packed numeric span, while diagnostics expand IDs and spans
through accessors. Spans are not separate general-purpose list records.
Parsing retains AST node kind, node payload, line, and column. Later backends
may read kind and payload directly, while diagnostics read the compact span
field. AST nodes also store integer kind IDs, and node-kind? compares those
IDs directly.
Validation retains function signatures, locals, inferred known types, effect
annotations, and capability decisions. It does not own backend code strings.
memory-summary-file reports the retained root form count, import-expanded
form count, import count, function count, AST node count, total symbol node
count, and unique symbol payload count for this front-end boundary.
Lowering retains the backend-specific body representation required by layout.
Per-function lowering is preferred; keeping every function's final byte text
beside every function's temporary byte text is a regression.
Layout retains sizes, offsets, labels, relocation inputs, string-pool entries,
and ABI notes. Layout must be able to run without building the final output
bytes.
Emission retains only the active output buffer or chunk writer state plus
metadata needed to finish the file. Large outputs are streamed or chunked.
Packaging retains artifact paths and hashes. It must not read a release binary
into memory only to hash it when a streaming hash command is available.
Release Gate
The release compiler path is the first production memory gate:
make memory-check
The narrow source-level check is:
make memory-architecture-check
make memory-summary-check
It is intended for submilestone work where running the full release compiler
chain would be too expensive.
The gate measures:
v0.1.0 -> build/memory/zero-next;build/memory/zero-next -> build/memory/stage2;build/memory/stage2 -> build/memory/stage3;- a symbol/string-heavy OISA compile through the released OISA compiler.
The gate also verifies stage2 == stage3 and records hashes for the measured
outputs.
Backend Notes
The C backend remains compatibility infrastructure in compiler/compat-main.0x0.
It may use host allocation, but it cannot define the normal memory story for
compiler production and is not emitted into the normal compiler artifact.
The direct ELF backend is the normal compiler production backend after v0.1.0.
Its current arena allocation for list values is part of the production path.
Future object and linker paths must use binary buffers and relocatable records
instead of routing production through generated C, generated assembler text, or
one large hex string.