0x0LearnReferenceLibrariesMigration0x0.jmp0x1b.com

Production Compiler Pipeline

This is the phase contract for compiler/main.0x0. It complements

docs/compiler.html, which describes the user-visible backend surface.

Phase Graph


source files
  -> import resolution
  -> lexing with compact spans
  -> parsing
  -> semantic validation
  -> type/effect checks
  -> backend lowering
  -> measure-only layout
  -> emission
  -> packaging and hashing

The graph is intentionally linear for the current compiler. Later object,

archive, linker, and optimizer milestones may add subgraphs, but they must keep

the same ownership rule: a completed phase owns a compact output and releases

scratch data.

Retained Outputs

Import resolution retains normalized module paths, module names, and exported

function names. It must not retain duplicate copies of imported source text once

the parser has produced the module forms needed by validation.

The file compiler entry uses load-file-forms-with-root: the root file is read

and parsed once, root forms are retained separately for module-name, and the

same parse contributes to the full import-expanded form list. Reintroducing a

separate (parse (read-file path)) in compile-file is a memory regression.

Lexing retains token kind, token payload, line, and column. Token kind names are

part of the interned vocabulary: token records store integer kind IDs and

payloads plus one packed numeric span, while diagnostics expand IDs and spans

through accessors. Spans are not separate general-purpose list records.

Parsing retains AST node kind, node payload, line, and column. Later backends

may read kind and payload directly, while diagnostics read the compact span

field. AST nodes also store integer kind IDs, and node-kind? compares those

IDs directly.

Validation retains function signatures, locals, inferred known types, effect

annotations, and capability decisions. It does not own backend code strings.

memory-summary-file reports the retained root form count, import-expanded

form count, import count, function count, AST node count, total symbol node

count, and unique symbol payload count for this front-end boundary.

Lowering retains the backend-specific body representation required by layout.

Per-function lowering is preferred; keeping every function's final byte text

beside every function's temporary byte text is a regression.

Layout retains sizes, offsets, labels, relocation inputs, string-pool entries,

and ABI notes. Layout must be able to run without building the final output

bytes.

Emission retains only the active output buffer or chunk writer state plus

metadata needed to finish the file. Large outputs are streamed or chunked.

Packaging retains artifact paths and hashes. It must not read a release binary

into memory only to hash it when a streaming hash command is available.

Release Gate

The release compiler path is the first production memory gate:


make memory-check

The narrow source-level check is:


make memory-architecture-check
make memory-summary-check

It is intended for submilestone work where running the full release compiler

chain would be too expensive.

The gate measures:

The gate also verifies stage2 == stage3 and records hashes for the measured

outputs.

Backend Notes

The C backend remains compatibility infrastructure in compiler/compat-main.0x0.

It may use host allocation, but it cannot define the normal memory story for

compiler production and is not emitted into the normal compiler artifact.

The direct ELF backend is the normal compiler production backend after v0.1.0.

Its current arena allocation for list values is part of the production path.

Future object and linker paths must use binary buffers and relocatable records

instead of routing production through generated C, generated assembler text, or

one large hex string.