Compiler And Backend Guide
compiler/main.0x0 is the canonical compiler implementation. The seed in
seed/zero.s may run this source and its emitted OISA, but it must not duplicate
the compiler pipeline.
Pipeline
The production phase ownership contract lives in docs/compiler-pipeline.html.
The production memory contract and budgets live in docs/compiler-memory.html.
This guide describes the externally visible compiler entry points and backend
behavior.
The compiler has four major responsibilities:
1. Parse .0x0 source into a small tagged AST represented with lists.
2. Validate the current semantic slice before any backend emits output.
3. Emit OISA for the self-hosting compiler path.
4. Emit C for the compatibility native path.
5. Emit GAS x86-64 assembly for linkable object files in the current integer
subset.
6. Emit ELF64/x86-64 bytes for direct native executables and the self-hosted ELF
compiler artifact.
The public OISA entry points are:
compile-file
compile-module
compile-file loads (↥ "path.0x0") imports before validation and emission.
Relative import paths are resolved from the importing file's directory.
Transitive imports share a threaded seen set, so common dependencies are loaded
once across sibling imports. Import forms may also include an alias:
(↥ "path.0x0" alias)
Aliased imports expose imported functions as alias.name. Imported modules may
declare exported functions with (↦ name another-name). Without ↦, all
functions are exported for aliased imports.
Import paths beginning with pkg: resolve through 0x0.lock local dependency
entries before loading. For example, (↥ "pkg:core-map") resolves the
dep core-map ... lockfile line and loads that path.
compile-module compiles an in-memory source string only; it does not resolve
imports because there is no source path to use as an import root.
The native compatibility entry points live in compiler/compat-main.0x0 and
are used only to build bin/zero-native for compatibility checks and seed
recovery:
compile-native-compiler-file
compile-native-program-file
The legacy linkable object assembly entry point also lives in
compiler/compat-main.0x0 until the production object-file milestone replaces
it with direct relocatable object output:
compile-object-asm-program-file
The direct ELF entry points are:
compile-elf-program-file
compile-elf-compiler-file
AST Representation
Tokens and AST nodes keep kind and value in their first two fields, with
source line and column stored after them:
(list kind value line column)
Token kinds include lp, rp, sym, str, and int. AST node kinds include
list, sym, str, and int.
This representation is intentionally simple because it must work in the
bootstrap evaluator, generated OISA, generated C runtime, and direct ELF runtime.
Emitters continue to read only kind and value; diagnostics can read the
compiler-owned line and column metadata.
The lexer/parser now reports source locations for parser-owned failures such as
unexpected closing parentheses, missing closing parentheses, trailing tokens, and
unterminated strings.
OISA Backend
The OISA emitter preserves source expression structure while normalizing module
and function forms:
(oisa-module <name>
(func <name> (params ...) (body ...)))
Type and documentation annotations are skipped during emission. Type annotations
are checked by the semantic validation pass before emission; documentation
annotations remain source metadata.
File imports are resolved before OISA emission. Unaliased imported functions are
appended to the same function set used by validation and emission. Aliased
imports are appended with qualified alias.name function names, while the root
file's module declaration remains the emitted OISA module name.
Semantic Validation
The validation pass is shared by the OISA, C, and direct ELF entry points. It
currently enforces:
- valid module, import, export, top-level
doc, and function shapes; - exported-name checks for aliased imports;
- duplicate function and duplicate parameter rejection;
- type annotation shape and concrete annotation arity;
- local symbol resolution through parameters and sequential
≔bindings; - call resolution and arity for user functions;
- builtin arity for the backend-supported builtin surface;
- conservative
I64/Text/Bool/Unitargument and return checks when both
sides are inferable.
- known-type
ifcondition checks while preservingAnyas the dynamic escape
hatch.
- builtin-specific known-type checks for boolean
notand same-concrete-type
=/!= operands while preserving Any as the dynamic escape hatch.
- first capability checks:
(cap pure)rejects calls toread-stdin,argv,
env, read-file, write-file, print, panic, and functions that are not
pure by annotation or default.
The checker intentionally returns Any for complex or not-yet-modeled type
cases instead of claiming full type inference. A future type slice should
replace that escape hatch with a complete type environment and source-span
diagnostics. The current accepted capability names are pure, io, file,
network, and process; lib/core/file.0x0 uses file for safe path-checked
file wrappers. Process metadata is available through argv and env, while
socket and subprocess runtime support remain future slices.
C Compatibility Backend
The C backend emits a complete C program as text from
compiler/compat-main.0x0. It exists for compatibility, cross-checking, seed
recovery, and native user-program execution through cc.
It is not the seed and it is not allowed to own language semantics. If source
syntax or compiler lowering changes, the authoritative production
implementation remains in compiler/main.0x0. Normal compiler production after
v0.1.0 uses the released ELF compiler builder; the C backend is optional
compatibility infrastructure and is not emitted into the normal compiler
artifact.
Linkable Object Backend
The legacy object backend in compiler/compat-main.0x0 emits GAS x86-64
assembly, then the generated native compiler can assemble it into a real .o
with cc -c. The object is linkable by the platform linker and currently uses
libc printf for integer result printing. The production object milestone will
replace this compatibility path with direct relocatable object output in
compiler/main.0x0.
The current object slice supports:
- integer literals and integer results;
Unit,true, andfalseas integer-compatible values;- integer arithmetic, comparisons, and
not; if,do, local bindings, function calls, recursion, and up to six
arguments;
- imported functions loaded through existing
(↥ "...")forms.
Unsupported dynamic/text/list/file constructs fail at object-backend compile
time. This keeps the object slice honest while the direct ELF backend remains
the larger native runtime target.
Commands:
./bin/zero-native asm examples/add.0x0 --out build/native/add.obj.s
./bin/zero-native build-obj examples/add.0x0 -o build/native/add.obj.o
cc build/native/add.obj.o -o build/native/add.obj
./build/native/add.obj
The convenience command assembles and links in one step:
./bin/zero-native build-linked examples/add.0x0 -o build/native/add.linked
Direct ELF Backend
The ELF backend emits a Linux ELF64/x86-64 executable directly as hex text. The
wrapper turns that hex into bytes and marks the result executable.
The backend currently owns:
- Function layout and absolute intra-image calls.
mainprogram startup and compiler-artifact startup.- Calls with up to six value/tag-preserving arguments.
- Integer arithmetic, comparisons,
if, local bindings, recursion, and
sequencing.
- Inline UTF-8 strings for the compiler syntax currently used.
- Text length, equality, concatenation, indexing, slicing, classification, and
signed integer conversion for the supported slice.
- Mmap-backed cons nodes for list operations used by the compiler.
- Linux syscall stdin read, file read/write, callable stdout
print, and
integer/text result printing.
Runtime values in the ELF backend are passed as:
rax = value payload
r15 = tag
The current tag convention is:
0 = nil / Unit
1 = integer or boolean payload
2 = NUL-terminated text pointer
3 = cons/list pointer
Cons nodes are 24 bytes:
offset 0 = car payload
offset 8 = car tag
offset 16 = cdr payload
This layout is deliberately small, but it is now an ABI. Changes to it must
update this guide, the source comments near the backend, and the self-host gates.
Self-Host Gates
The normal self-host gate is:
compiler/main.0x0 --seed--> build/stage1.oisa
build/stage1.oisa --vm----> build/stage2.oisa
build/stage2.oisa --vm----> build/stage3.oisa
cmp build/stage2.oisa build/stage3.oisa
The full self-host gate also proves that the direct ELF OISA compiler artifact
can compile the compiler to the same OISA. The artifact is emitted by the
generated native helper after the seed path proves OISA self-hosting:
compiler/main.0x0 --zero-native compiler-elf--> build/native/zero-oisa-compiler
build/native/zero-oisa-compiler compiler/main.0x0 build/native/zero-elf-stage.oisa
cmp build/stage2.oisa build/native/zero-elf-stage.oisa
The trusted release compiler builder emits the next compiler executable:
build/release/v0.1.0/bin/zero-elf-compiler compiler/main.0x0 build/zero-next
./build/zero-next compiler/main.0x0 build/stage2
./build/stage2 compiler/main.0x0 build/stage3
cmp build/stage2 build/stage3
That path is the normal succession path after v0.1.0. The seed remains only for
make bootstrap-from-seed and make verify-seed-bootstrap.
Every repository change must pass the enforced guard before commit:
make selfhost-guard
That guard runs a clean normal self-host, the full ELF compiler self-host, and
the smoke suite so language, runtime, editor, package, and documentation changes
cannot silently drift away from the self-host chain.
Memory-sensitive compiler changes must also pass:
make memory-check
That gate samples peak RSS for the released compiler chain, verifies
stage2 == stage3, and writes the memory report included in release metadata.
Backend Change Checklist
A backend change is not complete until:
docs/language.htmlreflects any source-level behavior.- This guide reflects any pipeline, ABI, runtime, or bootstrap behavior.
compiler/main.0x0has source comments for non-obvious invariants.- An example or test exercises the new behavior.
- Object-backend changes include
build-objorbuild-linkedcoverage when
they touch linkable native output.
make docs-checkpasses.make editor-checkpasses when editor-facing behavior changes.make selfhost-guardpasses.
Do not rely on an untested backend promise. If the direct ELF backend cannot run
the slice, document the exact gap as a runtime maturity issue instead of hiding it
behind the C compatibility path.