slovo/docs/papers/GLAGOL_WHITEPAPER.md
2026-05-22 08:38:43 +02:00

390 lines
19 KiB
Markdown

# Glagol: A Manifest-First Compiler Architecture for Slovo
<p class="title-spelling glagolitic">ⰃⰎⰀⰃⰑⰎ</p>
Sanjin Gumbarevic<br>
hermeticum_lab@protonmail.com
Publication release: `1.0.0-beta`
Technical behavior baseline: compiler and language support through
`1.0.0-beta`
Date: 2026-05-21
Evidence source: paired local Slovo/Glagol monorepo verification and benchmark
reruns from a local checkout
Maturity: beta
## Abstract
Glagol (<span class="glagolitic">ⰃⰎⰀⰃⰑⰎ</span>) is the first compiler for
Slovo. It exists to make the language support boundary inspectable: tokens,
S-expression tree, AST, typed AST, LLVM IR, hosted native executable, tests,
diagnostics, and release documents should agree.
The current publication release, `1.0.0-beta`, records the first real
general-purpose beta toolchain for Slovo. It includes the completed `u32` /
`u64` unsigned compiler and stdlib breadth scope alongside the current
nine-kernel benchmark suite. This paper records the current beta
implementation surface, the benchmark method and results, the distinction
between Glagol and Lisp-family implementations, and the compiler path from
beta to stable.
## 1. Compiler Thesis
The name Glagol is rendered in Glagolitic as
<span class="glagolitic">ⰃⰎⰀⰃⰑⰎ</span>. The publication pipeline embeds a
Glagolitic-capable font so this identity marker survives PDF rendering.
Glagol's compiler motto is:
```text
make the tree visible
```
The current pipeline is:
```text
.slo source
-> tokens
-> S-expression tree
-> AST
-> typed AST
-> LLVM IR text
-> Clang + runtime/runtime.c
-> native executable
```
The engineering point is not only native output. It is traceability. Source
structure, types, spans, diagnostics, formatter behavior, and generated code
should stay connected enough that a support claim can be audited.
## 2. Relationship To Lisp Implementations
Glagol compiles a Lisp-shaped language, but it is not a Lisp implementation in
the usual technical sense.
Common Lisp and Scheme implementations typically center a runtime evaluation
model, symbolic data, macro expansion, and language-defined execution
semantics. Clojure centers hosted execution on the JVM, namespaces, immutable
persistent data structures, dynamic vars, and runtime sequence abstractions.
Glagol instead centers:
- manifest-first language contracts
- explicit AST and typed AST stages before backend emission
- static checking before native code generation
- canonical formatting and structured diagnostics as release artifacts
- explicit `option` and `result` flow instead of exception-driven ordinary
failure
- lexical `unsafe` as the reserved low-level boundary
- hosted native executables through LLVM IR and Clang
- release gates that separate supported, compatibility, formatter-only, and
speculative examples
The parenthesized syntax is therefore a structural source format, not evidence
that Glagol is a macro-first Lisp VM or a generic list runtime.
## 3. Current Implementation Surface
At the current technical behavior beta baseline, Glagol supports:
- `check`, `fmt`, `fmt --check`, `fmt --write`, `test`, `build`, and `doc`
- JSON diagnostics, textual artifact manifests, and lowering inspection
- hosted native executable generation through emitted LLVM IR, host
`clang -O2`, and `runtime/runtime.c`
- flat local module projects, explicit import/export lists, local packages,
and workspace membership
- installed `share/slovo/std` discovery and ordered `SLOVO_STD_PATH` search
- direct scalar types `i32`, `i64`, `u32`, `u64`, finite `f64`, `bool`,
immutable `string`, and internal `unit`
- functions, top-level tests, immutable locals, current mutable whole-value
locals, `if`, and `while`
- current direct enum payload families, current known struct field families,
concrete option/result families, fixed immutable arrays over direct scalars
and `string`, and concrete runtime-owned vector families over `i32`, `i64`,
`f64`, `bool`, and `string`
- compiler-known standard-runtime calls through the promoted catalog plus
staged source-authored `std/*.slo` gates
- scalar C FFI imports
- benchmark scaffolds for Slovo, C, Rust, Python, Clojure, and Common
Lisp/SBCL, with `cold-process` and `hot-loop` timing modes
The current release, `1.0.0-beta`, is the first release that may honestly use
beta maturity language for this toolchain.
## 4. Diagnostics And Support Discipline
Glagol's quality boundary is not "the parser accepted a form." The required
support path is:
1. parse the source
2. lower to AST with spans
3. type-check names and value flow
4. reject unsupported forms before backend panic
5. emit LLVM only from checked representation
6. cover behavior or diagnostics with tests
7. update release docs and fixtures together
This matters because Slovo syntax is intentionally regular. A permissive parser
can make unsupported forms look almost supported. Glagol therefore treats
backend panics, invalid LLVM from user source, and stale docs that overclaim
support as release-blocking defects.
## 5. Runtime And Standard Library Strategy
Glagol currently exposes two related but distinct library surfaces:
- compiler-known standard-runtime calls such as `std.io.print_i32`,
`std.string.len`, selected parse/format/conversion calls, host IO,
process/environment/file helpers, randomness, time, and stdin
- source-authored beta modules in `lib/std/*.slo`, loaded through explicit
imports, installed std discovery, checkout discovery, or `SLOVO_STD_PATH`
This split is deliberate. It lets library design move forward without claiming
that the final stable import, compatibility, or package story already exists.
Source-authored modules are useful now because they exercise language design,
fixtures, and examples. They are beta explicit-import APIs, but not yet a frozen
stable `1.0` standard library.
## 6. Benchmark Method
The benchmark suite measures local-machine behavior only. It is a regression
and comparison harness, not a public performance claim.
Environment:
| Field | Value |
| --- | --- |
| Host | `Linux 6.17.10-100.fc41.x86_64 x86_64 GNU/Linux` |
| Glagol | `glagol 1.0.0-beta` |
| Python | `Python 3.13.9` |
| C compiler | `clang version 19.1.7 (Fedora 19.1.7-5.fc41)` |
| Rust | `rustc 1.77.2 (25ef9e3d8 2024-04-09)` |
| Clojure | `1.11.2` |
| Common Lisp | `SBCL 2.5.9-1.fc41` |
Build and runtime paths compared:
| Implementation | Build/runtime path |
| --- | --- |
| Slovo | `glagol build <benchmark>` -> generated LLVM -> host `clang -O2` linking `runtime/runtime.c` |
| C | `clang -O2 -std=c11` on the local scaffold |
| Rust | `rustc -C opt-level=3 -C debuginfo=0` on the local scaffold |
| Python | `python3` running the local scaffold |
| Clojure | `clojure` running the local scaffold; timings include JVM and Clojure startup |
| Common Lisp | `sbcl --script` running the local scaffold; timings include SBCL startup |
Timing semantics:
- The runner builds each implementation once before timing. The reported
numbers are execution timings, not compile-time timings.
- `cold-process` launches a fresh process per sample with the base loop count.
It measures process startup plus one benchmark run.
- `hot-loop` also launches a fresh process per sample, but with the amplified
loop count `10000000`; the reported normalized median divides the timed total
by `10` to compare with the base `1000000` loop count.
Benchmark kernels:
- `math-loop`: scalar arithmetic accumulation
- `branch-loop`: scalar branching and accumulation
- `parse-loop`: repeated decimal parsing with checksum validation
- `array-index-loop`: checked fixed-array indexing and scalar accumulation
- `string-eq-loop`: exact string content equality reduced to an `i32` checksum
- `array-struct-field-loop`: immutable struct-field access over a fixed `i32`
array plus scalar accumulation
- `enum-struct-payload-loop`: repeated enum `match` payload extraction over an
immutable struct payload carrying a fixed `i32` array
- `vec-i32-index-loop`: runtime-owned `i32` vector indexing and scalar
accumulation
- `vec-string-eq-loop`: runtime-owned string vector indexing plus exact string
equality reduced to an `i32` checksum
Comparison boundaries:
- `math-loop` and `branch-loop` compare structurally similar loop bodies across
all implementations.
- `parse-loop` keeps the same input text and checksum, but not the same parser
implementation. Slovo uses `std.string.parse_i32_result`, C uses `strtol`,
Rust uses `text.parse::<i32>()`, Python uses `int`, Clojure uses
`Integer/parseInt`, and Common Lisp uses `parse-integer`.
- `array-index-loop` keeps the same eight-element integer corpus and `% 8`
dynamic index-selection pattern across all implementations. It stays on
immutable fixed-array indexing and scalar accumulation only.
- `string-eq-loop` keeps the same five-word ASCII corpus and runtime-supplied
target string across all implementations. It measures exact content equality
only. It does not compare regex engines, normalization, locale handling, or
pointer identity.
- `array-struct-field-loop` keeps the same eight-element integer corpus and
`% 8` dynamic index-selection pattern, but moves the array through one
immutable struct field. It is a narrow benchmark for the promoted
`exp-120` direct struct-field lane, not a broad claim about every struct
layout.
- `enum-struct-payload-loop` keeps the same eight-element integer corpus inside
an immutable struct payload, matches one enum value on every iteration, and
indexes the bound struct field. It is a narrow benchmark for the promoted
`exp-121` struct-payload enum lane, not a broad tagged-union or ADT claim.
- `vec-i32-index-loop` keeps the same eight-element integer corpus and `% 8`
dynamic index-selection pattern as `array-index-loop`, but routes that
access through the promoted runtime-owned `(vec i32)` lane instead of fixed
arrays.
- `vec-string-eq-loop` keeps the same five-word ASCII corpus and
runtime-supplied target as `string-eq-loop`, but routes selection through
the promoted runtime-owned `(vec string)` lane instead of fixed arrays.
- Because Rust is timed at `opt-level=3` while Slovo and C are timed through
`clang -O2`, the suite is a useful local regression/comparison harness, not a
strict same-flags compiler shootout.
Hot-loop commands:
```bash
python3 benchmarks/math-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/branch-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/parse-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/array-index-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/string-eq-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/array-struct-field-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/enum-struct-payload-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/vec-i32-index-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/vec-string-eq-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
```
Cold-process commands:
```bash
python3 benchmarks/math-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/branch-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/parse-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/array-index-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/string-eq-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/array-struct-field-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/enum-struct-payload-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/vec-i32-index-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/vec-string-eq-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
```
## 7. Benchmark Results
The exp-123 publication baseline widens the paired same-machine result set
from seven rows to nine by adding two owned-vector kernels:
- `vec-i32-index-loop`
- `vec-string-eq-loop`
Hot-loop normalized median time, in milliseconds per one million iterations:
| Benchmark | Slovo | C | Rust | Python | Clojure | Common Lisp/SBCL |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
| math-loop | 1.121 | 1.121 | 1.138 | 111.092 | 241.220 | 1.753 |
| branch-loop | 2.014 | 2.012 | 2.032 | 114.016 | 241.469 | 4.624 |
| parse-loop | 6.456 | 16.233 | 7.169 | 134.465 | 264.669 | 20.108 |
| array-index-loop | 1.103 | 1.109 | 1.128 | 96.649 | 298.388 | 3.379 |
| string-eq-loop | 4.332 | 4.092 | 2.279 | 120.453 | 288.617 | 11.128 |
| array-struct-field-loop | 1.139 | 1.116 | 1.129 | 110.854 | 277.466 | 3.663 |
| enum-struct-payload-loop | 4.304 | 1.512 | 1.880 | 302.252 | 310.066 | 5.297 |
| vec-i32-index-loop | 1.328 | 1.103 | 1.131 | 111.153 | 272.914 | 2.231 |
| vec-string-eq-loop | 5.210 | 4.122 | 3.471 | 122.826 | 302.817 | 10.431 |
Cold-process median time, in milliseconds per benchmark run:
| Benchmark | Slovo | C | Rust | Python | Clojure | Common Lisp/SBCL |
| --- | ---: | ---: | ---: | ---: | ---: | ---: |
| math-loop | 1.625 | 1.675 | 1.765 | 121.014 | 2808.812 | 9.435 |
| branch-loop | 2.563 | 2.517 | 2.682 | 130.790 | 2674.146 | 15.027 |
| parse-loop | 6.942 | 16.749 | 7.857 | 149.594 | 2835.421 | 27.750 |
| array-index-loop | 1.599 | 1.606 | 1.807 | 107.150 | 2812.157 | 17.589 |
| string-eq-loop | 4.826 | 4.756 | 2.938 | 135.748 | 2892.359 | 21.504 |
| array-struct-field-loop | 1.670 | 1.612 | 1.837 | 115.371 | 2823.026 | 13.411 |
| enum-struct-payload-loop | 4.934 | 2.000 | 1.783 | 291.850 | 2516.815 | 27.555 |
| vec-i32-index-loop | 3.047 | 2.851 | 1.776 | 112.950 | 2911.603 | 10.017 |
| vec-string-eq-loop | 5.427 | 4.575 | 4.081 | 134.914 | 2567.482 | 18.950 |
Compiler interpretation:
- The current hosted build path keeps Slovo essentially on the local native
baseline for the scalar and fixed-array kernels. In hot-loop mode,
`math-loop`, `branch-loop`, and `array-index-loop` all land very close to the
C scaffold and within a narrow distance of the Rust scaffold.
- `parse-loop` is now more than a backend-loop benchmark. It compares end-to-end
parser and runtime choices. On this machine, the current Slovo decimal parse
path outperforms the C scaffold built around `strtol` and stays close to the
Rust scaffold.
- `string-eq-loop` exposes a different boundary: exact content equality is
clearly efficient enough for native-code use, but the current Slovo runtime
path is still behind the Rust scaffold and slightly behind the C scaffold on
this machine.
- `vec-i32-index-loop` shows the cost of routing the same integer corpus
through the promoted owned-vector lane instead of fixed arrays. On this
machine the Slovo lane remains practical native code, but it is visibly more
expensive than the fixed-array kernel.
- `vec-string-eq-loop` shows the same tradeoff for owned string vectors. It
stays in the same broad range as the fixed-array string kernel, but it is a
more allocation- and indirection-heavy path than direct fixed-array access.
- `array-struct-field-loop` stays close to the direct fixed-array kernel. On
this machine, routing the same `% 8` indexing pattern through one immutable
struct field keeps Slovo, C, and Rust tightly grouped in hot-loop mode.
- `enum-struct-payload-loop` exposes a current composite-data boundary. The
Slovo lane remains practical native code, but repeated struct-payload enum
matching is still materially slower than the C and Rust scaffolds on this
machine.
- Cold-process timings show native executable startup plus one benchmark run.
They are not compile-time numbers and are more sensitive to launcher/runtime
initialization effects than hot-loop mode.
- Clojure is dramatically slower in this process-per-run harness because each
sample includes JVM and hosted runtime startup, and the benchmark bodies stay
on high-level runtime paths. The effect is still strongest in the more
allocation- and dispatch-heavy composite kernels.
- Common Lisp/SBCL remains much closer to native baselines than Clojure in the
same harness. That is why both Lisp-family comparison points are useful.
## 8. Current Technical Risks
The main risks in beta are not syntax parsing. They are engineering
coverage and compatibility:
- source forms reaching backend paths without clear diagnostics
- standard-library source helpers drifting from compiler-known runtime calls
- feature claims appearing in docs before fixtures and tests exist
- collection and ADT breadth growing faster than the compatibility story
- benchmark breadth growing faster than the language contract can stabilize
- benchmark numbers being misread as public thresholds or cross-machine claims
- package behavior becoming stable before dependency, manifest, and versioning
rules are precise
## 9. Path Beyond `1.0.0-beta`
Glagol now implements the first real beta Slovo contract and passes the
required beta workflow proof plus release gate. The remaining path is from
beta to stable.
Recommended compiler sequence:
1. Complete the next blocked post-beta language-breadth slices from the Slovo
roadmap without regressing the beta baseline.
2. Broaden runtime-owned strings, collections, and composite value flow
without exposing unstable ABI details as stable contracts.
3. Refine `f32` policy, additional integer families, explicit conversion
behavior, and remaining library/runtime gaps.
4. Harden package, workspace, and standard-library import/search behavior into
a compatibility-governed stable toolchain story.
5. Strengthen diagnostics, generated docs, conformance fixtures, and release
gates as first-class compiler interfaces.
6. Keep benchmark publication local and repeatable while deferring public
performance claims until methodology is stronger.
7. Freeze formatter output, diagnostics schema, package behavior, stdlib
compatibility, migration policy, and toolchain contracts for `1.0.0`.
## 10. Conclusion
Glagol has moved Slovo from a manifesto into a working beta native compiler
track. The important result is not only that programs compile. It is that the
support boundary is visible enough to review: source contracts, diagnostics,
tests, lowering, benchmarks, and publication artifacts can be kept in sync.
The compiler is now useful enough for ordinary local tools and libraries
within the documented beta contract. The path forward remains disciplined
breadth and compatibility hardening, not unsupported feature claims.