Hermeticum/slovo

Fork 0

sanjin be6cdfb87c Release 1.0.0-beta.7 serialization foundation

2026-05-22 18:07:24 +02:00

22 KiB

Raw Blame History

Glagol: A Manifest-First Compiler Architecture for Slovo

ⰃⰎⰀⰃⰑⰎ

Sanjin Gumbarevic
hermeticum_lab@protonmail.com

Publication release: 1.0.0-beta.7

Technical behavior baseline: compiler and language support through 1.0.0-beta; tooling and install workflow through 1.0.0-beta.1; runtime/resource foundation through 1.0.0-beta.2; standard-library stabilization through 1.0.0-beta.3; language-usability diagnostics through 1.0.0-beta.4; package/workspace discipline through 1.0.0-beta.5; loopback networking foundation through 1.0.0-beta.6; serialization/data-interchange foundation through 1.0.0-beta.7

Date: 2026-05-22

Evidence source: paired local Slovo/Glagol monorepo verification and benchmark reruns from a local checkout; beta.7 release-gate verification from the public monorepo

Maturity: beta

Abstract

Glagol (ⰃⰎⰀⰃⰑⰎ) is the first compiler for Slovo. It exists to make the language support boundary inspectable: tokens, S-expression tree, AST, typed AST, LLVM IR, hosted native executable, tests, diagnostics, and release documents should agree.

The current publication release, 1.0.0-beta.7, keeps the first real general-purpose beta toolchain baseline from 1.0.0-beta and records the first post-beta tooling/install hardening update plus the first runtime/resource foundation update plus the first standard-library stabilization update plus the first language-usability diagnostics update and the first local package/workspace discipline update plus the first loopback networking foundation update plus the first serialization/data-interchange foundation update. The beta baseline includes the completed u32 / u64 unsigned compiler and stdlib breadth scope, the narrow std.net loopback TCP runtime family, the narrow std.json.quote_string runtime family, and the current ten-scaffold benchmark suite. This paper records the current beta implementation surface, the benchmark method and results, the distinction between Glagol and Lisp-family implementations, the beta.1 tooling update, the beta.2 runtime/resource foundation, the beta.3 standard-library stabilization slice, the beta.4 diagnostics usability slice, the beta.5 package discipline slice, the beta.6 networking foundation slice, the beta.7 serialization foundation slice, and the compiler path from beta to stable.

1. Compiler Thesis

The name Glagol is rendered in Glagolitic as ⰃⰎⰀⰃⰑⰎ. The publication pipeline embeds a Glagolitic-capable font so this identity marker survives PDF rendering.

Glagol's compiler motto is:

make the tree visible

The current pipeline is:

.slo source
-> tokens
-> S-expression tree
-> AST
-> typed AST
-> LLVM IR text
-> Clang + runtime/runtime.c
-> native executable

The engineering point is not only native output. It is traceability. Source structure, types, spans, diagnostics, formatter behavior, and generated code should stay connected enough that a support claim can be audited.

2. Relationship To Lisp Implementations

Glagol compiles a Lisp-shaped language, but it is not a Lisp implementation in the usual technical sense.

Common Lisp and Scheme implementations typically center a runtime evaluation model, symbolic data, macro expansion, and language-defined execution semantics. Clojure centers hosted execution on the JVM, namespaces, immutable persistent data structures, dynamic vars, and runtime sequence abstractions.

Glagol instead centers:

manifest-first language contracts
explicit AST and typed AST stages before backend emission
static checking before native code generation
canonical formatting and structured diagnostics as release artifacts
explicit option and result flow instead of exception-driven ordinary failure
lexical unsafe as the reserved low-level boundary
hosted native executables through LLVM IR and Clang
release gates that separate supported, compatibility, formatter-only, and speculative examples

The parenthesized syntax is therefore a structural source format, not evidence that Glagol is a macro-first Lisp VM or a generic list runtime.

3. Current Implementation Surface

At the current technical behavior beta baseline, Glagol supports:

check, fmt, fmt --check, fmt --write, test, build, and doc
run for build-and-execute workflows, clean for generated build artifacts, and new --template binary|library|workspace
JSON diagnostics, textual artifact manifests, and lowering inspection
hosted native executable generation through emitted LLVM IR, host clang -O2, and runtime/runtime.c
flat local module projects, explicit import/export lists, local packages, and workspace membership
installed share/slovo/std discovery and ordered SLOVO_STD_PATH search
direct scalar types i32, i64, u32, u64, finite f64, bool, immutable string, and internal unit
functions, top-level tests, immutable locals, current mutable whole-value locals, if, and while
current direct enum payload families, current known struct field families, concrete option/result families, fixed immutable arrays over direct scalars and string, and concrete runtime-owned vector families over i32, i64, f64, bool, and string
compiler-known standard-runtime calls through the promoted catalog plus staged source-authored std/*.slo gates
compact JSON string literal construction through std.json.quote_string and the hosted __glagol_json_quote_string runtime helper
scalar C FFI imports
benchmark scaffolds for Slovo, C, Rust, Python, Clojure, and Common Lisp/SBCL, with cold-process and hot-loop timing modes

The current release, 1.0.0-beta.7, is a beta serialization/data-interchange foundation update on the first release line that may honestly use beta maturity language for this toolchain.

4. Diagnostics And Support Discipline

Glagol's quality boundary is not "the parser accepted a form." The required support path is:

parse the source
lower to AST with spans
type-check names and value flow
reject unsupported forms before backend panic
emit LLVM only from checked representation
cover behavior or diagnostics with tests
update release docs and fixtures together

This matters because Slovo syntax is intentionally regular. A permissive parser can make unsupported forms look almost supported. Glagol therefore treats backend panics, invalid LLVM from user source, and stale docs that overclaim support as release-blocking defects.

5. Runtime And Standard Library Strategy

Glagol currently exposes two related but distinct library surfaces:

compiler-known standard-runtime calls such as std.io.print_i32, std.string.len, selected parse/format/conversion calls, host IO, process/environment/file helpers, randomness, time, and stdin
source-authored beta modules in lib/std/*.slo, loaded through explicit imports, installed std discovery, checkout discovery, or SLOVO_STD_PATH

This split is deliberate. It lets library design move forward without claiming that the final stable import, compatibility, or package story already exists. Source-authored modules are useful now because they exercise language design, fixtures, and examples. They are beta explicit-import APIs, but not yet a frozen stable 1.0 standard library.

6. Benchmark Method

The benchmark suite measures local-machine behavior only. It is a regression and comparison harness, not a public performance claim.

Environment:

Field	Value
Host	`Linux 6.17.10-100.fc41.x86_64 x86_64 GNU/Linux`
Glagol	`glagol 1.0.0-beta` benchmark baseline
Python	`Python 3.13.9`
C compiler	`clang version 19.1.7 (Fedora 19.1.7-5.fc41)`
Rust	`rustc 1.77.2 (25ef9e3d8 2024-04-09)`
Clojure	`1.11.2`
Common Lisp	`SBCL 2.5.9-1.fc41`

Build and runtime paths compared:

Implementation	Build/runtime path
Slovo	`glagol build <benchmark>` -> generated LLVM -> host `clang -O2` linking `runtime/runtime.c`
C	`clang -O2 -std=c11` on the local scaffold
Rust	`rustc -C opt-level=3 -C debuginfo=0` on the local scaffold
Python	`python3` running the local scaffold
Clojure	`clojure` running the local scaffold; timings include JVM and Clojure startup
Common Lisp	`sbcl --script` running the local scaffold; timings include SBCL startup

Timing semantics:

The runner builds each implementation once before timing. The reported numbers are execution timings, not compile-time timings.
cold-process launches a fresh process per sample with the base loop count. It measures process startup plus one benchmark run.
hot-loop also launches a fresh process per sample, but with the amplified loop count 10000000; the reported normalized median divides the timed total by 10 to compare with the base 1000000 loop count.

Benchmark kernels:

math-loop: scalar arithmetic accumulation
branch-loop: scalar branching and accumulation
parse-loop: repeated decimal parsing with checksum validation
array-index-loop: checked fixed-array indexing and scalar accumulation
string-eq-loop: exact string content equality reduced to an i32 checksum
array-struct-field-loop: immutable struct-field access over a fixed i32 array plus scalar accumulation
enum-struct-payload-loop: repeated enum match payload extraction over an immutable struct payload carrying a fixed i32 array
vec-i32-index-loop: runtime-owned i32 vector indexing and scalar accumulation
vec-string-eq-loop: runtime-owned string vector indexing plus exact string equality reduced to an i32 checksum
json-quote-loop: compact JSON string quoting plus quoted-length checksum accumulation

Comparison boundaries:

math-loop and branch-loop compare structurally similar loop bodies across all implementations.
parse-loop keeps the same input text and checksum, but not the same parser implementation. Slovo uses std.string.parse_i32_result, C uses strtol, Rust uses text.parse::<i32>(), Python uses int, Clojure uses Integer/parseInt, and Common Lisp uses parse-integer.
array-index-loop keeps the same eight-element integer corpus and % 8 dynamic index-selection pattern across all implementations. It stays on immutable fixed-array indexing and scalar accumulation only.
string-eq-loop keeps the same five-word ASCII corpus and runtime-supplied target string across all implementations. It measures exact content equality only. It does not compare regex engines, normalization, locale handling, or pointer identity.
array-struct-field-loop keeps the same eight-element integer corpus and % 8 dynamic index-selection pattern, but moves the array through one immutable struct field. It is a narrow benchmark for the promoted exp-120 direct struct-field lane, not a broad claim about every struct layout.
enum-struct-payload-loop keeps the same eight-element integer corpus inside an immutable struct payload, matches one enum value on every iteration, and indexes the bound struct field. It is a narrow benchmark for the promoted exp-121 struct-payload enum lane, not a broad tagged-union or ADT claim.
vec-i32-index-loop keeps the same eight-element integer corpus and % 8 dynamic index-selection pattern as array-index-loop, but routes that access through the promoted runtime-owned (vec i32) lane instead of fixed arrays.
vec-string-eq-loop keeps the same five-word ASCII corpus and runtime-supplied target as string-eq-loop, but routes selection through the promoted runtime-owned (vec string) lane instead of fixed arrays.
json-quote-loop keeps one runtime-supplied ASCII string containing a quote and a backslash and measures compact JSON string quoting plus quoted-length checksum accumulation. It does not compare JSON parsing, maps, recursive JSON values, schema validation, or streaming encoders.
Because Rust is timed at opt-level=3 while Slovo and C are timed through clang -O2, the suite is a useful local regression/comparison harness, not a strict same-flags compiler shootout.

Hot-loop commands:

python3 benchmarks/math-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/branch-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/parse-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/array-index-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/string-eq-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/array-struct-field-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/enum-struct-payload-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/vec-i32-index-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/vec-string-eq-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/json-quote-loop/run.py --mode hot-loop --repeats 5 --warmups 1 --glagol compiler/target/debug/glagol

Cold-process commands:

python3 benchmarks/math-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/branch-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/parse-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/array-index-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/string-eq-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/array-struct-field-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/enum-struct-payload-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/vec-i32-index-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/vec-string-eq-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol
python3 benchmarks/json-quote-loop/run.py --mode cold-process --repeats 3 --warmups 1 --glagol compiler/target/debug/glagol

7. Benchmark Results

The benchmark rows below remain the full-suite 1.0.0-beta publication baseline. 1.0.0-beta.1 changes tooling and install workflow, and 1.0.0-beta.2 adds runtime/resource APIs, 1.0.0-beta.3 adds standard-library catalog and composition coverage, 1.0.0-beta.4 improves diagnostics, 1.0.0-beta.5 tightens package/workspace discipline, and 1.0.0-beta.6 adds a narrow loopback networking foundation, and 1.0.0-beta.7 adds a narrow JSON construction foundation. None of these post-beta slices claims changed benchmark performance. The beta.7 json-quote-loop scaffold is present for local follow-up timing and is not part of the exp-123 nine-row result table below.

The exp-123 publication baseline widened the paired same-machine result set from seven rows to nine by adding two owned-vector kernels:

vec-i32-index-loop
vec-string-eq-loop

Hot-loop normalized median time, in milliseconds per one million iterations:

Benchmark	Slovo	C	Rust	Python	Clojure	Common Lisp/SBCL
math-loop	1.121	1.121	1.138	111.092	241.220	1.753
branch-loop	2.014	2.012	2.032	114.016	241.469	4.624
parse-loop	6.456	16.233	7.169	134.465	264.669	20.108
array-index-loop	1.103	1.109	1.128	96.649	298.388	3.379
string-eq-loop	4.332	4.092	2.279	120.453	288.617	11.128
array-struct-field-loop	1.139	1.116	1.129	110.854	277.466	3.663
enum-struct-payload-loop	4.304	1.512	1.880	302.252	310.066	5.297
vec-i32-index-loop	1.328	1.103	1.131	111.153	272.914	2.231
vec-string-eq-loop	5.210	4.122	3.471	122.826	302.817	10.431

Cold-process median time, in milliseconds per benchmark run:

Benchmark	Slovo	C	Rust	Python	Clojure	Common Lisp/SBCL
math-loop	1.625	1.675	1.765	121.014	2808.812	9.435
branch-loop	2.563	2.517	2.682	130.790	2674.146	15.027
parse-loop	6.942	16.749	7.857	149.594	2835.421	27.750
array-index-loop	1.599	1.606	1.807	107.150	2812.157	17.589
string-eq-loop	4.826	4.756	2.938	135.748	2892.359	21.504
array-struct-field-loop	1.670	1.612	1.837	115.371	2823.026	13.411
enum-struct-payload-loop	4.934	2.000	1.783	291.850	2516.815	27.555
vec-i32-index-loop	3.047	2.851	1.776	112.950	2911.603	10.017
vec-string-eq-loop	5.427	4.575	4.081	134.914	2567.482	18.950

Compiler interpretation:

The current hosted build path keeps Slovo essentially on the local native baseline for the scalar and fixed-array kernels. In hot-loop mode, math-loop, branch-loop, and array-index-loop all land very close to the C scaffold and within a narrow distance of the Rust scaffold.
parse-loop is now more than a backend-loop benchmark. It compares end-to-end parser and runtime choices. On this machine, the current Slovo decimal parse path outperforms the C scaffold built around strtol and stays close to the Rust scaffold.
string-eq-loop exposes a different boundary: exact content equality is clearly efficient enough for native-code use, but the current Slovo runtime path is still behind the Rust scaffold and slightly behind the C scaffold on this machine.
vec-i32-index-loop shows the cost of routing the same integer corpus through the promoted owned-vector lane instead of fixed arrays. On this machine the Slovo lane remains practical native code, but it is visibly more expensive than the fixed-array kernel.
vec-string-eq-loop shows the same tradeoff for owned string vectors. It stays in the same broad range as the fixed-array string kernel, but it is a more allocation- and indirection-heavy path than direct fixed-array access.
array-struct-field-loop stays close to the direct fixed-array kernel. On this machine, routing the same % 8 indexing pattern through one immutable struct field keeps Slovo, C, and Rust tightly grouped in hot-loop mode.
enum-struct-payload-loop exposes a current composite-data boundary. The Slovo lane remains practical native code, but repeated struct-payload enum matching is still materially slower than the C and Rust scaffolds on this machine.
Cold-process timings show native executable startup plus one benchmark run. They are not compile-time numbers and are more sensitive to launcher/runtime initialization effects than hot-loop mode.
Clojure is dramatically slower in this process-per-run harness because each sample includes JVM and hosted runtime startup, and the benchmark bodies stay on high-level runtime paths. The effect is still strongest in the more allocation- and dispatch-heavy composite kernels.
Common Lisp/SBCL remains much closer to native baselines than Clojure in the same harness. That is why both Lisp-family comparison points are useful.

8. Current Technical Risks

The main risks in beta are not syntax parsing. They are engineering coverage and compatibility:

source forms reaching backend paths without clear diagnostics
standard-library source helpers drifting from compiler-known runtime calls
feature claims appearing in docs before fixtures and tests exist
collection and ADT breadth growing faster than the compatibility story
benchmark breadth growing faster than the language contract can stabilize
benchmark numbers being misread as public thresholds or cross-machine claims
package behavior becoming stable before dependency, manifest, and versioning rules are precise

9. Path Beyond `1.0.0-beta.7`

Glagol now implements the first real beta Slovo contract, the first post-beta tooling/install hardening release, the first runtime/resource foundation release, the first standard-library stabilization release, and the first diagnostics usability release, the first package/workspace discipline release, the first loopback networking foundation release, and the first serialization/data-interchange foundation release. The remaining path is from beta to stable.

Recommended compiler sequence:

Complete the next blocked post-beta language-breadth slices from the Slovo roadmap without regressing the beta baseline.
Broaden runtime-owned strings, collections, and composite value flow without exposing unstable ABI details as stable contracts.
Refine f32 policy, additional integer families, explicit conversion behavior, and remaining library/runtime gaps.
Harden package, workspace, and standard-library import/search behavior into a compatibility-governed stable toolchain story.
Strengthen diagnostics, generated docs, conformance fixtures, and release gates as first-class compiler interfaces.
Keep benchmark publication local and repeatable while deferring public performance claims until methodology is stronger.
Freeze formatter output, diagnostics schema, package behavior, stdlib compatibility, migration policy, and toolchain contracts for 1.0.0.

10. Conclusion

Glagol has moved Slovo from a manifesto into a working beta native compiler track. The important result is not only that programs compile. It is that the support boundary is visible enough to review: source contracts, diagnostics, tests, lowering, benchmarks, and publication artifacts can be kept in sync.

The compiler is now useful enough for ordinary local tools and libraries within the documented beta contract. The path forward remains disciplined breadth and compatibility hardening, not unsupported feature claims.

22 KiB Raw Blame History