1 / 21
σύμβολον符木
magalia · prediction · the orchestration layermagalia · prediction · the orchestration layer
magalia · prediction · 编排层magalia · prediction · 编排层
SymbolonSymbolon
σύμβολον — a token broken in two and later re-matched to recognise a guest-friend. Here: an inscription re-matched to the one that fits it. Ithaca (Greek) + Aeneas (Latin) + magalia’s own model, behind one grounded service — never inventing readings.σύμβολον —— 一枚剖为两半、日后重新合拢以相认的信物。在此:把一条铭文重新匹配到与它契合的那一条。Ithaca(希腊语)+ Aeneas(拉丁语)+ magalia 自训模型,统一于一项有据可循的服务之后 —— 绝不臆造读法。
669,498669,498
bridge rowsbridge rows
桥接行桥接行
55
evidence streamsevidence streams
证据流证据流
244 ms244 ms
retrieval (warm)retrieval (warm)
检索(热)检索(热)
33
graceful tiersgraceful tiers
降级层降级层
← New here? Start with the Aeneas deck← New here? Start with the Aeneas deck
← 初次了解?先看 Aeneas 演示← 初次了解?先看 Aeneas 演示
2 / 21
0 · from Aeneas0 · 承接 Aeneas
0 · where Aeneas left offTwo specialists, one tool0 · 承接 Aeneas两位专家,一件工具
The Aeneas deck ended on a handoff: Ithaca restores, places and dates Greek; Aeneas does the same for Latin and contextualises — retrieving the parallels a historian would reach for. Symbolon orchestrates both, plus magalia’s own joint model, behind one service. Assael 2025Aeneas 演示以一次交接收尾:Ithaca修复、定位、定年希腊铭文;Aeneas 对拉丁文做同样的事并建立语境 —— 检索史学家会去翻找的平行文本。Symbolon 把二者连同 magalia 自训的联合模型编排于一项服务之后。Assael 2025
The name. A symbolon was a clay token snapped in half; two guests proved their bond by re-matching the halves. Symbolon re-matches a broken inscription to the corpus text that completes its sense.名称由来。symbolon 是一枚掰成两半的陶筹;两位宾客以重新合拢两半来证明彼此的情谊。Symbolon 把一条残断的铭文重新匹配到能补全其意义的语料文本。
The honest framing (Aeneas’s own words): two specialists now exist; Symbolon is an experiment to advance, not a paradigm. This deck argues that case — and names its limits.坦诚的定位(Aeneas 原话):如今有了两位专家;Symbolon 是一次待推进的实验,而非范式。本演示陈述此论 —— 并指出其局限。
3 / 21
0 · three questions0 · 三个问题
0 · the questionsWhat Symbolon is really asking0 · 三个问题Symbolon 真正在问什么
1 · One instrument? Can two separately-trained specialist models become a single cross-corpus tool — search that crosses Greek↔Latin?1 · 一件工具?两个各自训练的专家模型能否合成单一的跨语料工具 —— 跨越希腊↔拉丁的检索?
2 · Grounded agency? Can an edition be an environment an agent works in — restoring, attributing, retrieving — yet structurally unable to hallucinate?2 · 有据的智能体?校勘本能否成为供智能体工作的环境 —— 修复、归属、检索 —— 却在结构上无法臆造?
3 · Fill, or guess? Can a restoration be told apart from a guess — attested by formula / parallels vs the model’s probability alone?3 · 补文,还是猜测?能否把一处修复与一次猜测区分开 —— 由套语/平行文本佐证,还是仅凭模型概率?
The spineEach question is answered across this deck — and each answer is, in part, a no. That honesty is the point.主线每个问题都将在本演示中作答 —— 而每个答案,部分是否定的。这份坦诚正是要旨所在。
4 / 21
0 · the argument0 · 论证脉络
0 · the argumentHow the deck moves0 · 论证脉络演示的脉络
The deck mirrors the Aeneas arc — foundations → concepts → limitations — but for Symbolon, answering Aeneas point for point.本演示沿用 Aeneas 的结构 —— 基础 → 概念 → 局限 —— 但针对 Symbolon,逐点回应 Aeneas。
I · Foundations. Three engines (Ithaca · Aeneas · magalia’s joint torso), joined by an identity spine Aeneas never needs, served through three graceful tiers.I · 基础。三台引擎(Ithaca · Aeneas · magalia 联合主干),由 Aeneas 无需的身份主干连接,并通过三个降级层提供服务。
II · The finding. Aeneas contextualises by embedding cosine. Symbolon measured whether one cosine index can cross Greek↔Latin — and found it cannot. The fix: bridge by concept & identity.II · 核心发现。Aeneas 以嵌入余弦建立语境。Symbolon 测量了单一余弦索引能否跨越希腊↔拉丁 —— 结论是不能。解法:以概念与身份桥接。
III · Concepts. Honesty by construction (the §0 invariants); the restoration-support cross-check (fill vs guess); the “#” gap; and what lies beyond restoration.III · 概念。结构性的诚实(§0 不变式);修复-佐证交叉核验(补文还是猜测);“#”缺口;以及修复之外的疆域。
IV · Limitations. What Symbolon inherits from Aeneas, what it adds, and where it sits on the 2026 frontier — an experiment, named as one.IV · 局限。Symbolon 从 Aeneas 承袭了什么、自身添加了什么,以及它在 2026 前沿的位置 —— 一次被如实命名的实验。
Build statusActs I–III (the interactive widgets) are in build; this scaffold ships Act 0 and Act IV first, so the honest frame stands before the argument fills in.建设状态第 I–III 幕(交互组件)正在建设;本骨架先交付第 0 幕与第 IV 幕,使坦诚的框架先于论证就位。
5 / 21
I · three enginesI · 三台引擎
I · three enginesWhat Aeneas had one of, Symbolon has threeI · 三台引擎Aeneas 有一台,Symbolon 有三台
Aeneas is one model over one corpus. Symbolon orchestrates three engines: Ithaca for Greek, Aeneas for Latin, and magalia's own joint torso — an offline browser pilot, honestly weak, never mistaken for the specialists. Assael 2025Aeneas 是覆盖一个语料的一台模型。Symbolon 编排三台引擎:Ithaca处理希腊语,Aeneas 处理拉丁语,以及 magalia 自训的联合主干 —— 一个离线浏览器试点,坦承其弱,绝不与专家模型混淆。Assael 2025
▶ interactive: engines — open the live deck to use it交互演示:engines —— 打开实时演示以使用
Why threeThe honest hierarchy is the point: two strong hosted specialists do the epigraphic work; the in-browser torso is a graceful-degradation fallback, labelled pilot wherever it appears (Act I · tiers). No engine's output is ever presented as more certain than it is.为何三台诚实的层级正是要旨:两个强大的托管专家承担铭文工作;浏览器内主干是优雅降级的兜底,凡出现处皆标注试点(第 I 幕 · 层级)。任何引擎的输出都不会被呈现得比其实际更确定。
6 / 21
I · the identity spineI · 身份主干
I · the identity spineThe thing a single corpus never needsI · 身份主干单一语料从不需要的东西
A corpus has one id space. The moment you join Greek, Latin, papyri, literature and your own editions, the first problem is: which records are the same inscription? The identity spine answers it — by identity, not similarity.一个语料只有一个号空间。一旦你连接希腊语、拉丁语、纸莎草、文献与自己的校勘本,首要问题便是:哪些记录是同一条铭文?身份主干来回答 —— 靠身份,而非相似度。
▶ interactive: identitySpine — open the live deck to use it交互演示:identitySpine —— 打开实时演示以使用
The join layerThis is the structural answer to Research Question 1 set up: the bridge that lets two corpora speak is built here, at the id layer, long before any vector is compared. 126 artifacts, 11 id systems, union-find merge — all resolvable with zero model inference.连接层这为研究问题 1 的解答搭好结构:让两个语料对话的桥在此构建 —— 在 id 层,远早于任何向量比较。126 件文物、11 套 ID 系统、并查集合并 —— 全部可在零模型推理下解析。
7 / 21
I · three graceful tiersI · 三个降级层
I · three graceful tiersSelf-hosted, so honest about what is upI · 三个降级层自托管,故对何者在线坦诚
There is no hosted Ithaca/Aeneas API — DeepMind open-sourced the weights; you self-host. Assael 2025 So Symbolon serves through three tiers and degrades gracefully: when the service is dark, the browser pilot answers — badged, never faked.没有托管的 Ithaca/Aeneas API —— DeepMind 开源了权重;需自托管。Assael 2025故 Symbolon 通过三个层提供服务并优雅降级:服务未部署时,由浏览器试点作答 —— 带标注,绝不伪造。
▶ interactive: tiers — open the live deck to use it交互演示:tiers —— 打开实时演示以使用
Honesty about seamsThe §0.6 invariant in the UI: a tier that is not deployed is shown dark, not stubbed with plausible output. Toggle the service to watch the served tier — and the pilot badge — change. The companion-API kit exists; hosting it is the standing user-side blocker.对接缝坦诚UI 中的 §0.6 不变式:未部署的层显示为未部署,不用貌似可信的输出填充。切换服务,观察服务层 —— 以及试点标记 —— 的变化。配套 API 套件已有;托管它是长期待办的用户侧阻塞项。
8 / 21
II · the cross-corpus dreamII · 跨语料之梦
II · the cross-corpus dreamOne search, two corporaII · 跨语料之梦一次检索,两个语料
Aeneas’s central move is contextualisation by cosine — retrieve the inscriptions whose learned direction aligns closest to the query. Assael 2025 Symbolon’s first question: can that one search cross the Greek↔Latin boundary?Aeneas 的核心动作是余弦语境化 —— 检索学得方向与查询最为对齐的铭文。Assael 2025Symbolon 的第一个问题:这一检索能否跨越希腊↔拉丁的边界?
▶ interactive: cosine — open the live deck to use it交互演示:cosine —— 打开实时演示以使用
The hypothesisThe hope: embeddingsproduced by the same training task should land in a shared geometry — a Greek honour decree near a Latin one. The Aeneas paper shows this works within the Latin corpus. The question is whether it crosses.假设期望:同一训练任务产生的嵌入应落在共享的几何空间中 —— 一条希腊荣誉敕令紧邻一条拉丁的。Aeneas 论文证明这在拉丁语料内部有效。问题是能否跨越。
9 / 21
II · the measured noII · 实测的否
II · the measured noWhat probe_geometry.py foundII · 实测的否probe_geometry.py 发现了什么
We measured it. The two models' embedding spaces are near-orthogonal: mean cross-corpus cosine −0.007. A unified cosine index silently collapses to per-language retrieval — it does not cross.我们测量了。两个模型的嵌入空间近乎正交:跨语料均值余弦 −0.007。统一余弦索引悄然退化为按语言检索 —— 它并不跨越。
▶ interactive: orthogonal — open the live deck to use it交互演示:orthogonal —— 打开实时演示以使用
Honest negative resultThis is a respectful rebuttal-by-evidence to Aeneas's central concept. Within Latin, cosine contextualisation is real and powerful. Assael 2025 Across Greek/Latin: the geometry isn't shared, because the two models trained on disjoint data with disjoint vocabularies. And there is no shortcut: aligning the two spaces (Procrustes, VecMap) needs the same inscription on both sides — but grc is keyed by PHI and lat by EDCS/TM, so 0 identity anchors are shared (measured over 358k rows). There is nothing for a learned alignment to fit. The fix isn't more training — it's bridging at a different layer.诚实的否定结果这是对 Aeneas 核心概念的以证据为据的反驳,出于尊重。在拉丁语内,余弦语境化真实且有力。Assael 2025跨越希腊/拉丁:几何不共享,因为两个模型在各自独立的数据与词表上训练。而且没有捷径:对齐两个空间(Procrustes、VecMap)需要同一铭文同时出现在两边 —— 但希腊以 PHI 编号、拉丁以 EDCS/TM 编号,0 个身份锚点重合(实测 358k 行)。对齐无从学起。解法不是更多训练 —— 而是在不同层面桥接。
10 / 21
II · bridge by concept & identityII · 以概念与身份桥接
II · bridge by concept & identityThe fix: don't fight the geometryII · 以概念与身份桥接解法:不与几何对抗
If vectors won't cross, cross at the concept and identity layer. The 5-stream bridge indexes 669,498 rows by a shared 33-concept lexicon + the magalia identity spine, not by cosine distance.如果向量无法跨越,就在概念与身份层跨越。五流桥接以共享的 33 概念词表 + magalia 身份主干,而非余弦距离,索引 669,498 行。
▶ interactive: bridge — open the live deck to use it交互演示:bridge —— 打开实时演示以使用
RQ 1 answeredThis answers Research Question 1 honestly: no by embeddings; yes by concept/identity. The two corpora can speak to each other — but via a constructed intermediate layer, not via an alignment that doesn't exist in the learned geometry.研究问题 1 已答这如实回答了研究问题 1:向量不行;概念/身份可行。两个语料可以对话 —— 但通过一个构建的中间层,而非学得几何中并不存在的对齐。
11 / 21
II · five evidence streamsII · 五条证据流
II · the five streamsWhat the bridge actually containsII · 五条流桥接实际包含什么
The bridge is not an abstract layer — it is 669,498 concrete rows across 5 evidence types, held in a serverless sql.js shard, warm retrieval at 244 ms.桥接不是抽象层 —— 它是跨越 5 种证据类型的 669,498 条具体行,保存在无服务器 sql.js 分片中,热检索 244 ms。
streamstream
rowsrows
sourcesource
grcgrc
178,551178,551
I.PHI inscriptionsI.PHI inscriptions
latlat
180,205180,205
LED inscriptionsLED inscriptions
litlit
244,578244,578
literary referencesliterary references
pappap
66,07966,079
DDbDP papyriDDbDP papyri
eded
8585
magalia editionsmagalia editions
totaltotal
669,498669,498
33 concepts · 244 ms33 concepts · 244 ms
allegianceallegiance
alliancealliance
colony-lawcolony-law
decree-formuladecree-formula
honours-proxenyhonours-proxeny
imperial-ideologyimperial-ideology
militarymilitary
oathoath
provincial-adminprovincial-admin
tribute-financetribute-finance
+ 23 more+ 23 more
及另 23 个及另 23 个
The middle layerThe 33-concept lexicon is the bridge index vocabulary (source: symbolon-bridge.sqlite + symbolon-connect.sample). Ten confirmed tags shown above. Identity linking uses the magalia registry (126 artifacts, 11 ID systems). Neither requires model inference at query time.中间层33 概念词表是桥接索引词汇(来源:symbolon-bridge.sqlite + symbolon-connect.sample),上方展示 10 个已确认标签。身份链接使用 magalia 注册表(126 件文物,11 套 ID 系统)。查询时两者均不需要模型推理。
12 / 21
III · honesty by constructionIII · 结构性诚实
III · honesty by constructionWhy it structurally cannot hallucinateIII · 结构性诚实为何它在结构上无法臆造
Aeneas insists on ranked hypotheses + saliency and warns of “history from square brackets”. Assael 2025 Symbolon turns that ethic into code: the §0 invariants — report tool output, cite every claim, preserve uncertainty, abstain when ungrounded.Aeneas 坚持排名假设 + 显著性,并警示“方括号里的历史”。Assael 2025Symbolon 把这一伦理化为代码:§0 不变式 —— 复述工具输出、为每个主张引用、留存不确定性、无依据则弃答。
▶ interactive: honesty — open the live deck to use it交互演示:honesty —— 打开实时演示以使用
RQ 2This is the heart of Research Question 2: an edition an agent can work in yet cannot invent within. The orchestrating model never authors a reading — it routes, cites, and when the streams are silent it says so. Refusal is a feature, not a failure.研究问题 2这是研究问题 2 的核心:一个智能体可以在其中工作、却无法在其中臆造的校勘本。编排模型从不自撰读法 —— 它路由、引用,当各流沉默时便如实相告。拒答是特性,而非失败。
13 / 21
III · fill, or guess?III · 补文还是猜测?
III · is it a fill, or a guess?The verdict Aeneas does not giveIII · 补文,还是猜测?Aeneas 不给的判断
A restoration probability tells you the model's confidence — not why. The H5 support cross-check classifies every candidate: formula-attested, contextual-only (pilot), or LM-probability-only — answering Research Question 3.一个修复概率告诉你模型的置信度 —— 而非为何。H5 佐证交叉核验把每个候选分类:套语佐证、仅语境(试点)、或仅模型概率 —— 以回答研究问题 3。
▶ interactive: support — open the live deck to use it交互演示:support —— 打开实时演示以使用
RQ 3Uncertainty preserved and made legible: the scholar sees not just a ranked list but what kind of evidence stands behind each rank. A high-probability candidate with zero corpus support is exactly the one to distrust — and the cross-check says so out loud.研究问题 3不确定性既被留存又变得可读:学者看到的不只是排名,还有每一名背后属于何种证据。一个高概率却零语料支撑的候选,恰是最该不信任的 —— 交叉核验直言不讳。
14 / 21
III · the “#” gapIII · “#” 缺口
III · the unknown-length gapAeneas' headline move, two honest waysIII · 未知长度的缺口Aeneas 的招牌动作,两种诚实做法
Aeneas' advance is unknown-length restoration: # = a gap of unknown size, - = one known char. Assael 2025 Symbolon serves the true Aeneas head via the service tier — and over the offline pilot, runs an honest length-sweep heuristic, labelled as such.Aeneas 的突破是未知长度修复:# = 未知大小的缺口,- = 一个已知字符。Assael 2025Symbolon 经服务层调用真正的 Aeneas 头 —— 在离线试点上,则运行一个诚实的长度扫描启发式,并如此标注。
▶ interactive: restore — open the live deck to use it交互演示:restore —— 打开实时演示以使用
The honest heuristicThe seam named honestly (§0.6): the pilot's length-sweep is not a learned length model — it tries plausible spans and ranks them, and says so. Greek (Ithaca) supports only known-length gaps; sending # to it should flag, not silently degrade.诚实的启发式坦诚命名的接缝(§0.6):试点的长度扫描不是学得的长度模型 —— 它尝试若干合理跨度并排名,并明言如此。希腊语(Ithaca)只支持已知长度缺口;向它发送 # 应当报警,而非悄然降级。
15 / 21
III · beyond restorationIII · 修复之外
III · beyond restorationRestoration is the floor, not the prizeIII · 修复之外修复是地基,而非奖赏
Aeneas stops at restoration + contextualisation. But restoration is now commoditised — a fine-tuned 8B LLM matches Ithaca (Cullhed, Act IV). Cullhed 2026 So the value moves up the stack, to the analytical graph over the identity spine.Aeneas 止步于修复 + 语境化。但修复如今已商品化 —— 一个微调 8B 大模型比肩 Ithaca(Cullhed,第 IV 幕)。Cullhed 2026故价值向上迁移,至身份主干之上的分析图谱。
▶ interactive: analyticalLayer — open the live deck to use it交互演示:analyticalLayer —— 打开实时演示以使用
The frontierThis is the open frontier (R3): knowledge graph · prosopography · formula networks · intertextuality. Each is already seeded by a verified magalia asset (graph 9,517 edges · formula network 6,413 edges · the persons-list) — and each ships only with an evidence chain, never an asserted intent.前沿这是开放的前沿(R3):知识图谱 · 人物志 · 套语网络 · 互文。每一项都已由一个经核验的 magalia 资产播种(图谱 9,517 条边 · 套语网络 6,413 条边 · 人物表)—— 且每一项只在具备证据链时发布,绝不断言意图。
16 / 21
IV · inherited limitsIV · 承袭的局限
IV · honest limitsWhat Symbolon inheritsIV · 坦诚局限Symbolon 承袭了什么
Symbolon sits downstream of Ithaca and Aeneas, so it inherits their limits wholesale — it is not free of them. Assael 2025Symbolon 位于 Ithaca 与 Aeneas 的下游,因此整体承袭其局限 —— 它并未摆脱这些局限。Assael 2025
Survival bias. The corpora are what happened to survive; the LED and I.PHI over-represent durable, public, urban text.存世偏差。语料是恰好存世的部分;LED 与 I.PHI 过度偏向耐久、公共、城市的文本。
Uneven vision. Only ~5% of inscriptions carry an image, so Aeneas’s multimodal edge — which Symbolon passes through — helps unevenly.视觉不均。仅约 5% 的铭文配有图像,故 Aeneas 的多模态优势(Symbolon 透传之)帮助不均衡。
History from square brackets. Editorial restorations retained in the text can be re-learned as fact — the discipline’s oldest trap, inherited intact.方括号里的历史。文本中保留的编辑性修复可能被当作事实重新习得 —— 学科最古老的陷阱,原样承袭。
Carried, not curedSymbolon’s answer is not to remove these but to surface them: ranked hypotheses, not verdicts; provenance on every claim. (the square-bracket risk)承载,而非治愈Symbolon 的回应不是消除它们,而是把它们显式呈现:给出排名假设而非定论;每个论断都附来源。
17 / 21
IV · its own limitsIV · 自身的局限
IV · its own limitsAnd what it addsIV · 自身的局限以及它自身添加的
The orthogonality ceiling. The two models’ embedding spaces are nearly orthogonal (mean cos −0.007), so they cannot be unified by one cosine index — the central finding, and a hard limit on “one search to rule them all.” (Act II.)正交性天花板。两个模型的嵌入空间近乎正交(均值余弦 −0.007),故无法用单一余弦索引统一 —— 这是核心发现,也是“一搜统御”的硬性上限。(第 II 幕。)
A weak in-browser pilot. magalia’s own joint torso (3.3M params, 8.4 MB f16) runs offline but is far below the hosted specialists — letters-only restoration ~0.59 top-1, dating@50yr ~0.46, region ~0.30. It is labelled pilot at runtime, never passed off as the real model.浏览器内的弱试点。magalia 自训联合主干(3.3M 参数,8.4 MB f16)可离线运行,但远逊于托管的专家模型 —— 仅字母的修复 ~0.59 top-1,50 年内定年 ~0.46,定域 ~0.30。运行时标注为试点,绝不冒充真实模型。
The service tier is dark. True Ithaca/Aeneas inference needs a hosted companion API that isn’t yet deployed — so on the public site the live restore/attribute paths fall back to the pilot. The deploy kit exists; the server does not.服务层尚暗。真正的 Ithaca/Aeneas 推理需要一个尚未部署的托管配套 API —— 故在公开站点上,实时修复/归属会回退到试点。部署套件已具备;服务器尚未就位。
An experiment, not a paradigm. Symbolon composes existing parts on a shared identity spine; it does not claim a breakthrough. Its value is the arrangement and its honesty, not a new model.实验,而非范式。Symbolon 在共享身份主干上组合既有部件;并不宣称突破。其价值在于这种编排与诚实,而非一个新模型。
18 / 21
IV · the frontierIV · 前沿
IV · where this sitsThe field, mid-2026IV · 置身何处学科现状,2026 年中
A frontier scan (June 2026) sharpens what Symbolon can honestly claim.一次前沿扫描(2026 年 6 月)厘清了 Symbolon 可如实主张的边界。
Restoration is commoditising. A fine-tuned commodity LLM now matches Ithaca on Greek restoration (Cullhed, DSH 41.1, 2026). So restoration is the floor, not the prize.修复正在商品化。一个微调的通用大模型如今在希腊文修复上已与 Ithaca 比肩(Cullhed,《DSH》41.1,2026)。故修复是地基,而非奖赏。
The agentic edition is still open. No grounded, tool-using agentic epigraphic edition existed as of mid-2026 — the one frontier Symbolon’s parts are pre-assembled for.智能体校勘本仍是空白。截至 2026 年中,尚无有据、使用工具的智能体铭文校勘本 —— Symbolon 的部件恰为这一空白预先备齐。
The join-layer launched. FAIR vocabularies, the rebuilt EDCS (Zurich), and WikiProject Epigraphy went live — so Symbolon’s identity spine now aligns to a standard that exists.连接层已上线。FAIR 词表、重建的 EDCS(苏黎世)与维基铭文项目均已上线 —— 故 Symbolon 的身份主干如今可对齐一个已存在的标准。
The one claimSymbolon’s defensible claim is narrow and real: the grounded agentic edition — restoration as one cited tool among many, never the free-generating oracle.唯一的主张Symbolon 可立足的主张窄而实:有据的智能体校勘本 —— 把修复作为众多带引用工具中的一件,而非自由生成的神谕。
19 / 21
IV · Cullhed 2026IV · Cullhed 2026
IV · Cullhed 2026What the paper shows, preciselyIV · Cullhed 2026论文精确地展示了什么
Instruction-tuning Llama 3.1 8B on I.PHI + DDbDP beats Ithaca on Greek text restoration, but underperforms on attribution. Cullhed 2026 This is the evidence behind "restoration is commoditising."在 I.PHI + DDbDP 上对 Llama 3.1 8B 进行指令微调,在希腊文本修复上超越 Ithaca,但在归属上表现更差。Cullhed 2026这是"修复正在商品化"的依据。
modelmodel
restoration ↓ better / ↑ betterrestoration ↓ better / ↑ better
region ↑region ↑
date (yr↓)date (yr↓)
CER%↓CER%↓
top-1%↑top-1%↑
top-20%↑top-20%↑
top-1%↑top-1%↑
top-3%↑top-3%↑
avg↓avg↓
shared test set (264 samples, PHI IDs ending in 3)shared test set (264 samples, PHI IDs ending in 3)
Llama 3.1 8B (Cullhed)Llama 3.1 8B (Cullhed)
22.622.6
61.461.4
7777
6565
78.478.4
55.7 yr55.7 yr
Assael 2022Assael 2022
2727
60.960.9
71.671.6
6969
80.280.2
48.3 yr48.3 yr
new inscriptions (4,111 samples, not in PHI — generalization test)new inscriptions (4,111 samples, not in PHI — generalization test)
Llama 3.1 8BLlama 3.1 8B
30.830.8
48.948.9
72.972.9
— not measured —— not measured —
IthacaIthaca
31.131.1
46.546.5
68.368.3
papyri (Papy_1, DDbDP — entirely new resource for deep learning)papyri (Papy_1, DDbDP — entirely new resource for deep learning)
Llama 3.1 8B (papyrus)Llama 3.1 8B (papyrus)
14.914.9
73.573.5
85.985.9
66.466.4
——
21.7 yr21.7 yr
Restoration: Llama wins. CER 22.6% vs 27.0% on the shared test; 30.8% vs 31.1% on new inscriptions never seen by either model. Top-20 77.0% vs 71.6%. A generic 8B model, instruction-tuned, beats the bespoke architecture.修复:Llama 胜出。共享测试集 CER 22.6% 对 27.0%;从未见过的新铭文 30.8% 对 31.1%。前 20 名 77.0% 对 71.6%。一个通用 8B 模型经指令微调,击败了专用架构。
Attribution: Ithaca still leads. Geographic top-1: 65.0% vs 69.0% (Llama −4pp). Dating avg: 55.7 yr vs 48.3 yr (Llama +7yr worse). The specialist architecture's dedicated region and date heads hold up where general-purpose instruction tuning falls short.归属:Ithaca 仍领先。地理前 1 名:65.0% 对 69.0%(Llama 差 4pp)。定年平均:55.7 年对 48.3 年(Llama 差 7 年)。专用架构的独立地区与年代头,在通用指令微调力不从心的地方,依然坚守。
Method detail (instruction template · training setup · TIES merging)Method detail (instruction template · training setup · TIES merging)
方法细节(指令模板 · 训练设置 · TIES 合并)方法细节(指令模板 · 训练设置 · TIES 合并)
Model: Meta Llama 3.1 8B Instruct. Fine-tuned on A100 GPUs (40/80 GB), batch 20, Torchtune. Data: I.PHI Greek inscriptions (Sommerschield 2021) + DDbDP papyri (papyri.info), 95%/5% train/test split; augmented with GPT-4o-mini to generate 10 paraphrased variants per inscription.模型:Meta Llama 3.1 8B Instruct。在 A100 GPU(40/80 GB)上微调,批次 20,Torchtune。数据:I.PHI 希腊铭文(Sommerschield 2021)+ DDbDP 纸莎草(papyri.info),95%/5% 训练/测试划分;用 GPT-4o-mini 为每条铭文生成 10 个改写变体以扩增数据。
Instruction template: 3 system prompts — "Date this…", "Assign this… to an exact place", "Reconstruct the missing letters in this…". The placeholder format is "[6 letters missing]" — unlike Ithaca, the model does not need to know the exact number of missing characters, only the approximate count. This handles scriptio continua naturally.指令模板:3 种系统提示 —— "给这份……定年"、"将这份……归属到确切地点"、"重建这份……中缺失的字母"。占位符格式为"[缺失 6 个字母]"—— 与 Ithaca 不同,模型不需要知道缺失字符的确切数量,只需近似数量。这自然地处理了连续书写。
Training stages: (1) calibration round: 3–4 epochs, 3 configurations to find best approach; (2) continued training to plateau; (3) TIES-merging (Yadav et al. 2023) with base Llama — improved restoration top-20 by +3pp, negligible CER change; (4) comparison re-training on 80/10/10 split with truncation to 50–750 chars to match Ithaca's input window.训练阶段:(1)校准轮:3–4 个 epoch,3 种配置以找最佳方法;(2)继续训练至收敛;(3)与基础 Llama 的 TIES 合并(Yadav et al. 2023)—— 修复前 20 名提升 +3pp,CER 变化可忽略;(4)在 80/10/10 划分上重新训练,截断至 50–750 字符以匹配 Ithaca 的输入窗口。
Masks: Sequences of 3–20 characters randomly selected; no more than 50% of any intact sequence; longer spans favoured. Individual missing characters = single "-"; longer stretches = ten consecutive hyphens. Spaces NOT treated as characters (unlike Ithaca) — better reflects real lacunae where word boundaries are unknown.掩码:随机选取 3–20 字符序列;不超过任何完整序列的 50%;倾向较长片段。单个缺失字符 = 单"-";较长缺失 = 十个连续连字符。空格不作为字符处理(与 Ithaca 不同)—— 更好地反映词边界未知的真实缺漏。
Implication for SymbolonThe paper's verdict: "Although more specialized models may eventually outperform this… instruction-tuned pretrained causal language models can serve as an efficient and easily implementable baseline for future work." Cullhed 2026 The word baseline is the key: restoration is now the floor. Symbolon's defensible frontier is attribution + the agentic edition, not restoration alone.对 Symbolon 的启示论文定论:"尽管更专业的模型最终可能超越……指令微调的预训练因果语言模型可作为未来工作高效且易于实现的基线。"Cullhed 2026关键词是基线:修复如今是地基。Symbolon 可防守的前沿是归属 + 智能体校勘本,而非单独的修复。
20 / 21
V · the three answersV · 三个答案
V · the three answersWhat the deck argued, in one viewV · 三个答案本演示的论证,一览
Three questions opened the deck; each is answered — and each answer is, in part, an honest no.三个问题开启了本演示;每个都已作答 —— 而每个答案,部分是诚实的否定。
1 · One instrument? No by embeddings — the spaces are orthogonal (cos −0.007). Yes by concept & identity: the 5-stream bridge over 669,498 rows.1 · 一件工具?向量不行 —— 空间正交(余弦 −0.007)。概念与身份可行:覆盖 669,498 行的五流桥接。
2 · Grounded agency? Yes, by construction — the §0 invariants: report, cite, preserve uncertainty, abstain. It cannot author a reading it cannot ground.2 · 有据的智能体?结构性地,可以 —— §0 不变式:复述、引用、留存不确定、弃答。它无法自撰无依据的读法。
3 · Fill, or guess? Told apart — the H5 support cross-check tags each candidate formula-attested / contextual-only / probability-only.3 · 补文还是猜测?可区分 —— H5 佐证交叉核验为每个候选标注 套语佐证/仅语境/仅概率。
The throughlineAnd one claim beyond the three: restoration is now the floor (commoditised); Symbolon's defensible frontier is the grounded analytical edition — the graph over the spine, every edge carrying its evidence. An experiment, named as one.贯穿主线并有越出三问的一条主张:修复如今是地基(已商品化);Symbolon 可立足的前沿是有据的分析校勘本 —— 主干之上的图谱,每条边都携其证据。一次被如实命名的实验。
21 / 21
the token, re-matched符木重合
The token,re-matched符木,重新合拢
Where Aeneas rebuilt the web of connections around one corpus, Symbolon asks whether two corpora can be joined — and answers honestly: not by their vectors, but by their shared identity.Aeneas 重建了一个语料周围的连接之网;Symbolon 则追问两个语料能否相连 —— 并如实作答:不靠它们的向量,而靠它们共享的身份。
← Aeneas← Aeneas
← Aeneas← Aeneas
← Ithaca← Ithaca
← Ithaca← Ithaca
The full journey →The full journey →
完整旅程 →完整旅程 →
Try the live workbench →Try the live workbench →
试用实时工作台 →试用实时工作台 →
Acts 0–V complete · five interactive widgets · all figures verified against the symbolon project record and Cullhed DSH 41.1第 0–V 幕齐备 · 五个交互组件 · 所有数字均对照 symbolon 项目记录及 Cullhed DSH 41.1 核实
© 2026 Wu Ching-Yuan 吴靖远 · magalia.wiki (籬廬). Generated transcript 2026-06-13 from symbolon.html · text CC BY 4.0. Papers © their authors (DeepMind, Nature).