1 / 19
The two long roads两条长路
A companion to the Ithaca & Aeneas decksA companion to the Ithaca & Aeneas decks
Ithaca 与 Aeneas 演示的配套Ithaca 与 Aeneas 演示的配套
Two long roadsmet in 2022两条长路在 2022 年相遇
Ithaca and Aeneas sit where six decades of machine learning meet six centuries of epigraphy. This deck walks both roads — every milestone glossed, dated, and cited.Ithaca 与 Aeneas 正处于六十年机器学习与六百年金石学交汇之处。本演示走完两条路 —— 每个里程碑都有注解、纪年与出处。
Ithaca →Ithaca →
Aeneas →Aeneas →
Concept Lab →Concept Lab →
原理实验室 →原理实验室 →
Symbolon →Symbolon →
2 / 19
Two roads两条路
Two roadsA machine, and something to learn from两条路一台机器,和可供学习之物
Any learning model is two things welded together: a method that can learn patterns, and data clean enough to learn from. Ithaca and Aeneas needed both to mature.任何学习模型都是两样东西的焊接:能学规律的方法,与干净到可供学习的数据。Ithaca 与 Aeneas 需要二者都成熟。
The road to the machine — from 1990s neural nets, through word-vectors and the Transformer, to BigBird and T5.通往机器之路 —— 从 1990 年代的神经网络,经词向量与 Transformer,到 BigBird 与 T5。
The road to the data — from a Renaissance traveller copying stones, through Mommsen’s "queryable database" to I.PHI.通往数据之路 —— 从文艺复兴旅人抄录石头,经蒙森的“可查询数据库”,到 I.PHI。
Why it mattersFollow either road first — they converge at the end on Ithaca (2022) and Aeneas (2025).为何重要先走哪条都行 —— 它们在结尾汇于 Ithaca(2022)与 Aeneas(2025)。
3 / 19
I · ML timelineI · 机器学习时间线
I · the road to the machineSix decades, click any milestoneI · 通往机器之路六十年,点击任一里程碑
Each idea solved the previous one’s bottleneck. Click a dot to expand it — blurb, analogy, why it mattered, and a citation.下方每个想法都解决了上一个的瓶颈。点击圆点展开 —— 说明、类比、何以重要、以及出处。
▶ interactive: timeline — open the live deck to use it交互演示:timeline —— 打开实时演示以使用
4 / 19
I · RNN & LSTMI · RNN 与 LSTM
I · sequence memoryWhat is an LSTM? (and RNN, and NLP)I · 序列记忆什么是 LSTM?(及 RNN、NLP)
A neural net that reads one step at a time, carrying memory — with gates to remember across long gaps. Toggle RNN/LSTM and step. Hochreiter 1997一个逐步阅读、携带记忆的神经网络 —— 配以跨长距记忆的门控。切换 RNN/LSTM 并步进。Hochreiter 1997
▶ interactive: rnn — open the live deck to use it交互演示:rnn —— 打开实时演示以使用
5 / 19
I · king − man + womanI · 国王 − 男 + 女
I · meaning as geometryWhy king − man + woman ≈ queenI · 意义即几何为何 国王 − 男 + 女 ≈ 王后
How can a word be a point so similar words sit close — and relations become arithmetic? Press the button. Mikolov 2013词如何成为一点、使相似词相邻 —— 且关系化为算术?点击按钮。Mikolov 2013
▶ interactive: wordmath — open the live deck to use it交互演示:wordmath —— 打开实时演示以使用
6 / 19
I · seq2seqI · seq2seq
I · encoder → decoderWhat an auto-regressive seq2seq RNN doesI · 编码器 → 解码器自回归 seq2seq RNN 做什么
Encoder reads the input; decoder writes the answer one token at a time, feeding its own output back. Restoring εποιη--. Sutskever 2014编码器读入;解码器逐个 token 写出答案,并把自身输出回馈。修复 εποιη--。Sutskever 2014
▶ interactive: seq2seq — open the live deck to use it交互演示:seq2seq —— 打开实时演示以使用
7 / 19
I · subword unitsI · 子词单元
I · subwordsBetween letters and whole wordsI · 子词介于字母与整词之间
Splitting into frequent fragments lets a small vocabulary spell any word — even rare or damaged. Ithaca instead works per-character. Sennrich 2016切成高频片段,使小词表能拼出任何词 —— 连罕见或残损者亦然。Ithaca 则逐字符工作。Sennrich 2016
▶ interactive: subword — open the live deck to use it交互演示:subword —— 打开实时演示以使用
8 / 19
I · ResNetI · ResNet
I · depthResNet: F(x) + x, the skip that enables depthI · 深度ResNet:F(x) + x,使深度成为可能的跳连
The shortcut that let networks go very deep — inside every Transformer block, and Aeneas’s vision backbone. Toggle it off. He 2015让网络变得极深的捷径 —— 在每个 Transformer 块内,也是 Aeneas 的视觉主干。关掉看看。He 2015
▶ interactive: resnet — open the live deck to use it交互演示:resnet —— 打开实时演示以使用
9 / 19
I · read in parallelI · 并行阅读
I · the engineSelf-attention + multi-head + positional → parallelI · 引擎自注意力 + 多头 + 位置 → 并行
Why the Transformer replaced the RNN: it reads the whole sequence at once. Run both and compare. Vaswani 2017为何 Transformer 取代 RNN:它一次读完整段。各跑一次比较。Vaswani 2017
▶ interactive: transformerpar — open the live deck to use it交互演示:transformerpar —— 打开实时演示以使用
10 / 19
I · T5I · T5
I · one model, many tasksT5: every task as text → textI · 一模型,多任务T5:每项任务皆 文本 → 文本
One encoder–decoder for restoration, dating, geography, translation. Why couldn’t earlier tech unify them? Raffel 2020一个编码器–解码器办修复、定年、地理、翻译。为何更早的技术无法统一?Raffel 2020
▶ interactive: t5 — open the live deck to use it交互演示:t5 —— 打开实时演示以使用
11 / 19
I · meaning as geometryI · 意义即几何
I · the idea that started itWhen meaning became geometryI · 起点的想法当意义化为几何
The breakthrough under everything — word2vec Mikolov 2013 — gave every word a position in space, so relationships become arithmetic.一切之下的突破 —— word2vec Mikolov 2013 —— 给每个词一个空间位置,使关系成为算术。
▶ interactive: wv — open the live deck to use it交互演示:wv —— 打开实时演示以使用
12 / 19
I · RNN → TransformerI · RNN → Transformer
I · the engine maturesFrom reading word-by-word to all-at-onceI · 引擎成熟从逐词阅读到一次读完
RNN / LSTM read one step at a time. The Transformer Vaswani 2017 replaced the notepad with attention — read the whole text at once. BigBird Zaheer 2020 then made it cheap enough for long inscriptions.RNN / LSTM 逐步阅读。Transformer Vaswani 2017 用注意力取代便笺 —— 一次读完整段。BigBird Zaheer 2020 再让它便宜到可处理长铭文。
▶ interactive: attnCompute — open the live deck to use it交互演示:attnCompute —— 打开实时演示以使用
13 / 19
I · fill-in-the-blankI · 完形填空
I · the trick that became restorationFill-in-the-blank, at planetary scaleI · 化为“修复”的妙招行星尺度的完形填空
▶ interactive: bert — open the live deck to use it交互演示:bert —— 打开实时演示以使用
BERT Devlin 2019 learned language by masking words and guessing them from both sides — and that masked objective is textual restoration.BERT Devlin 2019 通过遮盖词并据两侧猜测来学语言 —— 那个掩码目标,正是文本修复。
LatinBERT Bamman 2020 proved a BERT-style model can restore a historical language.LatinBERT Bamman 2020 证明 BERT 式模型能修复一门历史语言。
Pythia Assael 2019 first applied it to ancient Greek — Ithaca’s direct ancestor.Pythia Assael 2019 首次将其用于古希腊语 —— Ithaca 的直系祖先。
14 / 19
II · epigraphy timelineII · 金石学时间线
II · the road to the dataSix centuries, click any milestoneII · 通往数据之路六百年,点击任一里程碑
From a Renaissance traveller to the machine-readable text-dump that trains the models. Click a dot to expand each step.从文艺复兴旅人,到训练模型的机器可读文本转储。点击圆点展开每一步。
▶ interactive: timeline — open the live deck to use it交互演示:timeline —— 打开实时演示以使用
15 / 19
II · a database in 1847II · 1847 年的数据库
II · the charterSomeone specified a database — in 1847II · 纲领有人规定了一个数据库 —— 在 1847 年
Mommsen’s memorandum for the Corpus Inscriptionum Latinarum (after 1847 reads like a database spec: collect all, excise forgeries, edit from autopsy, and provide "precise indices".蒙森为《拉丁铭文集成》所写的备忘录 (after 1847 读来像数据库规格:收全、剔伪、据亲检编订、并提供“精确索引”。
Quote"‘Precise indices’ committed the corpus to discoverability — that is, to what we would now call a queryable database." The whole digital pipeline fulfils an 1847 promise. (after 1847引文“‘精确索引’使全集走向可检索 —— 即今日所谓可查询数据库。”整条数字流程兑现了 1847 年的承诺。(after 1847
16 / 19
II · the bracketsII · 括号
II · the shared grammarThe brackets a machine can readII · 共享的语法机器能读的括号
The Leiden Convention Dow 1952 (1931) gave editors one shared set of sigla, so a bracket means the same to everyone — and later, to a parser. The bridge to EpiDoc.莱顿规约 Dow 1952(1931)给编者一套共享符号,使括号对所有人意义相同 —— 此后对解析器亦然。通往 EpiDoc 的桥梁。
▶ interactive: sigla — open the live deck to use it交互演示:sigla —— 打开实时演示以使用
Why it mattersBut a bracket is a claim: "history from square brackets" Badian 1989 warns a restored letter is a hypothesis — the caution Ithaca/Aeneas build into ranked output.为何重要但括号是一种主张:“方括号里的历史” Badian 1989 警告补出的字母是假设 —— Ithaca/Aeneas 内建于排名输出中的告诫。
17 / 19
II · one stone, eight stepsII · 一石八步
II · the whole chain in one objectFrom a Sicilian stone to a training rowII · 一件物品里的整条链从西西里石头到一行训练数据
The entire road is visible in one inscription — ISic000470 — traced across eight transcriptions. Each step gains reach and loses something. SDAM 2021整条路在一条铭文里清晰可见 —— ISic000470 —— 历经八次转写。每一步都扩大触及、也丢失某物。SDAM 2021
▶ interactive: steps — open the live deck to use it交互演示:steps —— 打开实时演示以使用
18 / 19
III · the roads meetIII · 两路相遇
2022 / 20252022 / 2025
2022 / 20252022 / 2025
Where the roads meet两路交汇之处
A BigBird Transformer (the machine) trained on I.PHI (the data) became Ithaca. A T5-with-vision trained on the LED became Aeneas.一个 BigBird Transformer(机器)在 I.PHI(数据)上训练,成为 Ithaca。一个带视觉的 T5 在 LED 上训练,成为 Aeneas。
Meet Ithaca →Meet Ithaca →
认识 Ithaca →认识 Ithaca →
Meet Aeneas →Meet Aeneas →
认识 Aeneas →认识 Aeneas →
19 / 19
Sources & reading出处与延伸阅读
SourcesEvery claim, traceable出处每个论断,皆可溯源
▍ local source · ▍ web source.▍ 本地来源 · ▍ 网络来源。
▶ interactive: sources — open the live deck to use it交互演示:sources —— 打开实时演示以使用
© 2026 Wu Ching-Yuan 吴靖远 · magalia.wiki (籬廬). Generated transcript 2026-06-13 from lineage.html · text CC BY 4.0. Papers © their authors (DeepMind, Nature).