EN中文
magalia · §VIII ML/AI · neural epigraphy · joint cross-corpus torso

One model, three corpora — Greek inscriptions · Latin inscriptions · Greek papyri 一个模型, 三种语料 — 希腊铭文 · 拉丁铭文 · 希腊纸草

A single neural torso, trained jointly on all three corpora, shares one embedding space — so retrieving "parallels" can cross the boundary between inscriptions and papyri, Greek and Latin. Runs entirely in your browser, offline. Type a text, mark lost letters with ?, choose the corpus, and it restores, dates, locates, and finds cross-corpus parallels. 单一神经主干, 在三种语料上联合训练, 共享同一向量空间 —— 故"平行"检索可跨越 铭文与纸草、希腊与拉丁之界。完全在浏览器内离线运行。输入文本, 以 ? 标残缺字母, 择其语料, 即可补字、定年、定位并检索跨语料平行。

When年代

Where地域

Cross-corpus parallels · most similar across all three corpora跨语料平行 · 三语料中最相似者

Honestly. A pilot joint model: a 3,308,986-parameter character transformer (4 layers, 256-dim) with a corpus embedding, CPU-trained across the three corpora — a 58,000-text cross-corpus index (Greek inscriptions from I.PHI + InsAph/IRCyr, Latin from I.Sicily + IRT + EDH (CC BY-SA 4.0), Greek papyri from the DDbDP, via EpiDoc→JSON). Measured letters-only on the model's own validation set: restoration 59% top-1 / 81% top-3; date within 50 yr 46%; region (of 225) 30%. The point isn't to beat DeepMind's specialists — it's a working pilot: one small model reads across three corpora at once, with calibrated confidence. A rigorous leakage-free re-eval (top-1 0.55, ECE 0.06) and the production architecture are on the validation page: cross-language linking goes through the identity / concept layer, not one shared vector space. 诚实说明。 试点联合模型: 一个 3,308,986 参数的字符 Transformer (4 层, 256 维) 附语料嵌入, 跨三语料 CPU 训练 —— 58,000 条跨语料索引 (希腊铭文取 I.PHI, 拉丁取 I.Sicily, 希腊纸草取 DDbDP, 经 EpiDoc→JSON)。于模型自有验证集测得 (仅字母): 补字首选 59%/ 前三 81%; 定年 50 年内 46%; 地域 (共 225) 30%。要旨非胜过 DeepMind 专模, 而为一可用试点: 一个小模型可同时读跨三语料, 且置信已校准。严格无泄漏复评 (首选 0.55, ECE 0.06) 与生产架构见验证页: 跨语言之链经身份/概念层, 而非单一共享向量空间