EpiDoc WorkshopEpiDoc 编码教学工坊
Use ← → keys, click the buttons above, or press Esc to see all slides.
使用左右方向键、点击按钮、或按 Esc 查看所有幻灯片。
What this workshop is for本工坊所为何来
A promise about scope. EpiDoc inherits 363 elements from TEI. You will not learn all of them — you will master the ~12 core elements that carry roughly 95% of real encoding, and learn where to find the rest. 关于范围的承诺。EpiDoc 自 TEI 继承 363 个元素。你不必尽学,只须精通承担约九成五实际编码的 约 12 个核心元素, 并知道其余到何处去查。
What is EpiDoc?什么是 EpiDoc?
EpiDoc is a community customization of the TEI Guidelines for encoding ancient inscriptions in XML. One source file generates many views: a Leiden-style printed edition, a clickable website, a queryable database, and the TEI-conformant XML itself. Below: the same ISic000298 inscription rendered all four ways at once. Switch inscription in the embed via the picker — every pane updates in lock-step. EpiDoc 是 TEI 指南针对古代铭文 XML 编码的社区定制。一份源文件可生成多种视图: 莱顿印刷版、可点击网页版、可查询数据库、以及 TEI 规范的 XML 本身。下方: 同一份 ISic000298 铭文同时以四种方式呈现。在嵌入的查看器中切换铭文, 四个面板同步刷新。
↗ Open the full viewer in a new tab to switch inscription or pick an individual view. ↗ 在新标签页打开完整查看器 以切换铭文或选择单一视图。
The problem TEI was created to solveTEI 所欲解决的问题
Humanities computing's first three decades (1949–1985) produced extraordinary individual achievements — but no common ground. Father Roberto Busa's Index Thomisticus, begun in 1949 with IBM under his personal supervision, used one encoding scheme. Wilhelm Ott's TUSTEP at Tübingen used another. The Oxford Concordance Programme (OCP) used a third. The Brown Corpus (1964) had its own.
By the mid-1980s the field counted roughly thirty active text-encoding schemes. None could read another. A scholar who wanted to combine, say, Busa's Aquinas with a Tübingen Plato could not — they would have to re-encode one from scratch. Texts were trapped inside projects.
计算人文学最初三十年 (1949–1985) 涌现了非凡的个人成就,却无共同基础。罗伯托·布萨神父 (Roberto Busa) 1949 年起与 IBM 合作的《托马斯索引》(Index Thomisticus) 自有一套编码; 图宾根大学 Wilhelm Ott 的 TUSTEP 另成一系; 牛津的 Oxford Concordance Programme 与布朗语料库 (1964) 又各自不同。
至 1980 年代中期, 学界约有三十种各行其是的文本编码方案, 彼此无法互读。要把布萨的阿奎那与图宾根的柏拉图合并研究, 只能从头重编一份。文本被困在各自的项目之中。
Images: from Marco Passarotti, "The Index Thomisticus as a Digital Humanities Big Data Project," Umanistica Digitale 3 (2019). DOI 10.6092/issn.2532-8816/8575. Originals in the Busa Archive at Università Cattolica del Sacro Cuore (CIRCSE), Milan. Article licensed CC-BY 3.0. 图片来源: Marco Passarotti, 《作为数字人文“大数据”项目的“托马斯索引”》, Umanistica Digitale 第 3 卷 (2019 年); DOI 10.6092/issn.2532-8816/8575。原件藏于米兰圣心天主教大学 CIRCSE 之布萨档案 (Busa Archive)。文章依 CC-BY 3.0 授权。
November 1987 · Poughkeepsie1987 年 11 月 · 波启普西
At Vassar College in Poughkeepsie, New York, on 11–13 November 1987, thirty-two participants convened to address the problem. The meeting was hosted by the Association for Computers and the Humanities (ACH) and co-sponsored by the Association for Computational Linguistics (ACL) and the (UK/European) Association for Literary and Linguistic Computing (ALLC).
The meeting produced the one-page Poughkeepsie Principles: a commitment by the three associations to develop guidelines for the encoding and interchange of machine-readable humanities texts. C. M. Sperberg-McQueen (University of Illinois at Chicago) and Lou Burnard (Oxford University Computing Services) became the founding editors. Funding from the US National Endowment for the Humanities (NEH), the European Commission's LRE programme, and the Andrew W. Mellon Foundation.
1987 年 11 月 11–13 日, 三十二位与会者集于纽约州波启普西的 Vassar College, 共商上述问题。会议由 计算与人文学协会 (ACH) 主办, 计算语言学协会 (ACL) 与英欧 文学与语言计算协会 (ALLC) 共同发起。
会议通过一页纸的 波启普西原则 (Poughkeepsie Principles): 三大学会承诺共同编纂机器可读人文文本的编码与互换标准。创始主编为 C. M. Sperberg-McQueen (伊利诺伊大学芝加哥分校) 与 Lou Burnard (牛津大学计算服务中心)。经费来自美国国家人文基金会 (NEH)、欧盟委员会 LRE 计划与梅隆基金会。
«I have a distinct memory of him emerging very slowly from a rather small automobile which had been dispatched to collect me and other stranded Europeans from the train station in snowy Poughkeepsie, prior to the foundational Workshop of the TEI. … For the first of the three decades separating those meetings we worked very closely together on the design and construction of the TEI, and its ‘Guidelines’, at Michael's home in Oak Park, Illinois, or mine in North Oxford, at various conferences and at numerous hotels in North America and Europe. … We also spent a lot of time arguing, as well as drafting, redrafting, and re-redrafting the impeccable prose between the tags.»
«我仍清晰记得他从一辆小车里缓缓钻出来,那辆车专程派去雪后的波启普西火车站, 接我和几位滞留的欧洲同行, 准备出席 TEI 的奠基工作坊。… 在两次会面之间的三十年中的头十年, 我们紧密协作, 在迈克尔位于伊利诺伊州奥克帕克的家中、或我在牛津北郊的家中、在各处会议、在北美与欧洲无数旅馆里, 共同设计与构建 TEI 及其《指南》。… 我们也花了大量时间争论, 反复起草、修订、再修订标签之间那些字字斟酌的散文。»
Photos & quotation: Lou Burnard, from Remembering CM Sperberg-McQueen, ed. B. Tommie Usdin & Debbie A. Lapeyre (Mulberry Technologies / Balisage, August 2024). Source: balisage.net/MSM/MSMMemories-49.html. The full memorial collection is at balisage.net/RememberingMSM.html; the TEI's own In Memoriam is at tei-c.org. 图片与引文: Lou Burnard, 收录于《缅怀 CM Sperberg-McQueen》, B. Tommie Usdin 与 Debbie A. Lapeyre 主编 (Mulberry Technologies / Balisage, 2024 年 8 月)。来源: balisage.net/MSM/MSMMemories-49.html。完整悼念文集见 balisage.net/RememberingMSM.html; TEI 协会的官方悼词见 tei-c.org。
Important detail: TEI was internationally tri-sponsored from day one. Not a British project, not an American project — a transatlantic humanities consensus. 关键细节: TEI 自创立之初即由三方国际共同主办。它既非英国独占, 亦非美国独占, 而是跨大西洋的人文学共识。
From SGML to XML · P1 → P5从 SGML 到 XML
<measure unit="pounds" value="-1000"> — the attributes encode meaning for the machine while the surface text stays readable to a human.
描述性标记示例。“One thousand pounds in debt”一句加上 <measure unit="pounds" value="-1000"> 标签后, 属性向机器编码意义, 表面文字对人类读者仍然可读。
«A few years after the publication of the second proposal (P2) in 1994, the World Wide Web Consortium issued a final Recommendation for Extensible Markup Language (XML) in 1998; consequently, the Guidelines needed to be re-expressed in this new formalism. … The proposal has undergone several revisions since its inception and is currently on its fifth version (P5).»
«P2 (1994 年) 发布数年之后, 万维网联盟于 1998 年正式发布可扩展标记语言 (XML) 之建议; 因此, 《指南》需以此新形式重新表述。…该建议自诞生以来历经多次修订, 现已是第五版 (P5)。»
Figures 2 & 3 reproduced from Dr Patricia O'Connor, A History of Encoding from XML Markup to the TEI and EpiDoc Guidelines, OG(H)AM blog (University of Glasgow / Maynooth University, AHRC & Irish Research Council), 25 July 2024. Source: ogham.glasgow.ac.uk · A History of Encoding. 图 2 与图 3 取自 Patricia O'Connor 博士《从 XML 标记到 TEI 与 EpiDoc 指南: 编码史述》, OG(H)AM 项目博客 (格拉斯哥大学与梅努斯大学联合, 由英国 AHRC 与爱尔兰研究理事会资助), 2024 年 7 月 25 日发表。来源: ogham.glasgow.ac.uk · 编码史述。
From project to consortium · 2000–2001由项目到联盟
After thirteen years of grant-funded development, the TEI moved in 2000–2001 from project-status to a membership-based consortium. The founding host institutions:
- Brown University (USA)
- University of Bergen (Norway)
- University of Oxford (UK)
- University of Virginia (USA)
Today the consortium has roughly 140 institutional and individual members worldwide. Governance: an elected Board of Directors, a Technical Council that maintains the Guidelines, and special-topic working groups (manuscripts, linguistics, music, epigraphy, …).
经过十三年靠基金支持的发展, 2000–2001 年间 TEI 由项目转型为会员制 联盟。创始机构成员: 布朗大学 (美) · 卑尔根大学 (挪威) · 牛津大学 (英) · 弗吉尼亚大学 (美)。
至今联盟约有 140 个机构与个人成员遍布全球。治理机制: 由会员选举的董事会、维护《指南》的技术委员会, 以及各专题工作组 (写本、语言学、音乐、铭文学等)。
«The Text Encoding Initiative was born into quite a different world from that of today. In 1987, there was no such thing as the World Wide Web… In academic life, it was still (just about) possible to finance an undergraduate degree without bankrupting one's parents. The TEI's twenty-five-year transition from grant-funded research project to dues-supported, member-governed research infrastructure mirrors the maturing of digital humanities itself.»
«TEI 诞生时的世界与今日已大相径庭。1987 年, 万维网尚未存在…。在学术生活中, 一个本科学位的学费仍 (勉强) 不至于让父母破产。TEI 历经二十五年, 由基金资助之研究项目, 转型为靠会员制治理、由会费支撑的研究基础设施,此一转型恰反映数字人文学整体之走向成熟。»
Figures 1 & 2 reproduced from Lou Burnard, “The Evolution of the Text Encoding Initiative: From Research Project to Research Infrastructure” (April 2013), Journal of the Text Encoding Initiative, Issue 5 · Source: journals.openedition.org/jtei/811 · DOI 10.4000/jtei.811 · Article licensed CC-BY 3.0. 图一与图二取自 Lou Burnard《TEI 之演进: 由研究项目至研究基础设施》 (2013 年 4 月), 载《Journal of the Text Encoding Initiative》第 5 期 · 来源: journals.openedition.org/jtei/811 · DOI 10.4000/jtei.811 · 文章依 CC-BY 3.0 授权使用。
Why epigraphy needed its own customization铭文学何以须有自家定制
TEI P4/P5 was designed primarily for born-textual works — literary editions, manuscripts, linguistic corpora. Inscriptions raised four problems TEI permitted but did not standardise:
- The physical artefact. A stone's material, dimensions, layout, lettering, and damage history are part of the textual evidence — needed richer
<msDesc>conventions than ordinary manuscript description. - The Leiden conventions (codified 1931, refined since) — a 150-year-old print tradition for editorial markup that needed an authoritative XML mapping.
- Dense apparatus. Inscriptions routinely carry competing readings by editors a century apart; the apparatus is heavier than typical philological cases.
- Authority-list integration. Places (Pleiades), persons (PIR, LGPN, Trismegistos People), dates (godot.date), object types (EAGLE vocabularies) — a tighter linked-data discipline than literary corpora required.
TEI P4/P5 当初主要是为“天然以文本形式存在”的对象设计的,文学版本、写本、语言学语料库。可是铭文不一样, 它给 TEI 带来了四类问题。这些问题 TEI 在技术上都允许处理, 却没有给出统一的处理方式:
- 实物本身就是文本证据。石头是什么材质、有多大、字怎么排、字形长什么样、后来又损了哪里, 这些都关乎释读, 不只是装裱说明。一般写本描述用的
<msDesc>规约不够细, 需要更多。 - 莱顿规约(1931 年成文, 约莫等于我国民国二十年; 之后历经修订)是百五十年来的印刷传统,谁该补的字、谁该删的字、哪里残了、哪里疑了, 编者怎样标记, 都有定法。把这套传统对应到 XML, 需要一个权威的映射方案。
- 校勘异常密集。一块石头常常承载着相距百年的不同学者读法, 你读出的字与上一位学者读出的字未必相同,一般古文献的校勘记往往三两条, 铭文的校勘记可以厚到吓人。
- 对权威表的依赖远高于一般文学语料。地名要对到 Pleiades, 人名要对到 PIR、LGPN 或 Trismegistos People, 日期要对到 godot.date, 器物要对到 EAGLE 词表,这种贴紧链接数据的工作量, 是普通文学语料库不必担负的。
The community that gave EpiDoc its form为 EpiDoc 定形的社群
The community customisation began ~2000 with Tom Elliott (Ancient World Mapping Center, UNC; later NYU/ISAW) and Anne Mahoney (Stoa Consortium). Hugh Cayless (Duke) and Gabriel Bodard (then King's College London, now Roehampton) joined to formalise the schema. The first major published corpus in EpiDoc was the Inscriptions of Aphrodisias (KCL, edited by Charlotte Roueché, online 2004). The EpiDoc Guidelines reached their first formal release around 2006.这件事大约在公元 2000 年前后才真正开始, 由 Tom Elliott(当时在北卡罗来纳大学的古代世界地图中心, 后来到纽约大学的 ISAW)和 Anne Mahoney(Stoa Consortium)着手。后来 Hugh Cayless(杜克大学)和 Gabriel Bodard(那时在伦敦国王学院, 现在已迁至 Roehampton 大学)也加入进来, 一起把这套规范固定下来。第一部正式以 EpiDoc 出版的大型铭文集, 是 《阿弗洛狄西亚铭文集》(KCL, Charlotte Roueché 主编, 2004 年上线); EpiDoc《指南》的第一个正式版本, 大约在 2006 年面世。说是社群定制, 其实就是几位学者一边在自己的项目里编铭文, 一边把彼此踩出来的路, 慢慢梳理成一套大家都能照着走的规则。
Photographs: Inscriptions of Aphrodisias 2007, ed. J. Reynolds, C. Roueché, G. Bodard. King's College London, Centre for Computing in the Humanities (now King's Digital Lab). © Aphrodisias Excavations / King's College London. Online at insaph.kcl.ac.uk/iaph2007; earlier 2004 corpus Aphrodisias in Late Antiquity.图源:《阿弗洛狄西亚铭文集 2007》, J. Reynolds、C. Roueché、G. Bodard 主编。伦敦国王学院人文计算中心 (今 King's Digital Lab) 制作。© 阿弗洛狄西亚发掘队 / 伦敦国王学院。在线版本见 insaph.kcl.ac.uk/iaph2007; 2004 早期语料库 《晚期古代之阿弗洛狄西亚》。
Where EpiDoc lives todayEpiDoc 之当代版图
Cross-corpus discovery happens through Trismegistos (Leuven), which indexes over a million ancient texts using EpiDoc-compatible authority data for persons, places, and dates.想跨着不同语料库去检索某个人、某个地点、某一组年代, 通常都要走 Trismegistos(在比利时鲁汶大学)这一站; 它以 EpiDoc 兼容的人名、地名、日期权威数据, 已索引超过一百万件古代文本, 是各家语料库之间的"中央换乘站"。
I.Sicily · IAph (Aphrodisias) · IRT (Tripolitania) · IRCyr (Cyrenaica) · Inscriptions of Roman Macedonia · US Epigraphy · EAGLE (European Federation aggregating EDR/EDH/EDCS/Hispania Epigraphica) · Vindolanda tablets · Pompeii graffiti corpus.希腊—拉丁铭文学的主要语料库, 现在几乎都以 EpiDoc 编码; 欧盟 EAGLE 联盟则把意大利的 EDR、海德堡的 EDH、克劳斯—斯莱比的 EDCS、西班牙的 Hispania Epigraphica 等多国数据库汇到一处, 让人不必挨个网站翻。
Papyri.info — the aggregator over the Duke Databank of Documentary Papyri, the Heidelberger Gesamtverzeichnis (HGV), and APIS — migrated to EpiDoc around 2008. Today some 60,000+ papyri are encoded in EpiDoc and editable collaboratively through the SoSOL platform.纸草学方面, Papyri.info 把杜克的 DDbDP、海德堡的 HGV 与 APIS 三个原本各自为政的库整合到了一起, 大约 2008 年前后全面转用 EpiDoc; 现在已有约六万件纸草以 EpiDoc 编码, 学者们通过 SoSOL 平台协作编辑,谁加了什么字、删了什么字, 都留下版本记录。
Coin legends entered the EpiDoc ecosystem via Nomisma and OCRE (Online Coins of the Roman Empire, ANS/Oxford).钱币学稍后才加入进来。Nomisma 与 OCRE(Online Coins of the Roman Empire, 美国钱币学会与牛津大学合作)把币面上的铭文也带进了 EpiDoc 生态, 一枚枚硬币上的简短铭文从此可与石头上的长篇铭文统一查询。
Medieval donor & dedicatory inscriptions (CARE · Corpus Architecturae Religiosae Europeae) · Demotic and Coptic at Brown · Vesuvius-region ostraca · ancient South Arabian (DASI) · Mesoamerican Maya projects experimenting with EpiDoc-derived schemas.EpiDoc 的影响也正在向相邻的传统扩散。中世纪的奉献铭(CARE·欧洲宗教建筑语料库)、布朗大学所做的古埃及世俗体与科普特文、维苏威地区的陶片(ostraca)、古南阿拉伯文(DASI), 甚至中美洲的玛雅文字项目, 都在试验从 EpiDoc 派生出来的编码模式。古代希腊罗马以外的世界, 也正在通过这一套语法找到自己的入口。
Where the texts come from语料来源
EpiDoc data is increasingly published as open source alongside the customary website front-end. The same XML files that drive the published edition are mirrored on GitHub (and Zenodo/Heidata for citable snapshots), under CC-BY licences, so anyone can fork, correct, re-encode, or build a new platform from them. The four corpora below are the workshop's primary sources — each has a live site for reading and a repository for editing.EpiDoc 数据这几年越来越多地走开源路线, 不只是给你一个网站去读, 还把驱动那个网站的 XML 文件原样放在 GitHub 上(并通过 Zenodo / Heidata 等仓库提供可引用的版本快照), 多数采用 CC-BY 许可。换句话说, 你可以把整个语料库 fork 下来, 自己改、自己重新编码、甚至基于它另建一个新平台, 完全合法。下面四部语料库就是本工坊使用的主要来源, 每一部都既有给人读的网站, 又有给人改的仓库。
4,787 inscriptions from Sicily, encoded against EpiDoc 9+. Latin, Greek, Hebrew, Punic, Oscan, Sikel.西西里岛出土铭文共 4,787 件, 全部按 EpiDoc 9+ 版本编码, 语言极杂:拉丁文、希腊文、希伯来文、布匿文(腓尼基系)、奥斯坎文(意大利古族语)、西库尔文(本地土著语)都有, 像一份地中海各民族在一座岛上留下的密度极高的文字层叠。
1,618 inscriptions from Roman Libya. Predominantly Latin, with Neo-Punic and Greek.罗马时代利比亚地区(古称 Tripolitania)的铭文共 1,618 件, 以拉丁文为主, 但也有用新布匿文和希腊文写就的,这一带正好是罗马、迦太基与希腊文化交叠之处, 一块石头上可能同时承载着征服者的官方语言与本地人的母语。
1,489 inscriptions from Caria. Mostly Greek, late-antique aristocratic culture.小亚细亚卡里亚地区出土的铭文 1,489 件, 几乎全是希腊文, 时段集中在古代晚期(大约公元三世纪到六世纪, 约莫等于我国的魏晋南北朝)。读这一批, 能看见一座希腊化城市的贵族阶层如何在罗马—基督教世界中继续以希腊文表达自己。
2,360 inscriptions from Cyrenaica. Greek and Latin from a Greek-speaking province under Rome.昔兰尼加(今天利比亚东部)的铭文 2,360 件, 希腊文与拉丁文并存。罗马虽是统治者, 但当地是说希腊语的, 所以铭文里常常出现这样的格局:法律、军政事务用拉丁, 私人生活、墓志和宗教献辞用希腊,一个行省同时维持着两种文字生活。
All four are open data (CC-BY). The workshop draws ten texts from them, ranging from a three-line Latin epitaph to a four-line Greek elegiac quatrain. The same XML pipeline that publishes the corpus on its website is what the editor in the next pane edits — meaning every correction a student makes is a draft of a real edition.四部语料库都是开放数据(CC-BY)。本工坊从中精选了十铭, 短到三行拉丁文墓志, 长到四行的希腊文哀歌四行体, 各种类型都有一点。值得指出的是:你在右侧编辑器里改动的 XML, 与驱动这四个网站的 XML 走的是同一条管道。换句话说, 学生的每一次更正, 都不是练习作业, 而是一份真实版本的草稿。
EpiDoc reference: Leiden, P5 differences, elementsEpiDoc 速查 · 莱顿符号 · 与 P5 之别 · 元素总览
The Leiden Convention (codified 1931 by W. Peremans, V. Ehrenberg, and others; refined in Krummrey & Panciera 1980, Tituli 2) is the 150-year-old print tradition every modern epigraphic edition obeys. EpiDoc is essentially the XML serialisation of these brackets — but it tightens each one with attributes (@reason, @quantity, @unit) so what was visual print convention becomes machine-checkable data.莱顿规约是 1931 年由 W. Peremans、V. Ehrenberg 等学者商定的(约莫等于我国民国二十年), 1980 年又经 Krummrey 与 Panciera 修订, 收入《铭文研究》第 2 辑。一百五十多年来, 现代铭文版本基本上都遵循这套规约。EpiDoc 做的事, 说穿了就是把这套符号写成 XML,不过它不只是照搬, 而是给每一条规则都加上属性(@reason、@quantity、@unit), 让原本只供眼睛去看的印刷惯例, 变成程序可以检验的数据。
14 rules in 7 groups — every one of them appears in your workshop XML. Hover the rows to see the editorial intent encoded in each.七组、十四则规则:工坊的每一份 XML 都会用到。把光标移到行上, 可以看到每一则规则要表达的编辑意图。
Supply · 补字
What the editor reconstructs that is not on the stone — by category of reason. Three sub-rules distinguish what damage caused the absence (physical loss), what the cutter forgot (haplography), and what a native reader would have heard but not seen (subaudition).编者所补的字, 石头上原本是没有的; 但为什么会没有, 又有讲究。这一组下面三条细则, 分别对应三种不同的"没有":物理损坏造成的佚失、刻工自己漏掉的疏忽、本族读者一看就懂、原本就不必刻出来的省略。
| Leiden莱顿 | Meaning含义 | EpiDoc XMLEpiDoc XML | Example示例 |
|---|---|---|---|
[abc] | Lost — editor restores letters lost to physical damage of the support.佚失,编者补回因物理损坏而失之字。 Use when surface is broken, chipped, or worn through. Length must be justified in apparatus.用于残破、磕损或磨穿处。补字长度须有校勘根据。 | <supplied reason="lost">abc</supplied> | [Imp.] → <supplied reason="lost">Imp.</supplied> |
<abc> | Omitted — editor supplies what the cutter mistakenly skipped.漏刻,编者补回刻工误漏之字。 Stone is intact; the cutter slipped. Common in numeric repetitions, name-formulas.石面完好, 乃刻工之误。多见于重复数字、人名套语。 | <supplied reason="omitted">abc</supplied> | <o> → <supplied reason="omitted">o</supplied> |
<abc> | Subaudible — implicit to the native reader, never written.默会,本族读者心知而原本不刻者。 Same angle-brackets in print; the EpiDoc @reason distinguishes the editorial intent.印刷同形角括号; EpiDoc 以 @reason 区分编者意图。 | <supplied reason="subaudible">abc</supplied> | the public <works> → the public <supplied reason="subaudible">works</supplied> |
Deletion · 删衍
Letters present on the stone that the editor judges should not be read as text. EpiDoc distinguishes scribal redundancy (cutter's error) from ancient erasure (deliberate post-cut act, e.g. damnatio memoriae).这一组讲的是相反的情况:石头上明明有字, 但编者认为不该读进去。EpiDoc 在这里特意把两种情况分开:一种是刻工自己出错(写多了、写重了), 另一种是后来的人故意把它抹掉(刻好之后的有意行为, 比如damnatio memoriae,对某位失势者的名誉抹消)。
| Leiden莱顿 | Meaning含义 | EpiDoc XMLEpiDoc XML | Example示例 |
|---|---|---|---|
{abc} | Surplus — scribal redundancy, ignore when reading.衍文,刻工重出, 读时略去。 Most often a dittography. Stone is preserved; reader skips.多为重字。石面完好, 但读时跳过。 | <surplus>abc</surplus> | filiu{u}s → filiu<surplus>u</surplus>s |
⟦abc⟧ | Erasure — deliberately removed in antiquity.抹除,古人有意删去。 Both the original text and the act of erasure are preserved. @rend="erasure" records the physical method.原文与抹除行为皆保留。@rend="erasure" 记录抹除方式。 | <del rend="erasure">abc</del> | ⟦Domitiani⟧ |
Expansion · 释缩
Cutter's abbreviation made explicit by the editor. The bracket contains the resolution; the surface text is the abbreviation itself. EpiDoc nests the two pieces so each is independently queryable.古代刻工最常省字, 把 Imperator 刻成 Imp.、把 Caesar 刻成 Caes.; 编者读出来的时候, 习惯把省略的部分用括号补回。括号外的, 是石头上真正刻着的; 括号内的, 是编者补的。EpiDoc 把这两部分嵌套着写, 这样你想查"石头上刻的", 或想查"编者补的", 两者都各自能查得到。
| Leiden莱顿 | Meaning含义 | EpiDoc XMLEpiDoc XML | Example示例 |
|---|---|---|---|
abc(def) | Resolution of an abbreviation.缩写展开。 <ex> wraps editorial letters; without it the resolution is ambiguous.<ex> 包裹编者补字; 否则补释含混。 | <expan>abc<ex>def</ex></expan> | Imp(erator) → <expan>Imp<ex>erator</ex></expan> |
Doubt · 存疑
Letters present but doubtful, damaged, or only partially visible. Classical Leiden uses an underdot; EpiDoc encodes the same uncertainty with <unclear> and a controlled @reason.石头上的字还在, 但是模糊、残损, 或者只看见一部分。古典莱顿用法是在字母底下加一个小点, 表示"我读了出来, 但不太敢确定"。EpiDoc 把同样的意思编为 <unclear> 元素, 同时要求用 @reason 写明为什么不敢确定(损坏、笔画不规则、墨迹脱落, 等等)。
| Leiden莱顿 | Meaning含义 | EpiDoc XMLEpiDoc XML | Example示例 |
|---|---|---|---|
a̲b̲c̲ | Doubtful — visible but uncertain.存疑,可见但不可确认。 @reason takes one of: damage, eccentric_ductus, ink_loss, etc.@reason 取值含: damage、eccentric_ductus、ink_loss 等。 | <unclear reason="damage">abc</unclear> | Ḳaisar → <unclear reason="damage">K</unclear>aisar |
Gap · 阙文
Lacunae, illegible stretches, and intentionally unwritten space. Four sub-rules distinguish what kind of absence is being marked: lost-known, lost-unknown, illegible-but-present, and deliberately blank."没字"有许多种, 不能笼统一概以蔽之。这一组下面四条细则, 把"没字"再分得更细:有的字确实丢了, 而且大概能数出丢了几个; 有的字也丢了, 但数不清几个; 有的字其实没丢, 只是磨得看不清楚了; 还有一种, 是刻工本来就有意空在那里,属于版式的一部分, 不是缺漏。
| Leiden莱顿 | Meaning含义 | EpiDoc XMLEpiDoc XML | Example示例 |
|---|---|---|---|
[+++] (n dots) | Lost, count known — each dot = one missing character.佚失·字数已知,每点一字。 Quantity drives the apparatus; @unit can be character, line, or page.字数关乎校勘; @unit 可为 character、line 或 page。 | <gap reason="lost" quantity="3" unit="character"/> | [+++] |
[---] | Lost, count unknown.佚失·字数不明。 Used when the lacuna is bigger than can be confidently counted.当残缺过大、无法准确计数时使用。 | <gap reason="lost" extent="unknown" unit="character"/> | [---] |
+++ (no brkts) | Illegible — surface preserved but worn.漫漶,石面尚存而字难辨。 Distinct from lost: the surface is intact, just unreadable. Critical for restoration logic.与佚失有别:石面完好, 仅是难辨。关乎补字推断。 | <gap reason="illegible" quantity="3" unit="character"/> | filius +++ vixit |
vacat / v. | Intentionally blank space — encoded as <space>, not <gap>.刻工有意留白,用 <space> 而非 <gap>。Carries semantic weight: signals layout intention, hierarchy, or formal articulation.有语义价值:示版式意图、层级或结构分节。 | <space extent="unknown" unit="line"/> | Caesar vacat Augustus |
Layout · 版式
How the physical layout of the inscription is represented. EpiDoc records line and column boundaries as empty elements with @n numbering, so layout is queryable independent of text content.铭文是刻在物体上的文字, 因此它的版式,几行、分几栏、哪里换行、哪里换栏,也是要记录的。EpiDoc 把行、栏的边界写成空元素, 加上 @n 编号, 这样即便不读文字内容, 也能单独查询版式信息(例如:这一块石头一共有几栏?)。
| Leiden莱顿 | Meaning含义 | EpiDoc XMLEpiDoc XML | Example示例 |
|---|---|---|---|
| | Line break.换行。 @n is the line index. @break can record syllable-internal breaks.@n 为行号。@break 可记音节内断行。 | <lb n="2"/> | Caesar | Augustus → Caesar<lb n="2"/>Augustus |
‖ | Column break.换栏。 Used for stones with two or more vertical columns (e.g. ISic000470's central groove).用于双栏或多栏石面 (如 ISic000470 之中沟)。 | <cb n="2"/> | col.I … ‖ col.II … |
Spelling · 拼写
Archaic, dialectal, or scribal-error spellings are preserved alongside the regularized form. The <choice> wrapper lets the same XML serve both diplomatic and edited views.古老的、方言的、或者刻错的拼写, 编者既不删除、也不擅自改成标准形式, 而是把"原写"和"正字"两种形式并排放在一起。<choice> 这个包装器就是用来做这件事的:同一份 XML, 既能输出"如石所刻的原貌", 也能输出"经过整理的可读版"。
| Leiden莱顿 | Meaning含义 | EpiDoc XMLEpiDoc XML | Example示例 |
|---|---|---|---|
heic (in print) | Regularized vs original — both preserved.正字与原写,二者皆存。 Heavy in archaic Latin inscriptions (ISic000470: heic / aidibus sacreis / qum).古拉丁铭文常见 (ISic000470: heic / aidibus sacreis / qum)。 | <choice><reg>hic</reg><orig>heic</orig></choice> | heic → <choice><reg>hic</reg><orig>heic</orig></choice> |
Sources: EpiDoc Guidelines 9.8 (epidoc.stoa.org/gl/latest) · Krummrey & Panciera 1980 · Modeling Epigraphy with an Ontology (Epigraphy.info WG, 2026).参考:EpiDoc《指南》 9.8 (epidoc.stoa.org/gl/latest) · Krummrey & Panciera 1980 · 《以本体为铭文学建模》 (Epigraphy.info 工作组, 2026)。
EpiDoc is a customisation of TEI P5, not a separate schema. It works by including most of P5, dropping what doesn't fit epigraphy or papyrology, and tightening a handful of elements so editorial conventions can be checked mechanically. The compiled ODD becomes tei-epidoc.rng (785 KB), the RNG schema every workshop file validates against.EpiDoc 不是另立一套新的模式, 而是对 TEI P5 的一种定制。它的做法很务实:大部分 P5 直接拿来用; 与铭文学或纸草学场景不合的部分, 舍去; 少数几个元素稍微收严一下, 让原本只在文章里写明的编辑惯例, 现在程序也能自动校验。把整套 ODD 文件编译出来, 就是 tei-epidoc.rng(约 785 KB), 工坊的每一份文件都要通过这个 RNG 模式的校验。
Module-by-module treatment逐模块处理
Each row is one P5 module EpiDoc imports. Drops = how many P5 elements EpiDoc excludes from that module via except="…". Modules with zero exclusions are pulled in wholesale.下面每一行, 都是 EpiDoc 从 P5 引入的一个模块。"舍数"是 EpiDoc 在该模块中通过 except="…" 排除掉的元素数,也就是它觉得不适合铭文场景的元素。舍数为零的模块, 整个模块都沿用。颜色从绿到红, 表示 EpiDoc 对这个模块"动刀"的深浅。
| Module模块 | Drops舍数 | What it covers涵盖 |
|---|---|---|
core | 0 | Paragraphs, headings, names, lists, bibliographic references, lines, basic editorial markup. If your text is XML at all, you use elements from core.段落、标题、人名、列表、参考文献、行、基本编辑标记。XML 文档无不用核心模块之元素。 |
tei | 0 | The meta-level module that defines the top-level <TEI> document structure — the root holding teiHeader + text.元层模块, 定义顶层 <TEI> 文档结构,含 teiHeader 与 text 之根容器。 |
header | 1 | The <teiHeader> and its contents — metadata about source, publication, editorial responsibility, languages, classification, revisions.<teiHeader> 及其内容,出处、出版、编辑责任、语言、分类、修订之元数据。 |
textstructure | 18 | High-level structure of the text body: <text>, <body>, <front>, <back>, <div>. EpiDoc drops monograph-oriented elements (argument, byline, opener, closer) that don't fit inscriptions.文本主体之高层结构:<text>、<body>、<front>、<back>、<div>。EpiDoc 舍去专著式元素 (argument、byline、opener、closer) 等。 |
transcr | 2 | Transcription of primary sources — the heart of EpiDoc. <supplied>, <gap>, <unclear>, <add>, <del>, <space>. Where Leiden meets XML.原始资料之转录,EpiDoc 之核心。<supplied>、<gap>、<unclear>、<add>、<del>、<space>。莱顿在此化为 XML。 |
verse | 3 | Verse markup: line-groups <lg>, lines <l>, metrical notation <metDecl>. Used for metrical epitaphs and dedicatory poems.韵文标记:行组 <lg>、行 <l>、格律 <metDecl>。用于格律墓志与献诗。 |
analysis | 3 | Sub-sentence linguistic analysis: word tokens <w>, sentences <s>, phrases <phr>. Used by projects that lemmatize inscriptions.句以下的语言学分析:词元 <w>、句 <s>、短语 <phr>。用于词形标注铭文之项目。 |
certainty | 1 | Editor's certainty and responsibility for each judgement: <certainty>, @cert, @resp. Lets one file carry "certain" vs "conjectural" distinctions.每一判断的编者确信度与责任:<certainty>、@cert、@resp。使一文兼载"确"与"猜"之别。 |
gaiji | 3 | Non-Unicode characters: <charDecl>, <glyph>, <g>. For paleographic symbols not yet in Unicode.非 Unicode 字符:<charDecl>、<glyph>、<g>。用于尚未编入 Unicode 之古字符。 |
linking | 5 | Segmentation, alignment, cross-references: <link>, <ref>, <ptr>, <anchor>. Lets an inscription point at its image, bibliography, or commentary.分段、对齐、互引:<link>、<ref>、<ptr>、<anchor>。使一铭可指向其图像、参考文献或注解。 |
msdescription | 0 | The physical object: <msDesc>, <msIdentifier>, <support>, <material>, <objectType>, <layout>, <handDesc>, <provenance>. In EpiDoc, describes the inscribed stone itself.实物对象:<msDesc>、<msIdentifier>、<support>、<material>、<objectType>、<layout>、<handDesc>、<provenance>。在 EpiDoc 中, 描述刻文之石本身。 |
namesdates | 18 | Personal names <persName>, place names <placeName>, dates <date>, with rich sub-elements. EpiDoc drops modern-biography elements (education, faith, nationality, occupation) that don't map to ancient prosopography.人名 <persName>、地名 <placeName>、日期 <date>, 带丰富子元素。EpiDoc 舍去与古代人名学不合之现代传记元素 (education、faith、nationality、occupation)。 |
textcrit | 7 | Textual-criticism apparatus: <app>, <lem>, <rdg>, <note>. Records variant readings across editors and prior editions.校勘记:<app>、<lem>、<rdg>、<note>。记录历代编者的异读。 |
figures | 1 | Figures, tables, formulas: <figure>, <graphic>, <table>, <row>, <cell>. For embedded illustrations or tabular data.图、表、公式:<figure>、<graphic>、<table>、<row>、<cell>。用于版本内嵌之图示或表格。 |
spoken | 12 | Transcription of speech: <u> (utterance), <pause>, <kinesic>. EpiDoc drops nearly all — inscriptions are written, not spoken.口语转录:<u> (话轮)、<pause>、<kinesic>。EpiDoc 几乎全部舍去,铭文乃书写非口说。 |
corpus | 12 | Corpus-level description rather than individual text: language usage statistics, demographics, setting. EpiDoc drops modern-sociolinguistics elements (activity, channel, factuality).语料库整体描述, 非个别文本:语言使用统计、人口、场景。EpiDoc 舍去现代社会语言学元素 (activity、channel、factuality)。 |
dictionaries | 30 | Lexicographic structure: <entry>, <sense>, <etym>, <gramGrp>. EpiDoc drops virtually the entire module — inscription editions are not dictionaries.词典结构:<entry>、<sense>、<etym>、<gramGrp>。EpiDoc 几乎全部舍去,铭文版本非词典。 |
Worked examples — tightening in ISic000470实例 · ISic000470 中之收严
Three small fragments from ISic000470.xml demonstrate the practical effect. P5 would accept the left column; EpiDoc rejects it and demands the right.下面三个片段都来自 ISic000470.xml, 实际工坊文件。P5 是允许左栏写法的, EpiDoc 不允许,它会要求你写成右栏那样, 明确标注出"为什么"、"多长"、"什么单位"。
| Element元素 | P5 would acceptP5 可受 | EpiDoc requiresEpiDoc 须用 |
|---|---|---|
<supplied> |
<supplied>Imp.</supplied> |
<supplied reason="lost">Imp.</supplied> |
<gap> |
<gap extent="unknown"/> |
<gap reason="lost" extent="unknown" unit="character"/> |
<origDate> |
plain prose | <origDate datingMethod="#julian" notBefore-custom="0001" notAfter-custom="0200" evidence="lettering archaic-spelling">Augustan or Julio-Claudian</origDate> |
The 12 tightened elements十二受严元素
Each carries a mode="change" directive in the EpiDoc ODD — typically converting an optional attribute to required, or replacing an open value list with a closed vocabulary drawn from editorial conventions.下面这十二个元素, 在 EpiDoc 的 ODD 文件里都带着一个 mode="change" 指令,通常做的事情是:把 P5 里本来"可有可无"的属性, 改成"必须有"; 或者把 P5 里"取值随你写"的属性, 改成"必须从这套封闭的词表里选"。这些限制都是从编辑惯例里来的。
<supplied> editor-supplied text 编者补字<gap> missing/illegible 阙文/漫漶<origDate> original date 原刻年代<expan> abbreviation expansion 缩写展开<unclear> doubtful letters 存疑字母<lb> line break 换行<space> blank space (vacat) 刻工留白<div> edition/translation block 版本/译文段<add> ancient addition 古人增刻<biblScope> citation range 引文范围<hi> typographic highlight 字形高亮<ex> supplied within expansion 展开补字Why a customisation, not a separate schema?何以为定制, 不另立模式?
"Using a TEI schema maximizes the compatibility of EpiDoc encoded inscriptions with other text projects in the humanities generally. The EpiDoc Guidelines, therefore, rather than being an entirely new system, may be considered as a local guide to practice within the larger TEI guidelines.""使用 TEI 模式, 可以最大限度保证以 EpiDoc 编码的铭文, 与人文学界其他文本项目相兼容。因此《EpiDoc 指南》不应被看作另起炉灶的新系统, 而应被看作是在 TEI 大指南之内的一份地方实践指南。"
— Bodard 2010
The term inscription is multivalent — a sequence of signifiers on a support, or a distinct text on that support. EpiDoc respects the distinction architecturally: the material dimension (object, support, layout, hand, history) lives in
— Morlock & Santin 2014<msDesc>; the textual dimension (edition, apparatus, translation, commentary) lives in<text>. A single XML file can hold a stele bearing two unrelated texts cut by different hands, each as its own<div type="edition">, without forcing them into one representation."铭文"这个词本身是有歧义的。它可以指载体上那一串符号本身, 也可以指那串符号所写出的那段具体文本,二者并不相同。EpiDoc 在结构上把这种区分体现出来:物质这一面(石头本身、载体、版式、刻手、流传史)放在<msDesc>里; 文本这一面(版本、校勘、译文、注解)放在<text>里。这样一来, 一块碑上即使刻着两段彼此无关、出自不同刻手的文字, 也可以各以一个<div type="edition">来表达, 不必勉强当成一段。
Sources: compiled ODD tei-epidoc.xml v9.8 · Bodard 2010 · Morlock & Santin 2014. Module commentary from the EpiDoc Reference Guide build 2026-05-11.参考:编译后 ODD tei-epidoc.xml v9.8 · Bodard 2010 · Morlock & Santin 2014。模块解说取自 EpiDoc 参考指南 2026-05-11 编译版。
EpiDoc inherits 363 elements from 17 TEI P5 modules. The core transcription set below — the elements you will use in ~95% of workshop files — is grouped by editorial purpose. Each entry shows the canonical TEI gloss, a one-line role, and an EpiDoc-specific usage note (attribute, constraint, or convention).EpiDoc 从 TEI P5 的 17 个模块里继承了 363 个元素。我们不必每个都记,下面这一组是核心的转录集, 工坊文件里大约 95% 的场合都靠它们处理。每一条都给三层信息:TEI 自己的标准释义、一行角色说明, 以及 EpiDoc 在使用时的具体讲究(要哪个属性、有什么约束、习惯怎么写)。
Where the editor inserts judgement on top of the surface text. The Leiden brackets of every modern edition correspond to elements here.这一组, 都是编者要在原文之上加上自己的判断的地方。现代版本里所有的莱顿括号, 对应的就是这里的元素。
<supplied> editor-supplied text编者补字@reason ∈ {lost · omitted · subaudible · explanation · undefined}.EpiDoc 须用 @reason, 取值 {lost · omitted · subaudible · explanation · undefined}。<gap> gap阙文@reason; pair with @quantity + @unit, or @extent="unknown".EpiDoc 须用 @reason; 搭配 @quantity + @unit 或 @extent="unknown"。<unclear> unclear存疑@reason = damage · eccentric_ductus · etc.取代莱顿之字底点。@reason = damage、eccentric_ductus 等。<surplus> surplus衍文<del>: the cutter erred, not the post-cut hand.用于刻工之重字。与 <del> 不同:此为刻工之误, 非刻后之手。<del> deletion删除@rend for the physical method: erasure, expunged, strikethrough.用 @rend 记录物理方式:erasure、expunged、strikethrough。<add> addition增刻@place records position: above, below, margin, interlinear.@place 记录位置:above、below、margin、interlinear。Where the cutter compressed or spelled idiosyncratically and the editor expands or normalises. <choice> is the key wrapper that lets both forms coexist.这一组处理两类问题:刻工把字写得简短(缩写), 或者把字写得不合规范(古体、方言、刻误)。前者由编者展开, 后者由编者并陈正字。<choice> 是关键的包装器, 它让原形和正字两种写法可以共存于同一份 XML 中。
<expan> expansion展开<abbr> (or surface letters) and <ex> (supplied letters).包裹 <abbr> (或原刻字母) 与 <ex> (编者补字)。<abbr> abbreviation缩写<expan> when the abbreviated form is obvious from context.若缩写形式从上下文可明, 可于 <expan> 内省略。<ex> editor-supplied within expansion展开中编者补字<choice> choice between alternatives异形并陈<reg> + <orig>, or <corr> + <sic>.内含 <reg> + <orig>, 或 <corr> + <sic>。<reg> regularised正字<choice>.<choice> 中之"现代拼写"成员。<orig> original form原写<choice>.<choice> 中之"石上原形"成员。<corr> correction校正<reg>: this asserts an error in the source.与 <reg> 有别:此明示原文为误。<sic> sic — thus存原<corr>.<corr> 之"如刻"对位。Where the inscription's physical form leaves a mark in the encoding. Layout elements are empty (self-closing) — they record boundaries between visible bits.铭文是刻出来的, 因此它的物理形态也要在编码中留下痕迹。这一组元素大多是"空元素"(自闭合, 没有内文),它们的作用是在可见的内容之间, 划出边界:这里换行, 这里换栏, 这里有刻意的留白。
<lb> line beginning行首@n = line number. @break="no" records syllable-internal split.@n 为行号。@break="no" 记录音节内断行。<cb> column beginning栏首@n = column number.@n 为栏号。<space> space (vacat)留白<gap>.EpiDoc 专设:vacat 用此元素而非 <gap>。<hi> highlighted字形高亮@rend = capitals, ligature, larger-than, etc.@rend = capitals、ligature、larger-than 等。<handShift> change of hand换手@new = ID of the new hand declared in <handDesc>.空元素;@new = <handDesc> 中声明之新刻手 ID。The framing that turns a transcribed stretch of letters into a citable edition: divisions, the description of the object, dates, names, places.把一段转录文字变成一份可被引用的版本, 需要一层外部框架:文本怎么分段、它是刻在什么对象上、那对象出自哪里、年代有多老、里面提到了哪些人和哪些地方,这一组元素提供的就是这层框架。
<div> text division文本分段@type = edition · translation · commentary · apparatus.EpiDoc 惯例:@type = edition、translation、commentary、apparatus。<msDesc> object description对象描述<support> support载体<supportDesc>; holds material, dimensions, condition.置于 <supportDesc> 内; 含材质、尺寸、状况。<material> material材质@ref.以 @ref 链至 EAGLE 词表。<origDate> origin date原刻年代@evidence to 11 values (lettering, content, archaeology, etc.).EpiDoc 扩展 @evidence 取值为 11 类 (lettering、content、archaeology 等)。<origPlace> origin place原出地@ref.以 @ref 链至 Pleiades 或 Trismegistos GeoID。<persName> personal name人名@ref; nest <persName> children for sub-fields.以 @ref 链至 Trismegistos People / LGPN; 嵌套 <persName> 子节点表达分项。<placeName> place name地名@ref; @type = ancient · modern.以 @ref 链 Pleiades;@type = ancient、modern。Element glosses verbatim from tei-epidoc.compiled.xml v9.8 (363 elements across 17 modules). Full reference: epidoc.stoa.org/gl/latest; per-element pages at tei-c.org/release/doc/tei-p5-doc.元素释义直引自 tei-epidoc.compiled.xml v9.8 (17 模块共 363 元素)。完整文档:epidoc.stoa.org/gl/latest;每元素之专页见 tei-c.org/release/doc/tei-p5-doc。
Every EpiDoc file has the same skeleton每份 EpiDoc 文件都有相同的骨架
Learn this path by heart. Every inscription in every project has this same shape; the only thing that varies is what lives inside the <div> elements.这条路径背下来即可。每个项目、每份铭文都长成这同一副样子;真正会因铭文而变化的, 只是 <div> 元素内部的内容。
<TEI>
<teiHeader> ← metadata · who, where, when, what stone
<fileDesc> · bibliography of the digital file
<encodingDesc> · the EpiDoc version and any project conventions
<profileDesc> · languages, calendar, text classification
<revisionDesc> · who edited the file and when
</teiHeader>
<facsimile> ← images of the stone
<text>
<body>
<div type="edition"> ← the editorial transcription
<div type="apparatus"> ← competing readings
<div type="translation"> ← one or more modern translations
<div type="commentary"> ← scholarly notes
<div type="bibliography">← who has published this text
</body>
</text>
</TEI>
<div>s in the body, four kinds of editorial labour② <body> 中的四个 <div>, 即四类编者工作Inside <body>, four canonical <div type="…"> blocks hold four distinct kinds of work: edition = what the editor reads on the stone (with brackets, supplied letters, abbreviations expanded); apparatus = competing readings from earlier editors and other witnesses; translation = one or more modern renderings; commentary = scholarly notes on history, prosopography, palaeography. Printed editions collapse all four onto one page; EpiDoc keeps them as four independently citable, queryable, renderable resources.<body> 里, 四个 <div type="…"> 各负担一类编者工作:edition(校录)= 编者在石面读出的文(带括号、补字、缩写展开等);apparatus(校勘)= 前辈编者与诸证的异读;translation(译文)= 一种或多种现代翻译;commentary(注释)= 关于历史、人名、古字学的学术注解。印本把这四类压在一页;EpiDoc 把它们摊为四份各自可引、可检、可渲染的资源。
<div type="edition"> — three layers in one block③ <div type="edition"> 内部,一文三层What reads like one continuous transcription actually carries three superimposed semantic layers: what the stone has (text inside <abbr>, broken by <lb>); what the editor added (expansion in <ex>, supplied letters in <supplied>, corrections in <corr>); and what it means (names, places, offices, dates, numbers, each in its own semantic wrapper with @ref pointing to an authority list). Read only the first layer and you have the diplomatic text; read only the third and you can build a database. EpiDoc keeps all three readable at once.看似一气贯下的“释读”块, 其实层层叠合着三组语义:石上所刻(<abbr> 之内, 由 <lb> 分行);编者所补(展开装 <ex>、补字装 <supplied>、校改装 <corr>);语义所指(人名、地名、官职、日期、数字, 各装入相应元素, 并以 @ref 挂上权威表)。只读第一层, 即得“实录”;只读第三层, 便可建一份数据库。EpiDoc 让三层同时可读。
Visual ≠ Diplomatic ≠ Editorial ≠ Semantic. A printed edition collapses these four onto one page using brackets and italics; EpiDoc separates them into distinct elements so each can be queried, validated, and rendered independently. The four views in the next slide are exactly four different projections of the same XML file, each privileging one of these layers.视觉 ≠ 实录 ≠ 编者干预 ≠ 语义。印本以方括号与斜体把这四层压成一页;EpiDoc 则把它们各自摊到不同元素里, 使每一层都可以被独立检索、校验、渲染。下一页将看到的“四视图”, 正是同一份 XML 在这四种维度上的四种投影。
Your route through the workshop工坊路线图
<expan> + <g> + <hi> + <num>缩写展开 · 字符标记 · 字形高亮 · 数字<am> for plural-marker abbreviations复数标记式缩写 (<am>)D M S + <date dur> + <gap>神祇呼告 · ISO 8601 寿数 · 量化阙文<supplied> + godot.date希腊文 L 符 · 编者补字 · godot.date<lg>/<l> verse + ethnic诗行/诗节 · 民族名<textpart> code-switching语段切换 · 双语编码<supplied> + cert="low"大量补字 · 低确信度Tier 1 = warmup · Tier 2 = standard · Tier 3 = capstone. Click any button to jump straight to that exercise; press Esc for the full slide overview, or use the sidebar tab at the right edge.第 1 级为入门, 第 2 级为标准, 第 3 级为毕业作品。点击任一按钮即可直跳那一节; 按 Esc 打开整套幻灯片总览; 也可使用右侧的侧栏标签。
Two tools, two purposes两件工具, 两种用途
同一篇铭文, 四种视图同屏并陈:翻开书本时看到的莱顿排版、语料库网站发布的网页视图、数据库里存储的关系型行列、以及驱动前面三种视图的原始 EpiDoc XML。课堂上用投影展示, 让你直观看见 EpiDoc 这份文件究竟"在做什么"。
十道练习的实际操作环境。四个面板:莱顿规约速查表、随打随出 XML 的小型试验场、能编全篇铭文的文档编辑器, 以及处理双语 / 多语铭文的多语编辑器。真正动手的工作, 都在这里完成。
做完一道, 几种方式可选:把 XML 下载下来、生成一个互评用的 URL、发到工坊的 GitHub issue, 或直接邮给指导老师。每次提交把 XML、自由文字、自评清单一并打成一个 base64 令牌, 不丢任何一份。
The next two slides walk through each tool live. After that, the ten exercises follow — each one ships with a "Launch playground" button that pre-loads its stub XML.
下面两页分别现场演示这两件工具。再往后就进入十道练习, 每一道都备有“进入工坊”按钮, 会把当题的骨架 XML 预先载入。
leiden-playground.html — where you actually encodeleiden-playground.html,真正动手编码的地方
Four panels covering the full encoding workflow.四个面板, 涵盖从查表到编辑的全过程:
- Convention reference — the Leiden ⇄ EpiDoc table, always one click away.规约速查:莱顿 ⇄ EpiDoc 对照表, 随时可调出。
- Playground — type Leiden on the left, watch the XML build on the right; correct your typing as you learn.小试验场:左侧打莱顿, 右侧实时生成 XML; 边打边改, 立刻见效。
- Document editor — the full canvas, where you build a complete EpiDoc inscription from the stub.文档编辑器:完整画布, 由骨架出发, 编出整篇 EpiDoc 文件。
- Multilingual editor — for texts that switch languages mid-line (Latino-Punic, Latin + Greek, etc.).多语编辑器:给那些在同一行里切换语言的铭文用(拉丁 + 布匿、拉丁 + 希腊等)。
From each exercise's title slide, a "Launch playground" button opens this tool with that exercise's stub XML pre-loaded. Open in a new tab ↗每道练习的封面页都有“进入工坊”按钮, 会把当题的骨架 XML 直接载入。 在新窗口中打开 ↗
epidoc-resolver.html — from a description to EpiDocepidoc-resolver.html,由描述生成 EpiDoc
Paste a description of an inscription — prose, a Leiden transcription, or both — and let an LLM resolve it into a complete EpiDoc TEI document.粘贴一份铭文描述,散文叙述、莱顿转写, 或两者兼有,由大语言模型解析为一份完整的 EpiDoc TEI 文档。
- Paste & resolve — feed it free prose or a Leiden transcription; it returns a full TEI document.粘贴并解析:输入散文或莱顿转写, 即返回一份完整 TEI 文档。
- Qwen or DeepSeek — choose the model under ⚙ Settings; the API key is sent as a Bearer token.通义千问或深度求索:于 ⚙ 设置中择一, API 密钥作为 Bearer 令牌发送。
- Validate & preview — the resolved XML is checked and rendered before you commit anything.校验与预览:解析所得 XML 先经校验与呈现, 然后再行提交。
- Upload to the repository — gated by the shared upload password, exactly like the playground.上传至库:经共享上传密码把关, 与试验场完全相同。
A fast first draft when you start from a catalogue description rather than a blank editor. Open in a new tab ↗当你从著录描述而非空白编辑器起步时, 可借此快速得到初稿。在新标签页打开 ↗
C.101 · The Edicts of AugustusC.101 · 奥古斯都之昔兰尼诏令
高两米的大理石碑 · 六篇帝令 · 昔兰尼, 公元前 7—4 年 · IRCyr 2020 主编 Joyce M. Reynolds
This is the kind of inscription EpiDoc was made for. The next 19 slides walk through how the encoding handles a text of this complexity — the stone, the six documents, the structure, the editorial interventions, the authorities, the words, the dates, the languages, the apparatus, the bibliography, the revision history, the validation — and how that encoding makes the monument more accessible, more searchable, and more honest about its own editorial history than any printed edition could. Pair this section with lexicon.html for the full vocabulary.这正是 EpiDoc 之所以存在的那一类铭文。以下十九页, 将逐一展示编码如何处理一篇这等复杂的文本,石本身、六篇文书、结构层级、编者干预、权威挂载、词形归并、日期编码、语言标注、校勘异读、参考文献、修订史、验证机制,以及, 这种编码如何使整篇碑铭较任何印本都更易读、更可检索、更对自家编辑史诚实。本节宜与 lexicon.html 并阅, 以备完整术语之查。
One stele · six documents · two dating campaigns一石 · 六文 · 两次断代
The six documents on C.101C.101 上的六篇
I 41 ll.ll. 1–41 Mixed Greek-Roman courts in the senatorial province.于元老院行省设立希腊—罗马混合法庭。 7–6 BCE
In capital cases between Cyrenaeans, where the defendant is a non-Roman Greek, the jury shall consist of equal numbers of Greek and Roman jurors — drawn from the wealthiest residents of the province (those holding property of at least 7,500 denarii). The defendant may, however, request a jury entirely of Greeks.凡居伦人之间的死罪案件, 被告若为非罗马籍希腊人, 陪审团由希腊人和罗马人各占一半组成, 从行省内财产最丰之住民中选任(资产须达 7,500 第纳里)。被告也可申请由全部希腊人组成的陪审团。
List of 215 Romans eligible appended at the head of the edict诏首附录 215 位合资格罗马公民名单
C.101 ll. 1–47 · ed. Lewis & Reinhold 1990, no. 36II 16 ll.ll. 42–55 Judgment on three citizens (the P. Sextius Scaeva case).对三位公民的判决(P. Sextius Scaeva 一案)。 7–6 BCE
Three Cyrenaeans — Aulus Stlaccius Maximus son of Lucius, Lucius Stlaccius Macedo son of Lucius, and Publius Lacutanius Phileros — informed the Roman authorities of matters which they said concerned my safety and that of the state. Investigation found their charges baseless. I release them from the surety imposed on them by P. Sextius Scaeva.三位居伦人(Aulus Stlaccius Maximus、Lucius Stlaccius Macedo、Publius Lacutanius Phileros)曾报称事关我之安危; 经查并无实据, 我解除 P. Sextius Scaeva 加于他们的保释。
Whether Stlaccius Maximus also acted improperly in removing a statue inscribed with my name from a public place, I reserve to decide later, after hearing him.至于 Stlaccius Maximus 是否擅自取下刻有我名之公共雕像, 容俟其陈词后再裁。
C.101 ll. 48–63 · ed. Lewis & Reinhold 1990, no. 36III 8 ll.ll. 55–62 No fiscal immunity for new Roman citizens unless granted.新获罗马公民权者, 除非授令明载, 否则不予免税。 7–6 BCE
No one who has received Roman citizenship from me is thereby relieved of the compulsory services he owes to his original community — unless I have expressly granted that immunity in the same decree.凡从我处获得罗马公民权者, 并不因此免除其原属社群应尽之公共义务,除非授令本身明文豁免。
Citizenship is a personal honour, not a transfer of fiscal obligation.授予公民权是对个人的荣誉, 不是把财政义务一并转走。
C.101 ll. 63–71 · ed. Lewis & Reinhold 1990, no. 36IV 12 ll.ll. 62–71 Court procedure for civil disputes between Greek residents.希腊居民间民事争讼之审判程序。 7–6 BCE
In all non-capital disputes between Greek residents of Cyrenaica, the jurors shall be Greek — except where the defendant prefers a Roman jury. No juror may be drawn from the same city as either party to the dispute.凡居伦尼希腊居民之间非死罪之争讼, 陪审皆用希腊人,唯有被告改请罗马陪审者除外; 陪审员不得来自争讼任一方所属之城。
C.101 ll. 72–82 · ed. Lewis & Reinhold 1990, no. 36V 13 ll.ll. 72–82 New procedure for trying provincial governors.审判行省总督之新程序。 4 BCE
I have judged it proper that the senatus consultum passed in my ninth consulship under M. Valerius Messalla and P. Sulpicius Quirinius be sent into the provinces and posted in the most public place possible, so that those whom we govern may know it.我以为元老院于我第九次执政之年(M. Valerius Messalla 与 P. Sulpicius Quirinius 同列)所通过之决议, 当布告各省, 置于民众最易见之处, 使我所治理之人皆知。
C.101 ll. 83–95 · ed. Lewis & Reinhold 1990, no. 36VI 62 ll.ll. 83–144 Senatus Consultum implementing V, under cons. Sabinus & Rufus.元老院决议, 落实第 V 诏令, 执政官 Sabinus 与 Rufus 同列。 4 BCE
Senatus consultum, 4 BCE, on the consulship of M. Valerius Messalla and P. Sulpicius Quirinius.元老院决议, 公元前 4 年, M. Valerius Messalla 与 P. Sulpicius Quirinius 同任执政之年。
A new procedure is established for the recovery of monies extorted by Roman magistrates from provincials. The plaintiff approaches a Roman magistrate in Rome; the magistrate convenes a panel of five senatorial recuperatores drawn by lot. Verdicts pass by majority, and restitution is to be executed within thirty days. Cases that touch capital charges proceed under ordinary Roman criminal procedure.为追讨罗马官吏对行省的勒索, 设立新程序: 申诉者向罗马治权者提告, 治权者抽签召集五位元老级recuperatores(评判员)审理; 决议以多数表决, 三十日内执行赔付; 凡涉死罪者, 依罗马常规刑律审理。
Witnesses may not be compelled to remain in Rome beyond what is reasonable; envoys from the prosecuting community shall be reimbursed travel and subsistence.证人不得被强留于罗马以致逾时; 提诉社群之使者, 其往返与食宿应受补偿。
C.101 ll. 96–157 · ed. Lewis & Reinhold 1990, no. 36 · ref. SEG IX 8How EpiDoc carries six documents on one steleEpiDoc 如何在一碑之内承载六篇
<div type="edition" xml:lang="grc">
<div type="textpart" n="I"> <!-- 7–6 BCE, mixed courts -->
<ab> <lb n="1"/><persName type="emperor" key="augustus">…
<div type="textpart" n="II"> <!-- 7–6 BCE, P. Sextius Scaeva -->
<div type="textpart" n="III"> <!-- 7–6 BCE, citizenship vs. tax -->
<div type="textpart" n="IV"> <!-- 7–6 BCE, court procedure -->
<div type="textpart" n="V"> <!-- 4 BCE, covering edict -->
<div type="textpart" n="VI"> <!-- 4 BCE, Senatus Consultum -->
A printed edition would run the six edicts as paragraphs separated by Roman numerals in bold. The XML preserves the same hierarchy, but each <div type="textpart"> becomes an independently addressable object — citable as C.101.III, downloadable, queryable. EpiDoc's honesty about structure rather than appearance here matters: each edict is a distinct legal act with its own date, even though they share one piece of stone.印本会用粗体罗马数字把六诏令编为相连的段落; XML 保留同一层级, 但每一个 <div type="textpart"> 都成为可独立寻址的对象,可独立引用为 C.101.III、独立下载、独立检索。EpiDoc 重结构而不重外观之原则, 在此尤显其用:六篇诏令虽共此一石, 却各为独立的法律文件, 各有其年代。
What this enables. A Roman-law scholar can pull "every imperial edict on provincial court procedure between 7 BCE and 14 CE" by querying textpart elements whose surrounding origDate/@notBefore falls in range — and C.101.I, IV, and V all surface as separate results, each with its own date.我们可以做什么。研究罗马法的学者, 只须以 origDate/@notBefore 落于公元前 7 至公元 14 年之间为条件查询 textpart 元素, 即可一并取出"该期内所有关于行省审判程序的帝令",C.101.I、IV、V 即各以独立条目浮现, 各带其年。
What the cutter wrote vs. what Reynolds restored刻工所刻 vs. Reynolds 所补
Reynolds restored 14 truly lost letters with <supplied reason="lost"> and silently supplied 38 omitted iota subscripts with <supplied reason="omitted"> — the distinction matters: lost means the stone is damaged, omitted means the cutter (or Greek spelling convention) skipped a letter the editor judged ought to be present. The @reason attribute lets a reader filter for one but not the other.Reynolds 用 <supplied reason="lost"> 补回了 14 处真正失刻的字母, 又用 <supplied reason="omitted"> 默补了 38 处省略的下标 ι(iota subscript),二者之分要紧:lost(失)指石面受损, omitted(漏)指刻工(或希腊正字习惯)未刻而编者判定该有。@reason 属性让读者能选其一过滤之。
Visible to anyone, machine-checkable too. The <div type="apparatus"> records 20 places where the lapis differs from Reynolds's text — e.g. line 5 has ΗΔΙ on the stone, line 142 reads ΒΟΥΑΗΙ (where Boulēi is expected). Each variant is a tiny `<app loc="N"><rdg>ΗΔΙ</rdg></app>` — citable, comparable, never lost.人可读, 机亦可校。 <div type="apparatus"> 记录了 20 处刻面与 Reynolds 校本之间的差异,第 5 行 ΗΔΙ、第 142 行 ΒΟΥΑΗΙ(本应为 Βουλῇ) 等。每一处异读都装在小小一只 <app loc="N"><rdg>ΗΔΙ</rdg></app> 中,可引、可比、永不丢失。
100% of places, every imperial date — linked out百分百地名、所有帝王纪年, 皆出链
Every <placeName> carries @ref每个 <placeName> 皆带 @ref
Cyrene, Apollonia, Knossos, every village and territory mentioned in the edicts — all 45 place references in the file resolve to the Society for Libyan Studies gazetteer (slsgazetteer.org), GeoNames, or Pleiades. A historical geographer can pull "every Cyrenaican settlement mentioned in Roman-period imperial edicts" in one query.昔兰尼、阿波罗尼亚、克诺索斯, 凡诏令中提到的城邑与领地,共 45 处地名引用, 无一例外皆链至利比亚研究学会地名表(slsgazetteer.org)、GeoNames 或 Pleiades。研究历史地理者, 一次查询即可取出"罗马帝制时期帝令中所有提及之昔兰尼加聚落"。
Every imperial titulature → godot.date每一帝王纪年 → godot.date
"In his 17th year of tribunician power, his 11th as consul" — a phrase that meant 7–6 BCE to a contemporary, but takes a Roman historian a full lookup to verify. Here it's wrapped in <date ref="godot.date/id/..."> — clickable, globally stable, joinable to every other text on godot.date that cites the same regnal year."任保民官第十七年, 任执政官第十一次",此话当时即指公元前 7—6 年, 但今日罗马史家须查表方可确认。此处包入 <date ref="godot.date/id/...">,可点、稳定、可与 godot.date 上所有引用此同一年的其他文本互通。
What this enables. A doctoral student researching the spread of Roman citizenship in the eastern provinces under Augustus can write one SPARQL query that joins (a) every imperial edict tagged with the same titulature URI, (b) every <placeName> with a Pleiades ID in the relevant region, and (c) the prosopography of senators named via <persName> — and get a fully-formed corpus of evidence in seconds, not months.我们可以做什么。一位研究奥古斯都治下东部行省罗马公民权扩散的博士生, 可写一条 SPARQL 查询, 同时连接(a) 所有挂同一帝王纪年 URI 之诏令、(b) 所有 <placeName> 带 Pleiades ID 之地名(限相关地区), 以及(c) <persName> 内挂载的元老人物志,几秒之内即得完整证据库, 而非数月。
Lemmatisation: 1,346 tokens, every one searchable词形标注: 1,346 个词标记, 皆可检索
<w lemma="ἀρχιερεύς">ἀρχιερεὺς</w> <w lemma="δημαρχικός">δημαρχικῆς</w> <w lemma="ἑπτακαιδέκατος">ἑπτακαιδέκατον</w> <w lemma="ἐξουσία">ἐξουσίας</w>
Every Greek word in C.101's 152 lines — all 1,346 of them — sits inside a <w> wrapper carrying its dictionary form on @lemma. A naïve search for "δημαρχικός" against the raw text would miss δημαρχικῆς and δημαρχικὴν; with lemmatisation, all three forms surface together. The lemmatisation pass is timestamped 2020-11-24 by editor "Irene" — even the digital scholarly labour is preserved.C.101 在 152 行中的 1,346 个希腊词, 无一例外, 都包在一个带 @lemma 属性的 <w> 中, 上挂词典原形。若以原文 "δημαρχικός" 搜索, 必漏掉 δημαρχικῆς、δημαρχικὴν 等屈折形; 一加词形标注, 三形齐显。本次标注的时间戳为 2020-11-24, 经手者为编者 "Irene",连数字学术之劳作, 亦被妥为存档。
The killer corpus query. Across all of IRCyr2020 (~2,300 inscriptions), find every attestation of ἐξουσία in any case form and rank by date. Without lemmatisation, this would mean grep + a Greek-paradigm checklist + manual disambiguation, taking days. With it, one query, two seconds.一招制胜的语料库查询。在整个 IRCyr2020(约 2,300 件铭文)中, 找出 ἐξουσία 之一切格变化的出现, 并按年代排序。若无词形标注, 须 grep + 希腊词形对照表 + 人工消岐, 旷日费时; 一旦标注到位, 一条查询, 两秒之间。
Editions, translations, and the file's own biography版本、译文, 与文件本身之传记
Bibliography entries参考文献
From Oliverio's editio princeps (Notiziario Archeologico 4, 1927, pp. 13–68) through AE 1927.166, SEG 9.8, FIRA (Riccobono 1941, with Oliverio's Latin), Johnson 1961, De Visscher, 17 SEG volumes 1955–2006, PHI 324432, Marengo 1988, Berthelot 2018, Kenrick 2013. Every entry is a <ptr target="..."> resolvable against the IRCyr master bibliography.自 Oliverio 之 editio princeps (《Notiziario Archeologico》第 4 卷, 1927 年, 13—68 页)以下:AE 1927.166、SEG 9.8、FIRA(Riccobono 1941, 附 Oliverio 之拉丁译)、Johnson 1961、De Visscher、1955—2006 间 SEG 共 17 卷, PHI 324432、Marengo 1988、Berthelot 2018、Kenrick 2013。每条均为可解析至 IRCyr 主参考库的 <ptr target="...">。
Translations side-by-side译文并陈
Each <div type="translation"> carries its own @xml:lang and @source: Johnson 1961 English of Edicts I+II+IV+V+VI; Berthelot 2018 English of Edict III; an alternative Johnson rendering of IV; and a Latin translation from Riccobono 1941. Multiple translations under one file — the reader can compare interpretations side by side.每一 <div type="translation"> 各带 @xml:lang 与 @source:Johnson 1961 译 I+II+IV+V+VI 五诏(英); Berthelot 2018 译 III(英); 另有 Johnson 对 IV 之异译; 还有 Riccobono 1941 之拉丁文译。多份译文同入一文, 读者可并列比对。
Timestamped revisions, 2009–2020修订时间戳, 2009—2020
The <revisionDesc> records nine timestamped passes: file created 2009-05-11 (GB, from template), P4→P5 conversion 2010-08-04, Greek pasted in 2011-09-29 (CMR), Leiden tagging 2011-10-03 (PaolaPiliego), Greek place-names lemmatised 2012-07-27 (GB), full lemmatisation + Unicode normalisation 2020-11-24 (Irene). The file is a living scholarly object, with its own biography.<revisionDesc> 记录九次带时间戳之修订:2009-05-11 模板生成(GB)、2010-08-04 由 P4 升级至 P5、2011-09-29 录入希腊文(CMR)、2011-10-03 莱顿标注(PaolaPiliego)、2012-07-27 希腊地名词形化(GB)、2020-11-24 全文词形化与 Unicode 规范(Irene)。整份文件乃活态学术对象, 自有其传记。
Why this matters for accessibility. A reader who can't read Greek can still grasp every edict via four translations. A reader who can read Greek can compare Reynolds's 2020 text against Oliverio 1927's editio princeps via the apparatus. A reader who wants to cite this monument has 28 prior editions at hand. And a reader who wants to know how the digital edition itself was built can read the file's own change log. One file, four readerships, one hundred years of scholarship.这何以关乎可读性。不通希腊文者, 仍可借四份译文通晓六诏全意; 通希腊文者, 可借校勘记将 Reynolds 2020 之校本与 Oliverio 1927 之祖本逐字比对; 欲引此碑者, 28 部前人版本一掌可握; 欲考此电子版之造法者, 文件自附沿革日志可读。一份文件, 四类读者, 百年学问。
Before any text: the metadata machine未及一字, 先有元数据机制
<msIdentifier><repository ref="institution.xml#db933">Cyrene Museum</repository></msIdentifier>
<physDesc>
<objectDesc>
<supportDesc><support>
<p><material>Marble</material> <objectType>stele</objectType>, tapering toward the top
(<dimensions><width>0.61-0.54</width><height>2.045</height><depth>0.36</depth></dimensions>).</p>
</support></supportDesc>
<layoutDesc><layout><rs type="execution" key="scalpro">Inscribed</rs> on one face.</layout></layoutDesc>
</objectDesc>
<handDesc><handNote>Augustan; ave. <height>0.01</height>.</handNote></handDesc>
</physDesc>
Every descriptive sentence a museum catalogue might write is here split into typed elements with measurable values. <material> isn't the word "marble" but a typed reference; <dimensions> carries three child elements with measurements in metres; <rs type="execution" key="scalpro"> means the inscription was chisel-cut (the key is from a controlled vocabulary of inscribing techniques).凡博物馆目录可能写出的描述, 此处皆被拆为带可测量值的有类型元素。<material> 不是"大理石"这个词, 而是一个类型引用; <dimensions> 内含三个子元素, 各以米为单位; <rs type="execution" key="scalpro"> 表示此铭为凿刻而成(键名取自一份刻法受控词表)。
Caveat. Don't treat the teiHeader as "the page before the text". It IS the text, for any search that asks "show me all marble stelae over 1.5 m" or "all inscriptions in Augustan letterforms". The encoding is what makes those queries possible.编者所诫。不可将 teiHeader 视作"正文前的扉页"。对任何"列出所有高过 1.5 米的大理石碑"或"所有奥古斯都体铭文"的检索而言, teiHeader 即是正文。其编码, 正是此类查询所赖。
Seven photographs, each linked to its lines七张照片, 各对应所摄之行
<facsimile>
<graphic url="Photos_0025" decls="#photograph">
<desc>Lines 1-47 (Department of Antiquities, D 44)</desc></graphic>
<graphic url="Photos_0026" decls="#photograph">
<desc>Lines 83-134 (Department of Antiquities, D 46)</desc></graphic>
<graphic url="Photos_0024" decls="#photograph">
<desc>Lines 112-144 (Department of Antiquities, D 47)</desc></graphic>
<graphic url="DSC04081" decls="#photograph"><desc>Face (2008, H.Walda)</desc></graphic>
<!-- + DSC04082, DSC04083, DSC04084 -->
</facsimile>
Each <graphic> is a typed pointer to an image file. The @url resolves to a real photograph; @decls="#photograph" says "this is a photograph, not a drawing or squeeze" — by pointing back to the <classDecl> in the teiHeader, where xml:id="photograph" is defined. The <desc> tells the reader (and any indexer) which lines that image covers and where it was taken.每一 <graphic> 是指向图像文件的有类型指针。@url 解析至真实照片; @decls="#photograph" 言"此为照片, 非线描亦非拓本",经由 teiHeader 中 <classDecl> 内 xml:id="photograph" 之定义回连。<desc> 则告知读者(与索引器)该图覆盖第几行、何人何时所摄。
Caveat. The same <facsimile> pattern accepts a <zone> child that pins exact pixel coordinates to specific letters on the image — a step EpiDoc supports but Reynolds did not take here. The framework scales up if you need region-level annotation of the photograph later.编者所诫。同一 <facsimile> 模式可加一 <zone> 子元素, 把照片上特定像素坐标钉至具体字母,此步 EpiDoc 支持, Reynolds 此处未做。日后若需对照片做区域级标注, 框架仍可扩展。
Dating: stone-level and edict-level, both encoded断代:碑级与文级, 皆有编码
<origin>
<origPlace>Unknown.</origPlace>
<origDate notBefore="-0007" notAfter="-0006"
evidence="titulature" n="I-IV">I-IV: 7-6 BCE; </origDate>
<origDate notBefore="-0004" notAfter="-0004"
evidence="titulature" n="V-VI">V-VI: 4 BCE</origDate>
</origin>
<!-- inside each textpart edition, the imperial titulature carries a godot.date URI: -->
<date ref="https://godot.date/id/k2sqW4JtApudigweWEWKNb">
<persName type="emperor" key="augustus">…</persName>
…δημαρχικῆς ἐξουσίας ἑπτακαιδέκατον…</date>
Two <origDate> elements record the two dating campaigns on the stone — @evidence="titulature" says "we know the year from the emperor's tribunician titulature, not from the stratigraphy or letterforms". The @n attribute spells out which textparts each range covers. Inside the edition, the actual phrase "in his 17th year of tribunician power" is wrapped in <date ref="godot.date/id/…"/> — a globally stable URI for that single regnal year.两个 <origDate> 元素记录碑面上的两期断代,@evidence="titulature" 言"我们由帝王任职年表得年, 而非由地层或字体"。@n 属性载明各范围所覆盖之文段。编辑层内, "任保民官第十七年"一语被裹入 <date ref="godot.date/id/…"/>,此 URI 对该单一在位年, 在全球范围内皆稳定。
Caveat. Stone-level <origDate> uses ISO-8601 BCE syntax (-0007 = 7 BCE; the leading hyphen and four-digit padding are mandatory). godot.date URIs are opaque — never invent one; always look up the exact ID at godot.date.编者所诫。碑级 <origDate> 用 ISO-8601 公元前语法(-0007 即公元前 7 年; 前导短横与四位补零均强制)。godot.date URI 是不透明的,切勿臆造, 务必至 godot.date 查得精确 ID 后使用。
Where does xml:lang go, and why three times?xml:lang 标在哪里, 为何标三次?
<TEI xmlns="…" xml:id="C08600" xml:lang="en"> <!-- ① file metadata is in English -->
<teiHeader>
<profileDesc>
<langUsage> <!-- ② declare every language used -->
<language ident="ar">Arabic</language>
<language ident="grc">Ancient Greek</language>
<language ident="la">Latin</language>
…
</langUsage>
</profileDesc>
</teiHeader>
<text><body>
<div type="edition" xml:lang="grc"> <!-- ③ inscription is in Ancient Greek -->
<div type="translation" source="johnson1961" xml:lang="en">
<div type="translation" source="riccobono1941" xml:lang="la">
Three distinct levels: ① the file's metadata language (English), ② the <langUsage> declaration listing every language anything in the file uses (here: ar, en, fr, de, grc, grc-Latn, el, he, it, la), and ③ the language of each editorial block. The same XML file legitimately contains Greek inscribed text, Latin translation, and English commentary — and a renderer can pick the right script and direction for each.三层各异:① 文件元数据所用之语(英语); ② <langUsage> 声明本件涉及之全部语言(此处含阿拉伯文、英文、法文、德文、古希腊文、拉丁化希腊文、现代希腊文、希伯来文、意大利文、拉丁文); ③ 每一编辑块各自之语种。同一 XML 文件合法地兼载希腊文铭文、拉丁文译文、英文注释,渲染器可据此为各部分选定正确字体与方向。
Caveat. Always use ISO 639-3 codes (grc = Ancient Greek, la = Latin, en = English). Codes like grc-Latn add a script subtag — useful for transliterated Greek. The xml:lang attribute cascades: any element without its own value inherits its parent's. So you only declare a switch.编者所诫。务用 ISO 639-3 代码(grc 为古希腊文, la 为拉丁文, en 为英文)。grc-Latn 之类加书写脚本子标签,对拉丁化希腊文颇为有用。xml:lang 具继承性:任何未显式标注之元素, 自动继承其父语言。故只须于换语处声明一次。
Naming the named: emperor, defendant, consul, freedman命名所命:元首、被告、执政官、释奴
<!-- ① the emperor, by controlled key (no lookup needed) --> <persName type="emperor" key="augustus"> <w lemma="Αὐτοκράτωρ">Αὐτοκράτωρ</w> <name nymRef="Καῖσαρ">Καῖσαρ</name> <name nymRef="Σεβαστός">Σεβαστὸς</name></persName> <!-- ② a defendant attested by LGPN prosopography (line 42) --> <persName type="attested" key="lgpn:V1-65612"> <name nymRef="Πόπλιος">Ποπλίωι</name> <name nymRef="Σέξστιος">Σεξστίωι</name> <name nymRef="Σκευᾶς">Σκεύαι</name></persName> <!-- ③ a consul named by Roman tria nomina components (line 84) --> <persName type="attested"> <name nymRef="Γάϊος">Γάϊος</name> <name nymRef="Καλουίσιος">Καλουίσιος</name> <name type="cognomen" nymRef="Σαβεῖνος">Σαβεῖνος</name></persName> <!-- ④ a freedman, identified through his patron (line 43–44) --> <persName type="attested" key="lgpn:V1-65907"> <name nymRef="Πόπλιος">Πόπλιον</name> <name nymRef="Λακουτάνιος">Λακουτάνιον</name> <persName type="attested"><name nymRef="Πόπλιος">Ποπλίου</name></persName> <w lemma="ἀπελεύθερος">ἀπελεύθερον</w> <name nymRef="Φιλέρως">Φιλέρωτα</name></persName>
One element family — <persName>/<name> — encodes four very different naming acts. @type separates the imperial protocol from a private defendant; @key with an LGPN URI links the defendant to the Lexicon of Greek Personal Names; @type="cognomen" on a child <name> marks the Roman tripartite name (praenomen-gentilicium-cognomen); a nested <persName> for the patron carries the freedman's legal genealogy.同一族元素,<persName>/<name>,承载四种全然不同的命名行为。@type 区分帝王官式与平民被告; @key 配 LGPN URI, 将被告挂接《希腊人名词典》; 子元素 <name> 之 @type="cognomen" 标罗马三段式名(praenomen-gentilicium-cognomen); 内嵌之 <persName> 则装入施主之名, 以承载释奴之法律出身。
Caveat. The Romans on this stone wear Greek inflectional clothing: Σέξστιος Σκεύας = Sextius Scaeva. The @nymRef on each <name> gives the canonical (nominative) form, so a search for "Sextius" works even if the inscription has the dative Σεξστίωι. Do not collapse the layers — Reynolds's decision to mark both the carved inflected form and the canonical nymRef is the whole point.编者所诫。本碑上的罗马人皆着希腊屈折之衣:Σέξστιος Σκεύας 即 Sextius Scaeva。每个 <name> 上的 @nymRef 给出标准(主格)形式, 故搜索 "Sextius" 即便铭文用与格 Σεξστίωι 亦可命中。切勿合并此二层,Reynolds 同时标注所刻屈折形与标准 nymRef, 正是要点。
A place can be six different kinds of thing"地"可以是六种不同之物
<!-- ancient findspot (where stone was carved/erected) --> <placeName type="ancientFindspot" ref="https://www.slsgazetteer.org/909">Cyrene</placeName> <!-- monuList — a building inside the findspot --> <placeName type="monuList" ref="https://www.slsgazetteer.org/1327">Assembly Building</placeName> <!-- ancientRegion — a province or geographic region --> <geogName type="ancientRegion" key="Cyrene">Cyrenaica</geogName> <!-- ethnic name (NOT a place per se — a demonym) --> <placeName type="ethnic" nymRef="#Ῥωμαῖος" ref="#roma">Ῥωμαίους</placeName> <!-- modernCountry / modernFindspot — for the present-day location --> <geogName type="modernCountry" key="LY">Libya</geogName> <placeName type="modernFindspot" ref="http://sws.geonames.org/82972">Shahat</placeName>
Six semantic roles, all expressed with a placeName/geogName element discriminated by @type. Cyrene as the ancient findspot resolves to the SLS gazetteer; Cyrenaica as the ancient region uses a controlled @key; Romans as an ethnic resolves to an internal #roma anchor in the file's prosopography list; Libya uses ISO 3166-1 alpha-2; Shahat is the modern town built over Cyrene, pinned to GeoNames.六种语义角色, 皆由 placeName/geogName 元素表达, 由 @type 区分。Cyrene 作古代出土地, 解析至 SLS 地名表; Cyrenaica 作古代区域, 用一受控的 @key; Romans 作民族名, 解析至本件人物志表中之内部锚点 #roma; Libya 用 ISO 3166-1 alpha-2; Shahat 是覆于昔兰尼故址之上的今镇, 钉至 GeoNames。
Caveat. Internal anchors (ref="#roma") and external URIs (ref="https://www.slsgazetteer.org/909") both use the same @ref attribute. The hash-only form points to an entry inside this XML file's prosopography; the full URL points outward. Don't confuse them — both are valid, but they resolve in very different places.编者所诫。内部锚点(ref="#roma")与外部 URI(ref="https://www.slsgazetteer.org/909")共用同一个 @ref 属性。仅含井号者指向本 XML 文件内部人物表之条目; 完整 URL 则外引。切勿混淆,二者皆合法, 但解析至全然不同的地方。
When letters mean numbers: ιθ = 19字母作数:ιθ = 19
<!-- line 73 of C.101: "the 19th tribunician year" --> <lb n="73"/><w lemma="δημαρχικός">δημαρχικῆς</w> <w lemma="ἐξουσία">ἐξουσίας</w> <space quantity="3" unit="character"/> <num value="19"><hi rend="supraline">ιθ</hi></num>
Two Greek letters with a horizontal stroke drawn over them. <hi rend="supraline"> records the stroke — a marker the ancient cutter used to say "this isn't a word, it's a number".两个希腊字母上方各加一道横线。<hi rend="supraline"> 记录此横线,古刻工以之示"这非词, 而是数"。
<num value="19"> gives the value a computer can sort and filter. ι (iota) = 10, θ (theta) = 9 → 19 (Augustus's 19th tribunician year, 4 BCE).<num value="19"> 给出计算机可排序、可过滤之值。ι(iota)= 10, θ(theta)= 9 → 19(奥古斯都任保民官第十九年, 公元前 4 年)。
Caveat. <num>, <hi>, <g> (special glyph), and <am> (abbreviation mark) overlap in scope but each says something different. num: this is a number, and its value is N. hi: the cutter rendered this part of the line in some special way (supraline, ligature, tall letter). g: this is a non-textual glyph (an interpunct, a Christogram). am: this stretch of letters is a mark of abbreviation (the doubled dd for dominorum). Each is independently filterable.编者所诫。<num>、<hi>、<g>(特殊字形)、<am>(缩写符号)四者范围有重叠, 但各言其意。num:此为数, 值为 N。hi:刻工以特殊方式表现此处(加横线、连写、高字)。g:此为非文字之字形(间隔点、十字章)。am:此段字母为缩写之标记(dominorum 之双写 dd)。四者皆可独立过滤。
Intentional blank space: three encodings有意留白:三种编码
<!-- ① character-extent blank between phrases (line 3 of C.101) --> <lb n="3"/> <space quantity="12" unit="character"/> <w lemma="λέγω">λέγει</w> <space quantity="15" unit="character"/> <!-- ② full-line blank (line 71a of C.101) --> <lb n="71a"/> <space quantity="1" unit="line"/> <!-- ③ uncertain extent (after Edict II ends) --> <space quantity="8" unit="character" precision="low"/> <!-- ④ unmeasured trailing space --> <space precision="low"/>
EpiDoc draws a sharp line between <gap> ("the stone is damaged here, text is lost") and <space> ("the cutter intentionally left this much blank"). C.101 uses <space> constantly to mark the careful blanks the cutter placed between phrases, after each λέγει ("proclaims:") to introduce the edict's text proper, and to flag the gap between one edict and the next. @quantity + @unit measure the extent (characters or lines); @precision="low" says "approximately, not measured exactly".EpiDoc 严辨 <gap>("此处石面残, 文已失")与 <space>("刻工于此有意留白")。C.101 频用 <space> 标记刻工于句与句之间精心留出的空隙, 用以引出每篇诏令之 λέγει("乃下令曰:"), 或标示一诏令与下一诏令间的过渡。@quantity 与 @unit 计量长度(以字母或行计); @precision="low" 言"大致如此, 未精确测量"。
Caveat. A blank line is encoded twice: as <lb n="71a"/> (giving the blank line a number — note the "a" suffix to preserve the running line count) AND as <space quantity="1" unit="line"/>. The two together say "the stone has a blank 71st line; it counts toward the line-numbering but carries no text". Omitting either would corrupt the line index.编者所诫。一行空白须双编:既作 <lb n="71a"/>(给空白行一个行号,注后缀 "a", 以保行数连贯), 又作 <space quantity="1" unit="line"/>。二者并陈, 始言"碑上有一空白之第 71 行; 计入行号, 不载文字"。漏其一则行号失次。
Five distinct ways to say "this isn't a clean reading"五种不同方式言"此非洁本"
<supplied reason="lost">已失而补κληρο<supplied reason="lost">ύ</supplied>σθω.石上原刻、今已物理失落之字母, 编者据上下文补出。第 26 行:κληρο<supplied reason="lost">ύ</supplied>σθω。<supplied reason="omitted">未刻而补τεσ<supplied reason="omitted">σ</supplied>ερασκαιδέκατον.刻工根本未刻之字母(多为希腊正字习省, 如下标 iota)。第 2 行:τεσ<supplied reason="omitted">σ</supplied>ερασκαιδέκατον。<unclear>字迹存疑Ῥωμαῖ<unclear>ον</unclear> — looks like "ον" but the editor isn't sure.石上字母可见, 但识读不定。第 16 行:Ῥωμαῖ<unclear>ον</unclear>,似为 "ον", 但编者未敢断定。<surplus>多刻当删αὐτοῖ<surplus>ι</surplus>ς — extra ι slipped in.刻工实有所刻、但编者判定为衍文(刻误、重复)之字母。第 7 行:αὐτοῖ<surplus>ι</surplus>ς,衍出一 ι。<del rend="erasure"><orig>…</orig></del>物理擦除<del rend="erasure"><orig>ξε</orig> <orig>τη</orig></del>.石上经物理擦除之段落。编者保留其尚可识读之字。第 91 行:<del rend="erasure"><orig>ξε</orig> <orig>τη</orig></del>。Caveat. A reader who skips over the @reason attribute loses half the meaning. Lost is a statement about the stone's preservation. Omitted is a statement about the cutter's practice. Unclear is the editor confessing doubt. Surplus is the editor exercising judgment. Erasure is a historical event — someone, sometime, deliberately scraped letters off this stone. Five different kinds of footnote, all machine-readable.编者所诫。忽视 @reason 属性者, 已失其义之半。Lost(失)是关于碑面保存状态的陈述。Omitted(漏)是关于刻工习惯的陈述。Unclear(疑)是编者之自承不定。Surplus(衍)是编者之裁断。Erasure(擦)是一历史事件,曾有人, 于某时, 刻意将此石上数字刮去。五种不同的脚注, 皆机器可读。
<lb>: the most important single element on the stone<lb>:石面之第一要素
<!-- ① a normal line break with line number --> <lb n="1"/>Αὐτοκράτωρ Καῖσαρ Σεβαστὸς… <!-- ② mid-word continuation (word broken across line end) --> ἐπιβαρού<lb n="9" break="no"/>σας <!-- "ἐπιβαρούσας" runs from end of line 8 onto line 9 without a hyphen --> <!-- ③ a numbered blank line for line-count continuity --> <lb n="71a"/> <space quantity="1" unit="line"/> <lb n="73a"/> <space quantity="1" unit="line"/> <!-- ④ a textpart boundary that shares a physical line (line 55 starts II and III) --> <div type="textpart" n="II"> …<lb n="55"/> …</div> <div type="textpart" n="III"><lb n="55"/> … <!-- same physical line, new edict -->
<lb/> is empty (self-closing) and self-anchoring. Its @n is the line number on the stone — not the position in the file. @break="no" says "this line break occurred mid-word, with no hyphen on the stone". Line numbers can repeat (line 55 carries the end of Edict II and the start of Edict III, because they share a single physical line), and the suffix "a" (71a, 73a) preserves count for fully blank lines.<lb/> 是空元素(自封闭), 自带定位之能。其 @n 是石面之行号,不是文件中的位置。@break="no" 言"此换行发生于词中, 石上未刻连字号"。行号可以重复(第 55 行兼载第 II 诏令之尾与第 III 诏令之首, 因二者同居一行); 后缀 "a"(71a、73a)则为全空之行保持行号连贯。
Caveat. <lb/> is the anchor for the whole apparatus criticus: <app loc="5"> means "compare with line 5". Get the line numbers wrong, and every apparatus entry, every cross-reference, every cited passage breaks. The @n attribute is not just a label — it's a citation handle that the entire scholarly apparatus depends on.编者所诫。<lb/> 是整套校勘记之锚点:<app loc="5"> 即"对照第 5 行"。行号一错, 所有 apparatus 条目、所有交叉引用、所有引文皆崩。@n 不仅是标签,是整套学术装置所赖之引用句柄。
What to verify in your own EpiDoc file你自家的 EpiDoc 文件, 应自检何处
@break="no" for words crossing line ends; numbered lb n="Na" for blank lines, paired with <space unit="line"/>.每行 <lb n="N"/>; 跨行单词加 @break="no"; 空白行编为 lb n="Na", 并配 <space unit="line"/>。<?xml-model?> processing instructions at the top: tei-epidoc.rng for RNG structure, ircyr-checking.sch (or your project's) for Schematron rules.<revisionDesc> 内含 <change when="YYYY-MM-DD" who="…">; 文首两条 <?xml-model?> 处理指令:tei-epidoc.rng 验 RNG 结构, ircyr-checking.sch(或你项目专用之)验 Schematron 规则。The unifying principle. Every encoding choice in C.101 reflects the same discipline: separate what the cutter wrote from what the editor restored, separate visual rendering from semantic content, and link every name and date to an external authority. Get those three habits right, and the rest of EpiDoc follows. The full vocabulary lives in lexicon.html — keep it open while you work.统贯之道。C.101 中每一处编码选择, 皆反映同一规矩:分刻工所刻与编者所补; 分视觉表现与语义内容; 凡名凡日皆挂权威表。三习既正, 其余皆从。完整词汇见 lexicon.html,工作时常开备查。
iAph 8 · The Archive Wall at AphrodisiasiAph 8 · 阿芙罗狄西亚斯之档案墙
Where C.101 packs six documents into one stele (monument-as-corpus), Aphrodisias does the reverse: a curated corpus-as-monument, twenty-one imperial documents inscribed across three centuries on the north parodos wall. The next twelve slides concentrate on what this paradigm forces an encoder to do that a single stele never does.C.101 把六篇文书纳入一块石头(碑刻即语料);阿芙罗狄西亚斯走的是反方向,它是一座有意编排的语料即碑刻, 二十一件帝王文书横跨近三世纪, 一一刻在剧场北侧 parodos 墙上。接下来的十二张, 专门讨论这种范式逼迫编者去做、而单石上的编者从来不必做的那些事。
Monument-as-corpus vs. corpus-as-monument碑刻即语料 vs. 语料即碑刻
C.101 Cyrene · monument-as-corpusC.101 昔兰尼 · 碑刻即语料
One <TEI> root, one stone, six <div type="textpart"> children. Reading order is physical reading order. The encoder’s job is to split a single artefact into its semantic units.一棵 <TEI> 树、一块石头、六个 <div type="textpart"> 子节。诵读次序就是铭刻次序。编者要做的, 是把一件器物切分为多个语义单元。
Aphrodisias archive wall · corpus-as-monument阿芙罗狄西亚斯档案墙 · 语料即碑刻
Twenty-one <TEI> roots, twenty-one stones, no nesting. The reading order is the archive, not the file. The encoder’s job is to join twenty-one separate artefacts into one dossier — through cross-reference, shared @ref targets, and corpus-level metadata.二十一棵 <TEI> 树、二十一块石头, 互不嵌套。诵读次序属于档案, 不属于任何一份文件。编者要做的, 是把二十一件器物联结为一份册档,靠互引、靠共享的 @ref 目标、靠语料库层级的元数据。
Why both matter. A corpus that only knows how to encode monument-as-corpus (the Cyrene pattern) cannot represent dossiers whose unit of meaning is the wall. A corpus that only encodes corpus-as-monument (the Aphrodisias pattern) loses the textpart hierarchy that lets a reader cite C.101.III as a distinct legal act. EpiDoc supports both — the encoder picks whichever pattern matches the ancient editorial reality, not whichever is technically easier.两者都重要。语料库若只会做碑刻即语料(C.101 那一种), 就没办法呈现以墙为意义单位的册档; 若只会做语料即碑刻(阿芙罗狄西亚斯那一种), textpart 的层级又会消失, 读者也就引不到 C.101.III 这样一个独立的法律文书。EpiDoc 两路都走,编者该选哪一种, 跟着古代的编辑实况走, 不要跟着编码上的方便走。
When 40% of the text is editor当四成文本出自编者之手
The SC de Aphrodisiensibus survives as fragments scattered through the city walls, the theatre, and the Aphrodisias museum. In Reynolds’ edition, 2,671 letters out of c. 6,820 sit inside <supplied> — about 40% of the readable text is, formally, the editor’s restitution, distributed across 584 separate <supplied> elements. Compare C.101 (Cyrene): 52 supplied letters out of c. 5,000 — about 1%. The Aphrodisias file demonstrates the convention’s response to this density: every reconstructed segment carries @reason (lost vs. omitted), every gap carries @reason + @extent + @unit="character", and 95 ambiguous letters wear <unclear>. The format is honest about what the editor can and cannot see.《SC de Aphrodisiensibus》(《关于阿芙罗狄西亚斯人的元老院决议》) 的残片散布在城墙、剧场、阿芙罗狄西亚斯博物馆三处。Reynolds 校本印出来的约 6,820 个字母里, 有 2,671 个套在 <supplied> 之内:也就是说, 文本里约 40% 是编者的补订, 分散在 584 个 <supplied> 标记里。再看 C.101 (昔兰尼) 就不一样了:全文约 5,000 字, 只有 52 个补字, 约占 1%。阿芙罗狄西亚斯这份文件, 正好展示出规约如何回应这种残损密度:所有补字都挂 @reason (lost"失刻"或 omitted"漏刻")、所有阙文都挂 @reason + @extent + @unit="character"、95 处可疑字母全用 <unclear> 包起来。这套写法自陈所见之限, 不掩疑处。
What this enables. A reader writing a critical commentary on a single line can run a filter — //supplied[@reason="lost"] in line N — and see exactly which letters the editor recovered from parallel passages, which from grammar alone, and which the cutter never carved. The SC Aphrodisiensibus becomes two readable texts at once: the stone as found, and the stone as reconstructed. This separation is invisible in a printed edition and unrecoverable from a transcription that omits the brackets.这给了我们什么。想给某一行做精校评议, 一句 XPath 就够了,//supplied[@reason="lost"], 限于第 N 行,哪个字得自平行段落、哪个字凭文法所推、哪个字根本没刻过, 一目了然。《SC Aphrodisiensibus》于是双文同陈:出土的石头、与重构的石头。这层分别, 在印本中看不见, 在不带括号的转录里也回不来。
<lb type="worddiv"/><lb type="worddiv"/>Line-break vs. word-break · encode both换行与分词 · 二者并记
<lb n="1"/>ΥΠΑΤΟΥ ΓΑΙΟΥ ΚΑΛΟΥΙΣΙΟΥ <lb n="2"/>ΓΑΙΟΥ ΥΙΟΥ ΚΑΙ ΛΕΥΚΙΟΥ ΜΑΡΚΙΟΥ <lb type="worddiv" n="3"/>ΛΕΥΚΙΟΥ ΥΙΟΥ ΕΚ ΤΩΝ ΕΙΣΦΕΡΟΜΕΝΩΝ <lb n="4"/>ΕΙΣ ΤΗΝ ΣΥΝΚΛΗΤΟΝ ΔΟΓΜΑΤΩΝ <lb type="worddiv" n="5"/>ΘΕΜΑ ΠΡΩΤΟΝ…
The Aphrodisias cutter sometimes ended a line on a word boundary and sometimes split a word across the break. The standard <lb n="N"/> records the break itself; <lb type="worddiv" n="N"/> additionally records that the cutter chose to end at a word boundary — a deliberate layout decision, not a scribal accident. C.101 contains no such markup: the Cyrene cutter broke lines wherever space ran out, with no preserved care for word boundaries. This single attribute distinguishes a workshop that respected the eye of the reader from one that did not, and EpiDoc captures the distinction with one extra token.阿芙罗狄西亚斯的刻工有时把行尾停在词的尾巴上, 有时则让单词跨过换行。<lb n="N"/> 只记下换行本身; <lb type="worddiv" n="N"/> 还多记一点:这一处的换行恰好是词尾,也就是刻工有意做的版面决定, 不是手滑。C.101 里并没有这一标记:昔兰尼的刻工凭空间所限就断行, 顾不上词界。就这一个属性, 已经足以分别一位顾及读者眼睛的刻工和不顾的刻工, EpiDoc 一个 token 就把它记下来了。
What this enables. A palaeographer studying inscriptional layout can run count(//lb[@type="worddiv"]) ÷ count(//lb) across the IAph corpus and produce, in one query, a quantitative metric of word-boundary discipline by workshop and by period — exactly the kind of question that a printed edition with its brackets and dashes makes nearly unanswerable.这给了我们什么。研究铭文版面的古文字学者, 在 IAph 中跑一句 count(//lb[@type="worddiv"]) ÷ count(//lb), 即可得分作坊、分时期的"行尾守词"量化指标,这正是带括号、带连字符的印本几乎答不了的题。
<history>逐石之 <history> 追溯Stones travel before they speak石头先漂泊, 后说话
<div type="history"> blocks inside the file. The history div is not just metadata: it is the audit trail that lets any future editor know which fragment a given letter rests on — and therefore which reading is more secure than which.以上这些都写在文件里的 <div type="history"> 各节里。这一节不只是元数据, 它是审计链,让日后的编者能知道某个字母出自哪块残石, 由此判断哪一处的读法比哪一处更可信。Compare C.101. The Cyrene stone was found in one spot — re-used in the floor of a room next to the Assembly Building — and recorded by one excavator. Block-level provenance never had to be encoded. The Aphrodisias file shows what <history> can do when an inscription is dispersed: it makes the geography of fragmentation part of the citable record.C.101 是反过来的。昔兰尼那块石头只出土于一处,改用作议事堂旁某室的铺地石,由一位发掘者一并记录, 所以不必再分块编码出土史。阿芙罗狄西亚斯这份文件展示的是:当铭文散布在多处时, <history> 这一节能做什么,它把"碎片的地理学"纳进可引用的记录里。
Bibliography as stemma · 1705 → 2007书目即传抄谱系 · 1705 → 2007
Each badge below shows what the editor was looking at when they edited: a manuscript copy, someone else’s photograph, or the stone itself.下方各色标签, 显示每位编者校订时看的是什么:是别人的手稿、别人的照片, 还是自己亲眼看过的石头。
<bibl n="…"/> entries inside the file. The chain is now machine-readable: which letter in line 12 first appeared in Sherard? In Reinach? In MAMA? Each answerable by XPath.十七份证据全部化为文件里的 <bibl n="…"/> 条目。这一条传抄链, 至此机器可读:第 12 行的某个字, 是先出现在 Sherard? 在 Reinach? 还是在 MAMA? 我们都可以用一句 XPath 答出来。The wall as web: one document completes another视墙为网:一文以补一文
The <commentary> of iAph080027 opens with a remarkable note:
"The text can be further supplemented from an area of the wall which survives intact, where extracts from related documents had been separately inscribed (8.28), providing the substance of ll. 32–35 and 77–82."iAph080027 的 <commentary> 开头有一段话, 特别值得留意:
"本文可以进一步靠档案墙上另一处仍然完好的区域来补字, 也就是另刻于那边的相关文书的节略 (8.28), 正好可以补出本文第 32—35 行、第 77—82 行的实质内容。"
A sibling file restores this one姊妹文件补这一文
Document 8.28 — a separately encoded XML in the same IAph corpus — preserves the very passages this file lost. The corpus is internally redundant by design: the ancient archive included extracts so that the missing text could be recovered from elsewhere on the wall. EpiDoc lets the modern editor wire that redundancy into @corresp links between files.文书 8.28,同一 IAph 语料里的另一份独立 XML,恰好保存了本文所失的那几段。这一套语料库, 设计本身就有冗余:古代档案以节略的形式, 把本文的节录另刻在一处, 就是为了让所失的字日后可以从墙上别处找回来。EpiDoc 让今天的编者可以把这种冗余, 以 @corresp 这个属性, 化为文件之间的链接。
The principle, in one attribute一属性概其原则
Each <supplied> in lines 32–35 and 77–82 of this file can be tagged @source="#iAph080065" or @corresp="#iAph080028" — pointing at where the restoration came from. A reader can then trace any reconstructed letter to its physical witness on the same wall.本文第 32—35 与 77—82 行里, 每一处 <supplied> 都可以挂 @source="#iAph080065" 或 @corresp="#iAph080028",指向补字的来处。读者于是可以从任一处补字, 顺着追到同一面墙上的物证。
Why this is the “next level”. C.101 demonstrated EpiDoc inside one file. The archive wall demonstrates EpiDoc between files — a corpus whose unit of meaning is the relation, not the inscription. The single most important skill an Aphrodisias encoder learns that a Cyrene encoder does not is the discipline of citing one’s sister files: @corresp, @source, and a bibliography in which every entry is also a digital sibling.这就是“次阶”的所在。C.101 展示的是 EpiDoc 在一份文件之内能做什么; 档案墙展示的, 是 EpiDoc 在多份文件之间能做什么,一座语料库, 它的意义单位是关系, 不是铭文。阿芙罗狄西亚斯的编者比昔兰尼的编者多学到的, 是引用姊妹文件的一种纪律:@corresp、@source, 还有一份每条目都同时是"数字姊妹"的参考书目。
One column · five files · one argument一柱 · 五文 · 一论
iAph080027) — the foundational legal text excerpted onto the wall.节录《SC de Aphrodisiensibus》(母本就是 iAph080027),根本法律文本的节略, 刻在这墙上。39/38 BCEA late-antique anthology of early documents晚期古典对早期文书之结集
<origDate notBefore="0101" notAfter="0300" evidence="lettering context">
second to third centuries A.D. <!-- when the stone was carved -->
</origDate>
<ab>
<lb n="1"/><date when="-0038" evidence="content">
αὐτοκράτωρ Καῖσαρ θεοῦ Ἰουλίου υἱὸς Αὔγουστος
Σαμίοις ὑπὸ τὸ ἀξίωμα ὑπέγραψεν…
</date> <!-- when the text was composed -->
</ab>
Each Aphrodisias XML file carries two encoded dates: an <origDate> in the <teiHeader> for when the stone was carved, and a separate <date> (or the imperial titulature) in the body for when the document was composed. The gap can be more than two centuries. In four of these five files, the gap is ~150–300 years — the Antonine or Severan archive workshop transcribing Caesarian, Augustan and Trajanic letters onto the parodos wall, almost certainly from a roll-form civic archive that no longer survives.阿芙罗狄西亚斯的每一份 XML 都挂两组年代:<teiHeader> 里的 <origDate> 记刻石的年代, 文本中的 <date>(或借帝王名号) 记原书的年代。两者的差距, 有时超过两百年。这五件里的四件, 差距大约 150 到 300 年,安东尼或塞维鲁时代的档案作坊, 把凯撒、奥古斯都、图拉真几朝的信件, 重新刻在剧场 parodos 墙上;底本想必当时还以纸卷形式存放在城里的档库, 现在已经看不到了。
What this enables. A historian writing on civic archival practice in the Roman East can query //origDate[@notBefore > -100] / / following::date[@when < -0030] across the whole IAph corpus and surface every document whose memorialisation post-dates its composition by more than a generation. The wall stops being “an inscription” and becomes “a piece of evidence for how cities remembered their own pasts.”这给了我们什么。研究罗马东方诸城档案实践的史家, 在整个 IAph 里跑一句 //origDate[@notBefore > -100] / / following::date[@when < -0030], 就能取出所有"追刻晚于原书一世代以上"的文书。墙到了这一步, 已经不再只是"一件铭文", 而成了"诸城追忆自家过往的物证"。
Ancient erasure · when the editor was a contemporary古代抹刻 · 编者亦曾同时之人
<ab>
<lb n="6"/> τῷ δήμῷ τῷ Πλαρασέων καὶ
<del>
<unclear reason="damage">Ἀφροδεισιέων</unclear>
</del>
ἐπιτάσσεσθαι
</ab>
In 8.28, every occurrence of “Ἀφροδεισιέων” ("of the Aphrodisians") has been deliberately erased on stone — six erasures across fifteen lines, all of the second half of the double polity-name Plarasa-Aphrodisias, leaving only “Πλαρασέων καὶ — —”. Each erasure is encoded as <del> wrapping <unclear reason="damage">…</unclear>: this passage was once readable, the ancient editor removed it, the modern editor can still make it out under the chisel-marks.在 8.28 里, “Ἀφροδεισιέων”(“阿芙罗狄西亚斯人的”) 每一处出现都被当时人有意凿去,十五行间共六处, 凿去的都是“Plarasa-Aphrodisias”这一双名城邦的后半, 只剩“Πλαρασέων καὶ,——”。每一处都用 <del> 套 <unclear reason="damage">…</unclear> 的结构编码:这一处原本可读, 当时的编者凿去了, 今天的编者还能在凿痕底下辨出来。
Cyrene: editor restores昔兰尼:编者所补
Modern editor (Reynolds) reads through stone damage and writes back the letters she believes were once carved — <supplied reason="lost">. The cutter did the work; time eroded it.今天的编者 (Reynolds) 透过石面的损伤读出, 把她相信原本刻过的字写回来,<supplied reason="lost">。刻工已经做过, 是时间把它磨掉的。
Aphrodisias: ancients erased阿芙罗狄西亚斯:古人所凿
An ancient editor read what the cutter had written and chose to remove it — <del>. The modern editor records the removal without restoring it: the erasure is part of the document’s biography. (Why was the doubled name shortened to Aphrodisias alone? An open question on civic identity that the encoding refuses to flatten.)当时有一位编者, 读到刻工写下的字, 决定削去,<del>。今天的编者把这一处凿去如实记下来而不恢复:这一处抹刻, 是这一份文书生平的一段。(双名为什么后来缩成阿芙罗狄西亚斯的单名?编码不愿勉强答这个问题, 把它留给城邦认同史。)
Four titulatures, 280 years, one wall四帝号、二百八十年、一面墙
BCE8.291 element1 元
BCE8.324 elements4 元
AD8.333 elements3 元
AD8.1029 elements9 元
<num value="2">β</num> ὕπατος πατὴρ πατρίδος<num value="2">β</num> — “tribunician power for the second time”.帝国晚期的累叠:共九个衔。希腊字母 β 以 <num value="2">β</num> 编码, 标“任保民官第二年”。Same wall, 280 years, four protocols — the meter bars are the literal dating tool: each emperor carries a stable URI via <persName ref="…">, so a SPARQL query asking “which Roman emperors does the Aphrodisias archive cite, and in what proportions?” comes back in milliseconds.同一墙, 280 年, 四套帝号,上方仪表条之长度, 即断代之实测器:每一帝王以 <persName ref="…"> 挂稳定 URI, 故 SPARQL 一问 ——“阿芙罗狄西亚斯档案征引哪些罗马帝王, 各占几何?”—— 顷刻可答。
Near the wall, not on it近墙而不在墙上
<relatedItem><relatedItem> 桥之<relatedItem> + <listRelation> let IAph assert that 8.100 belongs to the dossier. The wall is a concept.档案的归属取决于含义, 不取决于石头。<relatedItem> 加上 <listRelation>, 让 IAph 可以宣布 8.100 也属于这份册档。墙在这里, 是一种概念。One letter · four palaeographic specimens一函四题 · 古字学样本
What the two case studies teach两案所教
<textpart>hierarchy — six edicts under one root<textpart>层级,六诏令同根- Editor restores damaged stone with
<supplied>编者以<supplied>补受损之石 - Multi-level dating: stone-level + edict-level via
godot.date两层断代:石面与诏令各挂godot.date persName/placeName+@refto authority filespersName/placeName挂@ref接权威表- 1,346 tokens, each lemmatised — corpus-grade encoding1,346 词, 一一标 lemma,语料级编码
- Photograph manifest tied to line ranges照片清单, 一一对应行段
- Heavy reconstruction — 584
<supplied>+ 95<unclear>大半补字,584 处<supplied>+ 95 处<unclear> <lb type="worddiv"/>— layout decisions the cutter made<lb type="worddiv"/>,刻工版面之有意- Block-level
<history>— stones travel before they speak逐石<history>,石头先漂泊, 后说话 - 17-witness transmission stemma, 1705 → 2007十七证之传抄链, 自 1705 至 2007
@corresp/<relatedItem>— the wall as web@corresp/<relatedItem>,视墙为网- Ancient
<del>, titulature evolution, palaeographic specimens古代<del>、帝号变迁、古字学样本
What you learn from one, you cannot learn from the other. Cyrene shows EpiDoc inside a single file — nested hierarchy, dense lemmatisation, multi-level dating in one root. Aphrodisias shows EpiDoc between files — a corpus whose unit of meaning is the relation, where archive-membership is content not stone, and where the cutter’s ligatures, apex marks, and erasures are independent objects of inquiry. Both are needed; both are EpiDoc.一案所教, 另一案教不出来。昔兰尼展示 EpiDoc 在一份文件之内,层级嵌套、词汇标注密集、断代分层于同一棵根。阿芙罗狄西亚斯展示 EpiDoc 在多份文件之间,一座语料库, 它的意义单位在于关系, 档案归属看的是内容, 不是石头;刻工的连字、长音点和抹刻, 也都能独立考察。两者都要紧, 也都是 EpiDoc。
Questions the encoding makes askable编码所开之问
<del> marks cluster across IAph + IRCyr + I.Sicily? Which names get struck, when, and why?IAph、IRCyr、I.Sicily 跨语料中, <del> 聚于何处?何名遭凿、何时遭凿、何因遭凿?<gap> can be supplied via cross-document parallels in the same corpus?对极残之铭文, <gap> 中可由同语料库他文补出者, 占几何?Three domains, six questions, one infrastructure. Material questions ask about the wall; content questions about its text; method questions about how to query at scale. None of them could be put to a printed edition. Each lives where an encoding feature meets a research curiosity — which is, finally, what EpiDoc is for.三个领域、六个问题、一套基础设施。材料的问题问墙;内容的问题问墙上的文字;方法的问题问规模上的可查询性。这些问题, 一个都没法向印本发问。每一个问题都生于"编码特征"跟"研究好奇心"的相遇,这, 正是 EpiDoc 存在的理由。
What still needs to be built尚待建之事
<del> inventoried with what was struck, when (where datable), and why.每处 <del> 都登录:凿掉了什么字、什么时候凿(能断代的就断)、为什么凿。<relatedItem> / @corresp explicitly typed: “extracts from”, “responds to”, “located near”, “cites”.每一处 <relatedItem> 与 @corresp 都要显式分类:“节录自”、“回应于”、“近置于”、“引用”。<graphic> becomes deep-zoomable, citable, machine-fetchable visual evidence.每张照片都标上底片 ID、扫描分辨率、IIIF 端点,锚在行上的 <graphic>, 由此成为可深缩、可引、机器可取的视觉物证。The takeaway. Encoding alone is not enough. The questions on the previous slide only become answerable at scale once the supporting infrastructure exists. Three of seven tasks are wide open — an entire generation of thesis projects waiting for their first sentences.要点。光有编码不够。上一页那些问题, 要做到规模上能回答, 得先把支撑的基础设施建起来。七项里头, 三项现在全开,整整一代博士选题, 都还在等第一句话写下来。
01ZethusZethus 之墓志
拉丁文墓志 · 三行 · 公元 1–3 世纪
isicily.classics.ox.ac.uk/inscription/ISic000001.html
A small marble plaque (17.5 × 29.3 cm), used as the cover of a child's sarcophagus. Now in the Museo Archeologico Regionale "Antonino Salinas" in Palermo, inv. 3501. Found at Caltanissetta in 1782. Three lines of Latin, separated by interpuncts. Complete, no damage.
大理石板 (17.5 × 29.3 cm), 原作儿童石棺盖。今藏巴勒莫 Antonino Salinas 考古博物馆 (入藏号 3501)。1782 年发现于卡尔塔尼塞塔。三行拉丁文, 词间以间隔点分开。完好无损。
· ZETHI
VIX · A · VI
Translation. «To the shades of the underworld. (Memorial) of Zethus. He lived 6 years.»
译文。“献给冥神。Zethus 之 (墓)。他活了六岁。”
The simplest possible EpiDoc edition. Use it to learn the basic shape of <expan>, <g>, <hi>, <persName>, <num>.
最简 EpiDoc 编辑示例。用以掌握 <expan>、<g>、<hi>、<persName>、<num> 的基本形态。
Before you encode, look at it four ways编码之前, 先从四面看它一遍
A three-line Latin epitaph. The first thing to read is not the inscription, but how four different views derive from one XML file.一篇三行拉丁文墓志。先不读铭文, 先读"同一份 XML 如何在四个视图里展现自己":
- In Leiden, the abbreviation marks and brackets jump out — they are typography.在 Leiden 视图里, 缩写标点与方括号首先映入眼帘,那是印刷传统的字体写法。
- In Web, the same brackets become hover tooltips: typography turns into interaction.在 Web 视图里, 同样的方括号变成了悬停提示,字体写法变成了交互。
- In Database, the name "Zethus" appears as a row with a Trismegistos ID.在 Database 视图里, "Zethus" 这个名字变成了一行带 Trismegistos ID 的数据。
- In XML, you see the
<expan>,<persName>,<num>wrappers that drove the three views above.在 XML 视图里, 你看见了<expan>、<persName>、<num>,正是它们驱动了上面三种视图。
The <expan> wrapper缩写展开标签
The stone carries the abbreviated word man. The Latin is manibus ("of the shades"). The editor's intervention separates what's on the stone from what's supplied:
石上所刻为缩写 man, 拉丁文完整形作 manibus(“亡灵的”, 与格)。编者的工作, 是把"石头上确有"与"为读者补出"两层清楚分开:
<w><expan><abbr>man</abbr><ex>ibus</ex></expan></w>
<expan> is the container announcing "an abbreviation lives here." <abbr> holds the carved letters; <ex> holds the editor's expansion. Separating them lets a renderer choose between man(ibus) (with parentheses) or man (purely diplomatic).
<expan> 是“这里有缩写”的外包装。<abbr> 装石上所刻字母, <ex> 装编者补出的字母。两层分开放, 渲染器可选择显示 man(ibus) (带括号) 或 man (实录式)。
The interpunct as a glyph间隔点作为符号
Roman engravers used a raised dot to separate words. It is not a letter. EpiDoc encodes it via <g>:
罗马刻工常用一颗居中略高的圆点来分隔单词。它不是字母, 而是分词符号。EpiDoc 以 <g> 元素编码:
D <g ref="#interpunct">·</g> man
@ref="#interpunct" points to an entry in the project's character declaration. Every interpunct across thousands of inscriptions can then be retrieved with a single query.
@ref="#interpunct" 指向项目字符声明中的一条目。语料库中所有间隔点 (跨数千件铭文) 都可一次性检索。
<g> separates signs (interpuncts, leaves, crosses, ivy-leaves) from letters. A search for «inscriptions with cross-shaped glyphs» becomes trivial.<g> 把符号 (间隔点、叶饰、十字、藤叶) 与字母分开。“带十字符号的铭文”一类查询变得轻而易举。
Visual vs. content: <hi>视觉与内容: <hi>
On the stone, H and I in Zethi are joined into a single carved character. The TEXT is still two letters; the SHAPE is one. EpiDoc records appearance through @rend:
石上, Zethi 中的 H 与 I 被刻工合为一字。文字内容仍是两字母, 形态上是一字。EpiDoc 通过 @rend 属性记录这种形态:
<w>Zet<hi rend="ligature">hi</hi></w>
@rend="ligature" records something about the visual appearance of the letters on the stone, not about the text content. The same logic powers @rend="tall" (a taller-than-normal letter), @rend="supraline" (a horizontal line above), and many more.
@rend="ligature" 记录字母在石上的视觉外观, 而非文本内容。同样的逻辑也适用于 @rend="tall" (高字母)、@rend="supraline" (字母上方横线) 等。
Line 1 of Zethus also carries <w>D<hi rend="tall">i</hi>s</w> — the i in Dis is taller. Look at the photograph and you'll see why.
Zethus 第 1 行还有 <w>D<hi rend="tall">i</hi>s</w>,Dis 中的 i 字高于左右二字。看照片即明。
<num> has both face and value<num> 兼具字形与数值
<num value="6">VI</num>
The text content VI is what's on the stone. The @value="6" is the computed integer. Two layers: a database can sort, filter, sum by value; a renderer displays the original face.
文本内容 VI 是石上所刻; @value="6" 是计算后的整数。两层分开: 数据库可按数值排序、筛选、求和; 渲染器显示原始字形。
“找出所有六岁夭折的罗马儿童”只需一条 SQL 查询。
Three layers for one name一人之名, 三层包装
<persName type="attested">
<name>
<w>Zet<hi rend="ligature">hi</hi></w>
</name>
</persName>
Why three layers for what looks like one name?
为什么一个看起来一气呵成的名字, 要分成三层来包装?
<persName>= the person. Attributes can link to a prosopographical authority (PIR, LGPN, individual project IDs).<name>= the name itself. For Roman tria nomina (Q. Pomponius Rufus), this layer hosts three siblings:<name type="praenomen">,<name type="gentilicium">,<name type="cognomen">.<w>= the word. The orthographic unit that the renderer treats as a token for highlighting, search indexing, lemmatization.
<persName> = 人物本身; <name> = 名字 (罗马三段名时此层下分 praenomen/gentilicium/cognomen 三个子元素); <w> = 词汇单元 (用于高亮、检索索引、词形归并)。
Everything assembled完整编码
<div type="edition" subtype="primary" xml:space="preserve" xml:lang="la">
<ab>
<lb n="1"/><w>D<hi rend="tall">i</hi>s</w> <g ref="#interpunct">·</g>
<w><expan><abbr>man</abbr><ex>ibus</ex></expan></w>
<lb n="2"/><g ref="#interpunct">·</g>
<persName type="attested"><name><w>Zet<hi rend="ligature">hi</hi></w></name></persName>
<lb n="3"/><w><expan><abbr>vix</abbr><ex>it</ex></expan></w>
<g ref="#interpunct">·</g>
<w><expan><abbr>a</abbr><ex>nnis</ex></expan></w>
<g ref="#interpunct">·</g>
<num value="6">VI</num>
</ab>
</div>
石上三行 → EpiDoc 中十四个元素。每一层都各司其职: 文本、视觉、展开、符号、人名、数字。
Now build it — practice in the playground动手练习 · 在工坊里编码 Zethus
Zethus is the only inscription with a pre-loaded sample in the playground's Document editor. Open it there and you can manipulate the full form-XML-edition triplet.Zethus 是唯一一篇在工坊“文档编辑器”中预载的样本。打开它, 你可以在“表单 / XML / 排印版”三栏中同步操作。
- 1Open the Document editor tab, pick "ISic000001" from the sample dropdown.打开文档编辑器选项卡, 从样例下拉框中选 "ISic000001"。
- 2In the form column, change the date — watch the XML and the edition update.在表单栏里改一改日期, 看 XML 与排印版同时随之更新。
- 3Change
VItoVIIIin the edition, click "Parse XML"; see the form re-populate.把排印版里的VI改成VIII, 点 "Parse XML", 看表单栏被反向回填。 - 4Try the Playground tab: type
[Imp(erator)], see<supplied>+<expan>form on the right.试试试验场选项卡:输入[Imp(erator)], 看<supplied>与<expan>在右栏同步出现。
02IRT0102残片文件
只有 teiHeader 与一句注释的残片文件
This exercise has no inscribed text at all. The lesson is how to read an EpiDoc file, not an inscription.
此题完全没有刻文。我们要学的是怎样读 EpiDoc 文件本身, 而不是读铭文。
Before you encode, look at it four ways编码之前, 先从四面看它一遍
This file has no inscription text at all — only a teiHeader and a single commentary note. The teaching point is what the file shows when the body is empty.这份文件没有铭文内容,只有 teiHeader 与一句简短注释。要看的, 是当 body 为空时, 一份 EpiDoc 文件到底还说了些什么。
- In Leiden, the edition pane is empty. Notice what is not there.在 Leiden 视图里, 释读栏空白,留意那"什么都没有"本身。
- In Web, the commentary still publishes — proof that metadata is not the inscription.在 Web 视图里, 评注仍然展示,元数据本身就是出版物的一部分, 并非附庸于刻文。
- In Database, the inscription ID still has rows — for editor, for project, for bibliography.在 Database 视图里, 这条铭文 ID 仍有数据,编者、项目、参考文献。
- In XML, study the full
<teiHeader>structure: this is the part you will write for every inscription.在 XML 视图里, 细看完整的<teiHeader>结构,每篇铭文都要写这一层。
What an empty-edition file looks like无刻文文件的样貌
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="IRT0102" xml:lang="en">
<teiHeader>
<fileDesc>
<titleStmt><title>Fragment</title></titleStmt>
<publicationStmt>
<publisher>Society for Libyan Studies</publisher>
<distributor>King's College London</distributor>
<idno type="filename">IRT0102</idno>
<availability><p>CC-BY UK 2.0</p></availability>
</publicationStmt>
<sourceDesc><msDesc><msIdentifier><msName>IRT0102</msName></msIdentifier></msDesc></sourceDesc>
</fileDesc>
<revisionDesc><change when="2021-06-24" who="CMR">Merged</change></revisionDesc>
</teiHeader>
<text><body>
<div type="commentary"><p>See now <ref type="inscription" n="IRT0101">101</ref>, of which this is part.</p></div>
</body></text>
</TEI>
The seven-layer path七层路径
From the root to the actual sentence "See now 101, of which this is part":
自根元素往下走, 直到那句真正的内容“请参看 101 号, 此残片为其一部”:
TEI → text → body → div[@type="commentary"] → p → ref[@type="inscription"]
Six levels of nesting (the seventh layer is the text content itself). Memorize this path — it works on every EpiDoc file in every project.
六层嵌套 (第七层为文本本身)。这条路径背下来:它适用于每一个项目的每一份 EpiDoc 文件。
Why the file exists at all为什么保留这份文件
IRT0101 is a larger inscription. IRT0102 turned out to be a smaller fragment of the same monument. Once the join was made, the editors had two choices:
IRT0101 本是一篇较大的铭文。IRT0102 后来证实是同一座纪念物上的小残片。"两者本为一体"这一判断作出后, 编者面前有两条路:
- (a) Delete IRT0102 entirely. Tidy, but breaks every existing citation to "IRT0102".
- (b) Keep IRT0102 as a stub redirecting to IRT0101.
They chose (b). This is the same principle as a web HTTP 301 redirect: the old reference still resolves, it just points to the new place.
编者发现 IRT0102 实为 IRT0101 之残片。删除会破坏所有现存引用; 保留为重定向残件则保住既有学术参照。同 HTTP 301 重定向之原理: 旧地址仍可访问, 只是指向新地址。
参照点的稳定性比语料库的整洁性更重要。
Reading the EpiDoc shell读懂 EpiDoc 外壳
An EpiDoc file is more than its edition. The teiHeader records:
一份 EpiDoc 文件远不止“释读”这一部分。teiHeader 还要记录:
- Who edited the file and when (
<revisionDesc>) - The licence under which it's shared (
<availability>) - The physical stone — material, dimensions, layout, lettering, condition (
<msDesc>) - The findspot and provenance history (
<history>) - The languages and the encoding conventions used (
<encodingDesc>,<profileDesc>)
EpiDoc 文件不仅是刻文本身。teiHeader 中还记载: 编辑者与日期、共享许可、实物石头 (材料/尺寸/布局/字体/状况)、出土地与流传史、所用语言与编码规约。
Now build it — practice in the playground动手练习 · 在工坊里编码 IRT0102
Don't worry about typing inscription text — this exercise is about reading the file shell. Use the playground's Document editor with the "blank" template to inspect how the teiHeader is built.本题不必动手写释读,重点是读懂 EpiDoc 文件外壳。在工坊“文档编辑器”中选 "blank" 空模板, 观察 teiHeader 是怎么搭起来的。
- 1Open Document editor → pick "Blank template".打开文档编辑器, 选 "Blank template"。
- 2Notice the form fields under "Identification" and "Source description" — they map 1-to-1 onto IRT0102's teiHeader.注意“Identification”与“Source description”两节的表单字段,它们与 IRT0102 的 teiHeader 一一对应。
- 3Fill in just two fields (Title, Date). Click "Build XML" — see the teiHeader appear.只填两个字段 (Title、Date), 点 "Build XML", 看 teiHeader 随之生成。
- 4Compare with IRT0102's file in four-views — same shape, different content.与 four-views 中 IRT0102 的 XML 视图比对,同样的形状, 不同的内容。
03Genius of Catania卡塔尼亚守护神
拉丁文奉献铭 · 七行 · 公元 4 世纪
The densest abbreviation drill in the workshop. Introduces the <am> element — abbreviation marks that are part of the abbreviation device, not letters of the spelled word.
本工坊缩写训练最密的一题。引入 <am> 元素,那些本身就是缩写装置一部分、并非被缩写词字母的标记。
Before you encode, look at it four ways编码之前, 先从四面看它一遍
A seven-line dedication packed with abbreviations. Every line shows the difference between letters (in <abbr>) and abbreviation marks (in <am>).一篇七行献铭, 缩写极密。每一行都在告诉你:一个缩写里, 字母(在 <abbr> 内)与缩写标记(在 <am> 内)是两回事。
- In Leiden, look at the small overline marks — those are the cutter's abbreviation device.在 Leiden 视图里, 留意字母上方的小横线,那是刻工用来示意"此处省略"的标记。
- In XML, the same marks live in
<am>elements, distinct from the letters of the spelled word.在 XML 视图里, 这些横线被装在<am>元素里, 与被缩写的字母严格区分。 - Database column "abbreviation" lists the resolved forms — every
<expan>becomes one row.Database 视图的 "abbreviation" 栏列出展开后的完整词,每个<expan>对应一行。 - Compare Leiden line 1 with the XML carefully — three abbreviations in one line, three different shapes.逐行对照 Leiden 与 XML 的第一行,三种缩写形态, 三种处理方式, 都装在同一行里。
A tablet from the theatre of Catania卡塔尼亚剧场出土的牌匾
A tablet of Proconnesian marble from the Marmara quarries, originally fixed to the base of a statue of the Genius (protective spirit) of the city of Catania. The stone was found on 18 May 1770, during the excavations of the ancient theatre conducted by the Principe di Biscari, and now lives in the Museo Civico di Catania.一块普罗孔尼苏斯大理石板(出自马尔马拉海的古采石场), 原本钉在卡塔尼亚城守护神(Genius)雕像的底座上。出土于 1770 年 5 月 18 日, 当时比斯卡里王子正主持发掘卡塔尼亚的古剧场。如今藏于卡塔尼亚市立博物馆。
The dedicator was Facundus Porfyrius Mynatidius, a vir clarissimus and consularis — the highest-ranking governor of late-Roman Sicily.奉献者是 Facundus Porfyrius Mynatidius, 身份为vir clarissimus(显贵元老)兼consularis(执政官级总督),晚期罗马西西里岛的最高行政长官。
SAECVLIS DDD NNN
GENIO SPLENDIDAE VR-
BIS CATINAE
FACVNDVS PORFYRIVS
MYNATIDIVS V·
CONS · EIVSDEM
What does ddd · nnn mean?ddd · nnn 是什么?
On the stone you see d followed by two more small suspended dd, then n followed by two more nn. The intended Latin is dominorum nostrorum — "of our lords."
石上所刻是一个 d, 后跟两个略小、位置稍高的 dd, 再写一个 n, 后跟两个略小的 nn。所代表的拉丁文为 dominorum nostrorum,“我们诸位君主之”。
But why three d's and three n's? The word dominorum doesn't start with three d's. The repetition is a plural marker. Late-Roman convention: doubling/tripling the initial letter signals plurality of the noun. Three lords = three reigning emperors (a tetrarchic-or-later imperial configuration).
石上所见: d 后接两个小 dd, 然后 n 后接两个小 nn。所欲表达的拉丁文为 dominorum nostrorum“我等主上”。但 dominorum 并不以三个 d 开头。这种重复是复数标记: 晚期罗马惯例中, 首字母的双重或三重表示该名词为复数。三重 d = 三位在位元首 (四帝共治或更后期的多帝并立体制)。
Dating logic: three Ds and three Ns means three Augusti reigning simultaneously. That narrows the inscription to one of six joint-reign periods in the 4th–5th centuries: 337–340, 367–378, 379–383, 388–395, 402–408, or 421. The consensus settles on a 4th-century date.断代由此而出: 三个 D 加三个 N, 表示同时有三位奥古斯都在位。这把本铭的年代收紧到四、五世纪的六个并立期之一:337—340、367—378、379—383、388—395、402—408、或 421。学界共识是公元四世纪某一期。
<am> elementAbbreviation mark, not abbreviation letters缩写标记, 而非缩写字母
If you encoded ddd naïvely as three letters inside <abbr>:
若把 ddd 老实当作 <abbr> 内的三个字母去编码:
<expan><abbr>ddd</abbr><ex>ominorum</ex></expan>
…you'd be claiming the stone carries three letters of the word dominorum. That's false. The first d belongs to the word; the second and third are plural-marker decoration. EpiDoc's <am> ("abbreviation mark") wraps the decoration:
……便等于宣称石上确刻了 dominorum 一词的三个字母,而这是错的。只有第一个 d 属于该词; 第二、第三个 d 不过是复数标记之装饰。EpiDoc 以 <am>(“缩写标记”)包住这层装饰:
<expan><abbr>d<am>dd</am></abbr><ex>ominorum</ex></expan>
Now the structure says: stone has d; that d is followed by a plural-marker device dd; the editor expands d to dominorum; the plurality is already attested by the <am>, so the expansion doesn't need to add a marker of its own.
现在的结构含义是: 石上有 d; 其后跟着复数标记装置 dd; 编者将 d 展开为 dominorum; 复数已由 <am> 承载, 展开中无需再重复表达。
The cutter's hand: Prag's lettering note observes that "A appears with both broken and straight bar; E appears both as a tall narrow standard E and as a vertical stroke with a single horizontal bar across the middle; V appears both as V and as Y. Interpuncts are mostly absent, except after VC and CONS in the final two lines." Letter sizes go 60–80 mm (line 1), then 40–60 mm (lines 2–4), then 50–70 mm (lines 5–7) — the cutter shrank, then grew the script again as space ran out.关于刻工之手: Prag 在字形说明里观察道:"字母 A 既见折角横, 也见直横; E 既作高瘦的标准 E, 也只作中横一笔的简化形; V 既作 V 也作 Y。间隔点 (interpunct) 多数省略, 只在末两行 VC 与 CONS 之后才出现。" 字高自上而下变化:第 1 行 60—80 毫米, 第 2—4 行 40—60 毫米, 第 5—7 行 50—70 毫米,刻工先缩字, 临到末尾空间将尽, 又把字放大了回去。
Latin grammar drives the encoding拉丁语法决定编码方式
If the stone simply read ddd with three full-sized letters and no convention of plural marking, you'd use the naïve encoding. But ddd in this context is a known formal device with a known meaning. The encoding should reflect what is grammatically true, not what looks superficially like three letters.
若石上仅刻三个等大 d 且无复数标记惯例, 才可用朴素编码。但此处 ddd 是一种已知的格式化装置, 具有明确含义。编码应反映语法事实, 而非表面看起来的“三个字母”。
编码告诉读者: 哪些字母拼出单词, 哪些字母表达惯例。二者作用不同。
Editor's commentary on cons · eiusdem (line 7): Jonathan Prag notes that "the precise significance of cons eiusdem is extensively debated and several alternatives have been suggested: the most likely interpretation is consularis eiusdem, signifying consularis of the same city — i.e. Catania, and by extension the province of Sicily." Encoding it as cons(ularis) rather than (e.g.) cons(ul) records the editor's decision, and links the inscription to the late-Roman provincial governance vocabulary.编者关于第七行 cons · eiusdem 的注: Jonathan Prag 指出, cons eiusdem 的精确所指学界争论已久; 较可信的释读是 consularis eiusdem, 即"同一城的执政官级总督",此处指卡塔尼亚, 延伸而指整个西西里行省。在编码中把它展为 cons(ularis) 而非(例如)cons(ul), 即是把编者的判断写进了 XML, 也把这件铭文挂上了晚期罗马行省治理术语的索引。
v, c, cons三个简单缩写
<expan><abbr>v</abbr><ex>ir</ex></expan> <expan><abbr>c</abbr><ex>larissimus</ex></expan> <expan><abbr>cons</abbr><ex>ularis</ex></expan>
Single-letter (v, c) and truncation (cons) abbreviations both use the standard <expan> shape. No <am> needed — these letters genuinely are parts of vir, clarissimus, consularis.
单字母缩写 (v, c) 与截短缩写 (cons) 都使用标准 <expan> 形态, 无需 <am>,这些字母确实是 vir、clarissimus、consularis 的组成部分。
<persName> can span <lb><persName> 可跨 <lb>
The dedicator's name occupies lines 5 and 6: Facundus Porfyrius / Mynatidius. EpiDoc is structural, not visual — one name stays inside one <persName>, even when a line break interrupts:
奉献者之名横跨第 5、6 行:Facundus Porfyrius / Mynatidius。EpiDoc 关注的是结构, 不是版式,一个人的名字依然装在一个 <persName> 之内, 即使中间被换行所打断:
<lb n="5"/><persName type="attested"> <name>Facundus</name> <name>Porfyrius</name> <lb n="6"/><name>Mynatidius</name> </persName>
The <lb> sits between two siblings of the parent element. Two separate <persName> blocks would mean two different people — which is wrong: this is one man with three names.
<lb> 夹在父元素的两个子元素之间。若分成两个 <persName> 就意味着两个不同的人,实际上这是同一人三段名。
break="no" for word continuationWhen a word breaks across lines单词跨行
Line 3 ends with ur-, line 4 begins with bis. Together: urbis ("of the city"). The line-break is internal to the word. Mark it:
第 3 行末为 ur-, 第 4 行开首为 bis; 合而读之即 urbis(“城的”, 属格)。换行发生在单词内部。如此标记:
<lb n="3"/>Genio splendidae ur <lb n="4" break="no"/>bis <placeName>Catinae</placeName>
@break="no" is the EpiDoc equivalent of a hyphen at the end of a printed line. A renderer can show urbis as a linked word (with the original split visible) or as a single token, depending on the user's preference.
@break="no" 相当于印刷品行末连字号。渲染器可根据用户偏好将 urbis 显示为完整单词 (但可见原始分行) 或单一词项。
Exercise 03 assembled完整编码
<div type="edition" subtype="primary" xml:space="preserve" xml:lang="la">
<ab>
<lb n="1"/>Vernantibus
<lb n="2"/>saeculis <expan><abbr>d<am>dd</am></abbr><ex>ominorum</ex></expan>
<expan><abbr>n<am>nn</am></abbr><ex>ostrorum</ex></expan>
<lb n="3"/>Genio splendidae ur
<lb n="4" break="no"/>bis <placeName>Catinae</placeName>
<lb n="5"/><persName type="attested"><name>Facundus</name> <name>Porfyrius</name>
<lb n="6"/><name>Mynatidius</name></persName>
<expan><abbr>v</abbr><ex>ir</ex></expan>
<expan><abbr>c</abbr><ex>larissimus</ex></expan>
<g ref="#interpunct">·</g>
<lb n="7"/><expan><abbr>cons</abbr><ex>ularis</ex></expan>
<g ref="#interpunct">·</g> eiusdem
</ab>
</div>
Editions: Mommsen, CIL X.2 (1883) 7014 · Dessau, ILS (1892) 3778 · Manganaro 1959, 5–10 fig.1 · Wilson 1990, 187 fig.156.b · Korhonen 2004, 7. Editor: Jonathan Prag (I.Sicily, last rev. 19 Jan 2021).主要版本:Mommsen《CIL》 X.2 (1883) 7014 · Dessau《ILS》 (1892) 3778 · Manganaro 1959, 5–10 图 1 · Wilson 1990, 187 图 156.b · Korhonen 2004, 7。本电子版编者:Jonathan Prag (I.Sicily, 末次修订 2021 年 1 月 19 日)。
Now build it — practice in the playground动手练习 · 在工坊里编码 Genius of Catania
In the playground, type a few common abbreviations and watch the engine split them into <abbr>+<ex>+<am>.在工坊试验场中, 输入几个常见缩写, 看引擎把它们拆成 <abbr>+<ex>+<am>。
- 1Switch to the Playground tab.切到试验场选项卡。
- 2Type
Imp(erator)·Cae̅s(ar)·D(omino) n(ostro)one per line.逐行输入:Imp(erator)、Cae̅s(ar)、D(omino) n(ostro)。 - 3Notice how
Cae̅s(with the overline) gets<am>inserted; the others get only<expan>.注意Cae̅s(带横线者)被加上<am>; 其他两个只生成<expan>。 - 4Read the rendered output: it should look like a printed edition again, brackets and all.读右栏的还原结果:它应再次像印本,方括号、圆括号一应俱全。
04Nabor SurniaNabor Surnia 之墓志
拉丁字母布匿文 · 公元 3—4 世纪
A Tripolitanian funerary inscription that mixes Latin script with Neo-Punic vocabulary. Introduces D M S as a divine address, <date dur="P80Y"> for machine-readable lifespans, <gap> with quantitative attributes, and <orig> for untranslated stretches.的黎波里塔尼亚的一篇墓志, 拉丁字母与新布匿(Neo-Punic)词汇混用。本题引入四件新工具:D M S 作为神祇呼告、<date dur="P80Y"> 用于机器可读的寿命数据、带数量属性的 <gap>、以及用 <orig> 标记暂时无法翻译的片段。
Before you encode, look at it four ways编码之前, 先从四面看它一遍
A Latino-Punic funerary inscription — Latin script, Neo-Punic vocabulary. Pay attention to how the encoding handles language at word level, not just at file level.一篇拉丁字母布匿文墓志,字母是拉丁的, 词汇是新布匿的。看编码如何在词这一层处理语言切换, 而非只在文件层。
- Leiden view:
D · M · Sat the top is a fixed funerary formula — three words in three letters.Leiden 视图:开头D · M · S是固定丧葬套语,三字三词。 - Web view: hover on the lifespan to see the ISO-8601 machine-readable duration (
P80Y= 80 years).Web 视图:把鼠标悬停在寿命上, 看见 ISO-8601 格式的机器可读时长(P80Y= 八十年)。 - Database: the
origrows mark Neo-Punic vocabulary that has no Latin translation.Database 视图:orig行标出无拉丁译文的新布匿词汇。 - XML: notice
<gap>with quantitative attributes — the editor counted the missing characters.XML 视图:留意带数量属性的<gap>,编者数过了缺失的字数。
One script, two languages一种字母, 两种语言
The Punic-speaking populations of Roman Tripolitania kept writing their Semitic language using Latin letters long after Phoenician script went out of use. Analogously to modern speakers of Cantonese sometimes writing in romanization. The technical label is Latino-Punic, ISO code xpu-Latn.
罗马时代的的黎波里塔尼亚, 操布匿语(闪族语)的居民, 即便腓尼基字母早已弃用, 仍长期以拉丁字母书写自己的母语。这种情形, 一如今日有些粤语使用者以罗马字记下口语。学界称之为“拉丁字母布匿文”(Latino-Punic), ISO 代码 xpu-Latn。
<div type="edition" xml:lang="xpu-Latn" xml:space="preserve">
The xml:lang attribute uses ISO 639's hyphen-suffix to indicate a non-standard script. Same convention as zh-Hant for Traditional Chinese or sr-Latn for Latin-alphabet Serbian.
罗马治下的的黎波里塔尼亚, 布匿语使用者长期用拉丁字母书写其闪族语言, 远晚于腓尼基字母的废止。这种现象学术上称拉丁字母布匿文 (ISO 代码 xpu-Latn)。同样的写法在 zh-Hant (繁体中文)、sr-Latn (拉丁字母塞尔维亚语) 中亦可见。
D M S as divinityThe standard funerary formula墓志套语
Roman tombs nearly always open with Dis Manibus ("to the divine shades") or Dis Manibus Sacrum ("sacred to the divine shades"). The Manes are the gods of the underworld — addressees of the dedication.
罗马墓志几乎都以 Dis Manibus(“献给亡灵”)或 Dis Manibus Sacrum(“献给亡灵, 神圣不可侵”)开篇。Manes 是冥府之神,此处即是奉献的对象。
<persName type="divine" ref="#manes"> <w lemma="deus"><expan><abbr>D</abbr><ex>is</ex></expan></w> <w lemma="manes"><expan><abbr>m</abbr><ex>anibus</ex></expan></w> </persName> <w lemma="sacer"><expan><abbr>s</abbr><ex>acrum</ex></expan></w>
type="divine" classifies the Manes as gods. @ref="#manes" links to the project's authority list — every D M across the corpus resolves to the same divine entity. Sacrum stays outside the <persName>: it's an adjective qualifying the dedication, not part of the gods' name.
type="divine" 将 Manes 归为神祇类。@ref="#manes" 连接到项目权威表,语料库中所有 D M 都解析到同一神祇实体。Sacrum 留在 <persName> 之外: 它是形容词, 修饰整个奉献行为, 不是神名的一部分。
ISO 8601 duration on <date>机器可读的寿命数据
The stretch auo sanu n LXXX records the deceased's age — 80 years. Wrap as:
石上 auo sanu n LXXX 一段, 记录的是亡者的寿数,八十岁。如此包装:
那一段 auo sanu n LXXX 记录的是亡者的寿数,80 岁。如此包装:
<date type="life-span" dur="P80Y"> <w lemma="auo">auo</w> <lb n="5"/><w lemma="sanuth">sanu</w> <lb n="6"/><w lemma="numerus"><expan><abbr>n</abbr><ex>umero</ex></expan></w> <num value="80">LXXX</num> </date>
@dur="P80Y" is ISO 8601 duration format: "Period of 80 Years." Now «find every tombstone of someone who died older than 70» becomes one SPARQL query.
@dur="P80Y" 是 ISO 8601 时段格式: “80 年之期”。“找出所有 70 岁以上的墓主”即变为一条 SPARQL 查询。
Saying how much is missing陈述缺失多少
The bottom-left of Nabor's stone is damaged. The editor estimates 2, 2, and 4 lost characters at the start of lines 8, 9, 10:
Nabor 之石的左下角已残。编者据残痕估计:第 8、9、10 行开首分别缺失约 2、2、4 个字母:
<lb n="8"/><gap reason="lost" quantity="2" unit="character"/><orig>milim e</orig> <lb n="9"/><gap reason="lost" quantity="2" unit="character"/><orig>duo</orig><space quantity="1" unit="character"/> <lb n="10"/><gap reason="lost" quantity="4" unit="character"/><orig>s</orig>
The triple @reason / @quantity / @unit tells software what kind of absence (lost vs. illegible vs. omitted), how much (a count), and of what (characters, lines, words). This is the standard EpiDoc lacuna shape.
三元组 @reason / @quantity / @unit 告诉软件: 是什么类型的缺失 (失落、不可辨、省略)、缺失多少 (计数)、缺失什么 (字符、行、词)。这是 EpiDoc 标记残缺的标准形态。
Editor's commentary (Reynolds & Ward-Perkins, IRT 1952): "Lines 8–10: the left-hand margin of the stone is missing but it is not clear whether any letters have been lost." This is the canonical reason to use <gap reason="lost" extent="unknown"/> rather than a quantified gap — the editor honestly doesn't know.编者注 (Reynolds 与 Ward-Perkins, IRT 1952): "第 8—10 行:石头的左缘已残, 但是否确有字母丢失, 尚不明朗。" 这正是该用 <gap reason="lost" extent="unknown"/> 而非带具体数量的阙文的标准理由,编者老实承认自己不知道丢了几个字。
<orig> as a semantic placeholder<orig> 作为语义占位
The lower lines of Nabor's stone contain Punic words (nsath fo, milim e, duo, trailing s) that scholars cannot securely translate. Mark them as original-as-received:
Nabor 之石下半部分有几组学者尚不能确译的布匿词汇(nsath fo、milim e、duo, 以及末尾的 s)。将其标记为“原貌, 一仍其旧”:
<orig>nsath <unclear>fo</unclear></orig>
<orig> says: "I'm keeping this exactly as the stone has it, without normalising or interpreting." Its sibling <reg> would supply a regularised reading. Using <orig> alone is honest about ignorance — and leaves the door open for a future Punic specialist to add <reg> later.
<orig> 的含义是: “原样保留, 不正规化, 不解释”。其同伴标签 <reg> 提供规范化读法。仅用 <orig> 即为诚实地承认无知, 并为将来精通布匿语的学者预留补充 <reg> 的空间。
The little panel on the side侧面板
Nabor's stone has a second small inscribed panel (b) that carries only ny / fo. Encode each panel as a <div type="textpart">:
Nabor 之石另有一小面板(称为 b 面板), 仅刻 ny / fo 数字。两个面板各以一个 <div type="textpart"> 编码:
<div type="edition" xml:lang="xpu-Latn" xml:space="preserve">
<div subtype="section" n="a" type="textpart"><ab>…ten lines…</ab></div>
<div subtype="section" n="b" type="textpart">
<ab><lb n="1"/><orig>ny</orig> <lb n="2"/><orig>fo</orig></ab>
</div>
</div>
The @n="a" / @n="b" labels let apparatus entries refer to specific panels: <app loc="b.1">. We'll meet this pattern again in Exercise 07.
@n="a" / @n="b" 标签让校勘记可针对具体面板引用 (如 <app loc="b.1">)。此模式在练习 07 中将再次出现。
The main text — what IRT calls panel a — fills the moulded frame from top to bottom. But look at line 7: where the cutter ran out of words, someone later squeezed in a second small inscription, panel b, in smaller letters in the vacant space.主文(IRT 称之为“a” 面板)在凹陷的框线内自上而下占满整个版面。但看第七行,主文写不下了, 留下一小块空白, 后来又有人用更小的字, 在那块空白里挤进了第二段铭文, 即“b” 面板。
Panel b reads: NY / FO — two lines, two letters each. The editors' verdict: "not translatable." J. M. Reynolds left both English translations bracketing this fact: "b. (not translatable.)"“b”面板所刻: NY / FO,两行, 每行两个字母。J. M. Reynolds 的翻译至此搁笔, 只写了一句 "b. (not translatable.)",不可译。
In the XML this is encoded as a second <div type="textpart" subtype="addition">, sibling to the main text. The encoding records the social fact: a stone's text can grow after the first carving.在 XML 中, 这一段被编为第二个 <div type="textpart" subtype="addition">, 与主文并列。编码所记录的, 不仅是文本, 还是一桩社会事实:一块石头上的字, 在首刻之后, 还会继续生长。
Now build it — practice in the playground动手练习 · 在工坊里编码 Nabor Surnia
This is your first multilingual exercise. Open the Multilingual editor and add Latin + a Punic placeholder.本题是你第一道多语题。打开多语编辑器, 添加拉丁与一个布匿占位语言。
- 1Open Multilingual editor → pick "Blank template".打开多语编辑器, 选 "Blank template"。
- 2Click "+ Add language" — pick Latin, then Hebrew (closest to Neo-Punic in the menu).点 "+ Add language",选 Latin, 再选 Hebrew (菜单中最接近新布匿的语言)。
- 3Type the
D · M · Sformula in the Latin pane.在拉丁栏中输入D · M · S套语。 - 4Notice the lang-badges next to each field — every translation now carries a language tag.注意每个字段旁的语言徽标,每条译文都带上了语言标签。
Transcribed by R. Goodchild (1948) → IRT 1952 894 (Reynolds & Ward-Perkins) → Elmayer 1997, 60 → IRT 2009 894 → EDH 059737 → Kerr 2010, 192. Photos courtesy Ward-Perkins Archive, British School at Rome.转录传承:R. Goodchild (1948) → 《IRT 1952》894 号 (Reynolds 与 Ward-Perkins) → Elmayer 1997, 60 → 《IRT 2009》894 → EDH 059737 → Kerr 2010, 192。图像由英国罗马学院 Ward-Perkins 档案提供。
05NeikaisNeikais 之墓志
昔兰尼加希腊文墓志 · 公元前 1 世纪—公元 1 世纪 · 全篇希腊文之首题
Three new skills: (a) Greek dates in Egyptian-style format, (b) the year-symbol that replaces the word ἔτος, (c) restoring Greek damage with <supplied reason="lost"> across line breaks.
三项新技能: (a) 埃及式希腊文日期格式; (b) 代替 ἔτος 字的“年”符号; (c) 用 <supplied reason="lost"> 跨行补缺。
Before you encode, look at it four ways编码之前, 先从四面看它一遍
A Greek funerary inscription from Cyrenaica. Includes a Greek L-symbol for "year" — the workshop's tightest single-glyph encoding lesson.一篇昔兰尼加希腊文墓志, 含希腊文 "L 符号"表示"年",工坊里最紧凑的单符号编码之课。
- Leiden: spot the L-symbol — three lines down. Looks like a regular Greek letter but is not.Leiden 视图:第三行可见 "L" 符,看似希腊字母, 其实不是。
- XML: that one glyph becomes six nested elements —
<date>,<num>,<g>, plus attributes.XML 视图:这一个符号被装入六层嵌套,<date>、<num>、<g>, 还有诸属性。 - Web: hover on the date to see godot.date — every date here has a stable URL.Web 视图:把鼠标悬停在日期上, 看见 godot.date,这里每个日期都有一个稳定 URL。
- Database: note how godot.date IDs join across the whole corpus.Database 视图:看 godot.date 的 ID 如何在整个语料库内串连。
What the stone shows石上所刻
A rock-cut tomb at Taucheira (modern Tocra) on the Cyrenaican coast. The façade carries multiple panels (T.451–T.459), recording family burials. This panel commemorates Νείκαις, child of Ἀνδροσθένης, who lived 15 years.
昔兰尼加海岸 Taucheira (今 Tocra) 的岩凿墓。正立面上多块面板 (T.451–T.459) 记录家族葬。本面板纪念 Νείκαις (Ἀνδροσθένης 之子), 享年 15 岁。
ὶ─
Νείκαις
Ἀν[?δ?]ροσ-
θένευς
L ιε
The leading L on lines 1 and 6 is the year-symbol. The dashes mark missing parts of the date and the patronymic.
第 1 和第 6 行前的 L 是“年”字符号; 破折号表示日期与父名中残缺的部分。
The panel measures roughly 24.5 × 33 cm, outlined in low relief, with letters about 4 cm high. L appears as the symbol for ἔτους ("year") and ἐτῶν ("aged"); the sigma is squared. Surface erosion has eaten most of the year and month name.面板约 24.5 × 33 厘米, 四周有浅浮雕边线, 字高约 4 厘米。L 作为符号同时代表 ἔτους(年)与 ἐτῶν(岁); sigma 作方形。石面已严重风化, 大半的年序与月份名都被风蚀吃掉。
Interpretive reading (Reynolds 1983):(ἔτους) [.] Με[χ-]ὶ[ρ---]ΝείκαιςἈνδ̣[ροσ-]θένευς(ἐτῶν) ιε΄
"Year ?, Mecheir ?, Neikai(o)s [scil. son] of Androsthenes, aged 15."Reynolds 1983 之释文:(ἔτους) [.] Με[χ-]ὶ[ρ---]ΝείκαιςἈνδ̣[ροσ-]θένευς(ἐτῶν) ιε΄
"第?年, Μεχίρ 月某日。Androsthenes 之子 Neikai(o)s, 享年十五。"
One little glyph, six layers of XML一个小符号, 六层 XML
The L-shaped tick at the start of the date is shorthand for ἔτους ("year, gen."). It's an abbreviation, but instead of a letter, the abbreviation IS a glyph. The canonical encoding is six layers deep:
日期开头那个 L 形的小符号, 即代表 ἔτους(“年”, 属格)。它确是缩写, 只是缩去的不是字母,缩写本身就是一个图形符号。标准编码共有六层:
<w lemma="ἔτος">
<expan>
<abbr>
<am>
<g ref="symbols.xml#year"/>
</am>
</abbr>
<ex>ἔτους</ex>
</expan>
</w>
Read from inside out: glyph → abbreviation mark → abbreviation → expansion → expansion-device → word. Each layer adds a different kind of information.
由内而外读: 符号 → 缩写标记 → 缩写形 → 编者展开 → 整个缩写装置 → 词项。每一层都贡献不同类型的信息。
Found in context, encoded in isolation: on the rock wall, this L is one of several markers on a single façade that holds nine inscriptions (T.451–459). The encoding strips it from its tomb and gives it six nested elements so that across IRCyr2020's ~2,300 inscriptions, every "year-symbol followed by numeral" can be searched, sorted, and dated as one consistent kind of object.同壁数铭, 各自成档: 在原石上, 这个 L 不过是同一面崖壁(T.451—459 共九块面板)中的一个小记号。EpiDoc 把它从墓壁剥离, 套上六层嵌套, 使 IRCyr2020 全集约 2,300 件铭文中所有"L 符 + 数字"的组合, 都能作为同一类对象被检索、排序、断代。
Each layer earns its place每层皆有所司
<g>— the glyph itself, defined in the corpus's character list.<am>— "this glyph IS the abbreviation mark for the word." Without<am>, the glyph would look like decoration.<abbr>— what appears on the stone, in lieu of the spelled-out word.<ex>— the editor's expansion, the actual Greek word.<expan>— the device as a whole, what enables the renderer to switch between «show the glyph» and «show ἔτους».<w>— the word as a lexical unit, with its dictionary form in@lemma.
每一层都对应一种独立的信息: 实物符号 / 标记功能 / 石面所见 / 编者展开 / 整体装置 / 词条归并。缺一不可。
Restoring Μεχίρ补足 Μεχίρ
The month name Μεχίρ (Mecheir, ≈ Jan–Feb) is split across lines 1 and 2 with two letters missing:
月份名 Μεχίρ(Mecheir, 约公历一二月间)跨第 1、2 行, 中间缺失两个字母:
<rs type="month" key="mechir">
<w lemma="Μεχίρ">
Με<supplied reason="lost">χ</supplied>
<lb n="2" break="no"/>
ὶ<supplied reason="lost">ρ</supplied>
</w>
</rs>
Two <supplied reason="lost"> elements restore the missing letters. The <lb n="2" break="no"/> says the word continues across the line break. The whole sequence is wrapped in <rs type="month" key="mechir"> — a referring string identifying this stretch as the Egyptian month Mecheir.
两处 <supplied reason="lost"> 补出缺失字母; <lb n="2" break="no"/> 标示单词跨行延续; 整段包于 <rs type="month" key="mechir"> 中, 指明此为埃及历 Mecheir 月。
Why "Mecheir" (Μεχίρ)? An Egyptian month — the sixth in the calendar reformed under Augustus — roughly 26 January to 24 February. Reynolds saw the first two letters on the stone and supplied the rest. Mecheir dating in Cyrenaica is a thin but real strand of the calendar continuum between Egypt and the Greek-speaking world.为何作 "Mecheir" (Μεχίρ)? 此乃埃及历的第六月; 奥古斯都改历之后, 大致对应公历 1 月 26 日—2 月 24 日。Reynolds 在石上仅辨出 Mecheir 的头两个字母, 余皆由她补足。在昔兰尼加, Mecheir 系年的铭文虽稀, 却确是埃及与希腊语世界之间历法接续的一脉。
When you can't count, say so数不清时, 明言之
After the month-name, the day-number is gone. We don't know whether it was 7, 17, or 27. Use extent="unknown", not a fake quantity:
月份名之后, 日期数字已佚。我们无从分辨是初七、十七还是二十七。此时应用 extent="unknown", 而不是凭空填一个 quantity:
<gap reason="lost" extent="unknown" unit="character"/>
This is editorial honesty made queryable. A statistical study can later ask «what fraction of dates in this corpus have lost day-numbers?» and get a real answer.
这是把编者诚实变为可查询的属性。统计研究可问“语料库中有多大比例的日期缺失日数?”并获得真实答案。
godot.date — every date has a URLgodot.date,每个日期都有 URL
The whole date construction sits inside a <date> element whose @ref points to godot.date, an authority project that gives each ancient date a stable identifier:
整段日期被包入一个 <date> 元素, 其 @ref 指向 godot.date,一个为每一古代年代分配稳定标识符的权威项目:
<date ref="https://godot.date/id/BcwJL846woVx95NLsfd9PK"> <!-- L (year-symbol) gap-1 Mεχίρ gap-unknown --> </date>
Even though this date is partly lost (we don't know the regnal year or the day), the editor's best estimate is tied to a canonical godot.date entry. Linked data in practice: two independent scholarly projects (the inscription corpus and the date authority) talk through stable identifiers.
尽管此处日期部分残缺 (王年和日期皆缺), 编者的最佳估计仍挂在 godot.date 的规范条目上。这就是实践中的链接数据: 铭文语料库与日期权威, 通过稳定标识符对话。
This inscription's actual godot.date URL: godot.date/id/BcwJL846woVx95NLsfd9PK ↗. Click and you see: regnal year unknown, month Mecheir ("month 6 of the Cyrenaican / Alexandrian calendar"), every date attestation Reynolds 1983 records, and links to other inscriptions Reynolds dated the same way.本铭对应的 godot.date 链接: godot.date/id/BcwJL846woVx95NLsfd9PK ↗。点开即可见:在位年序不详、月份为 Mecheir(昔兰尼加—亚历山大历第六月)、Reynolds 1983 所记的所有相关日期标注, 以及由她同样断代的其他铭文之链接。
Nested <persName>嵌套 <persName>
Greek family identity is a chain of genitives: "Neikais of Androsthenes of Lysippos of …". EpiDoc encodes this with nested <persName>:
希腊式的家世表述是一条属格的链:“Neikais, Androsthenes 之子, Androsthenes 又为 Lysippos 之子……”EpiDoc 用嵌套的 <persName> 编码这种关系:
<persName type="attested" key="lgpn:V1-60818">
<name nymRef="Νίκαιος">Νείκαις</name>
<persName type="attested" key="lgpn:V1-60027">
<name nymRef="Ἀνδροσθένης">Ἀν<unclear>δ</unclear>
<supplied reason="lost">ροσ</supplied>
<lb n="5" break="no"/>θένευς</name>
</persName>
</persName>
Outer <persName> = the deceased. Inner = the father. A query for «all sons of Androsthenes in the Cyrenaican corpus» can find them even when the father appears only as a genitive modifier.
外层 <persName> = 死者; 内层 = 父亲。“在昔兰尼加语料库中找出所有 Androsthenes 的儿子”即可查到, 哪怕父亲名仅以属格修饰语形式出现。
A genitive of paternity: on a Greek funerary stone, X (son/daughter) of Y is the standard formula. Here Neikais' father is Androsthenes; the stone shows only "Ἀνδ̣[ροσ-]θένευς" — genitive case, with the medial consonant supplied. Nesting Androsthenes' inner <persName> inside the outer one for Neikais means a SPARQL query can find "all children of Androsthenes in Cyrenaica" without ever parsing prose.属格表父名: 希腊墓志的标准套语是“某人, 某人之子(女)”。此处 Neikais 之父名为 Androsthenes, 石上仅刻 "Ἀνδ̣[ροσ-]θένευς",取属格, 中间辅音由编者补出。在 XML 中, Androsthenes 的 <persName> 嵌套于 Neikais 的 <persName> 之内; 这使日后用 SPARQL 查询"昔兰尼加所有 Androsthenes 之子女"时, 不必再解析散文文本, 一查即得。
ιε = 15ιε = 15
Greek used its letters as numerals: α = 1, β = 2, …, ι = 10, ε = 5, so ιε = 15. Wrap as <num value="15">ιε</num> inside a life-span date:
希腊文以字母兼作数字:α = 1、β = 2、……、ι = 10、ε = 5, 故 ιε = 15。在记寿数的日期之内, 包装为 <num value="15">ιε</num>:
<lb n="6"/><date type="life-span" dur="P15Y"> <w lemma="ἔτος"><expan><abbr><am><g ref="symbols.xml#year"/></am></abbr><ex>ἐτῶν</ex></expan></w> <num value="15">ιε</num> </date>
The L-symbol now expands to the genitive plural ἐτῶν (not ἔτους) because it goes with "years" in the plural after the numeral.
同一个 L 符号, 此处展开为属格复数 ἐτῶν(而非单数属格 ἔτους),因为接在数字之后, 与"年"取复数。
“年”字符号此处展开为属格复数 ἐτῶν (不是 ἔτους), 因数字之后接复数。
Now build it — practice in the playground动手练习 · 在工坊里编码 Neikais
Practice typing Leiden into XML — focus on the <supplied> wrapping that recovers a Greek month name.练习把莱顿写成 XML,重点放在 <supplied> 上, 它在此用来补出残缺的希腊月份名。
- 1Switch to Playground.切到试验场。
- 2Type the Leiden line
[Με]χίρ ιε(the month Mecheir, day 15).输入莱顿行[Με]χίρ ιε(Mecheir 月第 15 日)。 - 3Right pane: confirm
<supplied reason="lost">Με</supplied>appears, followed by the rest.右栏:确认<supplied reason="lost">Με</supplied>出现, 后续字符依次接上。 - 4Add
L'in front (the L-symbol + apostrophe) — see what happens with the glyph encoder.在最前面加上L'(L 符 + 撇号), 看符号编码器如何处理。
Transcribed by Joyce Reynolds → Reynolds 1983 16.c → SEG 33.1414 → PHI 324595 → IRCyr2020 T.453 (eds. Ch. Roueché, J. Reynolds & G. Bodard). Photographs from the Tocra archive, courtesy King's Digital Lab, CC-BY.转录传承:Joyce Reynolds → 《Reynolds 1983》16.c → SEG 33.1414 → PHI 324595 → IRCyr2020 T.453 (主编:Ch. Roueché、J. Reynolds、G. Bodard)。图像出自托克拉档案, 由 King's Digital Lab 提供, CC-BY 许可。
06Palmatos献给 Palmatos 的颂诗
阿弗洛狄西亚荣誉颂诗 · 公元 5 世纪晚期
The workshop's first verse text. Introduces <lg>/<l> for meter and the three-layer encoding of an «ethnic» place-name-derived noun.
工坊首篇诗歌。引入 <lg>/<l> 标注格律, 以及由地名衍生的“民族名”三层包装。
Before you encode, look at it four ways编码之前, 先从四面看它一遍
An honorific verse in elegiac couplets — your first poem. Look how verse markup (<lg>, <l>) sits inside the same edition structure as prose.一首六五对句体颂诗,你的第一首诗。注意诗体标签(<lg>、<l>)如何与散文共用同一套版式结构。
- Leiden: read the four lines as verse — the layout matters here in a way it didn't for prose.Leiden 视图:把四行当诗去读,此处版式本身就是意义, 与散文不同。
- XML:
<lg>wraps the whole quatrain; each<l>is one line of the meter.XML 视图:<lg>包整首; 每一<l>即一行格律。 - Web: the ethnic "the Carians" is a clickable place-name → Pleiades.Web 视图:民族名"卡里亚人"是可点击的地名,链至 Pleiades。
- Database: lemmatized words — every noun has its
@lemma.Database 视图:词形已标,每个名词皆有@lemma。
An elegiac couplet for Palmatos献给 Palmatos 的哀歌对句
Παλμᾶτον ἰθυδίκην τόσσον ἀγασσάμενοι. (pentameter)
Translation (Roueché). «The Carians, remembering many benefits, and greatly admiring the rightly just Palmatos, [set up this statue].»
译文 (Roueché): “卡里亚人感念其无数恩泽, 深慕公允无私之 Palmatos, [立此像]。”
Survives only in the Planudean Anthology (Anth. Pal. 16.35). The Aphrodisias editors believe the stone was originally inscribed at Aphrodisias and later copied by a Byzantine scholar.
<lg> and <l>诗节组与诗行
Prose stretches go inside <ab>. Verse stretches go inside <lg> ("line group"), with individual lines as <l>:
散文段落装入 <ab>; 韵文段落装入 <lg>(line group, 行组), 每一行单独作 <l>:
<lg met="alegaic?"> <l n="1" met="hexameter">…</l> <l n="2" met="pentameter?">…</l> </lg>
The @met attribute records the metrical pattern. A question-mark in the value ("alegaic?") signals editorial uncertainty — kept literally in the value, not as a separate @cert attribute, by convention in the Aphrodisias corpus.
散文段落用 <ab>; 诗歌段落用 <lg> + 内含的 <l>。@met 记录格律; 值中的问号 ("alegaic?") 标示编者不确定,阿弗洛狄西亚语料库的惯例。
«The Carians» — three layers“卡里亚人”,三层包装
<rs type="subject" key="koinon">
<w lemma="Κᾶρ">
<placeName type="ethnic" full="yes" reg="Κάρ">Κᾶρες</placeName>
</w>
</rs>
Three layers, three facts:
三层包装, 各陈一桩事实:
<rs type="subject" key="koinon">— Functionally, this is the dedicator: the koinon, the corporate federation of Carian cities.<w lemma="Κᾶρ">— Lexically, it's a word.<placeName type="ethnic">— Grammatically, it's a noun derived from a place-name. «Ethnic» is a technical term in epigraphy for inhabitant-nouns: Athenians, Romans, Spartans, Carians.
三层、三种事实: 功能上是奉献者 (Carian koinon, 卡里亚共同体); 词汇上是一个词; 语法上是由地名衍生的“民族名”。“Ethnic”是铭文学专业术语, 指由地名衍生的居民称谓 (雅典人、罗马人、斯巴达人、卡里亚人)。
@reg recovers the nominative@reg 恢复主格
<persName key="Palmatus" type="aphrodisian" full="yes"> <name reg="Παλμᾶτος">Παλμᾶτον</name> </persName>
The inscribed form Παλμᾶτον is the accusative ("admiring Palmatos"). The @reg attribute supplies the canonical nominative Παλμᾶτος, so a database query for the name finds this attestation despite the case ending.
石上所刻 Παλμᾶτον 是宾格形式(“敬慕 Palmatos”)。@reg 属性补出该名的规范主格 Παλμᾶτος; 这样, 数据库无论以哪种格变化检索, 都能从此铭文中找到这条记录。
Compare with Exercise 05's @nymRef on <name> and key="lgpn:..." on <persName>: different attributes for different authority projects, all serving the same canonicalisation purpose.
石上 Παλμᾶτον 为宾格。@reg 给出规范主格 Παλμᾶτος, 使数据库查询不受格变化影响。比较练习 05 的 @nymRef 与 key="lgpn:...": 不同权威项目用不同属性, 但目的相同,规范化。
One lemma per word每词一个词典原形
Every Greek word in this couplet has a @lemma on its <w> wrapper:
这组对句中的每一个希腊词, 其 <w> 包装层上都带 @lemma 属性:
<w lemma="μνήμων">μνήμονες</w> <w lemma="πολύς">πολλέων</w> <w lemma="εὐεργεσία">εὐεργεσιάων</w> <w lemma="ἰθυδίκης">ἰθυδίκην</w> <w lemma="τόσος">τόσσον</w> <w lemma="ἄγαμαι">ἀγασσάμενοι</w>
Without lemmatization, a search for «benefactions» (εὐεργεσία) would miss this inscription because the inflected form on the stone is the genitive plural εὐεργεσιάων. With @lemma, every form of the word is findable through the dictionary entry.
没有词形归并, 搜索“恩泽”(εὐεργεσία) 会漏掉此铭文,石上形式为属格复数 εὐεργεσιάων。有了 @lemma, 任何形式都能通过词典原形找到。
The choice of meter is a statement格律之选, 即一种宣言
By the late 5th century CE, prose honorifics were the norm. Choosing a hexameter+pentameter elegiac couplet for Palmatos signals the late-antique cultural ethos of paideia: the honorand is being celebrated as a man of education, and the inscription's medium is itself a tribute to the literary culture he and his honorers shared.
至公元五世纪晚期, 颂辞已多以散文写成。为 Palmatos 题写一组六音步配五音步的哀歌对句, 是在示意一种古代晚期的paideia(古典教养)风尚:受颂者被作为一位“学养之士”来表彰, 而铭文本身的形式, 又正是对他与其颂者共同分享的文学文化的一份致敬。
The vocabulary is consciously Homeric — μνήμων, εὐεργεσία, ἰθυδίκης — placing Palmatos in the line of heroic exemplars. EpiDoc records this through @met + @lemma; the historical reading is the encoder's commentary.
公元 5 世纪晚期, 散文荣誉铭已是常态。选择六步格+五步格的哀歌对句体彰显古代晚期 paideia (博雅教育) 文化精神: 颂主作为有教养之人受表彰, 铭文形式本身即向其与颂主所共享的文学文化致敬。词汇有意采用荷马式 (μνήμων、εὐεργεσία、ἰθυδίκης), 把 Palmatos 置于英雄序列。EpiDoc 通过 @met + @lemma 记录其形式; 历史解读则由编者注释呈现。
Now build it — practice in the playground动手练习 · 在工坊里编码 Palmatos
Try the Document editor with a blank template — practice building <lg> from scratch.在文档编辑器中使用空模板,练习从零构建 <lg>。
- 1Open Document editor → "Blank template".打开文档编辑器, 选 "Blank template"。
- 2In the edition pane, type a couplet — two lines of Greek (or Latin) verse.在排印版栏中输入一组对句,希腊文或拉丁文皆可。
- 3Click "Build XML". Look how the editor wraps each line in
<l>and the pair in<lg>.点 "Build XML", 看编辑器如何把每行包入<l>, 整组包入<lg>。 - 4Compare with the canonical Palmatos XML in four-views.与 four-views 中 Palmatos 的 XML 视图比对。
07Ausanius执事 Ausanius
塞利努斯出土 · 拉丁文 + 希腊文墓志 · 公元四世纪末至五世纪
A bilingual funerary inscription: Latin proper, with two single Greek letters (Α and Ω, framing a Christogram) breaking the Latin line. Introduces <div type="textpart"> for code-switching, <textLang> for the sociolinguistic claim, and <roleName> for the ecclesiastical office.一篇双语墓志:主体为拉丁文, 而两个孤立的希腊字母 Α 与 Ω 夹住一个基督单字符号(☧), 把拉丁文的版式切开。本题引入三件工具:<div type="textpart"> 用以标记语言切换、<textLang> 用以记述社会语言学事实、<roleName> 用以编码教会职务。
Before you encode, look at it four ways编码之前, 先从四面看它一遍
A bilingual epitaph — Latin then Greek, on the same stone, for the same deacon. The XML shows two <div type="textpart"> blocks under a single edition.一篇双语墓志,同一块石上, 先拉丁、后希腊, 为同一位执事而作。XML 中可见同一版本内有两个 <div type="textpart">。
- Leiden: line 1 is Latin, lines 2-4 are Greek — but they're one edition.Leiden 视图:第 1 行拉丁, 第 2-4 行希腊,但属于同一版本。
- Web: the language toggle at the top switches the commentary between Latin/Greek contexts.Web 视图:顶部语言切换让评注在拉丁/希腊语境间切换。
- XML:
<div type="textpart" xml:lang="la">followed by<div type="textpart" xml:lang="grc">.XML 视图:<div type="textpart" xml:lang="la">紧随其后<div type="textpart" xml:lang="grc">。 - Database:
<textLang>rows record the bilingual fact at corpus level.Database 视图:<textLang>行在语料库层面记录此处双语事实。
One edition, two textparts一份刻文, 两个语段
<div type="edition" subtype="primary" xml:space="preserve">
<div type="textpart" subtype="section" n="a" xml:lang="grc">
<ab><lb n="1"/><w>ἀ</w> <w>ὠ</w></ab>
</div>
<div type="textpart" subtype="section" n="b" xml:lang="la">
<ab>…Latin lines…</ab>
</div>
</div>
The outer <div type="edition"> intentionally lacks @xml:lang — language is set on each <textpart>. A bilingual renderer can apply different fonts and bidirectional behaviour per panel.
外层 <div type="edition"> 不设 @xml:lang; 语言在每个 <textpart> 上局部声明。双语渲染器可对不同面板应用不同字体和文本方向。
<textLang> with sociolinguistic analysis语言层面的社会语言学标注
<textLang mainLang="la" otherLangs="grc" ana="#bilingualism.bilingual-phenomena.code-switching.tag-switching"> Latin and Ancient Greek </textLang>
@ana (analysis) points to a hierarchical taxonomy of bilingualism phenomena. "Tag-switching" is a specific contemporary-linguistics term: inserting a short token from one language while the main text stays in another. The Greek Α Ω is a religious tag inside a Latin biographical inscription. EpiDoc captures this as sociolinguistic data.
@ana 指向一个层级化的双语现象分类。“Tag-switching” (标签转换) 是当代语言学专业术语, 指主体保持一种语言, 仅插入另一语言的短小标签。希腊文 Α Ω 是嵌入在拉丁文传记中的宗教标签。EpiDoc 把它当作社会语言学数据来捕捉。
Why this stone matters sociologically: Selinus in late antiquity sat at a confessional boundary. The Museo Salinas placard cites Rossius (ap. Salinas, then Mommsen, then Bivona) on a hypothesis worth knowing: "The presence of a deacon in a locality that was not an episcopal see and that did not have a prominent role in the Roman period remains problematic. The name Ausanius (Auxanius outside Africa) has been related to Ausana, an episcopal see in proconsular Africa. This has led to the hypothesis that a group of African Christians might have settled in Selinunte after fleeing the persecutions of Arian Vandals." Encoding the bilingualism therefore touches a real historical question.这块石头的社会史价值: 古代晚期的塞利努斯正位于一处宗派分界线上。萨利纳斯博物馆展签转述 Rossius(后经 Salinas、Mommsen、Bivona 沿袭)的一个值得记住的假说:“这位执事(deacon)出现于一处既非主教座、又在罗马时期并不显赫的小地, 颇为蹊跷。Ausanius 这个名字(在非洲以外多作 Auxanius)曾被联系到非洲行省的主教座 Ausana。由此一说便有此推测:一批北非基督徒因逃避阿里乌斯派汪达尔人的迫害, 在塞利努斯定居下来。” 因此, 对这件铭文的双语编码所触及的, 是一桩真实的历史问题。
<expan>Α Ω is not an abbreviationΑ Ω 不是缩写
The Greek letters Α and Ω on the upper panel might look like they abbreviate Greek words. They don't. They are a theological symbol: «I am the Alpha and the Omega, the beginning and the end» (Rev 1:8). Encoding them with <expan> would falsely imply they're shorthand for spelled-out words.
上部面板上的两个希腊字母 Α 与 Ω, 乍看像是某词的缩写, 其实并不是。它们是一个神学符号:“我是阿尔法, 我是奥米伽, 是始, 是终”(《启示录》 1:8)。若用 <expan> 来编码它们, 便等于误称它们是某个被省略之词的缩写形。
<ab><lb n="1"/><w>ἀ</w> <w>ὠ</w></ab>
The religious meaning lives in <div type="commentary">. The encoding records only what the stone shows: two letters wrapped as words.
希腊字母 Α 与 Ω 看似缩写, 实非。它们是神学符号: “我是阿尔法, 我是俄梅伽, 是首先的, 是末后的”(《启示录》1:8)。若用 <expan> 会错误地暗示它们缩写了某词。神学意义放在 <div type="commentary"> 里; 编码只记录石上所见: 两个被包装为“词”的字母。
The Α / Ω formula in late-antique Christian epigraphy: drawn from Revelation 1:8 ("I am the Alpha and the Omega"), it stood for Christ as beginning and end. Carved as two solitary Greek letters around a Christogram, A and Ω are quotations, not abbreviations — there is no Latin word being shortened. Encoding them inside a <foreign> wrapper rather than <expan> records this distinction, which matters when a corpus query later asks "how many Sicilian inscriptions cite Revelation?"晚期基督教铭文中 Α / Ω 公式的来历: 出自《启示录》1:8 “我是阿尔法, 我是奥米伽”, 借以宣告基督为始与终。两个孤立的希腊字母围绕一个基督单字符号被刻出, 它们是引文, 不是缩写,没有任何拉丁文词被它们省略。在 XML 中, 它们装入 <foreign> 而非 <expan>,这一区分日后查询"西西里铭文中有多少件引用《启示录》"时就派上用场了。
<roleName> for the deacon用 <roleName> 标记执事
<roleName type="religious" subtype="diaconus"> <w>diaconus</w> </roleName>
<roleName> wraps functions, ranks, offices — anything someone does, as opposed to what they are named. @type distinguishes religious roles from civic (magistracy), military, or honorific. @subtype pinpoints the specific office.
A query like «every early Christian deacon attested in Sicily» becomes one SPARQL filter.
<roleName> 包装职能、品级、官职,一个人做什么而非叫什么。@type 区分宗教/市政/军事/荣誉; @subtype 精确到具体职位。“西西里早期所有有铭文证据的执事”即可一查而得。
"Diaconus" — a clerical ladder rung: in the 4th-5th-century Latin Church, a diaconus was the third rank, below bishop and presbyter. Encoding "Diaconus" as <roleName type="religious"> rather than as part of the name lets a prosopographical query distinguish deacons from priests from bishops, and lets the same XML feed a future map of clerical offices across late-Roman Sicily.“Diaconus” — 教阶中的一级: 在四、五世纪的拉丁教会中, diaconus(执事)是第三级, 居主教与司铎之下。在 XML 中把 "Diaconus" 装入 <roleName type="religious"> 而非视作名字的一部分, 便能让人物志检索得以区分执事、司铎、主教; 同一份 XML 日后也可用来绘制晚期罗马西西里的教会职务地图。
The middle is supplied, the ends are on the stone头尾在石上, 中间由编者补
Line 7 has Ian s: the Ian is on the stone, then a gap (the middle of Ianuarias is left out), then s is on the stone again. EpiDoc handles this with multiple <abbr> and <ex> children inside one <expan>:
第 7 行有 Ian s:石上刻 Ian, 中间空一段(Ianuarias 的中段未刻), 然后又刻 s。EpiDoc 把这种结构编为一个 <expan> 之内多个 <abbr> 与 <ex> 交替排列的样子:
<w><expan><abbr>Ian</abbr><ex>uaria</ex><abbr>s</abbr></expan></w>
The reader sees Ian(uaria)s, with parentheses only around the supplied middle. The structure says: stone, editor, stone — three children, alternating.
读者看到 Ian(uaria)s, 括号只包中间补出部分。结构含义: 石上、编者、石上,三个子元素交替排列。
Now build it — practice in the playground动手练习 · 在工坊里编码 Ausanius
A perfect case for the Multilingual editor — Latin + Greek on the same stone.本题正合多语编辑器之用,同石之上, 拉丁与希腊并存。
- 1Open Multilingual editor → "Blank template".打开多语编辑器, 选 "Blank template"。
- 2Add languages: Latin (
la) and Ancient Greek (grc) via "+ Add language".通过 "+ Add language" 添加 Latin (la) 与 Ancient Greek (grc)。 - 3In the edition pane, type a one-line Latin and a one-line Greek text.在排印版栏中输入一行拉丁文与一行希腊文。
- 4Build XML, confirm two
<div type="textpart">blocks with distinctxml:lang.生成 XML, 确认出现两个<div type="textpart">, 各自带不同的xml:lang。
Editions: NSc 1882, 333–334 · Mommsen, CIL X.2 (1883) 7201 · Bivona, Iscrizioni latine lapidarie del Museo di Palermo, Sikelia 5 (1970) 44 · Wilson, Sicily under the Roman Empire (1990) 319 n.19 · Manganaro, «Greco nei pagi e latino nelle città» (1993) 588–589 fig. 31. Editor: Jonathan Prag (I.Sicily, last rev. 6 Dec 2025).主要版本:NSc 1882, 333—334 · Mommsen《CIL》 X.2 (1883) 7201 · Bivona《巴勒莫博物馆拉丁铭文》(Sikelia 5, 1970) 44 · Wilson《罗马帝国治下的西西里》(1990) 319 注 19 · Manganaro〈Greco nei pagi e latino nelle città〉(1993) 588—589 图 31。本电子版编者:Jonathan Prag (I.Sicily, 末次修订 2025 年 12 月 6 日)。
08Christian Greek epitaph基督教希腊文断片
公元 4 世纪 · 卡塔尼亚 · 补字密度最高的练习
Marble plaque broken on the left, ten partially preserved lines of Greek. Mentions two deceased (one named Basilis) and a Christian curse against tomb-violators invoking the Pantokrator. Introduces cert="low", supraline numerals, and the apparatus criticus.
大理石板, 左侧残缺, 十行不完整希腊文。提两位死者 (一位名 Basilis), 含呼告 Pantokrator 的诅咒套语。引入 cert="low"、上划线数字、校勘记。
Before you encode, look at it four ways编码之前, 先从四面看它一遍
A fragmentary Christian epitaph. About half the inscription is supplied — perfect for studying how the editor declares uncertainty.一篇基督教希腊文残片, 约半数字符为编者所补,正适合学习编者如何明白宣称"我不确定"。
- Leiden: square brackets dominate — that's the visual signature of heavy restoration.Leiden 视图:方括号密布,大量补字的视觉特征。
- XML: every
<supplied>here also carries@cert="low"; the editor flags uncertainty per-element.XML 视图:此处每个<supplied>同时带@cert="low"; 编者就每一处单独宣称确信度。 - Web: hover any restoration — you see the editor's notes and confidence level.Web 视图:悬停任一补字, 即见编者的批注与确信度。
- Database: filter for
cert="low"— these are the rows you should treat as conjectural.Database 视图:可按cert="low"筛选,这些行皆为推测。
What survives, what's restored残存与补字
1. [ — ] τὸν κόσμον ἅπαντα Ε+ 2. [ — ]εως καὶ Βασιλίδος 3. [ἀμέμ]πτως καὶ ἀκαταγνώ- 4. [στως ἔζησ]αν μοι ἔτη ιδ̄ ἐν 5. [— ζήσ]ασα ἔτη κη̄ ὁρκί- 6. [ζω τὸν Παντο]κράτορα καὶ 7. [— τὸν μέλλ]οντα αἰῶνα 8. [— μη]δὲν ἀνῦξαι 9. [— ἐκεί]νων κατ[ά- 10. [θεσιν —]
Translation: «…all the world…of (name) and of Basilis. They lived perfectly and blamelessly with me fourteen years…lived twenty-eight years. I beseech in the name of the Almighty…for all eternity…let no one open [the tomb]…»
译:“……整个世界……(某人)与 Basilis 的。他们与我同居十四年, 全然无可指摘……享年二十八岁。我以全能者之名恳请……直至永远……勿使人启 [此墓]……”
Ages: ιδ̄ (14), κη̄ (28)年龄: 14 与 28
<num value="14"><hi rend="supraline">ιδ</hi></num> <num value="28"><hi rend="supraline">κη</hi></num>
The supraline (overline) is the standard Greek scribal convention to flag a letter functioning as a numeral. ι alone could be the word «you (plural)» or the numeral 10 — the overline disambiguates. EpiDoc records both layers: the visual mark with <hi rend="supraline">, the numeric meaning with @value.
字母上方的上横线(supraline)是希腊抄写者的标准惯例, 用以提示“此字此处作数字”。ι 单独出现, 可能是“你们”一词, 也可能是数字 10,上横线即用来消除这种歧义。EpiDoc 同时记录两层:视觉符号以 <hi rend="supraline"> 标出, 数字含义则由 @value 携带。
Notice the deliberate <space quantity="1" unit="character"/> surrounding each numeral on the stone — the engraver gave them breathing room as visual markers.
上划线是希腊文标示“此字母作数字用”的标准方式。ι 单独出现可能是“你们”或数字 10, 上划线消除歧义。EpiDoc 同时记录视觉标记 (<hi rend="supraline">) 与数值含义 (@value)。注意数字两侧的 <space>,刻工有意留出空白以作视觉强调。
<supplied> meets <unclear>Word-internal continuations单词跨行延续
Lines 3–4 carry the formula ἀμέμπτως καὶ ἀκαταγνώστως «blamelessly and irreproachably.» Survives only as …πτως καὶ ἀκαταγνώ-/στως. The encoding restores the lost letters and marks the line-internal break:
第 3、4 行有一句套语 ἀμέμπτως καὶ ἀκαταγνώστως(“无可指摘, 无可责备”)。残存形式只剩 …πτως καὶ ἀκαταγνώ-/στως。编码把佚失字母补回, 并标记词内的换行:
<lb n="3"/><w><supplied reason="lost">ἀμέμ</supplied> <unclear>π</unclear>τως</w> <w>καὶ</w> <w>ἀκαταγνώ <lb n="4" break="no"/><supplied reason="lost">στως</supplied></w>
The opening of ἀμέμπτως is supplied, the middle π is unclear, the end στως across line 4 is also supplied. <lb break="no"/> ties the two halves into one word.
ἀμέμπτως 词首由编者补出, 中间 π 标为存疑, 跨行的词尾 στως 亦由编者补出。<lb break="no"/> 把两半连缀为一词。
cert="low"When you're guessing, say so推测时, 明言之
<lb n="9"/><gap reason="lost" extent="unknown" unit="character"/> <w><supplied reason="lost" cert="low">ἐκεί</supplied>νων</w> <w>κατ<supplied reason="lost">ά</supplied> <lb n="10" break="no"/><supplied reason="lost" cert="low">θεσιν</supplied></w>
Lines 9–10 restore [ἐκεί]νων κατ[ά-/θεσιν]. Other scholars proposed alternatives (Ferrua suggested κατετέθη «was laid down»; Manganaro suggested κατάγειον μνημείον). The chosen reading goes in the edition with cert="low"; competing readings go in the apparatus.
第 9、10 行所补的 [ἐκεί]νων κατ[ά-/θεσιν] 并非唯一方案。Ferrua 提议 κατετέθη(“被葬下”), Manganaro 则提议 κατάγειον μνημείον(“地下纪念室”)。本电子版编者选取其一入释读栏, 并标 cert="low"; 其余诸说则置于校勘栏。
@cert is honesty made queryable. A statistical study can ask «what fraction of this corpus's supplements are speculative?» — and get an answer.@cert 把学术诚实变为可查询。统计研究可问“语料库中有多大比例的补字是推测性的?”并获得答案。
Where the scholarly conversation lives学术对话之所
<div type="apparatus" resp="#martaf">
<listApp>
<app loc="line 1"><note>Manganaro: ---[τ]ὸ<unclear>ν</unclear> κόσμον ἐ<unclear>ν</unclear>. After ε, Korhonen reads…</note></app>
<app loc="line 2"><note>Libertini, Ferrua, Korhonen: [βασιλ]έως καὶ βασιλίδος; Manganaro: [τύμβῳ (name)]εως καὶ βασιλίδος</note></app>
<app loc="lines 4-5"><note>Libertini: ἐν | [ζήσ]ασα; Ferrua: ἐν | [γάμῳ ἅπ]ασα; Manganaro: ἕν, (?) | [ἑτέρα πληρώσ]ασα</note></app>
</listApp>
</div>
Modern editions rarely show only one reading. The chosen reading goes in <div type="edition">; competing readings are recorded in <app> entries with @loc pointing to specific lines. The encoder's job is to make both layers machine-readable.
现代版本几乎不会只展示一种读法。所选读法置于 <div type="edition">; 其他读法记录在 <app> 中, 用 @loc 指向具体行。编码者的任务是让两层都机器可读。
The «μηδὲν ἀνῦξαι» formula“不要打开 (墓)”之套语
Lines 6–8 invoke the Pantokrator (Almighty) and threaten anyone who opens the tomb: «I beseech the Almighty…that no one open [it].» This is a well-known late-antique Christian feature, studied notably by Denis Feissel, BCH 104 (1980): 464–470.
Three reasons curses appeared in Christian epitaphs:
基督教墓志何以频频出现这种诅咒之辞, 大致有三层缘由:
- Theological. The body matters for the bodily resurrection; disturbing it carries eschatological consequences.
- Practical. Tomb reuse and grave-good theft were widespread; legal sanctions were weak.
- Cultural. The curse genre was inherited from pagan funerary practice; Christianity adapted rather than abolished it.
基督教墓志诅咒侵墓者, 古代晚期常见现象 (经典研究: Denis Feissel, 《BCH》104 (1980): 464–470)。三个动机: 神学 (身体复活之重要性)、实用 (盗墓常见而法律弱势)、文化 (沿用异教葬俗的诅咒传统)。
Now build it — practice in the playground动手练习 · 在工坊里编码 Christian Greek
Practice supplying restorations of varying certainty — cert="high" for confident moves, cert="low" for guesses.练习按不同确信度补字,把握之处用 cert="high", 推测之处用 cert="low"。
- 1Switch to Playground.切到试验场。
- 2Type
[Διο]νύσιος(a confident restoration of a common name).输入[Διο]νύσιος(常见名字的把握补字)。 - 3Now type
[υἱὸς ?]with a question mark — see how the engine handles uncertainty markers.再输入[υἱὸς ?](带问号), 看引擎如何处理不确定标记。 - 4Manually add
cert="low"to the second supplied — re-render and compare.手动给第二个补字加上cert="low", 重渲染并比对。
09Trajan's Arch图拉真拱门铭文
大莱普提斯 · 拉丁文营造铭 · 公元 109—110 年
A Roman senator's cursus honorum in three dense lines. The exercise drills the structural truth that nested <persName>, <roleName>, <orgName>, and <placeName> wrappers turn one block of prose into ~12 prosopographical facts a database can search.三行密集的罗马元老 cursus honorum(履历铭)。本题要把握的结构性事实是:嵌套使用 <persName>、<roleName>、<orgName>、<placeName> 等元素, 能把一块连贯的散文, 拆出大约十二条可被数据库检索的人物志事实。
Before you encode, look at it four ways编码之前, 先从四面看它一遍
A dense Roman cursus honorum — the career inscription of an imperial senator. Every line is a tightly packed list of offices, names, and dates.一篇密集的罗马 cursus honorum,帝国元老的官职履历铭。每行皆是官职、人名、年代的紧密堆叠。
- Leiden: the line-by-line layout mirrors the senator's career chronologically.Leiden 视图:逐行版式按元老履历的时间顺序排列。
- XML:
<persName>wraps the tria nomina;<roleName>wraps each office;<orgName>wraps each legion.XML 视图:<persName>包三段名;<roleName>包每一职务;<orgName>包每支军团。 - Database: each role and legion becomes its own row — this turns one inscription into ~12 prosopographical facts.Database 视图:每个职务、每支军团各成一行,一篇铭文变出十余条人物志数据。
- Web: one name has been erased — find it in the
<del rend="erasure">wrapper.Web 视图:其中一个名字被抹除,在<del rend="erasure">中找到它。
What is a Roman career inscription?罗马“履历铭”之意涵
A Roman senator's career was a ladder of public offices held in a more-or-less fixed sequence — quaestor, tribunus plebis, praetor, consul, plus optional priesthoods, military commands, and provincial governorships. An honorific inscription listing these offices in order was a cursus honorum, «course of honours.»
The inscription thus functions as a CV in stone. It is the principal source for Roman prosopography: who held what office when, and in what sequence.
罗马元老的仕途是一系列公职的阶梯, 大致按固定顺序晋升: 财务官 → 平民保民官 → 法务官 → 执政官, 另可兼任祭司、军事指挥与行省总督等。荣誉铭中按序排列这些官职即为“荣誉之路”。铭文实际上充当石上简历, 是罗马人物志研究的主要史料,谁、何时、按怎样的顺序担任何职。
Roman tria nomina罗马三段名
<persName type="attested" key="pir-p-0749"> <name type="praenomen" nymRef="Quintus"><expan><abbr>Q</abbr><ex>uintus</ex></expan></name> <name type="gentilicium" nymRef="Pomponius">Pomponius</name> <name type="cognomen" nymRef="Rufus">Rufus</name> </persName>
- praenomen — personal first name (Quintus, often abbreviated as Q.)
- gentilicium — clan name (Pomponius — the gens)
- cognomen — family-branch name (Rufus — the cognomen)
- praenomen(个人名),即位于最前的第一个名字(此处为 Quintus, 常缩为 Q.)。
- gentilicium(氏族名),表家族支系所属(Pomponius — gens 之名)。
- cognomen(家支别名),标明此人所属的家族分支(Rufus 即此人的 cognomen)。
@key="pir-p-0749" links to Prosopographia Imperii Romani P 0749 — the canonical prosopographical reference for senators of the imperial period.
praenomen 为个人之名 (Quintus, 常缩写为 Q.); gentilicium 为氏族名 (Pomponius); cognomen 为家族分支名 (Rufus)。@key="pir-p-0749" 连接到《罗马帝国人物志》P 0749 条目,帝国时期元老的规范参考。
This man, in prose: Quintus Pomponius Rufus — Reynolds' translation reads "Quintus Pomponius Rufus, consul, priest, member of the priestly college for cult of the Flavians, curator of public works, imperial legate with propraetorian powers in the provinces of Moesia, Dalmatia and Hispania, commander of Legion Five, prefect of the coasts of nearer Spain, and Gallia Narbonensis in the war which Emperor Galba fought for the [Republic], proconsul of the province of Africa, through the agency of Lucius Asinius Rufus, propraetorian legate [---]." Cross-referenced in the standard prosopography as PIR P 0749. The XML @ref on the outer <persName> points there, and a SPARQL query can pull every inscription mentioning this senator across the EpiDoc-encoded Mediterranean.这个人, 用散文来说: Quintus Pomponius Rufus,Reynolds 译为 “Quintus Pomponius Rufus, 执政官, 司祭, 弗拉维家族祭祀团成员, 公共工程总监, 莫西亚、达尔马提亚、西班牙诸行省具副执政官权之元首特使, 第五军团团长, 加尔巴元首为[共和国]而战时之近西班牙与高卢纳尔波内西斯沿海防务长官, 非洲行省代行总督, 经其副执政官特使 Lucius Asinius Rufus 之手 [---]。” 在标准人物志中, 他的条目编号为 PIR P 0749。外层 <persName> 的 @ref 即指此, 一条 SPARQL 查询便可把整个 EpiDoc 编码的地中海世界中所有提到他的铭文一网打尽。
cos for consulcos 缩写 consul
The Latin word consul was conventionally abbreviated as co·s (with an interpunct between, but readable as cos). The encoder treats this as a discontinuous abbreviation, just like Ian(uaria)s in Exercise 07:
拉丁文 consul 一词惯常缩写为 co·s(中间夹一间隔点, 但实际读作 cos)。编码视其为不连续缩写, 与练习 07 中的 Ian(uaria)s 同理:
<expan><abbr>co</abbr><ex>n</ex><abbr>s</abbr><ex>ul</ex></expan>
Reader sees co(n)s(ul): co on stone, n supplied, s on stone, ul supplied. The same pattern appears again for procos = proconsul.
拉丁词 consul 习惯缩写为 co·s。编码作间断式: co 石上, n 编者补, s 石上, ul 编者补。procos = proconsul 同此处理。
The Fifth Legion as <orgName>第五军团作为机构
<orgName type="military" ref="#legio5">
<w lemma="legio">
<expan><abbr>leg</abbr><ex>ionis</ex></expan>
</w>
<num value="5"> V</num>
</orgName>
<orgName type="military"> wraps the legion as an organisation. Inside, the word legionis is expanded, and the numeral V is given its @value="5". The @ref="#legio5" ties this attestation to a corpus-wide entity, so a query for «every inscription naming the Fifth Legion» retrieves all of them at once.
<orgName type="military"> 把军团作为机构包装。内部包含 legionis 的展开与数字 V。@ref="#legio5" 把此处归并到语料库范围的同一实体, 便于“所有提及第五军团的铭文”一次性检索。
Why "leg(atus) leg(ionis) V" matters: a senator commanded a single specific legion before he became proconsul. In this inscription Pomponius Rufus served as legatus (commander) of Legio V. Six legions in the Roman army carried the numeral V over time (Alaudae, Macedonica, Gallica, Urbana, Apollinaris, Iovia), and which one is meant matters historically — but the inscription is silent on the surname. Encoding "Legion Five" as <orgName ref="trismegistos:orgs/..."/> lets the corpus stay honest about the ambiguity while still attaching the unit to a queryable record.“leg(atus) leg(ionis) V”何以紧要: 元老成为行省总督前, 必先指挥过某一支具体的军团。本铭中, Pomponius Rufus 为第五军团之 legatus(团长)。罗马军中曾先后有六支军团带编号“V”(Alaudae、Macedonica、Gallica、Urbana、Apollinaris、Iovia), 究系何者, 史家意见分歧, 而铭文本身不具姓号。把“第五军团”编码为 <orgName ref="trismegistos:orgs/..."/>, 既诚实保留了这种含混, 又让这支军队能挂上一条可检索的记录。
A name a senator usually erased通常被元老抹去的名字
<persName type="emperor" ref="#galba"> <name type="cognomen" nymRef="Galba">G<supplied reason="lost">a</supplied>lba</name> </persName> <w lemma="pro">pro</w> <w lemma="republica"><supplied reason="lost">re<expan><abbr>p</abbr><ex>ublica</ex></expan></supplied></w> <w lemma="gero">gessit</w>
«…in the war which Emperor Galba waged for the [Republic].» Galba was the loser of the 68–69 CE civil war; most senators who had served him later erased the reference. Pomponius Rufus kept it — fifty years later, under Trajan. This is one of the only epigraphic attestations of senatorial service under Galba.
“……加尔巴元首为[共和国]而战之役。”加尔巴是公元 68—69 年内战的败者;凡曾在其麾下任职的元老, 多在后来悄悄抹去这段经历。Pomponius Rufus 偏偏没抹,五十年后, 在图拉真朝, 仍把这段履历刻在石上。这便成了关于加尔巴朝任职的极少数铭文见证之一。
Note the nested supplements + expansion on [re(p(ublica))]: the editor supplies re-, then inside the supplement an <expan> records that the rest is an abbreviation. Three-deep nesting; perfectly legal EpiDoc.
注意 [re(p(ublica))] 处嵌套的补字加展开: 编者补出 re-, 补字内部还含一个 <expan> 标记其余为缩写。三层嵌套, 完全合法的 EpiDoc。
The corpus as prosopographical database语料库作为人物志数据库
When every office is encoded as <w lemma="…"> + <expan> + optional <placeName>/<orgName>, the corpus answers:
一旦把每一职务都编码为 <w lemma="…"> + <expan>, 必要时再加 <placeName> 或 <orgName>, 整个语料库便能直接回答如下问题:
- Which senators held the consulship before becoming proconsul of Africa?
- Which provinces were governed by men who had previously commanded the Fifth Legion?
- Was the post of curator operum publicorum held earlier or later in a typical senatorial career?
- 有哪些元老在出任非洲行省总督之前, 已先任执政官?
- 有哪些行省的总督, 曾先指挥过第五军团?
- “公共工程总监”(curator operum publicorum)一职, 在元老的标准履历中通常位于早段还是晚段?
These are real historical questions. Without structural encoding, each requires hand-counting through thousands of inscriptions. With it, a SPARQL query returns the answer in seconds.
每一官职若编为 <w lemma="…"> + <expan> + 选附 <placeName>/<orgName>, 语料库即能回答: 阿非利加总督之前先任执政官者有谁? 由前第五军团长官治理之行省有哪些? 公共工程总监一职通常在仕途较早还是较晚阶段?,这些是真实的历史问题。无结构化编码, 每问皆需手工查阅; 有结构化编码, 一条 SPARQL 即得。
EpiDoc 把铭文从文学对象提升为关系数据。文字相同; 可回答的问题大为扩展。
The second man in the inscription: L. Asinius Rufus, propraetorian legate. Q. Pomponius Rufus governed Africa per — "through the agency of" — Lucius Asinius Rufus, his deputy. Asinius is likely PIR A 1248 (or 1250). So one inscription names two senators, in a chain of delegation. Encoded properly, this single block of stone produces: 2 persons, 2 PIR cross-refs, ~9 roles, ~5 provinces, 1 legion, 1 historical war (Galba's), 1 proconsulate-of-Africa, 1 building-event for an arch in Lepcis Magna — about fifteen prosopographical facts. That is the corpus-as-database moment.铭文中第二个出现的人:L. Asinius Rufus, 副执政官特使。 Q. Pomponius Rufus 治理非洲, 是 per,“经其手”,Lucius Asinius Rufus 之代行。Asinius 一人, 大概对应 PIR A 1248(或 1250)。也就是说, 此铭名两位元老, 一条委派之链。若编码到位, 一块石头便可产出:二人、二条 PIR 互引、约九种官职、约五处行省、一支军团、一场历史性战争(加尔巴之战)、一任非洲行省代行总督、一座大莱普提斯之拱门,约十五条人物志事实。这就是"语料库即数据库"的瞬间。
Now build it — practice in the playground动手练习 · 在工坊里编码 Trajan's Arch
Try the Document editor — use the form fields to record one role and one organization at a time.在文档编辑器中, 用表单字段, 一次记一职、一团。
- 1Open Document editor → "Blank template".打开文档编辑器, 选 "Blank template"。
- 2In the edition pane, type
Imp(eratori) Caes(ari) Nervae Traiano Aug(usto).在排印版栏中输入Imp(eratori) Caes(ari) Nervae Traiano Aug(usto)。 - 3Build XML. Observe how the engine generates several
<expan>wrappers in a chain.生成 XML, 看引擎连续生成多个<expan>。 - 4Edit by hand: wrap the whole imperial nomenclature in
<persName>with a Trismegistos@ref.手动:把整组元首名号包入一个<persName>, 加上 Trismegistos 的@ref。
Transmitted from Delaporte 1836, 7–8 → CIL VIII (1881) 13 → CIL VIII suppl. 4 (1916) 22670 → Romanelli 1940, 99 ff., figs. 10–11 → AE 1948.3 → ILS 1014 → Guey 1951 → AE 1952, 36 → IRT 1952 537 (Reynolds & Ward-Perkins) → IRT 2009 537 → EDH 019665. Archival photos: Ward-Perkins Archive, BSR (Sopr. CLM 930 & 932).流传脉络:Delaporte 1836, 7—8 → 《CIL》 VIII (1881) 13 → 《CIL》 VIII 补编 4 (1916) 22670 → Romanelli 1940, 99 ff., 图 10—11 → 《AE》 1948.3 → 《ILS》 1014 → Guey 1951 → 《AE》 1952, 36 → 《IRT 1952》 537 (Reynolds 与 Ward-Perkins) → 《IRT 2009》 537 → EDH 019665。档案照片:Ward-Perkins 档案, 英国罗马学院 (BSR), Sopr. CLM 930 与 932。
10Pytheas献给 Pytheas 的挽歌 · 毕业作品
阿弗洛狄西亚 · 公元 5 世纪晚期 · 哀歌四行体
The capstone. By now you have encoded names, dates, abbreviations, supplements, ethnic-nouns, verse, bilingualism, and a senatorial cursus. This text is shorter than IRT0537 but carries the full vocabulary of EpiDoc on a single piece of literature.
毕业作品。至此你已编码过人名、日期、缩写、补字、民族名、诗歌、双语、元老履历。本题虽短于 IRT0537, 但在一首文学作品上承载了 EpiDoc 的完整词汇。
Before you encode, look at it four ways编码之前, 先从四面看它一遍
The capstone exercise — a four-line Greek elegiac quatrain. Two couplets, hexameter alternating with pentameter, mourning a young man.毕业作品,一首四行希腊文哀歌四行体。两组对句, 六音步交替五音步, 为一位早逝青年所作。
- Leiden: read it aloud — the meter is doing emotional work the prose can't match.Leiden 视图:朗读一遍,格律本身在做情感的工作, 散文做不到。
- XML:
<lg type="elegiac">wraps the quatrain; each<l>carries@met(hexameter or pentameter).XML 视图:<lg type="elegiac">包整首; 每一<l>带@met(六音步或五音步)。 - Web: the lemmatized text lets you click each word and read its grammatical analysis.Web 视图:词形已标; 可点击每词, 阅其语法解析。
- Database: the workshop's richest single record — 60+ rows of person, place, date, lemma, meter, attestation.Database 视图:工坊最丰富的单条记录,60 余行, 含人、地、日期、词形、格律、出处。
Four lines for Pytheas四行献予 Pytheas
2. ἀλλ' ἔτι σῆς ψυχῆς ἀγλαὰ πάντα μένει, (pent)
3. ὅσσ' ἔλαχές τε φύσει, μῆτιν πανάριστε· (hex)
4. τῷ ῥα καὶ ἐς μακάρων νῆσον ἔβης, Πυθέα. (pent)
«Not even after death have you lost your fine reputation in the whole earth, but still all the splendid [achievements] of your soul remain — both those which you inherited, and those which you learnt, according to your nature, most excellent in intellect. So now, Pytheas, you have also gone to the Island of the Blest.»
“纵使身死, 你卓著之名遍传寰宇, 未曾湮灭; 你灵魂的一切光辉犹存,那些你天生秉受的, 那些你顺其本性所学得的; 你心智之卓越无人能及。Pytheas 啊, 于是你也步入了极乐之岛。”
Two couplets stacked: H/P/H/P两组对句叠合: 六/五/六/五
<lg met="elegaic"> <l n="1" met="hexameter">…</l> <l n="2" met="pentameter">…</l> <l n="3" met="hexameter">…</l> <l n="4" met="pentameter">…</l> </lg>
One outer <lg> wraps the whole quatrain. No nested <lg> for each couplet — the alternation of @met values implicitly marks the couplet structure.
一个外层 <lg> 包整个四行。不必为每组对句单独嵌套,@met 值的交替已隐含对句结构。
@reg on both levels两层皆带 @reg
<persName reg="Πύθεας" key="Pytheas" type="aphrodisian" full="yes"> <name reg="Πυθέας">Πυθέα</name> </persName>
The inscribed form Πυθέα is the vocative («Pytheas!»). Two regularisations:
石上所刻 Πυθέα 是呼格形式(“Pytheas 啊!”)。两层正字化:
<persName reg="…">— the canonical «person tag» as the corpus database represents this individual. Sometimes uses an idiosyncratic transliteration.<name reg="…">— the canonical «word tag», the dictionary nominative form of the Greek noun.
They can differ in spelling conventions; both serve canonicalisation but at different levels of abstraction.
二者在拼写约定上可能略有差异; 都是规范化, 但抽象层级不同。
两个 @reg 含义不同: <persName reg="…"> 是语料库中规范化的“人物标签”; <name reg="…"> 是规范化的“词汇标签”(即希腊文名词的主格)。两者拼写可能略有差异; 都做规范化, 但抽象层级不同。
What NOT to encode不必编码的部分
- No
<apparatus>— the text comes through manuscript transmission (the Greek Anthology), not from a surviving stone. No carving variants to record. - No
<gap>or<supplied>— the Byzantine scribe transmitted the text complete. Nothing to restore. - Minimal
<facsimile>— no photograph; no diplomatic transcription.
- 无
<apparatus>:此文是通过抄本(《希腊文选》)传下的, 并非出自现存的石头, 故无刻文异读可记。 - 无
<gap>与<supplied>:拜占庭抄写者把全篇完整传下, 无所遗失, 故无须补字。 - 极简
<facsimile>:没有照片, 也没有逐字转录。
EpiDoc encodes what is editorially relevant, not what could in principle be encoded. If you find yourself adding tags that don't say something true about the text or its transmission, delete them.
EpiDoc 只编码对编辑而言相关的, 不编码所有可编码的。若加的标签未陈述文本或流传的真相, 请删去之。
The cultural meaning of the choices编码选择背后的文化意义
- κλέος ἐσθλόν (line 1) — straight Homeric vocabulary. The «glory that survives death.» Pytheas placed alongside Achilles.
- μακάρων νῆσον (line 4) — the Homeric/Hesiodic «Isle of the Blest» reserved for heroes (Hesiod, Works and Days 168–173). A Christian writer would normally use Christian eschatology; choosing the Homeric afterlife is a cultural statement.
- μῆτιν πανάριστε (line 3) — μῆτις is Odysseus's defining virtue; πανάριστος is a Homeric superlative. Pytheas placed in the line of Odysseus.
- κλέος ἐσθλόν(第 1 行),道地的荷马用语, 即“死后不朽的荣名”。把 Pytheas 置于阿喀琉斯之列。
- μακάρων νῆσον(第 4 行),荷马与赫西俄德笔下专为英雄而设的“至福者之岛”(参《工作与时日》168—173)。基督徒作者通常用基督教终末论, 此处偏取荷马式来世, 是一种有意为之的文化姿态。
- μῆτιν πανάριστε(第 3 行),μῆτις(“机谋”)是奥德修斯的标志性德性; πανάριστος 又是荷马式的最高级。Pytheas 由此置于奥德修斯之列。
EpiDoc encodes the form via @met + @lemma; the cultural reading is the encoder's <commentary>. Both layers are part of the edition.
EpiDoc 通过 @met + @lemma 记录形式; 文化解读则在编者的 <commentary> 中呈现。两层都是版本的一部分。
Now build it — practice in the playground动手练习 · 在工坊里编码 Pytheas
Build a small elegiac couplet from scratch — the workshop's final test of your encoding muscle.从零编一组小哀歌对句,工坊最后的编码功夫之试。
- 1Open Document editor → "Blank template".打开文档编辑器, 选 "Blank template"。
- 2In the edition pane, type two lines of Greek verse — one hexameter, one pentameter.在排印版栏中输入两行希腊文,一行六音步, 一行五音步。
- 3Build XML. Hand-add
type="elegiac"to<lg>and@metto each<l>.生成 XML 后手动:为<lg>加type="elegiac", 为每一<l>加@met。 - 4Download. Submit. You are done.下载, 提交。完成。
Synthesis综合回顾
学到了什么 · 各部分如何连缀 · 下一步何往
What EpiDoc actually doesEpiDoc 到底在做什么
Separation of layers分层
What's on the stone, what the editor adds, and what the text means each live in distinct elements. One file, many views.石上所见、编者所补、文本所指, 各占独立元素。一份文件, 多种读法。
<abbr> · <supplied> · <persName>Structural truth, not visual layout结构真相, 不囿版式
A name that spans two lines is still one name. A discontinuous abbreviation is one expansion. Encoding reflects grammar and meaning, not what looks like it on the page.跨行的名字仍是一名; 间断的缩写仍是一展。编码所反映的是语法与语义的真相, 不是纸面上的样子。
<persName> spans <lb> · <expan> spans gapsQueryable encoding可检索之编码
Every name, place, office, date, and number carries machine-readable attributes — @ref, @key, @value, @dur. The corpus becomes a relational database that answers prosopographical and historical questions.每个人名、地名、官职、日期、数字都带机器可读属性,@ref、@key、@value、@dur。语料库由此化为可回答人物志与历史问题的关系数据库。
@ref → LGPN · Pleiades · godot.dateRead → Encode → Review读 → 编 → 互评
Read读
In four-views.html: see how the same XML becomes Leiden / Web / Database / XML simultaneously. Notice what each view privileges, and what only the source can show.在 four-views.html 中, 看同一份 XML 如何同时变成 Leiden / Web / Database / XML 四种视图。每一视图各重一面, 而源文件本身又见前三者所不见之处。
Encode编
In leiden-playground.html: type Leiden, watch XML emerge; or open the document/multilingual editor and build a complete inscription from the stub. The schema catches invalid TEI before you submit.在 leiden-playground.html 中:打莱顿, 看 XML 浮现; 或开文档/多语编辑器, 从骨架编出整篇铭文。提交前, 模式会先把无效的 TEI 挡住。
Review评
Trade submissions with a peer. Same exercise, fresh eyes. The reviewer's grade goes into their own portfolio — that keeps reviews honest.与同伴互换提交。同题异目, 一双新眼。审稿之质量也计入审稿者自己的成绩,这是让互评不流于敷衍的关键。
What you can do next毕业之后
Encode a new inscription编一篇新铭文
From your own photographs. Submit it to I.Sicily or another EpiDoc project as a proposed contribution.用自己拍的照片, 提交给 I.Sicily 或其他 EpiDoc 项目, 作为一份正式投稿。
Query the corpus查询语料库
Download the I.Sicily TM/EDR/EDH/EDCS cross-IDs and the IAph/IRT/IRCyr corpora; load into a triple-store and ask prosopographical questions.下载 I.Sicily 的 TM / EDR / EDH / EDCS 交互索引, 以及 IAph、IRT、IRCyr 等语料库; 载入三元组库后, 即可作人物志层面的检索。
Build a focused study做一项专题
A semester's worth of student encoding over ~200 Christian funerary inscriptions can produce publishable observations about late-antique formulae, the spread of depositus, or the geography of the Pantokrator curse.一个学期内由学生们合作编码约 200 件基督教墓志, 即可对古代晚期套语、depositus 一词的传播, 或 "Pantokrator 之诅咒" 的地理分布, 得出可发表的观察。
Contribute to EpiDoc为 EpiDoc 作贡献
The standard is maintained by an open community at epidoc.stoa.org; new elements and refinements are proposed on GitHub.EpiDoc 由一个开放社群在 epidoc.stoa.org 上维护; 新元素与规则改进皆在 GitHub 上提交。
All workshop texts are reproduced under Creative Commons Attribution licences.工坊所用全部文本依照 Creative Commons Attribution 许可使用。
Thank you谢谢
愿编码顺利。
© 2026 Wu Ching-Yuan 吴靖远 · part of Matrix Hub on magalia.wiki · 收录于 magalia.wiki 之 Matrix Hub
<teiHeader>is not the inscription①<teiHeader>不是铭文本身Everything above
<text>is metadata: who edited the file, under what licence, with what conventions, who funded it, when it was last revised. It is the “front matter” of every EpiDoc file. Exercise 02 (IRT0102) is specifically about reading this part.<text>之上的一切, 全是元数据:谁编辑了这份文件、采用什么许可、采用哪些规约、由谁资助、上次修订是哪天。它好比一份 EpiDoc 文件的“卷首”。练习 02(IRT0102)的全部要点, 就是怎么读懂这一层。