Week 13 · Formulaic Language: the Premise of Prediction

§I — Why formulaic? 为何说公式化？

Pre-modern public-facing inscriptions are conventional speech-acts. Each move in the document is named by a formula whose form is determined by genre, not by the unique content of the moment. The decree begins ἔδοξεν τῇ βουλῇ καὶ τῷ δήμῳ; the Roman SC begins Quod consules verba fecerunt; the laudatio funebris begins with uxor optima; the tribute-list opens with τάδε τοῖς ἑλληνοταμίαις παρέδοσαν. Predictability is a property of the genre, not of the stone. Restoration is therefore re-derivation, not invention. 前现代之公开铭刻乃约定俗成之言语行动。文中每一步皆以公式命名，其形由体例所定，非由某时某事所独有。决议起首必曰 ἔδοξεν τῇ βουλῇ καὶ τῷ δήμῳ；罗马元老院决议必曰 Quod consules verba fecerunt；丧葬颂辞必曰 uxor optima；纳贡清单必曰 τάδε τοῖς ἑλληνοταμίαις παρέδοσαν。可预测性属体例，不属石。故修复乃再度推导，非凭空创造。

The same slot-position across four cases

同一槽位 · 四例对比

Each cell below shows the opening identification-slot of one case. The slot is genre-supplied; the filling is the philological residue.

下表每格示一案例之起首识别槽位。槽位由体例供给，填充则为考释之残余。

Case A · Laudatio · slot 4 (virtue-catalogue) [Quid de p]ietate, mode[stia, comitate], iucundissima [aequitate], sed[ula lanificio], reli[gione sine] superstit[ione], orna[tu non conspi]ciendo, cu[ltu modico]? F.VIRTUE_CATALOGUE_MATRONA parallel pool: 5 / 5 — every laudatio mulierum in Mantzilas 2017 carries this slot in canonical order

Case B · Segesta · slot 1 (prescript) ἔδοχσεν τει βολει καὶ τοι δέμοι · Ἀκαμαντὶς ἐπρυτάνευε · [Χαρ]ίας ἐγραμμάτευε · Τιμόνοθος ἐπεστάτε · [Hά/Ἀν]τι̣φ̣ον ἤρχ̣ε F.PRESCRIPT_ATTIC_5C parallel pool: ~ 350 attested 5th-c. BCE Athenian decree prescripts (per Henry 1977)

Case C · ATL · slot 1 (year-prescript) [θεοί · τάδε τοῖς ἑλληνοταμίαις παρέδοσαν, hοῖς] {ΣECRETARY} ἐγραμμάτευεν · [ἀπαρχέν τὲν θ̣εὸν τὲν Ἀθεναίαν] F.YEAR_PRESCRIPT_APARCHE parallel pool: 39 / 39 — every annual quota-list reuses this slot; only the secretary-name and year change

Case D · Persicus · slot 1 (proconsular preamble) [Paullus Fabius Persicus, proconsul provinciae Asiae, dicit] · Παῦλλος Φάβιος Περσικός · ἀνθύπατος · [δοκιμάζω καὶ κελεύω καὶ προστάσσω] F.PROCONSULAR_BILINGUAL_PREAMBLE parallel pool: 6 in-folder proconsular-Ephesus edicts (Persicus + 5 siblings, 27 BCE → AD 138)

The provocation: if the formula is the gap-filler, what work is left for the philologist? Week 14 will re-pose this question probabilistically (the Ithaca neural model). 设问：若公式即填空之钥，考释学者尚有何余事？第 14 周将以概率视角（Ithaca 神经网络模型）再问此问。

§II — Closed-vocabulary anatomy 封闭词汇之解剖

The Formula Dossier's bank holds 58 closed-vocabulary slots tagged across the 6,617 normalised decrees in the matrix-hub corpus. Each formula is a slot-name, not a fixed string — the slot has a typical Latin or Greek realisation but it can vary by genre, period, region. Hover a case-letter pill below to see which slots that case instantiates; click a formula node to see the slot's parallel-pool size. 公式集成库共有 58 个封闭词汇槽位，标注于矩阵中心语料库之 6,617 篇规范化决议。每个公式皆为槽位名，非固定字串 —— 槽位有其典型希拉文实现，然可随体例、时期、地域变化。下方悬停案例字母可观该案所占用之槽位；点击公式节点可查该槽位之并行池规模。

Show: Highlight case:

Latin abbreviation lookup (with 8 Laudatio pins)

拉丁缩写查询（含八个图利娅颂辞要词）

D.M. S.P.Q.R. uxor optima l. m. vix. ann. opt. posuit pietas

Type or click a pin above. Dictionary harvested from EDCS expansions (SDAM ETL).

§III — The corpus at scale 语料库之规模

EDCS — the Clauss/Slaby Epigraphik-Datenbank at Frankfurt-Eichstätt-Zurich — currently holds 537,262 Latin inscriptions in its 2022-09 snapshot. The four cases sit at very different points of this distribution: Laudatio in Roma (top province by count); Persicus in Asia/Ephesus (under-represented in EDCS, which under-counts Greek-language inscriptions). The same logic applies in reverse to PHI for Greek epigraphy. The formula-driven restoration succeeds because the parallel pool is dense exactly where the document being restored sits in the corpus. EDCS（即 Clauss-Slaby 之拉丁铭文数据库，今由法兰克福-艾希施泰特-苏黎世三处合管）截至 2022 年 9 月版本，共收录 537,262 篇拉丁铭文。本课四例分置此分布之极不同处：图利娅颂辞处於 Roma 一省（数量最多）；佩耳西库斯则位於亚细亚以弗所（EDCS 对希腊语铭文统计偏低，故此处呈现稀疏）。希腊铭文之 PHI 库亦然，惟方向相反。基於公式之修复之所以可行，正因待修文献於语料库中所处位置，并行池密集足以支撑。

Kaše et al. 2022 · published-figure paraphrase (declared-from-PDF mode). Their Fig. 1 maps occupation-density across the western Roman cities; Fig. 4 shows the inscription-driven occupation-frequency distribution. Per the paper's own caption: 'an accumulation of tertiary sector occupations in large cities'. The Laudatio's Augustan-Rome setting sits in the upper-percentile band of inscription-density per Kaše's metric — its dedicator was a senatorial-class male in Rome, exactly the population EDCS over-represents. 该文图一映射西部罗马城市之职业密度，图四呈现以铭文为据之职业频率分布。原文按其图注谓：「大城市中第三产业职业之累积」。图利娅颂辞之奥古斯都时期罗马背景，恰处於 Kaše 指标中铭文密度之高百分位带 —— 其撰者乃罗马元老阶层之男性，正是 EDCS 过度反映之人群。

Rendered from the figures the paper publishes; no values are inferred (declared-from-PDF mode per WK13-BUILD-PLAN.md §7).

ATL stoichedon overlay — case-C-specific row-restoration

ATL 按列示意 · C 案专属之按行修复演示

One column of an ATL quota-list. Three squares in row 3 are blanked. Click reveal to see Meritt vol II's restoration logic. ATL 份税清单之一列。第 3 行有三格缺失。点击「揭示」可见 Meritt 第二卷之修复推理。

ϜΥ

𐅄

Surviving letters: HEPM...Σ. Row position: in the alphabet-block for cities beginning with H/Ε. Quota: 60 dr (assessment = 1 talent). Meritt vol II (ATL II, pp. ~80–90) argues for HEPM[ION]EΣ = Hermioneis (Argolis) on three grounds: row-correspondence with the year-1 and year-6 lists; the alphabet-block fit; the quota-amount consistency. The restoration was independently confirmed in 1953 (ATL IV) and again by Paarmann 2007. 残存字母 HEPM...Σ。行位：以 H/Ε 起首之城邦字母段。贡额：60 德拉马克（即一塔兰特）。Meritt 第二卷（pp. ~80–90）据三由证之：(1) 与第一年、第六年清单按行对应；(2) 字母段相符；(3) 贡额与登记一致。一九五三年 ATL 卷四再证之；Paarmann 2007 再确之。今订作 HEPM[ION]EΣ = Hermioneis（阿尔戈利斯之 Hermione 邦）。

§IV — Four worked restorations 四例修复实演

Each tab demonstrates a different mechanism of formula-driven prediction: left-margin restoration (A); chronology crux (B); stoichedon row-restoration (C); bilingual proconsular bridging (D). Click any slot-button below the inscription to reveal the conventional filling, and read the side-narrator for what corpus signal predicts it. 下方四标签各演示一种公式驱动之不同修复机制：左缘修复（A）、纪年争议（B）、按列按行修复（C）、双语总督桥接（D）。点击铭文下之槽位按钮，可揭示其按体例之填充，并见侧栏所述何种语料信号据以预测。

§V — Read Kaše 2022 with us 与 Kaše 2022 同读

Three verbatim quotations from Kaše, Heřmánková & Sobotková (2022), Division of labor, specialization and diversity in the ancient Roman cities: a quantitative approach to Latin epigraphy, PLOS ONE 17(6): e0269869. The paper is the current methodology benchmark for EDCS-driven quantitative Latin epigraphy; the second quote below is the field's own confirmation of the Week 13 thesis. 下引 Kaše、Heřmánková、Sobotková 三氏 2022 年文（《古代罗马城市之分工、专门化与多样性：以拉丁铭文为据之量化途径》，PLOS ONE 17(6): e0269869）逐字三则。该文为当今基於 EDCS 之拉丁铭文量化研究之方法论基准；第二则乃本领域对本课论点之自承印证。

The study is based on a recently published dataset of Latin inscriptions of the Roman Empire (LIRE, N = 136,190). LIRE is an aggregate of inscriptions from two public epigraphic databases: Epigrafik Datenbank Clauss-Slaby (EDCS, N = 500,618) and Epigraphic Database Heidelberg (EDH, N = 81,476). However, LIRE contains only those inscriptions from EDCS and EDH that satisfy the following criteria: (1) records contain valid geospatial coordinates, (2) coordinates fall within the boundaries of the Roman Empire at its largest extent in 117 CE, (3) metadata contain the most plausible date of creation, and (4) date of an inscription intersects with the timespan of the Roman Empire (arbitrarily set to 50 BCE through 350 CE). Kaše, Heřmánková & Sobotková 2022 · § Epigraphic dataset · pp. 4–5
Establishes the corpus scale: EDCS = 500,618; EDH = 81,476; LIRE intersection = 136,190 after spatio-temporal filtering. Our own EDCS snapshot used for §III above has 537,262 records as of 2022-09 (a slight increase since Kaše et al. drew from EDCS). 奠定语料库规模：EDCS = 500,618 条；EDH = 81,476 条；LIRE 交集（经时空筛后）= 136,190 条。第三节所用之 EDCS 快照（2022-09 版）有 537,262 条，略多於 Kaše 氏等所据之 EDCS 数。

Latin is a morphologically rich language. In such a case, many common computational text analysis methods require the language data to be morphologically preprocessed, ideally representing individual words in their dictionary-like (lemmatized) form, otherwise repeated occurrences of the same word cannot be detected. However, considering the fact that the language of inscriptions is highly formulaic, with missing sentence division, and full of alternative spellings and inconsistent word-order in compounds, the standard machine-learning models for lemmatization (pre-trained on Latin literary texts) do not perform well. Kaše, Heřmánková & Sobotková 2022 · § Discussion · pp. 18–19
The field's own confirmation of Week 13's thesis. Kaše et al. EXPLICITLY note that 'the language of inscriptions is highly formulaic'. Their methodological consequence is the inverse of ours: where they have to fall back from neural lemmatization to manual + rule-based detection, we use the SAME formulaic-ness as the affordance that makes left-margin / chronology-crux / row-restoration tractable. 本领域对本课论点之自承印证。Kaše 氏等明言「铭文之语言高度公式化」。其方法论结果与吾辈相反：彼等须自神经词形还原退而手动加规则检测；吾辈则正以同一公式化性，作为可解之凭借 —— 左缘可补、纪年可定、按行可修，皆赖此。

Our results are based on a series of inferences and proxy data ranging from archeology-derived population estimates to fragmentary propaganda texts rather than systematic samples or Bureau of Labor census data… We measure the reality that has percolated into epigraphic evidence, survived, was documented, digitized, and passed our quality checks. Kaše, Heřmánková & Sobotková 2022 · § Discussion · p. 20
The philological hedge. Formula-driven restoration succeeds only against the corpus that survived. The unsurvived population of inscriptions is the dark matter of epigraphy. Both Kaše et al.'s quantitative method and our pedagogical method share this honest limit. 谨慎之省思。公式驱动之修复，仅能针对已存语料施行。未存之铭文人口，乃铭文学之暗物质。Kaše 氏量化法与本课教学法，皆共担此自觉之限。

Further reading: SDAM · the JDH 2021 paper companion page; the Formula Dossier (54 formulae × 6,617 decrees); the SDAM databases atlas. 延伸阅读：SDAM · JDH 2021 论文导览页；公式集成（54 公式 × 6,617 决议）；SDAM 铭文数据库地图。

Hands-on next

实务下一步

Open the Restoration Workbench and try the 6 exercises; then continue to Week 14 to see Ithaca (DeepMind's deep neural model) attempt the same task probabilistically. 打开修复工坊试作六题；然後进入第 14 周，观 Ithaca（DeepMind 之深度神经模型）以概率视角解同一问题。

→ Open the Restoration Workbench → The Formula Dossier → Governance Corpus (49 editions)

Formulaic Language: the Premise of Prediction 公式化语言：预测的前提

Laudatio Turiae

图利娅颂辞

Segesta decree

塞格斯塔决议

Athenian Tribute Lists

雅典纳贡清单

Fabius Persicus on Artemision

佩耳西库斯阿耳忒弥斯庙政令