Paolo Monella My talk and hackathon materials for workshop Die (hyper-)diplomatische Transkription und ihre Erkenntnispotentiale

Italiano English

ALIM Orso-LiLa 2020



This page includes materials relative to

at the workshop Die (hyper-)diplomatische Transkription und ihre Erkenntnispotentiale organized by Frederike Neuber (BBAW) and Patrick Sahle (BUW) at the Bergische Universität Wuppertal (BUW), February 6-7, 2020, Room I 13.41 (campus map).

Questa pagina incldue i materiali relativi

al workshop Die (hyper-)diplomatische Transkription und ihre Erkenntnispotentiale organizzato da Frederike Neuber (BBAW) e Patrick Sahle (BUW) presso la Bergische Universität Wuppertal (BUW), 6-7 febbraio 2020, stanza I 13.41 (mappa del campus).





  • Slides for talk Slide del contributo An ontology for digital graphematics and philology:
  • Audio recording of the talk (in English). In some parts (minutes 00:00-04:30, 06:00-06:45, 12:00-12:30) the volume of the audio is low
  • Registrazione audio dell'intervento al convegno (in inglese). In alcune parti, il volume dell'audio è basso (specificamente, nei minuti 00:00-04:30, 06:00-06:45, 12:00-12:30)


  1. Slides for the pitch talk on the data provided for the hackathon of February 7: Slide della breve introduzione ai dati portati per lo hackathon del 7 febbraio: PDF, ODP
  2. Ursus edition
    1. Visualization page
    2. Ursus GitHub repository. Relevant files:
      1. casanatensis.xml (TEI manuscript transcription before lemmatization)
      2. lemmatized_casanatensis.xml (TEI manuscript transcription, lemmatized)
      3. GToS.csv (Graphematic table of signs)
      4. jsparser.js (JS script to parse and visualize the transcription based on the XML files and the GToS)

Talk abstract

The euristic potential of hyperdiplomatic transcription can be fully developed only if the latter is based on a well-designed and shared data model.

To make only one example, algorithms for statistics or structured queries should consume data in which the distinction between objects such as "grapheme" and "alphabeme" is clearly modelled, since in Latin "vi" we have two graphemes that may represent the corresponding alphabemes ("vi", "with force"), or may represent a number. Similar issues arises, for instance, when dealing with transliterations, abbreviations, non-alphabetic (e.g. alphasyllabic) scripts. Thus, the modelling of concepts such as glyph, allograph, grapheme, grapheme type (alphabetic, diacritic, punctuation etc.), alphabeme, abbreviation, phoneme, word impacts data, algorithms and their euristic value alike.

Since TEI aims to be theory-agnostic, it does not provide a shared model for the concepts listed above (glyph, allograph etc.), but instead suggests a Unicode-based pragmatic approach to the representation of ancient ("non-standard") graphemes. As a consequence, each project produces transcriptional data following its own editorial conventions and, therefore, based on its own data model. The consequent reduction of interoperability is dramatic if we want to step up from mere data visualization for human reading to machine analysis.

Ontologies are one of the best current DH practices to formalize a data model, and Linked Open Data are the best way to share those formalizations as open, reusable and modular objects. In my talk, I will discuss my data model for digital graphematics, oriented towards digital transcription and editing of pre-modern handwritten textual sources. My current project at the Venice Centre for Digital and Public Humanities aims to formalizing that model into a LOD ontology and to implementing it in classes of an object-oriented programming language.

My use case for the hackathon will be my edition of Ursus Beneventanus, built upon that model (