Paolo Monella My talk at the TEI Conference, Rome 2013


General information

I'm publishing here the materials pertaining to the talk that I'm giving at the TEI Conference 2013, Rome. The talk's title is A Saussurean approach to graphemes declaration in charDecl for manuscripts encoding.


This is the slideshow that I used at TEI 2013 on October 3, 2013.


The current approach of TEI to the issue of graphemes encoding consists in recommending to use the Unicode standard. This is sufficient, on the practical side, when we encode printed documents based on post-Gutenberg writing systems, whose set of graphic signs (graphemes, diacritics, punctuation etc.) can be considered standard and implicitly assumed as known.

However, each historical textual document like a medieval manuscript or an ancient inscription features a specific writing system, different from the standard emerged after the invention of print.

This implies that the TEI 'Unicode-compliance' principle is not sufficient to define graphemes in pre-print writing systems. Let us assume that manuscript A has two distinct graphems 'u' and 'v', while manuscript B has only one 'u' grapheme. If we identified both the 'u' of the first manuscript and the 'u' of the second manuscript with the same Unicode codepoint (U+0075), our encoding would imply that they are the same grapheme, while they are not. Each of them, instead, is defined contrastively by the net of relations in the context of its own writing system, and the net of contrastive relations of manuscript A is different from that of manuscript B, as the latter does not have a 'u/v' distinction. This is even more evident with other graphic signs such as punctuation, whose expression (shape) and content (value) varied enormously through time.

This is why Tito Orlandi (2010) suggests to declare and define formally, for each document edited (e.g. manuscript), each graphic sign that the encoder decides to distinguish, identify and encode in his or her digital edition. The natural place for this description seems to be charDesc element within the TEI Header.

However, a specific technical issue arises, that I shall discuss in this paper: the TEI gaiji module only allows for a description of 'non-standard characters', i.e. graphemes and other signs not included in Unicode. To my knowledge, there is currently no formal way in TEI to declare the specific set of 'simple' Unicode characters used in a digital edition and to define the specific value of the corresponding graphemes in the ancient document's writing system.

This is due to the current TEI general approach to the encoding of 'characters'. The TEI Guidelines currently suggest that encoders define as few 'characters' as possible, while I am suggesting that they should declare and define all encoded signs.

Possible solutions to this specific issue will be examined in this paper. I shall discuss possible changes to the TEI schema to allow for Unicode characters to be re-defined in the specific context of TEI transcriptions of ancient textual sources. Finally, I shall suggest how this might change the general approach towards the issue of graphemes encoding in the TEI Guidelines. I think that, at least in the case of the encoding of ancient documents, it should be recommended that all graphic signs identified, and not only 'non-standard' ones, be formally declared and defined.