Paolo Monella Post-doc scholarship in Digital Humanities Accademia dei Lincei, Rome 2012

Cologne Dialogue on Digital Humanities 2012: my notes

These are the (unedited) notes I took at the Cologne Dialogue on Digital Humanities 2012, 23-24 April 2012.

Table of contents

Check the Conference schedule for details

April 23rd, 2012

What are the Digital Humanities?

Manfred Thaller, "Setting the Agenda"


A definition of eScience: Science supported by a digital environment.

A definition of eHumanities: Humanities supported by a digital environment.

Infrastructures for the Humanities. We have lost the specificity of e-infrastructures in the Humanities, in the last years.

Sub-fields (eHistory, ePhilology etc.) are not well aware of each other. Do they have something in common?

1949 - ca. 1970:
Ad hoc programming in the context of large, funded projects.
Medium: Higher Programming Languages
Ca. 1970 - ca. 1985:
Using method oriented programming packages... [I didn't pick the rest]

The reaction of practitioners to new technologies.

E. g. 1986 Austria: someone said that personal computers were only useful for printing nice letters.

We are at the beginning of a new cycle: ebooks (largely ignored by Humanities practitioners).

Today Humanities, DH and Computer Science are not understanding each other. This is no progress.


Digital critical editions should be actually better (not only sexier) than printed ones [Interesting].

XML: MS Word has converted to it. It is the standard today.

DH today have no formal model of what a text is [Interesting].

DHers often do not understand Information Technology.

Digital persistence has not been taken into account in the construction of today's DH resources.

40% of MSS texts is abbreviations. However, the 'traditional' scholarly work of the last 40-50 years on editing MSS (before DH) has been forgotten by those who edit MSS digitally [Interesting].

1. Infrastructure

10:00 – 11:30 Controversy 1: Do the Digital Humanities have an intellectual agenda or do they constitute an infrastructure?


Willard McCarty, "The residue of uniqueness"


My answer is: agenda before infrastructure.

The DH : artefacts of human culture = Humanities : the human.

THe DH research the artefacts of human culture, just like the Humanities research the human (i. e. what makes homo sapiens different).

What makes humans different from animals? (The residue of uniqueness)

Is the idea of humanity threatened by computers?

Turing 1936's paper (metaphor for his machine) and 1950's paper. Human intelligence and artificial intelligence.

After Turing we conceive our brain as a computer and new computers as a brain.

Cybernetics reproduces the human.

Warren Weaver's paper "The mathematics of communication": human communication oas a maths algorythm. Where is the human specificity?

The problem of what is specific to the DH is a form of the very old problem: what is specific to human?

The computer offers a recursive model of human artefacts.

DH are about modelling what is human: this is the agenda.

We DHers must set our own agenda, not letting the others to it and telling us what to do.

Computers must not merely be servants to something else (this issue was already in 1962 conference).

Infrastructure is like ancient Egyptian pyramid. The Humanities are more "Ancient Greek" under this respect: autonomy. "Our disciplines are islands in need of boats".

R. Busa said on his Index Thomisticus that a project needs an institution supporting durabiliy (Egyptian thought).

McCarty: DH must not become merely infrastructural.

Discussion on Controversy 1

Q: We should be more positive about the new horizons opened by the DH: we're having fun!

A: Mechanization of the DH is OK, but not industrialization.

Q: What do the DH replace in the Humanities as opposed to 50 years ago?

A: I don't know.

Q: Infrastructure is not negative (think of museums, libraries etc.)

A: Infrastructure is good. It's a matter of what comes first: agenda must come before infrastructure. If you have libraries perfectly working and no scholars and students, it's an absurdity.

Q: The incompleteness theoreme implies that we won't ever be able to build a completely digital system without the human. Question: what is the counterthesis to DH? Non-digital humanities or digital sciences (non-humanities)?

A: The matter is what the DH can give back to the Humanities in change from what the DH get from the Humanities.

Dino Buzzetti: I speak on behalf of the Italian DH community. There is a decision that the whole DH community has to take. Funds today are devoted mainly to building infrastructures, not to fostering thinking on the agenda. I think that thinking should be promoted too. In Italy we have DH courses everywhere, but no formal recognition of the discipline as such (as Tito Orlandi knows and has fought for). In the Anglo-saxon world service centres (like King's CCH) were developed before the very recognition of the dicsipline. Now in Italy we are very much in need of infrastructures too. What we really need in Italy is a convergence between infrastructure and thinking.

A: I'm not head of the department because I'm not good at infrastructure, and it's rightly so (heads need to attract funding flows). My point is: let's be aware of what we DHers are for. I've been given the bless of liberty to spend my time caring about thinking.

Jan Christoph Meister: I'll play the advocatus diaboli. The extant tools have showed that humans think irrationally. That machines can't think. Give me a rational argument (not a believe) why we want to carry on a humane project, provided that machines seem to be better at their work than humans.

A: All Humanities (and DH) are based on a desire. I can't make a rational case. A desire for a future not of irrationality in a bad sense but of a-rationality (variety).

Counter-reply (by Jan Christoph Meister): I agree completely, except that we must point out that ours is merely an ethical declaration of allegiance to the "Humane project".

Q: Textual scholarship (German: Philologie) is based on the tools created in ancient Alexandria. Are tools and infrastructure really separable from humans?

A: Sure, tools are part of human activity in themselves. But we have a history of separating them, as we often build tools without thinking of what they are for.

Q: I'm a computer scientist. Maybe we should reposition the Humanities in our digital era.

A: This re-position is the whole point. This is why we should look very attentively at Computer Sciences and Artificial Intelligence in particular. Turing's scheme can generate indefinitely many types of computing other than the computer as we know it.

Counter-reply: But if this is the case, the opposition methodology vs. infrastructure would disappear.

A: "That would be good".

Q: We have mass books digitization and oriented digitization for Humanities specialists. In the second case Humanities and Computer Sciences experts is more important.

A: The Perseus Project has found out that "pre-cooked ideas" of what will interest scholars and what will interest the "general public" prove false. Often the general public also looks at specialistic tools ("scholarly devices"). I think that the division between general public and scholarly digital objects should not be made.

Tito Orlandi: I won't put a question, but say how I would have tackled the issue. I'd have taken it from another side. At the core of our problem there is the Turing machine. Do we understand what is a Turing machine, and what it is to us, Humanities scholars? We should not follow Turing's line of research of his last years, by asking ouselves what is the relation between the Turing machine and our mind. The point is that computers only work by the basic principles of the Turing machine. Either one agrees with this assumption or not. The Computer Sciences are simply a way of seeing things, a way to describe things, a formalized way. So the Humanities need infrostructure able to formalize things. Infrastructure should be thought of in a more general sense: XML is infrastructure, although it cannot be seen or touched. Unix is the real background of any useful infrastructure for the Humanities. This is the case because Unix is the closest existing thing to the Turing machine. What is the best infrastructure for Humanities? SGML. Nobody today speaks of SGML anymore. The wonderful idea behind it is that it's totally formal. It does not impose any kind of tag on the data itself. It only says how you can impose the structure upon the data. Then any humanist can decide the tagging to impose upon that. All the rest (including XML, TEI) impose a pre-fixed structure upon the researcher. The researcher must be completely free (as it is with SGML, because it is totally formal).

Q: Did you (McCarty) emphasize the 'human' over infrastructure because you are skeptical on the fact that the technical can capture the human?

A: We can keep trying.

Henry M. Gladney (to Tito Orlandi): We are now beyond the Turing machine. Also, XML is just SGML improved. Third, communication is also important.

Dino Buzzetti (to Tito Orlandi): XML is no formal data model in the strict sense.

Manfred Thaller (to Tito Orlandi): XML is what is left of SGML once you took off all good things, really useful to the Humanities.

2. Approaches towards interdisciplinary research

12: 00 – 13:30 Controversy 2: Are all approaches towards interdisciplinary research between the Humanities and Computer Science meaningfully represented by the current concept of Digital Humanities?


Susan Schreibman: "Digital Humanities: Centres and Peripheries"

What do the DH have in common?

The Companion to the Digital Humanities was published in 2005. At the time there were few books addressing the issue.

It is now time for a new Companion (a 2nd edition): this is an announcement.

Starting to work towards the Companion, in 2000 they established an advisory board.

A formerly competing institution decided not to publish a competing Companion, but to join forces and publish one book.

The structure of the Companion to the Digital Humanities.

In 2001 the distinction between DH and New Media Studies seemed more clear. At MLA 2001 there were two Groups: Computers and the Humanities and New Media.

2001: John Unsworth's label "DH" wins over "Humanities Computing".

2003: DH conferences titles tend to focus on keywords like "new" or "corpus".

Around 2001-2003 the Companion of the Digital Humanities was USA-based. Then it became more international in contributors.

In section II of the Companion fo the Digital Humanities (on principles), 4 out of 7 chapters deal about text markup and analysis.

The Companion fo the Digital Humanities is a snapshot of DH methods as of 2001-2002. It was published in December 2004.

Section III of the Companion fo the Digital Humanities is about "applications". Here theory turns into practice. But: is it practice that defines us?

A chapter in section III draws a distinction between resources created by librarians and by scholars.

Section IV of the Companion fo the Digital Humanities is about "Production, dissemination, archiving".

2012: DH titles have changed. "DH", "research", "analysis" are predominant. "Text" decreases (thouth it stays important). "English" decreases, "German, Chinese" increases. "Language" and "analysis" grow.

The distinction between DH and New Media Studies is now blurred.

The publisher is not only thinking of a new edition of the Companion fo the Digital Humanities, but also of a way to keep it going instead of creating a static (digital) publication. They'll make it without an advisory board.

Domenico Fiormonte, "Towards a Cultural Critique of the Digital Humanities"


Story: a guy from a 'savage' village emigrates to a 'modern' city, gets a degree in Anthropologist, then goes back to his own village and studies it. This is how I feel today, watching myself and the DH from the outside and from the inside, as I study today both DH and Sociology of the DH.

Is there such a thing as "Anglo-american DH"? If so, what do they look like?

The methodological question is what differentiates different DH approaches (e. g. continental vs. Anglo-saxon approaches).

This is true. The point is, htough: what is the methodological question?

The "cultural law" of the artefact:

Is there a non Anglo-american DH?

Yes, but the answer is not simple. Non Anglo-american DH are almost invisible.

A scholar wrote a story on the "Chinese dinosaur". In 1997 an US newspaper wrote an article entitled: "In China, a spectacular trove of dinosaur fossils is found".

The news reported only information only give information by the role of international scholars involved (nothing about Chinese people and circumstances involved).

The finding becomes real only when the West gests to know about it.

Bottlenecks of DH:

Geopolitical issues: DH organizations.

Presence of individuals in boards of DH organizations by country of institutional affiliation shows mostly (overwhelmingly) people from USA, Canada and UK shows mostly people from USA, Canada and UK (overwhelmingly).

Multiple or cross-appointments.

The same people appear in different boards (e. g. J. Unsworth and M. Terras appear in 5 different boards).

As a result, M. Terras' infographic on the DH shows an overwhelming predominance of USA and UK.

THAT Camps.

They are today a good chance for peripherical DH communities to gather.

At THAT Camp Florence a new European DH association was created, because we canot play the rules of the game, but we can play another game.

The problem of standards.

Who controls the standards? Any standard enforcement creates residual categories, someone who wins and someone who loses (Bowker and Leigh Star 1999; Leigh Star 2006).

(Break-in intervention from the audience at theis point: there is however a subsequent decision by the public, e. g. XML was largely used by the public, SGML was not anymore).

E. g.: Unicode has been defined by a board where big corporations were represented (Google, IBM etc.), and no representative of public bodies was there.

Perri 2009 has argued that Unicode arises from a Western, "hypertypographical" approach to writing (different, for instence, than some details of the Indian writing system). [This is interesting for my "Babel" line of research" P. M.]

The countries with the highest linguistic diversity (e. g. India, Tibet, Brazil, Mexico) are often the poorest in the world, and interfere little with the definition of standards.

Let's explore multi-linguism and multi-culturality!

I have more to say (and proposals to make), but I prefer to hear the debate, and then make my proposals.

Discussion on Controversy 2

S. Schreibman: in the US there is more bottom-up research and initiative than in EU.

Willard McCarty: Canada is a counter-example. It is very important in the DH, yet it's no big country.

Paolo Monella: Also our modelling of alphabet, language, text may be more or less inclusive, more or less "Western-centric". I argue that more sophisticated modelling is more open and inclusive.

Manfred Thaller (to Paolo Monella): Spanish language: one Unicode character for Spanish "ll" (as Spanish people would like to) or two characters (as it was originally in Unicode - now no longer).

Q: Also Open Source/Open Access legal issues are relevant.

Q: The inclusive approach is one of the big things happening in Computer Sciences. E. g. today visualisation is not the output of the research, but an early phase of the analysis (involving the reader).

Elisabeth Burr: Often people outside UK and USA simple don't want to participate.

3. Scope of the DH

15:00 – 16:30 Controversy 3: What is the scope of the Digital Humanities? What is the relationship between individual disciplines served by them?


Jeremy Huggett, "Core or Periphery? Digital Humanities from an Archaeological Perspective"


What are the DH?

Most people trying to answer don't mention the single disciplines involved.

When some people mention disciplines, some disciplines are not (or hardly) represented.

Day of DH 2012 did not emphasize disciplines, but it emphasized a text-base aproach (e. g. Classics and Philolosphy are little represented). Linguistics and Literary studies, instead, are represented. Linguistics and Literary studies, instead, are represented.

The non-text based threfore have difficulties in seeng the DH agenda.

Digital Classics.

G. Crane, M. Terras, H. Cayless define Digital Classics very differently (in an early or in an advanced state).

Google n-grams.

"Humanities Computing" peeks later than other terms like "Historical computing, literary computing" etc.). It peeks about 2000-2005, while the others peek about 1990.

Instead, the terms connected with "digital" (not "computing"), including "DH", peek all together after 2000.

As we move from "computing" to "digital", single disciplines mantain disciplinary identity.

Digital archaeology papers tend to appear in mainstream archaeological journals.

Instead, DH in other sub-fields are not accepted in mainstream journals etc.

Stuart Dunn has recently argued that the relationship betwen Digital Archaeology and DH has been particularly lucky. Probably because Archaeology is interdisciplinary in itself (like the DH).

There is a large number of centers for Digital Archeology throughout in the world.

Like DH, Digital Archeology is often defined in terms of practices or "methodological commons" (Harold Short).

Volume "Understanding DH" does not mention Digital Archeology, but volume "Thinking beyond the tool" does.

"Archaeology deals with longpast pre-literate societies, so it fits poorly within a logo-centric DH"

The use of Ginfrastructure on the Humanities ignores the knowledge gathered by Digital Archeology, so they look quite simplistic to the eye of an archaeologist (e. g.

Anxiety Discourse.

The DH are anxious about defining themselves. Just low-prestige technical support to other disciplines? The same has happened to Geography and Archaeology and Archaeology.

DH and Digital Archeology are manoeuvring around each another and not really talking to each other.

The same goes with Digital Geography.

Funds cuts, search for relevance, disciplinary anxiety: the different sub-disciplines in the DH should join forces. They're stronger together and weaker apart.

Jan Christoph Meister, "DH is us, or on the unbearable lightness of a shared methodology"

There are two ways of looking at the relationship between disciplines and DH. How a discipline contributes to the DH and how the DH contribute to a discipline. Today I'll do none of this. I'll do something different: I'll look at the very idea of what a discipline is.

Changes in name (Humanities - eHumanities, Philology, ePhilology).

Scientific communities are also language communities and develop a specific terminology.

The introducition of the new term "DH" implies that the identifying communality is "digital" (this is shared by all DH sub-disciplines). But what does it mean to be "digital"?

It would not make sense to speak of "Blue Humanities" because "blue" is not defined. "Critical Humanities" does make sense, as "critical" has a definition.

What does it mean for something to be "digital"? There's no easy answer.

"Digital" means conceptualized/conceptualizing in a specific way: discreteness. 1 vs. 0, "on" vs. "off".

Digital and analogue both have their pros and their cons.

The "Digital" is completely reusable and reconstructable. It has a power to transcend its original methodological context.

The "Analog" cannot be reassembled ad libitum. It is open to interpretation, it is a multi-layer model of something, it depends on the intellectual context.

Digital vs. analogue conceptualization.

The Achilles and the tortue paradox, in digital and analogue:

The humanists have always been the flag-bearer of the analogue.

It is hard to convince the "traditional" umanists of the utility of the DH.

Thesis 1: The scope of the DH is universal - but its practice should not be.

Some things are not doable technologically today, but they will be tomorrow.

Does it make sense to approach any Humanities research question under a digital angle? It's a matter of cost/benefits. [Interesting for my "Why?" line of research]

"DH needs to invest more energy into philosophical and methodological self-reflection and critique to engage in a constructive dialogue with the Humanities".

Thesis 2: DDH confronts the traditional disciplines with the 'unbearable lightness' of a shared methodology.

Academic disciplines as discrete objects are an invention of the XIX Century. They are social institutions, defined by:

"DH offers the chance of communication through the use of a new conceptual lingua franca: digital conceptualisation"

A lingua franca across the Humanities or even across all sciences (like when everyone spoke Latin).

XIX Century disciplines were nation-oriented. The digital lingua franca could be international.

DH cuts across and affects all dimensions defining the traditional disciplines: it negates disciplinary identity.

Many colleagues of mine don't think that I'm in the German Studies any longer. Sometimes I share their doubt.

Discussion on Controversy 3


Jeremy Huggett: We agree in many points. The tension between digital and analogue changes when some cultural objects start to be born digital. This is starting in Archaeology right now.

Jan Christoph Meister: Some books still exist both in print and in digital. However, my students just don't go to the library and only read digital texts. Also, Humanities students learning some DH ask the big question: "How does this relate to my own experience?". I invented the term "embedded DH". I spend a lot of time conceptualizing the traditional Humanities questions in formal ways. We must be aware that the digital approach cannot solve any problem. We must say: "We'll tackle this Humanities issue digitally up to this point and no further, as further than this it cannot be tackled digitally". It often happens that 'traditional' humanists come to me and ask me to add a 'digital angle' to their research project funding appication. This is no healthy way of working. In 50 years our academic disciplinary infrastructure will become completely obsolete. It's a XIX thing, not working any longer.

Dino Buzzetti: the focus of DH today is on presentation. Father R. Busa's focus was on processing (before graphical interfaces). It is time that the DH turns back to processing information, rather than to presenting it. A second point: between digital and analogue there is the issue of indetermination.

Jeremy Huggett: Ginfrastructure at the beginning was just a way to represent information, rather than to create new understandings. We need to make the leap now (towards creating new understandings).

Jan Christoph Meister: Alan Renear in the '90s spoke of markup critically. We don't like representational markup (duplicating the appearance of print books): it would be like a 1:1 map of the world - useless [Interesting metaphor]. Second point, about the leap: we cannot do it all the sudden. It must be gradual. The really difficult question won't ever be answered by the machine.

Chaim Zinn The DH (however you conceive them) do not constitute a defined "field of knowledge", but only an academic framework for discussing questions related to digitazion in the Humanities.

Jan Christoph Meister: Even better, considering that the "disciplines" system is going to collapse. I don't see the benefit of trying to define the DH as a discipline.

Q: I'm an archaeologist and I do not know any colleagues who work without digital tools, so I ask wheter in the future Digital Archeology will just become the standard form of Archaeology.

Jeremy Huggett: It's true that most archaeologists have gone digital, but they use digital tools in a purely instrumental way to answer purely archaeological questions. This does not make them digital archaeologists. Then, there are people who are interested in the tools themselves: they are the digital archaeologist [Interesting]. They are in danger of becoming the technicians of their department, but there are ways around it. Also, they won't ever become useless, as there will always be new tools to develop [Interesting].

Tito Orlandi: I am afraid that "digital" is just a more fashionable word for "electronic". But never mind. On the queston whether disciplines are alive or not, and whether DH are a discipline or not: let's take the example of Epigraphy and Papyrology. They're accepted as disciplines today, but in fact they're not.

Manfred Thaller: Some say that using formal methods in philology is useless, as you won't ever have a completely formal conclusion, so it's better not to try it at all.

Jan Christoph Meister: Small intensive philological operations won't be digitized. But digital analysis of words (as signs of concepts) through n-grams on large historical corpora of texts will be useful to Humanities practitioners in general.

17:00 – 18:30 Discussion

Manfred Thaller: The talks have already been put online on the Cologne Dialogues' website. Contributions from others (notes, comments, papers) will also be added (through login on the website). Then the print proceedings will be published. Finally, if a controversy receives enough comments, we can organize another specific dialogue on that topic.

Willard McCarty: It would be great if papers had circulated before the conference or had been handed to participants in print during the conference.

Manfred Thaller: Not all speakers submitted their talks in time; also, if attendants have a hard copy in front of them they don't listen attentively to the presentation.

Paolo Monella: If you are willing to undertake the burden of moderating the comments, the more open the comments area is, the better it is. Think of the book Debates in the DH, where the book produced talks, talks produced blog posts and blog posts produced comments.

Manfred Thaller: Yes, we are willing to undertake that burden, but only for a limited amount of time.

Q: The online commentators could become peer reviewers of the papers.

Manfred Thaller: Interesting. But we want to have proceedings ready by next dialogue, that should be in December.

Susan Schreibman: We could have a built-in system in the website allowing commentators to comment on specific passages of the text.

Manfred Thaller: Dinner is reachable by train. Go to the train station, then take S12 to Cologne central station.

April 24th, 2012

Making the Digital Humanities work: Tools, infrastructures, technology and conceptual work.

4. Role of markup

9:00 – 10:30 Controversy 4: What is the appropriate role of markup?


Espen S. Ore, "Document Markup – Why? How?"


Markup is not clearly separate from text: word division is markup as much as it is text.

Inscribed stones without word separation do not have even this very basic marup that would help us to understand the text.

Rune writing is a complex form of coding of language: not just a string of characters (one slide shows numbering involved in deciphering an inscription)

Classical (or Hellenistic) markup: they marked spurious Homeric lines with a sign.

"This standoff markup was brought closer to the text as scholia in medieval manuscripts". Thre are examples of markup inseparable from data.

Other example: a print book 1846-1785 with a dialect collecton. Italics, bold, identation has a markup function. Typography is used to give some information. It's no 'formal' system as it breaks its own rules, so we can't use it to extract information automatically, but it's a markup system.

Example: digital edition of Ibsen letters. Started in TEI/SGML, then TEI/XML, now TEI/XML P5.

As the project has had such a long life, there is 'fossile' markup features in the markup belonging to P3 that are not currently used.

The markup is paper-oriented. It includes typographical information (E. Ore did not agree with this choice). (Slide now shows examples of the inline markup used in the project).

TEI/XML allows to mark up textual variants (strikeouts, substitutions etc.). E. Ore shows now the digital image, the markup and the visualization effect for the end reader.

Overlapping hierarchies: one for the text flow and one for the lines on the page.

Standoff encoding: in a different file. The markup can be exported to a relational database.

Standoff encoding: pros and cons.

It makes overlapping hierarchies easier.

But there is no standard for data interchange.

A solution might be: using standoff inside the project and then "Project the data into a suitable hierarchy for XML (TEI) export" [Interesting: this is what Isabella Bonincontro said about TEI as merely an exchange format; tension between interoperability and "deep" markup?].

Conclusion: "What are text and markup really?"

Desmond Schmidt, "The Role of Markup in the Digital Humanities"


Layout of this talk:

  1. Background - Why markup matters
  2. Six problems with the standard Markup model
  3. Design of a solution
  4. Conclusions and collaborations

(Very intersting and clear slides!)

Part 1. Background - Why markup matters

The growth of repositories for cultural data (examples: Dariah, Project Bamboo, Europeana, TextGrid, HuNi in Australia).

They contain digital surrogates of human artefacts (textual or not).

The role of markup in the big picture. (See interesting slides - no. 12-15 - with points and definitions).

TEI history (1987 foundation, 1993 guidelines, 2002 XML).

Part 2. Six problems with the standard Markup model
1. problem: too many tags.
TEI: 545 elements: too many, both for encoders and for software developers.
2. problem: usability.
Even tools like Oxygen don't make it easy for humanists.
3. problem: overlap.
Renear et al 1993: "What is Text Really?". Texts are not really hierarchical.
  • variation in document structure between versions
  • change of interpretive angle of the encoder
4. problem: interlinking.
It seems a good way to solve the 'overlapping' project, but there are issues (see slides 39-42).
5. problem: variation.
Alterations to a text, different editions etc. "Poorly represented by embedded markup".
"Can't represent overlap, changes to markup structure, many versions" [Interesting!].
6. problem: interoperability.
The most serious and unaddressed problem.
SVG (vectorial graphics) has achieved the goal of interoperability in its field (it's been used to make these slides), but 25 years of TEI have not achieved it.
"Each 'standardised' markup scheme is different - it has to be". So interoperability with TEI "is impossible".

Part 3. Design of a solution

An abstract design for texts and markup.

"Markup sets" separated from "text".

"Nmerge" project. Multi-version document (MVD).

It replaces a lot of human effort with automatic computation.

Interesting slike on a MVD example. It copes with dislocation of text portions.

Standoff markup (not entirely satisfactory)
Markup is outside the text.
Standoff markup invented by linguistis in the early 1990s.
It has a limit: separate sets of markup can be selected, but not mixed.
"Standoff Properties"
Better approach: "Standoff Properties".
Overlapping hierarchies can be mixed.

We should not worry about losing our standard XML tools, because they're not many and they can be replaced.

"DHers nedd tools they can really share".

Part 4. Conclusions and collaborations


Conclusion: embedded XML markup is not well interoperable and tractable

Discussion on Controversy 4

Espen Ore: Wouldn't such a project (with intensive SW development) be too expensive?

Desmond Schmidt: Right. To avoid excessive costs we must use already available tools.

Jan Christoph Meister: OK with standoff markup. Right: TEI/XML is compless (the manual is over 1000 pages). Markup goes back to the old, unstructured idea of annotation. My question is: how do we help users to map tag X in markup vocabulary A with tag Y in markup vocabulary B? Shoud such links be based on statistics rather than on taxonomy? Should we have also a second paradigm for markup, that is not taxonomy but statistics?

Espen Ore: If we want to build editions that may be converted into elements of large text archives, for that TEI/XML is OK. We don't want to use TEI for annotation.

Desmond Schmidt: TEI is not wrong in defining a shared vocabulary to talk about texts. It's the formal part of it that's wrong. But sharing tags is OK.

Dino Buzzetti: My worry is that we do not have a clear idea of what text is. How do you (Schmidt) relate features that relate to the "physical" layer of text encoding with properties that refer to content? [Interesting question!]. Markup has been invented by people who were concerned about the print outcome of texts, and this original bias still weighs. Embedded markup is most suitable to represent the physical appearence of printed text.

Q: I work on digital images. Semantics is utterly uncomputable.

Willard McCarty: Markup is only (maybe) useful for undisputable features (line numbers), but not for interpretive features. The latter are subjective.

Domenico Fiormonte: Is SW with TEI/XML built on markup? What comes first? SW or markup?

Q: I'm working on Averroe, so we have the 'same' text in many languages. How do you deal with it?

Desmond Schmidt: You can have a "standoff properties markup set" with information about linking of chunks of the text among different versions and languages. I think (but I'm not sure) that this problem can be solved through standoff properties.

Tito Orlandi: W. McCarty's question is very important. We need tools to analyze the text. We tend to give markup a role that's not quite correct. It's true: markup is present even in ancient inscriptions. In a way, some literary criticism essays are a form of markup. When I read one of those essays, I want to understand the rationale underlying that critic's essay ('markup'). Also, when we use markup of a previous digital humanist we have to put some effort in understanding the rationale under their markup. Also: why must markup be formally correct? All that counts is that you know the rationale behind whatever form of markup: then you'll build SW to compute that markup. This brings interpretive markup in again and solves McCarty's doubt on the usability of interpretive markup.

Desmond Schmidt: My model allows you to comment on embedded markup. A markup set is outside the text.

Malte Rehbein: The two speakers have focused on the digital presentation of a digital edition. But is this really what we want to do? The potential of computation is much more than just presentation on a screen. On annotation: we hope that "digital conceptualization" is to provide a lingua franca for the Humanities (C. Meister's hope).

Desmond Schmidt: If you work with plain taxt, there are text mining technologies that can do amazing things computing plain text.

Q: DH are collaborative, so different people need to add heir own layer of interpretation to a text.

Desmond Schmidt: Annotation can be linked to texts just apart from the kind of markup we are talking about today.

5. Which type of infrastructure?

11:00 – 12:30 Controversy 5: Big structures or lightweight webs. What is the most sensible technical template for research infrastructures for the Digital Humanities?


Sheila Anderson, "Taking the Long View: From e-Science Humanities to Humanities Digital Ecosystems"


(Interesting and informative slides).

I've been researching in what research infostructure (infrastructures) are and how they relate to traditional infrastructures (libraries, museums etc.).

History of Technology.

What's "technology": "techne" = art; "logy" = a branch of learning.

Sytems not technology (Thomas Hughes).

Hughes says that tecnological evolution evolves this way:

You can't separate research infrastructures from actual research evolution.

A universe of information:

We need research infrastructures. Some historical milestones:

E-Science Programme: Atkins et al, 2009.

Problem: EU and other funding bodies do not understand clearly why the Humanities also need a big (digital) infrastructure.

Anybody still thinks of the Humanities in the traditional way (individual close reading etc.).

Franco Moretti has argued that Humanities are not only about close reading. He says that most scholars study only the 'canon' (about 200 literary works).

Arts and Humanities e-Science Programme:

The disruptions of DH projects not technically working made us reflect about the Humanities themselves, rethink about our own practices.

What do we need to do to enable 'traditional' scholars to use the digital methods?

"Digging into Data" project "addresses how 'big data' changes the research landscape for the Humanities and social sciences".

Old Bailey Online project. When — after the end of the project — the project's responsible reflected on the project (which had been very successful), he said that he was worrying about losing the sense of individual voices while using statistics.

Visualizing our universe:

A rallying cry to the Humanities:

Joris J. van Zundert: "If you build it, will we come? Large scale digital infrastructures as a dead end for digital humanities"

I think that big infrastructures like Dariah are a dead end.

We'll end up with huge highways that nobody uses.

Much of the successful innovation are brought about by:

Standars are useful, but this hinders innovation. And big infrastructures are based on well-defined standards.

We have too many standards (see interesting slide). No general infrastructure can cope with all those standards.

If you first set the standard and then encode digitally, you can't further model these objects.

Sistributed Digital Scholarly Editions.

Digital editions is what I (van Zundert) have been working on. [Interesting!]

In 5-10 years a digital edition might become a multi-service and multi-server system. E. g., annotations should be served by a different service. Not just a single scholar, but many will be working on it.

Sustainabiliy of our digital object.

The approach of large infrastructures is to institutionalize that sustainability. This is threatening for the Humanities institutions, that possibly won't have the force and skills to fulfill the promise of ensuring long-term sustainabiliy.

What happens when a DH project runs out of funds?

A proposal: let's use the Internet technology that's been there from the '70s!

We haven't used the potential of the Internet so far.

We can think DH objects as chunks of information self-sustainable and self-replicating on the Internet. We're not there yet, but we're working into that.

We could cut our DH work into little application-agnostic chunks, independently implemented for the web.

Distribution, redundancy, reuse.

Discussion on Controversy 5

Sheila Anderson: Your concept of large infrastructures is misguided. They aim to offer traditional humanistis digital tools to use. They build a ecosystem model where interaction is fostered. In a way the "cloud" is an infrastructure in itself. Also: I am going to turn down 20 million Euros of funds that have been given to large infrastructures? Also, grid computing is useful as it provides computing power.

Joris van Zundert: How do we democratize access to research? Big infrastructures are about big projects. Also: people working in DH big infrastructures tend to think that their specific infrastructure is everything, it is the cloud etc.

Sheila Anderson: Large infrastructures do not coincide with technology. E. g. Dariah has no Dariah cloud. It's more about debate and discussion. Large digital infrastructures have a potential of reaching out for Humanities scholars who are 'non digital'. To me large infrastructures are about a social and political agenda.

Joris van Zundert: How do large infrastructures support scholars working independently or in small institutions?

Sheila Anderson: E. g. Dariah provides occasions, conferences, workshops for scholars to come together and discuss. Much small-scale collaboration is happening that does not hit the mainstream. We can help to bring this to the attention of the larger community.

Joris van Zundert: True: Clarin and Dariah have facilities to bring people together. But the way we bring people together with InerEdition is that we share our tools at virtually no cost. And you do "express methodology into the code". Does Dariah facilitate that kind of collaboration?

Sheila Anderson: Dariah aims to levering what's happening and seeing what's working and what's not working.

Willard McCarty: I'll tell a story. When the University of Toronto wanted to buy a supercomputer (many years ago), the humanistis said that they didn't have much data, so they didn't need the computer. My question is: do we really supercomputing? Do we really have big data to deal with? Or are we just looking for funding? We should avoid a "funding argument".

Sheila Anderson: I'll give and example of big data for which you needed computing power [I didn't pick the project's name]. Another example would be analyzing the data coming out of the social media. This is not only data useful for the social sciences, but also for the Humanities.

Joris van Zundert: Once we asked to use a supercomputer and said that our data was below a terabyte. They told us "This is noise for us. We'll process it and won't even charge you for that".

Henry M. Gladney: Maybe Joris has inflated and conflated a number of personal frustrations in building a project without much support. I've worked both on small scale personal projects and with big infrastructures. I think that it's a personal issue and no general issue at all.

Dino Buzzetti: I agree that it's a political, social and cultural agenda. I think that there is more to do to reach out for 'traditional' humanities. InterEdition, with very little financing, have been able to produce very useful tools. In Italy there is an issue about funding: the only way to get funding is to converge on a research, political, cultural agenda.

Sheila Anderson: I'm aware of the risk that big infrastructures attract all the funding to the detriment of small scale projects. We need to find a balance for that. Also: Dariah will work towards identifying services (the scholar creates an account, logs in etc.).

Desmond Schmidt: I'm not convinced of the usefulness of the microservices approach (that Joris talked about: cutting the code into chunks). If the steps are too little they're not useful.

Joris van Zundert: The advantage is reuse. For example, regularization works differently in Armenian than for English. That's better than trying to come up with a large project that tries to cope with everything.

Q: I work in Clarin and the Language Archive. We could have a multi-players system: big infrastructures on the one side, and small scale collaboration on the other side. We need both. Without big infrastructures we won't have sustainability.

Sheila Anderson: I don't want to deny how valuable InterEdition is. The advantage of large infrastructures is that they reach out for 'non-technological' humanistists, while projects like InterEdition only involve digital humanists.

Joris van Zundert: Talking of Clarin, and of Nederland's Clarin in particular, that I know better, it ends up producing not really innovative research.

Sheila Anderson: Dariah ensures that there is one big international research group, e. g., about the Holocaust, and that many different nations fund the articulations of this research project.

Q: About the medieval research. DH tools are experimental: modelling and experimenting.

Q: About sharing. In my university we have two supercomputers. They keep asking historians if they are ready to feed them with computing to do for them. Will big infrastructures bring about changes? Will they give us material to feed the supercomputers with?

Sheila Anderson: We should rather start from another starting point and aks ourselves: what are your research problems?

Manfred Thaller: We shouln't only think of texts. DH are bigger than textual studies. Think of large numbers of images of MSS.

6. Balance between conceptual work and technology

14.00 – 15.30 Controversy 6: "Digital curation" or "digital preservation" is a topic, which has originated within the world of digital libraries; recently it has been drawn closer and closer to the Digital Humanities. What is the balance between conceptual work and technology?


Helen R. Tibbo, "Placing the Horse before the Cart: Conceptual and Technical Dimensions of Digital Curation"


What's the balance between theory and pratice in the DH?

Digital curation may be considered a discipline of its own.

It has been argued that all curation tomorrow will be digital curation.

RLG/CPA Report 1996.

Technology is one of the requirements for digital curation, but is the last of a long list.

Henry M. Gladney, "Long-Term Digital Preservation: a Digital Humanities Topic?"

I am a computer scientist and an engineer.

"For DH to be considered an independent academic topic, it must nclude an explicit research agenda that is not already addressed by some accepted academic faculty such as Information Science!"

The issue of digital preservation has already been solved from a Computer Sciences perspective.

Preservation means:

Most cultural and academic work focuses on repositories. TDO ("Trustworthy Digital Objects") methodology focuses on information objects.

We need a schema and tools working for every individual, for every institution and for every kind of data.

Q: "Long-term" to me means that we preserve digital objects so long that the user cannot talk to whom created the object because the latter is dead or anyway unavailable because too much time has passed.

The best option to date is a tool called "Preserved Information Package: TDO".

Discussion on Controversy 6

Helen Tibbo: there are many document data formats. How does your tool cope with them?

Henry Gladney: We must build an application for any data type.

Q: How do you cope with the preservation of computer games?

Q: Digital preservation has two aspects: how we preserve the Humanities artefacts and how we preserve the objects of DH. The Rationale of knowledge preservation. Digitization is not enough for preservation. Accessibility of digital resources.

Willard McCarty (to Gladney): How do you build trust in new objects (socially)?

Q (to Gladney): The demand for digital preservation will grow in time, and institutions can give an answer (e. g. universities).

Henry Gladney: My tool has been thought also for individuals to preserve their own files.

7. Digital libraries

16:00 – 17:30 Controversy 7: "Digital Libraries" have started their life as an answer to opportunities created by a specific stage of technical development. Where are they now, between Computer Science and the Digital Humanities?


Hans-Christoph Hobohm, "Do digital libraries generate knowledge?"


I'm a library and information scholar, with a PhD in French Literatures.

Diderot's Encyclopedie. They described the world using taxonomies. But this wasn't necessarily related to the real world.

Semantic web today: we cannot describe the world formally using formal semantics (triples etc.) because this kind of semantics is only a relation between objects.

How can we make the information stored in (digital) libraries become knowledge?

A document (also a digital document) has information and evidence.

The Shannon-Weaver model of communication.

Thre levels of communication problems:

In digital library engineering, mostly only the technical problem is considered. The semantic prolem emerged. In 2005 book "The Turn. Integration of information seeking and retrieval in context" by P. Ingwersen and K. Järvelin posed the semantic problem.

Book "Acting with technology. Activity theory and interaction design".

Blended library takes into account the fact that users have a body and their cognitive processes happen through their body. Users are more 'immersed' than with Google.

Some people say that digital libraries are simply augmented search engines.

Discussion on Controversy 7

Hans-Christoph Hobohm: I don't think that formal semantics are useless. They will be reused in the future. But they do not produce new knowledge.

Hans-Christoph Hobohm: Today we have folksonomics, Facebook. This produces knowledge as it describes relations. The users on Facebook describe their life. Facebook's interface has recently turned into a chronicle of the user's life. The Shannon-Weaver model of communication does not describe how people communicate.

Willard McCarty: Research is not research if you know what you are going to find.

Chaim Zinn: There are many distinctions and definitions of "data", "information" and "knowledge". Some say that knowledge is in the mind of man, while information is outside it. I think that digital libraries do not generate knowledge but represent knowledge.

Hans-Christoph Hobohm: Right, but currently digital libraries are too focussed on a static representation of information. As we open big pubic digital libraries (Europeana, or the German national one), we want to integrate them in the reading, learning and collaboration process. Until today they only collect and represent objects (Shannon-Weaver model). We should go beyond it.

Willard McCarty: We should not call the people in the library "users". This definition has a passive aspect and affects us heavily.

Q: What makes digital libraries better than traditional libraries in generating knowledge?

Hans-Christoph Hobohm: Nothing, except that digital libraries have larger flows of information.

Manfred Thaller: Another advantage of digital libraries is the hypertextual linkage.

Continuing the agenda … Planning for round 2

Manfred Thaller: People can comment on papers on the website. The papers on the website have stable URLs. We'll put materials on Google Docs. Thank you all.