Papers

ÁLVAREZ CARBAJAL, Francisco Javier (École des Hautes Études en Sciences Sociales, France)

Keywords: Digital Diplomatics, Charter Encoding Initiative, medieval charters, Counts of Luna, encoding standards.

  • Session: Encoding historical data in the TEI
  • Date: 2015-10-30
  • Time: 11:00 – 12:30
  • Room: Amphi Fugier

Diplomatics is a consolidated field in the study and edition of historical sources. In spite of having its own scientific vocabulary (Vocabulaire International de la Diplomatique) and dealing with documents of legal validity (internally-structured, formulaic and thus, presumably easier to encode texts) there is not a TEI module nor any other standard covering specifically the encoding of diplomatic documents. Although there is a noticeable number of projects dealing with the diplomatic editions of charters, none, except from the Charters Encoding Initiative (CEI), has tried so far to export their encoding model to other sources in order to establish a standard. And what is worse, this multiplicity has lead to the current scenario of encoding fragmentation and loss of interchangeability among the created editions. So far, the CEI has been the most solid (although unfortunately incomplete) attempt to achieve the above mentioned standard and integrate it into TEI. Therefore, this paper aims to continue the task initiated by CEI and recover its original purpose of creating a TEI based standard for the encoding of charters. In order to do so, I will present a case study: a TEI ODD designed for the encoding of a corpus of late medieval Castilian charters from the County of Luna. The goal is to document how the ODD works within the framework provided by TEI, paying special attention to the encoding of charter metadata (dates, calendar, tradition, physical description, etc.), diplomatic discourse and means of authentication. Finally, by sharing the model with other scholars with similar interests, I hope to get the necessary feedback to improve it, and consequently generate the common ground required to foster a community, which is the first step to (hopefully) create a TEI SIG orientated towards the development of an specific diplomatics module within the TEI.

ARMASELU, Florentina (CVCE, Luxembourg)

Keywords: TEI visualisation, TEI publication, digital editions, European integration studies

  • Session: Encoding historical data in the TEI
  • Date: 2015-10-30
  • Time: 11:00 – 12:30
  • Room: Amphi Fugier

The proposal deals with the creation of a TEI publication framework for documents in European integration studies, starting from a small-scale collection from the Western European Union archives on armament issues (less than 100 documents – notes, minutes of meetings, studies and memoranda), intended to be published as a digital documentary edition on the Web. The project has included OCR, conversion to XML-TEI P5, enrichment by means of XSLT and Named Entity Recognition, as well as corpus analysis. For the publication of the collection, the adaptation of an existing platform – EVT (Edition Visualization Technology) – has been chosen, which allows side-by-side visualisation of the digital facsimile and its transcription, and page by page navigation. Besides its particular research goals, the project has also been serving as a pilot for the design and development of a more general platform intended to accommodate a larger variety and number (of the order of tens of thousands) of documents in European integration history, such as treaties, press articles, handwritten notes, letters, video and audio archives. For the majority of these materials, multilingual, high quality transcripts already exist (e.g. in TXT, RTF, SRT formats). The ongoing experiments are oriented towards different visualisation modes, including side-by-side view, linking of text and image or synchronisation of audio/video sequences with corresponding transcripts, alignment of multiple variants or multilingual versions, as well as free-text search and faceted navigation. Ones of the main challenges in the platform design consist in fine-tuning the amount of manual/automatic work for the transcripts conversion to XML-TEI (in a form appropriate to its subsequent publication) and in creating a modular enough architecture to support a multi-project purpose and a gradual development. The presentation would include an overview of the pilot project and of the methodology related to the design of the TEI publication framework.

AZEMART, Ghislaine (Chaire Unesco-ITEN, France); BEN HENDA, Mokhtar (Chaire Unesco-ITEN, France; MICA, Université Bordeaux Montaigne); HUDRISIER, Garance (Lycée Marcelin Berthelot de Toulouse); HUDRISIER, Henri (Chaire Unesco-ITEN, France); LABORDERIE, Arnauld (Chaire Unesco-ITEN, France); LEHMANS, Anne (ESPE d’Aquitaine); LIQUETE, Vincent (ESPE d’Aquitaine); ROMARY, Laurent (INRIA)

Keywords: Pédagogie par TEI, crowd-sourcing TEI, Enseignement/recherche, écologie recherche/patrimoine

  • Session: Presentation and engagement with the TEI
  • Date: 2015-10-30
  • Time: 16:00 – 17:30
  • Room: Amphi Laprade

Nous rendons compte de la phase de démarrage du projet HD Muren (Humanité digitales multiculturelles en mode crowdsourcing via la recherche et l’enseignement) mis en place dans le cadre d’Idéfi-Créatic à Paris 8, (Initiative d’excellence en formation innovante ayant entre autres pour objectif de lier enseignement et recherche). Il s’agit de démontrer la possibilité d’une production participative autour de patrimoines littéraires numériques par la TEI. Cette production participative est fonctionnellement complémentaire et synergique entre recherche en littérature et enseignement de la discipline des lettres au lycée et à l’université (didactique ou recherche en littérature). La TEI est ici expérimentée comme support de pédagogie ouverte des lettres dès le niveau de l’enseignement secondaire, mais aussi des premières années universitaires (notamment formation des professeurs de littérature). On se focalise sur la fabrication d’un atelier cadre permettant d’expérimenter l’aptitude des élèves à s’approprier une lecture « savante » instrumentée en TEI et pour des professeurs à s’être aussi appropriés les mêmes compétences mais en sachant transmettre et animer une cohorte d’élèves : plans de cours, mise en séquences et en exercices, prévision du collaboratif, énoncés d’exercices traditionnels mais numériques (dissertation, analyse ou commentaire de texte). Les corpus en vers ou de théâtre sont choisis en priorité car ils permettent de mettre plus facilement en évidence les correspondances d’analyse fond-forme La transmission et l’animation d’une telle pédagogie expérimentale a pour objectifs : – de transmettre un savoir et une créativité de l’analyse littéraire, – d’accompagner, recentrer et redéployer les aptitudes des élèves « digital natives » (s’appropriant naturellement les tâches de scan, d’OCR et même d’en codage XML pour les élargir vers la lecture savante instrumentée en TEI, – de s’assurer de l’acquisition d’une litteracy numérique par les élèves qui n’en auraient pas – d’assurer la synergie et la collaboration enseignement-recherche-patrimoine. Pour le milieu de la recherche en littérature et des patrimoines, une telle démarche permet : – de consacrer le temps des chercheurs (ou des bibliothécaires) à des tâches à forte valeur ajoutée, – de bénéficier rapidement à moindre coût de gros corpus significatifs, – de donner une visibilité aux travaux de recherches et à la mise en valeur des patrimoines en apportant (gagnant-gagnant) leur compétence et leur prestige dans des rencontres avec élèves et professeurs. Pouvoir proposer aussi très en amont des thématiques ou des fonds particuliers. Les partenaires de la phase de démarrage étant : – la Chaire ITEN-Unesco-Paris 8 et le LEDEN – Le laboratoire Mica (MSH Aquitaine) – l’ESPE aquitaine, l’Université de Bordeaux – Un voire plusieurs lycées pilotes. Ce projet prolonge des projets antérieurs (HumanitéDigitMagheb, T&MEIeuroMed, Bibliothèque Numérique Franco-Berbère) qui enrichiront sa problématique dès après la phase de démarrage : cultures euro-méditerranéennes, lien TEI et MEI, apprentissage des langues modernes ou anciennes.

Bibliographie
  • Burnard, Lou. (2014). What is the text Encoding Initiative ? How to add intelligent markup to digital resources. Open Edition press,108 p.
  • Dacos, M. (2010). Manifeste des Digital humanities [Billet]. Consulté à l’adresse http://tcp.hypotheses.org/318
  • Ferry, J. (2014). Jules Ferry 3.0, Bâtir une école créative et juste dans un monde numérique Conseil national du numérique (No. 3.0). Paris. Consulté à l’adresse http://www.cnnumerique.fr/education-2/
  • Hudrisier, H. (1999). La « lecture assistée par ordinateur » et ses applications savantes ou pédagogiques dans un contexte interactif normalisé : la TEI (Text Encoding Initiative). Passerelles (Revue de l’Université de Paris 8), (24), 57‑64.
  • – HumaNum. (s. d.). TGIR Huma-Num Très Grande Infrastructure de Recherche dédiée aux humanités numériques. Consulté 30 mai 2015, à l’adresse http://www.huma-num.fr/
  • Ide, N., & Véronis, J. (1996). Présentation de la TEI. Cahiers GUTenberg, TEI : Text Encoding Initiative(24), 4‑10.
  • TEI Consortium. (2015). Guidelines for Electronic Text Encoding and Interchange. (originally edited by Sperberg-McQueen, C.M. and Burnard, L., Version 2.8.0. avril 2015). Consulté 30 mai 2015, à l’adresse http://www.tei-c.org/Guidelines/P5/
  • Université Paris 8. (s. d.). IDEFI-CréaTIC – Université Paris 8. Consulté 30 mai 2015, à l’adresse http://www.univ-paris8.fr/IDEFI-CreaTIC

BERETTA, Francesco (Pôle histoire numérique – Laboratoire de recherche historique Rhône-Alpes – LARHRA, France)

Keywords: ontologies, historical structured data, semantic markup, digital history, semantic web

  • Session: Encoding historical data in the TEI
  • Date: 2015-10-30
  • Time: 11:00 – 12:30
  • Room: Amphi Fugier

The Système modulaire de gestion de l’information historique (SyMoGIH) is a project developed since 2008 by the Pôle histoire numérique of the LARHRA (CNRS UMR 5190, Universités de Lyon et Grenoble) in order to store, analyse and publish historical structured data on a modular and collaborative platform. More then 50 scholars and students, as well as 10 research programs, used or are currently using the platform for research purposes. Some of the data collected in the collaborative database are published on the main project website and on websites devoted to different research programs, like Siprojuris. These data will soon be available on the semantic web using a project specific ontology. The symogih.org ontology can be used, on the one hand, to extract structured historical data from texts and store them in databases or triple-stores. On the other hand, if digitized texts are available it is also possible to use the TEI’s markup language in combination with the symogih.org ontology in order to encode structured data directly in XML texts. Two research programs and some scholars and PhD students are currently using this approach to produce historical data. We intend to promote this practice in our platform and I have introduced a simplified version of this approach in my teaching about digital tools for historians at master’s level. In this paper, I will present the markup concepts we adopted to encode structured historical data in TEI texts. Ontology and texts are connected using an approach similar to « method A » presented by Øyvind Eide in a recent article of the TEI Journal. I also intend to present an example of encoding a corpus of biographical records so that you can visualise and analyse historical data directly marked up in the text using the symogih.org ontology. Francesco Beretta is CNRS research fellow since 2005. He is responsible of the Pôle histoire numérique (digital history department) within the Laboratoire de recherche historique Rhône-Alpes (LARHRA CNRS UMR 5190 – Universités de Lyon et Grenoble) since 2009. Specialist in the history of Roman Inquisition, in intellectual history of catholicism and history of science, he has taught at different universities in Fribourg, Lausanne and Paris (EPHE, EHESS) and currently teaches a master’s level course on digital tools for historians at University Lyon 3.

BESHERO-BONDAR, Elisa Eileen (University of Pittsburgh at Greensburg, United States of America); TRIPLETTE, Stacey E. (University of Pittsburgh at Greensburg, United States of America)

Keywords: translation, alignment, early modern Spain, nineteenth-century England, ur-text, parallel editions

  • Session: Workflows and the TEI
  • Date: 2015-10-31
  • Time: 11:00 – 12:30
  • Room: Amphi Laprade

Our project deploys TEI alignment markup to document, quantify, and analyze methods taken to compress the length of a medieval Spanish text in a nineteenth-century English translation. We are designing markup methods for interdisciplinary research on cross-cultural translations spanning gaps of centuries and genres in order to reveal the extent to which translation processes introduce formal innovations on earlier texts. Our long-range project is to document changes in structure and content from Garci Rodríguez de Montalvo’s 1508 Amadís de Gaula to the 1547 Sevilla printing of Montalvo’s Amadís to the 1803 English translation by Robert Southey. We chart multiple shifts in form and content of the romance through the efforts of the two textual scholars positioned centuries apart. We begin with a smaller project to discuss at the TEI 2015 conference. We align Southey’s text with Montalvo’s and study through TEI alignment markup the omissions, reductions, and semantic shifts applied in Southey’s translation process. Southey’s translation abridges the early modern Spanish prose romance by half to restore a theoretical “medieval” or “primitive” state. That is, Southey decided to remove elements from his source text that he conjectured were superadded by Montalvo to produce in English a simpler, imagined Ur-text. In his introduction to the translation, Southey alleges that he has condensed the romance through linguistic compression, without introducing a “modern style” or eliminating episodes. We will discuss our markup and data extraction methods and share visualizations to help compare the divergent shapes of the two texts, not only in a parallel text edition but also through graphs and charts. We expect to reveal how the Amadis text transformed in ways that are more than aesthetic (despite Southey’s claims), and to show the shifting social, political, and moral dimensions that make Southey’s Amadís a hybrid of a medieval and modern text.

BIA, Alejandro (Universidad Miguel Hernández, Spain)

Keywords: TEI document production, extended-markdown, automatic tagging

  • Session: Publishing and the TEI
  • Date: 2015-10-30
  • Time: 11:00 – 12:30
  • Room: Amphi Laprade

Creating new XML documents from scratch, or from plain-text, can be a difficult, time consuming and error prone task, specially when the markup vocabulary used is rich and complex, as is the case of the TEI. It usually takes a good amount of time to make the document validate for the first time. In the old times, SGML allowed certain freedom to document encoders, which were meant to save time and effort, like leaving certain tags open, or not using quotes for all attribute values. In this sense, SGML was more permissive than XML. This was good for human encoders, but made it difficult for programmers to create parsers and applications that fully complied SGML’s permissive set of rules and inferences. On the contrary, XML was made to very restrictive, and in turn more predictable, which makes parsing and processing easier, and contributed to the fast popularity that XML gained soon after its introduction. In the Wiki world, a myriad of Wiki languages appeared, with the purpose of simplifying, or completely avoiding HTML markup. Among them, Markdown is a recent and very successful shorthand notation to avoid writing HTML tags while still keeping text legibility intact. By joining the spirit of good-old SGML, and the ideas behind Markdown, we came to the idea of the downTEI project, which consists of an extension to the markdown syntax meant for the creation of TEI-XML documents, and the corresponding parsers needed to perform such conversion. With this approach, it is easy to obtain a valid TEI document in a very short time, avoiding going through a long list of validation errors. This approach, however, has some limitations. It is meant to process the most common tags, like the ones used for prose and verse, and the most commonly used for the teiHeader (in short, the most frequent tags from teixlile.dtd). For specialized applications (like manuscripts, for instance), further tagging is necessary after the initial conversion, but even in such cases a significant amount of time is saved in the process. In the presentation we will describe the extended markdown notation, the parsing process, and the resulting TEI documents. We will also discuss the benefits and limitations of this approach.

AUTHOR’S BIO:

Alejandro Bia has a PhD in Software Engineering. He studied at ORT University, Oxford University, and at the University of Alicante. Currently he is a full time lecturer at the Miguel Hernández University, within the Department of Statistics, Mathematics and Computer Science and researcher at the Operations Research Center (CIO). He has lectured for the Cultural Heritage Digitization Course at FUNED (2013), the Master in Digital Humanities (2005-2011), and the Master in Web Technology (2005-2007), at the University of Castilla La Mancha, for the Department of Languages and Information Systems (2002-2004) and the Department of Fundamentals of Economic Analysis (2002) of the University of Alicante, and at ORT University (1990-1996). His lecture topics are: text markup using XML and TEI, software engineering, project management, computer forensics, information security, web application design, concurrent programming, operating systems, computer architecture, computer networks and English for computer sciences. At present, he is in charge of the TRACEsofTools project (software tools for contrastive text analysis in parallel bilingual corpus), and continues to develop the Digital Humanities Workbench (DHW) project. In 2005, he has done consultancy work for the National Library of Spain. From 1999 to 2004, he has been Head of Research and Development of the Miguel de Cervantes Digital Library at the University of Alicante. Previously, he has worked as Special-Projects Manager at NetGate (1996), and as Documentation Editor of the GeneXus project at ARTech (Advanced Research and Technology) (1991-1994). His current research interests are text alignment, text-mining, stylometry and visualization methods applied to text corpora. Previously, he worked on the application of software engineering methods and techniques to digital libraries and to enhance document structure design, multilingual markup, digitisation automation by computer means, digital preservation, digitisation metrics and cost estimates. He also worked on neural networks training and developed the ALOPEX-B optimization method. Currently he is a member of AIPO (Association for Human Computer Interaction).

BISSON, Marie (MRSH, Pôle Document Numérique, Université de Caen, Basse-Normandie, France; Equipex Biblissima)

Keywords: Indexation, Outil collaboratif, Base prosopographique, Environnement TEI, Autorités

  • Session: Interchange and the TEI
  • Date: 2015-10-30
  • Time: 14:00 – 15:30
  • Room: Amphi Fugier

Dans la cadre de l’equipex Biblissima et du Centre Michel de Boüard, un outil collaboratif d’indexation a été mis en place : il permet en particulier d’accumuler une réserve de notices-personnes, occupations, lieux, et œuvres. Cet outil permet donc à une communauté de chercheurs et éditeurs de constituer des bases de références communes pour l’indexation des sources anciennes et d’accumuler de la connaissance de façon collaborative et à distance. Conçues en XML-TEI, les notices sont désignées d’un identifiant unique et rassemblées dans une base BaseX. Un outil conçu au pôle du Document Numérique de la Maison de la recherche en Sciences Humaines de l’Université de Caen permet de les créer, les éditer et les modifier : elles sont ensuite sauvegardées sur un serveur. Un deuxième outil conçu par la même équipe permet de les lire et les interroger. En nous appuyant sur l’exemple du thésaurus « Personnes », nous proposons de montrer le fonctionnement de l’outil collaboratif d’indexation en TEI. Générique, il est utilisé pour la conception des notices-personnes, lieux et œuvres, mais il est également utilisable pour l’écriture collaborative de tous documents encodés en XML-TEI. Nous proposons de présenter l’outil du point de vue du chercheur éditeur : nous montrerons les applications pour le travail sur les sources anciennes : quelle granularité a minima a été retenue pour l’indexation des personnes, quelle souplesse la TEI et l’outil nous apportent pour l’accumulation de connaissance sur celles-ci ; comment se réalise le processus d’indexation des personnes de la création de la notice à son indexation sur la source ; nous présenterons aussi le changement qui s’opère dans le cadre de cette indexation par rapport à l’ancienne manière de concevoir les index, notamment la possible réalisation de base prosopographique et leurs liens avec les bases d’autorité nationales et internationales (BNF, VIAF).

BLEIER, Roman (Trinity College Dublin, Ireland – TCD); HADDEN, Richard (National University of Ireland Maynooth, Ireland – NUIM); SPINAZZÈ, Linda (National University of Ireland Maynooth, Ireland – NUIM)

Keywords: Crowdsourcing, editing, correspondence, normalisation, automation

  • Session: Correspondence in the TEI
  • Date: 2015-10-29
  • Time: 09:00 – 10:30
  • Room: Amphi Fugier

The Letters of 1916 is a project to create a collection of correspondence written around the time of the Easter Rising in Ireland or by Irish people. The project uses a crowdsourcing approach: not only experts, but anybody can contribute by uploading images of letters and transcribing them.

Since September 2013, when the project was launched, a large number of letter imageshave been uploaded and transcribed via our website (http://dh.tcd.ie/letters1916/) and stored in a relational database. The next stage of the project is it to make the collected images, transcriptions, and metadata available in the form of a digital edition.

The transition from our crowdsourcing environment to a digital scholarly edition is challenging on many levels. One of the biggest challenges is it to ensure normalisation and accuracy of the TEI encoding. Our workflow can be broken down into the following stages: firstly, extraction of transcriptions and metadata from the relational databases in which they are currently stored; secondly, insertion of metadata and transcriptions into TEI templates; and, finally, both automated and manual error checking and proofing to ensure the transcriptions are consistently encoded, valid TEI documents.

After a general view on our crowdsourced collection process for metadata and transcriptions and issues related to it, our paper will discuss strategies and methodologies we use to create valid and meaningful TEI transcriptions. Essentially the TEI encoding has to be general enough to be worthwhile (and useful as TEI data for future usage) and consistent enough to be ingested back into a relational database that is the “digital edition”.

Brief biography of authors

Roman Bleier is PhD student on the Digital Arts and Humanities programme in Trinity College Dublin. His research focuses on digital editing with TEI and he works on a digital edition of Saint Patrick’s writings under the supervision of Professor Seán Duffy.

Richard Hadden is a PhD student on the DiXiT progamme, studying the intersection between Digital Scholarly Editing and Mass Digitisation processes. The Letters of 1916 project is serving as a case­study for his research. He has an MA in Digital Humanities and a BA in Modern European Languages.

Linda Spinazzè is currently working on the Letters of 1916 project as a DiXiT postdoctoral fellow. After completing her first degree in Medieval Latin Literature and gaining a second degree in Computer Science & Humanities, she obtained her PhD in Medieval and Classical Philology with an experimental dissertation on Elegies by Maximianus investigating an alternative model of digital scholarly editing.

keywords

Crowdsourcing, editing, correspondence, normalisation, automation

BROUGHTON, Misha (University of Cologne, Germany)

Keywords: standoff, OHCO, data modelling, xpointer, xpath

  • Session: Interchange and the TEI
  • Date: 2015-10-30
  • Time: 14:00 – 15:30
  • Room: Amphi Fugier

The Ordered Hierarchy of Content Objects model of text casts a long shadow over our current text encoding practice. Even as our practice experiments with increasingly innovative methods to manage the necessity of overlapping hierarchies and tags (from the original CONCUR declarations of SGML to our current use of milestone elements in XML), we continue to do so in a markup language that assumes, as its defining function, an OHCO model of text and markup. This rooting of TEI in XML (and, by extension, in SGML and the OHCO model) presents problems both for front-end encoding of texts and for back-end processing and data interchange of those encodings. While both standoff markup and hierarchy-free encoding have been proposed as a solution to these problems, this paper contends that a move to a completely OHCO-free markup would throw out the baby with the bathwater: we would lose the advantage of explicitly organized content along with the problems of overlapping hierarchies and ambiguous parsing. This paper proposes, instead, a different approach: rather than abandoning OHCO and its associated technologies, as the proponents of standoff seem to suggest, or retrofitting our markup practice to cheat the constraints built into XML itself, this paper seeks to remodel our concept of text, neither as a single Ordered Hierarchy of Content Objects nor as an undifferentiated mishmash of highlighted features. Rather, building from Patrick Sahle’s “Wheel of the Text,” this paper proposes that text can be considered as a matrix of inter-related – and inter-dependent – Ordered Hierarchies of Content Objects, each containing and describing different aspects of the textual reality. This paper will explore the possibilities of extending our markup practice using such an extended OHCO model as well as the ease of data interchange allowed by it, and conclude with a model of how this model might be realized using current TEI practice.

BRÜNING, Gerrit (Goethe University Frankfurt am Main, Germany)

Keywords: variants, manuscript transcription, textual criticism, overlapping hierarchies

  • Session: Textual variants and the TEI
  • Date: 2015-10-30
  • Time: 14:00 – 15:30
  • Room: Amphi Laprade

Chapter 11 of the TEI Guidelines (Representation of Primary Sources) has undergone profound changes with TEI P5 Version 2. Equally significant changes are likely to occur in the following chapter (Critical Apparatus). This paper is aimed to elucidate connections between the issues of encoding manuscript revisions and multiple witnesses. It is part of a broader discussion of the current state of chapter 11 that is among the envisaged outcomes of the DFG-NEH project “Diachronic Markup and Presentation Practices”. Traditionally we distinguish differences between two or more witnesses (variants) from revisions within a single manuscript (‘corrections’). This methodological distinction is reflected in the coexistence of two different modules, transcr and textcrit. For the purpose of modelling variance in a more abstract way, however, it is useful to put aside whether variants occur within or between witnesses. It is well known and has been much complained about for a long time that variants among bigger textual units or even divisions cannot (or cannot satisfyingly) be expressed within the framework of the Text Criticism module in its current state. An analogous problem is faced in the Transcription module: <addSpan/> and <delSpan/> are globally available but are not designed to express paradigmatic relations, as app and subst do). Thus, if we consider making <app>, <lem>, and <rdg> available above the chunk-level, it would be consistent to treat <subst>, <del>, and <add> in the same way.

This would be a more radical change than the introduction of the considerable number of new elements and attributes that we have already seen in chapter 11, as the traditional hierarchy of divisions, inter-level, chunk and phrase-level elements was left untouched. However radical, allowing elements above the chunk level still falls short of what is needed when it comes to variants that cut across structural divisions and units of the text, and especially when it is the structure itself that either varies between witnesses or is changed within one witness. In such cases the conventional distinction between what is ‘text’ and what is ‘markup’ is abandoned, as it is the markup of divisions and units that is in turn to be marked up.

Ein wechselnd [Weben
Ein glühend] Leben!

On the basis of practical examples like this one from Goethe’s Urfaust (lines 154–5), the paper will demonstrate that it is indeed necessary to modify the traditional hierarchy, if the structure of texts is fluid rather than static, and if it is this very fluidity that an encoder wishes to represent. Furthermore, the paper will explore how this can happen as gently as possible, how the problem of overlap can be handled, and how the resulting markup can be transformed into a traditional TEI-Syntax.

BUSCH, Hannah (University of Trier/Trier Center for Digital Humanities, Germany)

Keywords: manuscript description, ontology, medieval manuscripts, annotation, codicology

  • Session: Encoding manuscripts in the TEI
  • Date: 2015-10-30
  • Time: 16:00 – 17:30
  • Room: Amphi Fugier

The development of tools for the automatic and manual tagging digitized sources is one of the main goals in digital humanities research projects dealing with medieval manuscripts. The TEI module “msdescription” already provides a good standard regarding descriptive metadata from manuscript catalogues. However, for a more detailed description of the page layout a more profound mark-up is sometimes required. In order to obtain easily machine-readable metadata for quantitative analysis of the sources the use of continuous text in prose needs to be avoided. Referring to an external ontology might be a good option, but the range of dictionaries and other terminological tools for codicology is still rather unsatisfactory. This applies not only to English and German controlled vocabularies with explicit codicological terms but also to other European languages. Therefore, within the project eCodicology – Algorithms for the Automatic Tagging of Medieval Manuscripts[1] a bilingual[2] codicological terminology[3] has been created and will now become the basis a SKOS data model[4] for the description of codicological information such as layout features and illuminations.

The paper will present this data model and its application in the meta data management5 of eCodicology. Some examples of statistical analyses will be given as well. Furthermore, I would like to discuss the possibility of future collaborations within the TEI community to establish a common reference model for manuscript terminology.

[1] http://www.ecodicology.org

[2] English and German

[3] Based on the works of Denis Muzerelle (French), Marilena Maniaci (Italian), M.P. Brown (English) and inspired by the discussions about the need of an updated and complete multilingual terminology of Christine Jacobi-Mirwald and Marilena Maniaci in Wolfenbüttel 2011.

[4] http://www.w3.org/2004/02/skos/

[5] A meta data schema according to TEI P5 is used.

BÉRANGER, Marine (PROCLAC UMR7192 Research Lab – EPHE and CNRS, Paris, France); HEIDEN, Serge (ICAR UMR5191 Research Lab – Lyon University and CNRS, France); LAVRENTIEV, Alexei (ICAR UMR5191 Research Lab – Lyon University and CNRS, France)

Keywords: Akkadian language, cuneiform writing, tablets corpus, TXM tool, linguistic analysis

  • Session: Correspondence in the TEI
  • Date: 2015-10-29
  • Time: 09:00 – 10:30
  • Room: Amphi Fugier

This paper presents a project involving TEI encoding of Akkadian tablets for their further analysis with TXM software. The goal of the project is to analyze the vocabulary, spelling and syllabary of a corpus of Akkadian letters, to outline the different Mesopotamian scribal traditions and to understand the complexity of a letter’s writing. The corpus is currently composed of 350 letters written in the Old Babylonian dialect between 2002 BC and 1595 BC. All the letters have been transliterated in Latin characters following the standards established by the Archibab team (http://www.archibab.fr). The transcriptions (previously stored in a relational database) were encoded in TEI for this project. Every word is tagged with a <w> element and annotated with @ana. The <g> element surrounds every transliterated sign, using @ref for mapping to its Rykle Borger’s syllabary identification number and Unicode codepoint. The transcription also encodes damage and conjecture elements <del>, <supplied>, <unclear>, <corr>, <surplus>, etc. Special XSLT stylesheets were designed to preprocess the TEI source transcriptions for TXM import via a generic XML import module with tokenization at word or cuneiform character levels optimized for different kinds of queries. It is for example possible to compare different letters by their vocabulary or orthography according to various metadata parameters, to study the different (transliterated) values of the cuneiform signs that are not damaged on the clay tablet or to obtain a kwic concordance of the cuneiform signs that were erased by the scribe during the writing of the letter. Correspondence analysis allows identifying the vocabulary which is characteristic to a place of composition, a circumstance or a period, and visualizing the similarity or dissimilarity of the letters. A sample corpus will be made available under open license at the TXM demo portal (http://portal.textometrie.org/demo) by the time of the TEI conference.

CASENAVE, Joana (Université de Montréal, Canada)

Keywords: édition critique électronique, philologie numérique, corpus médiévaux, extension TEI

  • Session: Textual variants and the TEI
  • Date: 2015-10-30
  • Time: 14:00 – 15:30
  • Room: Amphi Laprade

Dans le cadre de notre recherche doctorale, nous menons un travail d’édition critique électronique d’un corpus de fables médiévales : l’Isopet 1-Avionnet. Ce recueil de fables possède une particularité : il a fait l’objet de deux rédactions successives assez différentes l’une de l’autre et chaque rédaction est présente dans trois manuscrits témoins. De plus, au sein même de chaque rédaction, il est possible de constater des divergences structurelles entre les manuscrits : en effet, il arrive parfois qu’un manuscrit témoin compte des vers supplémentaires, tandis que d’autres en omettent. Or, dans le fichier XML/TEI, nous avons fait le choix d’encoder ensemble pour chaque unité de texte – en l’occurrence, pour chaque fable, toutes les informations concernant chaque témoin d’une même rédaction. Il nous faut alors baliser les fables de telle sorte qu’il soit possible d’indiquer les informations propres à l’établissement du texte (variantes sémantiques et variantes orthographiques) de même que les informations propres à chaque manuscrit (abréviations, fautes de copies, lacunes). Ainsi, à partir de l’encodage d’une fable, nous voulons générer aussi bien son édition critique que la transcription de chaque manuscrit témoin. Le texte critique établi par l’éditeur constitue pour nous la matrice de l’édition critique électronique et les manuscrits témoins sont encodés en fonction de cette matrice, ce qui permet de mettre à jour leurs divergences structurelles. La question qui se pose est alors la suivante : en XML/TEI, comment rendre compte explicitement des structures divergentes qui figurent dans notre corpus ? Au travers du module textcrit, la TEI permet un balisage des variantes d’un texte. Mais, dans le cadre de notre travail, il nous faut indiquer, en plus de ces variantes, les divergences structurelles entre les manuscrits témoins. Nous proposons donc de baliser spécifiquement ces divergences structurelles au moyen d’une extension TEI.

CHATEAU, Emmanuel (Labex Passés dans le présent); BEAUGIRAUD, Valérie (Institut d’histoire de la pensée classique – UMR 5037); BOSCHETTO, Sylvain (Laboratoire de recherche historique Rhone-Alpes – UMR 5190); BOULAI, Carole (Triangle – UMR 5206); GEDZELMAN, Séverine (Laboratoire de recherche historique Rhone-Alpes – UMR 5190; Triangle – UMR 5206); INGARAO, Maud (Institut d’histoire de la pensée classique – UMR 5037); JALLUD, Pierre-Yves (TGIR Huma-Num); MORLOCK, Emmanuelle (Histoire et Sources des Mondes Antiques – UMR 5189); PONS, Philippe (Labex Passés dans le présent); SAÏDI, Samantha (Triangle – UMR 5206); MAGUÉ, Jean-Philippe (Interaction, Corpus, Apprentissages, Représentations – UMR 5191)

Keywords: publication, XML database, Xquery, BaseX

  • Session: Publishing and the TEI
  • Date: 2015-10-30
  • Time: 11:00 – 12:30
  • Room: Amphi Laprade

Various approaches have been proposed to publish TEI corpora on line, but no standard software solution has emerged yet. As a possible next step after the markup of an edition, publication is still a difficult issue for many projects in Digital humanities. Initiated by the Digital humanities network of ENS Lyon (Atelier des Humanités Numériques), SynopsX is a lightweight framework which aimed is to easily publish and expose XML corpora. It’s a full XQuery web application built with the native XML database BaseX. The sources of the project are published under GNU on GitHub (https://github.com/ahn-ens-lyon/synopsx). SynopsX has been conceived as a scalable and easily customizable solution for XML publication of TEI files. It allows full control on the URL scheme to build real REST applications and we are planning to use it to expose XML corpora in the Linked Open Data. The software brings a templating system for various renderings of resources according to predefined or customized mappings from XML data to output formats. As different partners were involved in the development, three principles have guided the conception of the software : collaboration, mutualization and genericization. At least from this point of view, we can say that SynopsX is a fruitful exemple of non-institutionnal collaboration around the building of an open source software in the field of the TEI.

CIOTTI, Fabio (Università Roma Tor Vergata, Italy); PERONI, Silvio (Università di Bologna, Italy); TOMASI, Francesca (Università di Bologna, Italy); VITALI, Fabio (Università di Bologna, Italy)

Slides (new!!)

Keywords: Ontology, markup semantic, linked data, EARMARK, OWL, modelling, encoding theory

  • Session: Hermeneutics and the TEI
  • Date: 2015-10-31
  • Time: 11:00 – 12:30
  • Room: Amphi Fugier

This paper is directly connected with Ciotti and Tomasi presentation at the TEI Meeting 2014 [1]. There we envisioned the adoption of ontological modelling as a mean to define formally the intended meaning of TEI markup constructs and content. In this paper we present the first results of the work conducted in the last months, after the research team has been extended to include Fabio Vitali and Silvio Peroni. To summarize we discuss two main achievements of our work, one general and theoretical and the second practical and technical. On the former level we have had to rethink our initial optimistic approach. TEI real usage in the community in fact is largely oriented by pragmatic factors. This has produced a fuzzy and proteic multitude of applications mutually related by a sort of “family resemblance” (in Wittgensteinian sense), that is impossible to reduce to a unique formal definition. In this multitude though we can identify a subset of shared assumptions that shape a common ground of semantic notions about the role and meaning of TEI markup. We now think that this subset can be object of an ontological formalization. As we suggested last year we can “prima facie” take the TEI Simple customization as an approximation of this common ontology. Given these main contextual reflections we have started to work on the formal ontology. The paper describe some overall design and methodological aspects of our modelization and the development of an OWL2 DL ontology enabling the formal specification of the TEI Simple semantics. Our requirements for building such ontology are the following: – the ontology should express only one point of view, i.e., a particular characterisation of TEI Simple elements’ semantics; – the ontology should define a precise semantics of the elements having a clear characterisation in the official TEI documentation (e.g., the element “p”), while it should relax the semantical constraints if the elements in consideration can be used with different semantic connotations depending on the context (e.g., the element “span”); – it should be possible to extend the ontology, reuse it and define alternative characterisation of elements semantics without compromising the consistency of the ontology itself. The approach we have chosen for this is to adopt the LA-EARMARK framework [2] aligning the Linguistic Act Ontology (defining the semantics of entities by means of well-known semiotic theories) and the EARMARK Ontology (providing an ontologically precise definition of markup).

References
  • 1. Ciotti F., Tomasi F. (2014). “Formal ontologies, Linked Data and TEI semantics”. TEI Conference and Members Meeting 2014. Evanston (IL), October 22-24, 2014. http://tei.northwestern.edu/files/2014/10/Ciotti-Tomasi-22p2xtf.pdf.
  • 2. Peroni S., Gangemi A., Vitali F. (2011). “Dealing with Markup Semantics”. I-SEMANTICS 2011 Proceedings, 111-118. New York, ACM. DOI: 10.1145/2063518.2063533.

CLÉRICE, Thibault (Humboldt Chair of Digital Humanities, Universität Leipzig, France); ALMAS, Bridget (Tufts University); BEAULIEU, Marie-Claire (Tufts University); DEE, Stella (Tufts University)

Keywords: CTS, continuous, integration, sustainability, contribution

  • Session: Interoperability and the TEI
  • Date: 2015-10-29
  • Time: 11:00 – 12:30
  • Room: Amphi Laprade

The Open Philology Project (OPP) at Leipzig and its US affiliate, the Perseus Digital Library at Tufts (PDL), has years of experience developing extensive infrastructures for managing textual data for historical languages. With around 100 million words available on PDL, and millions more words coming through OPP, in a context of opening contributions from wide ranging communities of users, dealing with ingestion of new texts is a matter of security, flexibility and efficiency.

Over the last few years, PDL and OPP have been moving forward in implementing the Canonical Text Service URN norm and the Epidoc subset guidelines to allow for better interoperability and citability of its texts. We are now working towards supporting a scalable workflow centered on continuous curation of these texts, from both within and outside the PDL/OPP ecosystem. Key requirements for such a workflow are ease of maintenance and speedy deployment of texts for use by a wide variety of analytical services and user interfaces.

Drawing on software engineering best practices, we are building an architecture meant for continuous integrations[1]: analogous to the way Travis[2] integrates with Github[3], we are developing a customizable service that test individual files upon each contribution made to our public git repositories. The services can be configured to test and report status on a variety of checkpoints ­ from schema compliance to CTS­ready markup.

With a strong continuous integration service, we should be able to deal not only with a wide range of genres and languages, but also with a diversity of contributors. We can delegate the tedious tasks of checking markup to the machine, leaving curators free to focus on the scholarship. We also expect that automating checks on the integrity and the adaptability of textual objects for specific frameworks can reduce the error rate and allow for shorter feedback loops to contributors and users of our corpora.

[1] https://en.wikipedia.org/wiki/Continuous_integration

[2] https://travis­ci.org/

[3] https://github.com/

COJANNOT-LE BLANC, Marianne (HAR-Mod Histoire des Arts et représentations (EA4414); Labex les Passés dans le Présent, France); CHATEAU, Emmanuel (HAR-Mod Histoire des Arts et représentations (EA4414); Labex les Passés dans le Présent, France)

Keywords: histoire de l’art, web sémantique, ecdotique, corpus

  • Session: Presentation and engagement with the TEI
  • Date: 2015-10-30
  • Time: 16:00 – 17:30
  • Room: Amphi Laprade

Tandis que la Text Encoding Initiative demeure peu utilisée en France dans le domaine de l’histoire des arts, le projet “Guides de Paris” vise à encoder un corpus de textes, qui constituent des sources incontournables pour l’histoire des arts à Paris aux XVIIe et XVIIIe siècles (architecture, peinture, sculpture, urbanisme, collections etc.), en vue d’une exploitation plus systématique, et dans l’espoir de mettre à jour de nouvelles questions de recherche. Le traitement systématique de ces textes doit répondre à plusieurs ambitions. Il s’agit tout d’abord d’offrir des points d’accès multiples et pertinents pour faciliter la conduite de recherches ponctuelles (par noms d’artiste, type d’oeuvre etc.). Il doit en outre mettre en évidence la qualité de corpus (phénomènes d’intertextualité ou de copie ; analyse du traitement différencié d’un même objet d’une édition à l’autre etc.). Enfin, les guides faisant référence à des œuvres conservées ou décrites par ailleurs, il s’agit de produire un corpus connecté qui tire parti des ressources en ligne récemment produites dans le cadre des grands programmes de numérisation patrimoniale. Cette communication se propose de montrer comment la mise en œuvre d’un balisage XML-TEI à partir de la transcription de ces sources textuelles permet la constitution d’une véritable base de connaissance sur le patrimoine parisien. Nous examinerons notamment comment la TEI peut être employée à profit pour la constitution d’un référentiel qui repose sur de fines stratégies d’indexation mais aussi sur la création de liens avec les ressources en histoire de l’art bientôt – ou déjà – disponibles dans le Linked Open Data.

COULON, Laurent (HiSoMA – CNRS, University of Lyon); ELWERT, Frederik (CERES – Ruhr-University Bochum); MORLOCK, Emmanuelle (HiSoMA – CNRS, University of Lyon); POLIS, Stéphane (F.R.S.-FNRS – University of Liège); RAZANAJAO, Vincent (Griffith Institute – University of Oxford); ROSMORDUC, Serge (CNAM – Paris); SCHWEITZER, Simon (BBAW – Berlin); WERNING, Daniel (EXC Topoi – Berlin)

Keywords: ancient Egyptian-Coptic; digital epigraphy; TEI/XML – Epidoc; interchange format; controlled vocabularies; stand-off annotations

  • Session: Interchange and the TEI
  • Date: 2015-10-30
  • Time: 14:00 – 15:30
  • Room: Amphi Fugier

Sharing digital textual resources is an actual challenge for scholars working on Ancient Egyptian-Coptic (3000 BC-1350 AD). There are two types of reasons for this: first, the different writing systems that have been used throughout the history of this language (hiero-glyphic and hieratic scripts, demotic, Coptic) led to various solutions as regards the encoding of texts; second, the diverging aims and scopes of the projects involved in creating annotated corpora of Ancient Egyptian-Coptic generated representation formats with few characteristics in common. As a result, the resources themselves cannot be shared, and no standard tool can be used for encoding, annotating, querying or analyzing these resources. In order to overcome these issues, several leading projects in the field (1) join forces and introduce a TEI compliant interchange data model that has the following characteristics:

(a) The ancient Egyptian-Coptic TEI interchange data model represents an agree-ment on a subset of the EpiDoc schema towards which the textual data of each project can be converted. Project specific annotations are dealt with either using stand-off markup that refers to tokens of transliterated texts (Bański 2010; Pose et al. 2014), or on the basis of data models that are true expansions of the kernel interchange data model.

(b) The specialized metadata elements and attributes referring to Egyptological concepts are based on controlled vocabularies that are shared and enriched collaboratively by the projects.

(c) These metadata apply either to physical text-bearing objects, inscribed physical features, witnesses (on documents) or texts (Morlock & Santin 2014). As the conceptualization of the relationship between these entities is shared between projects, coherence and precision when describing both the material, philological and linguistic dimensions of textual resources can be obtained.

Note (1) : The project “Cachette de Karnak” (IFAO-SCA; http://www.ifao.egnet.net/bases/cachette/about/; Razanajao, Morlock & Coulon 2013), the Ramses Project (Polis, Honnay & Winand 2013), the Rubensohn Project (http://elephantine.smb.museum), and the Thesaurus Linguae Aegyptiae (http://aaew.bbaw.de/tla/; Dils & Feder 2013).

References
  • Bański, P. 2010. Why TEI standoff annotation doesn’t quite work: and why you might want to use it nevertheless, in Proceedings of Balisage: The Markup Conference. Vol. 5 of Balisage Series on Markup Technologies.
  • Dils, Peter & Feder, Frank. 2013. The Thesaurus Linguae Aegyptiae. Review and Perspectives, in Polis & Winand (eds.), p. 11-23.
  • Morlock, Emmanuelle & Santin, Eleonora. 2014. The Inscription between text and object: The deconstruction of a multifaceted notion with a view of a flexible digital representation, in Orlandi, Santucci, Casarosa, Liuzzo (eds). First EAGLE International Conference on Information Technologies for Epigraphy and Cultural Heritage, Paris, 27th September-1st October. p. 325-350. <halshs-01141856>
  • Polis, Stéphane, Anne-Claude Honnay & Jean Winand. 2013. Building an Annotated Corpus of Late Egyptian. The Ramses Project: Review and Perspectives, in Polis & Winand (eds.), p. 25-44.
  • Polis, Stéphane & Winand, Jean (eds.), Texts, Languages & Information Technology in Egyptology. Selected papers from the meeting of the Computer Working Group of the International Association of Egyptologists (Informatique & Égyptologie), Liège, 6-8 July 2010, Liège, Ægyptiaca Leodiensia 9.
  • Pose, Javier, Lopez, Patrice & Romary, Laurent. 2014. A Generic Formalism for Encoding Stand-off annotations in TEI. <hal-01061548>
  • Razanajao, Vincent, Emmanuelle Morlock & Laurent Coulon. The Karnak Cachette Texts On-Line: the Encoding of Transliterated Hieroglyphic Inscriptions. TEI Conference and Members’ Meeting 2013, Oct 2013, Rome, Italy. <http://www.tei-c.org/Vault/MembersMeetings/2013/>. <halshs-01141540>

CUMMINGS, James (University of Oxford); RAHTZ, Sebastian (University of Oxford); TURSKA, Magdalena (University of Oxford); PYTLIK ZILLIG, Brian (University of Nebraska – Lincoln); MUELLER, Martin (Northwestern University)

Keywords: TEI Simple, Processing Models, TEI ODD, Output

  • Session: Publishing and the TEI
  • Date: 2015-10-30
  • Time: 11:00 – 12:30
  • Room: Amphi Laprade

A Processing Model for TEI Simple TEI Simple is project funded by the Mellon foundation and the TEI Consortium, developing a customisation and extension of the TEI Guidelines. It has created a highly-constrained and prescriptive subset of the TEI suitable for a basic representation of standard Early Modern and Modern books, but this sort of customisation is nothing new in the TEI community. What is more important is that it has developed a notation for documenting intended processing models for TEI documents. TEI elements are generally descriptive of the interpreted semantics of the source text (‘this is a title’, ‘this is a quotation’), where the interpretation is often based on a human interpretation of layout. The rend attribute is sometimes, but not necessarily, used to describe that layout. The resulting encoded text is very amenable to analysis, but for the common case of re-presenting the text on the web, the TEI model is incomplete. The new processing model notation allows each element to be assigned to a structural category (<model>, and given an outline rendition description (<rendition>). This allows a processor to know whether to handle the element or not, and broadly speaking how to display or otherwise process it. The model and rendition instructions are embedded in a TEI ODD customization. This paper will touch on the TEI Simple project, but concentrate on in-depth explanation of the new ODD method of documenting intended processing models using practical and straight-forward examples. By the time of the TEI Conference we expect to have worked with the TEI Technical Council to incorporate the necessary changes to TEI ODD into the TEI. All of the work for the Simple project is undertaken openly through a public github repository: https://github.com/TEIC/TEI-Simple and an open mailing list at https://web.maillist.ox.ac.uk/ox/info/teisimple.

DAVIS, Matthew (Indepedant Scholar, United States of America)

Keywords: TEI, modeling, transcription

  • Session: Tooling and the TEI
  • Date: 2015-10-31
  • Time: 09:00 – 10:30
  • Room: Amphi Fugier

Invariably, digitization projects using TEI concentrate on the content of items through transcription of the text into an abstracted format which is then most often reconstituted into a similacrum of the original alongside a stack of images from which the user of the tool is supposed to intuit the original form. While useful and necessary, this approach also obscrures what is lost in digitization – the material aspects of the codex book, the documentary tapestry, or the physical inscription – in favor of providing tools that are readable by machines, but less so by people. Thus the process of digitization, abstraction, and reconstitution serves the same function as a technological black box. The item to be digitized enters the box, the online version of that item appears on the other side, but the processes it undergoes are opaque to the average online viewer. Moreover, these practices threaten to alter the initial conception of of these artifacts, as that digital surrogate is often the first encounter students and scholars have.

Using both digitized medieval manuscript sources and the Clopton chantry chapel at Holy Trinity, Long Melford – itself an example of where the physical reality of a medieval artifact belies easy categorization and display digitally – I will explain ways that favoring the zone-based TEI schema in combination with thoughtful display practices might recover some of what is lost through the process of digitization and the wealth of legacy codicological information that might be utilized more effectively – ensuring that both form and content are given their due.

Biography

Dr. Matthew Evan Davis (Texas A&M Univeristy, 2013) most recently was the Council for Library and Information Resources/Mellon Postdoctoral Fellow in Data Curation for Medieval Studies at North Carolina State University. There, he worked as part of the team on two TEI-based projects — the Piers Plowman Electronic Archive and the Siege of Jerusalem Electronic Archive, as well the Manuscript DNA project and the Medieval Electronic Scholarly Alliance, an aggregator and discussion space for digital scholarly and cultural heritage work regarding the Middle Ages. Continuing to serve as a consultant and Technical Editor on the former two project, he is also currently the editor of The Minor Works of John Lydgate, a new project seeking to digitize, transcribe, and make available the works of the 15th century poet. The site in its current form may be seen at www.minorworksoflydgate.net.

Besides his work on Lydgate, Dr. Davis is also very interested in the staging of medieval drama, cultural transmission through translation and reception, the history of the book, and material and digital curation as a means of preserving both the material object and the connections between the object, the content contained by that object, and its cultural milieu.

DUMONT, Stefan (Berlin-Brandenburg Academy of Sciences and Humanities, Deutschland)

Keywords: letters, meta data, interchange, API

  • Session: Correspondence in the TEI
  • Date: 2015-10-29
  • Time: 09:00 – 10:30
  • Room: Amphi Fugier

Letters are an important historical source: First, they may contain information about the most different topics, events, and issues. Second, letters allow insights about connections and networks between correspondence partners. So, questions occur which can only be answered across the borders of scholarly letter editions due to the fact that these editions are usually focussed on partial correspondences (on a certain person or on a correspondence between two specific persons). But this needs time-consuming searches across various letter editions. This has been a well-known problem for quite some time, now. It has lead Wolfgang Bunzel, a Romanticism researcher, to request “the creation of a decentralized, preferably open digital platform, based on HTML/XML and operating with minimal TEI standards“ for connecting divers scholarly editions of letters. With “correspSearch” (http://correspSearch.bbaw.de) this paper will present a web service, which takes a step in this direction by aggregating metadata of letters from various (digital or printed) scholarly editions and providing them collectively via open interfaces. Each project can provide their metadata in an online available and free licensed TEI XML file, which is conform to the Correspondence Metadata Interchange (CMI) format. The CMI format was developed by the TEI Correspondence SIG and based mainly on the new TEI element correspDesc, but in a restricted and reductive manner. To identify persons and places authority controlled IDs are used (e.g. VIAF, GND etc.). The web service collect these TEI XML files automatically and periodically and offers all researchers a central web interface to search for letters in divers scholarly editions and repositories. Via an Application Programming Interface (API) the gathered data can also be queried and retrieved by other web applications, e.g. digital scholarly editions. Thus, researchers can explore letters in scholarly editions as parts of larger correspondence networks.

EMSLEY, Iain (University of Oxford, United Kingdom); DE ROURE, David (University of Oxford, United Kingdom)

Keywords: sonification, hypertext

  • Session: Abstracting the TEI
  • Date: 2015-10-30
  • Time: 09:00 – 10:30
  • Room: Amphi Laprade

Sonification is a complementary technique to visualization that uses sound to describe data. Kramer defines sonification as “the use of nonspeech audio to convey information. More specifically, sonification is the transformation of data relations into perceived relations in an acoustic signal for the purposes of facilitating communication or interpretation.” [13] While providing new opportunities for communicating through the human perceptual and cognitive apparatus, sonification poses challenges with presenting the exploratory patterns in data to the user as it is a less familiar medium for this purpose.

We describe work to sonify variants of Hamlet to aid exploratory textual analysis. The sonification presented focuses on using pitch and tones to help the user listen to differences in the structure between variations of a text or texts encoded in Text Encoding Initiative (TEI) XML. Our approach is inspired by the Hinman Collator, an opto-mechanical device originally used to highlight print variants in Shakespeare texts, whereby visual differences between two texts literally stood out through a stereoscopic effect [5]. Using an audio stream for each text, this project aims to produce a stereo audio image of the text, so creating an audio version of the stereoscopic illusion used in collating machines. The timing and frequencies are extracted for storage and transformation into alternate formats or to repeat the analysis.

We present initial work on XML variants of Shakespeare’s Hamlet using the Bodleian Libraries’ First Folio XML and their earlier work on the Quartos. We extracted document entities such as act, scenes, lines, and stage directions for the analysis. These are viewed as hyperstructures that may be separated from the text for sonification and comparison with other variants. Analytical perceptions can be altered through the presentation of the tones, pitches and icons. Audio displays demand the creator to rethink how structural data is presented to the user, and about the hyperstructures extracted to give potential for conversion of the analysis into hypermedia using visualization as well as sonification. Early results show promise for the auditory comparison.

We look at related work and present the case study. We then consider the use of audio beacons to help the user locate within the document, and discuss the integration with visualization. Finally we look at future work and conclude the paper.

Related Work

Sonification on exploratory data patterns has been explored in several projects. For example, work on stock market data [3,10] discusses the use of volume and pitch to alert to changes in the data, rather than relying on purely visual stimuli. It demonstrates the use of auditory displays for pattern analysis in exploratory data using a rule system, and is closely associated with visualization.

The Listening to Wikipedia project(1) presents and audio and visual display of edits made to Wikipedia pages. Using circles and rule-based sounds, it presents the recent changes feed to the user including new users and the type of user making the edit. This work provides an elegant interface to the user data but it is limited to one stream.

The TEI-Comparator(2) was developed to compare paragraphs and visualize the changes [9, 14] for the Holinshed Chronicles(3) project, illustrating a collation approach applied to TEI. This visualization work does not render the text into audio signals, and it was designed for a particular text. It focuses on the text rather than the editorial structures.

Sonification of hyperstructures is explored in [11], where an authored hypertextual structure is sonified using the techniques of algorithmic composition. In contrast, we present work that develops the notion of sonifying the hyperstructure, or hyperstructures, extracted and transformed from the editorial matter.

Sonifying versions of Hamlet

We present work on creating an auditory display from Shakespeare’s Hamlet. This began with the Bodleian’s work on the First Folio [5] and their earlier work on the Quartos with the British Library.

Initially we convert a selection of TEI XML elements, relating to acts, scenes, stage directions, lines and speaker, into a series of numbers. The process uses the XPointers for the characters to match the speaker to the line. These are read by the sonification software and mapped to relevant tones and sounds before being recorded as a music file, played to the user, or both actions. The different versions of TEI encodings pose challenges to ensure that each play has the same characters encoded and that the encodings can be mapped to the same number via a rule.

figure_1

Figure 1: Example transform of TEI XML structure into sound

This work focuses on an alternative presentation to Hinman’s Collator where two texts are transposed in stereoscope to show the differences between them. Our eyes use variations between images to interpret depth in 3D vision; similarly, our ears use subtle timing and phase variations to establish a stereo stage. Using an audio stream for each text, the project aims to produce a stereo audio image of the text with auditory beacons to guide the user within the stream. Playing a synchronized audio stream per text in each ear helps the listener’s brain to hear any subtle differences between two versions.

Displaying the hyperstructures of the texts such as the speakers of a line element allows the listener to hear whether editorial changes have been made to the textual structure and to hint at variations of the same text.

By way of example, in the 1603 Quarto edition [7] the first stage direction and first lines are:

<stage rend="italic, centred" type="entrance">Enter two Centinels.
	<add place="margin-right" type="note" hand="#ab" resp="#bli">
		<figure>
			<figDesc>Brace.</figDesc>
		</figure>now call&#x0027;d
		<name type="character" ref="#bar">Bernardo</name> <lb/>&#x0026;
		<name type="character" ref="#fra">Francisco</name> &#x2014;
	</add>
</stage>
<sp who="#sen">
	<speaker>1.</speaker>
	<l><c rend="droppedCapital">S</c>Tand: who is that?</l>
</sp>
<sp who="#bar">
	<speaker>2.</speaker>
	<l>Tis I.</l>
</sp>

In the 1605 Quarto edition [8], the stage direction and first lines are:

<stage rend="italic, centred" type="entrance">Enter
	<name type="character" ref="#bar">Barnardo</name>, and
	<name type="character" ref="#fra">Francisco</name>, two Centinels.
</stage>
<sp who="#bar">
	<speaker rend="italic">Bar.</speaker>
	<l><c rend="droppedCapital">VV</c>Hose there?</l>
</sp>
<sp who="#fra">
	<speaker rend="italic">Fran.</speaker>
	<l>Nay answere me. Stand and vnfolde your selfe.</l>
</sp>

Although the sentinels are identified as Barnardo and Francisco in the stage direction, the text and markup specify different characters. In our software, this would create separate sounds for the first line but not the second. The latter line would create the stereoscopic illusion where the first line breaks it.

Auditory Beacons

Acts and scenes provide useful beacons for the listener to understand which section of the text is being presented. As audio is an unfamiliar medium for this work, there is a need to help the listener identify their position within the document structure. Simple auditory icons are used to aid the listener in understanding the presented event, and research is ongoing to improve these.

In early versions of the sonification, the acts and scenes were produced with different instruments and pitches to allow the user to identify them as this element. This means that the user has to be taught what the sound means and how to associate events within the display. The present version of the software uses simple tones. We are considering the development of auditory icons to help identify the type of element event being presented. In [12], the author discusses the debates in musicology about the use of period and modern instruments in the playing of period music. This sets up a tension in the use of sound. As the text may not be modern, what sound should be represented: one that is contemporary to the text or to the user?

The stage element provides greater detail to use within the display. The ‘type’ and ‘who’ attributes help to design the type of sound. The sounds associated with the ‘who’ attribute can be linked to the speakers but present a different issue. The speaker attribute is associated with one person but the stage directions may have more than one person interacting with the direction. This changes the note from being a single note to a chord or progression. The volume for each speaker is slightly raised as they continue speaking, helping the user identify that the speaker has not changed. When comparing two streams, the listeners will identify any textual changes when both tone and volume alter. Using the two parameters of note and volume provides the user with two axes to understand the data.

Visualization

We created an early prototype visualization showing symbolic representations of the events, using the Processing language used for coding in the visual arts(4) The note data was sent to the visualization server to show an abstract image or text based on the note received, displayed in near real-time to the sound. It did aid comprehension of the audio display, but the use of abstract symbols like the circles for speakers, poses the same challenge as the sonification where the symbol must be understood.

figure_2

Figure 2: Early visualisations to aid the sonification

User feedback suggests that further refinement is required to help make the displays more useful. This may include the use of text and being developed for the Web.

Future Work and Conclusion

We have demonstrated the potential of sonification as a tool to help the user identify differences between textual variants. Auditory displays are known in exploring data though new for analytical tools. The medium allows the designer to use multiple parameters simultaneously to add meaning to an event by changing tone, pitch, sound or volume. This presents challenges in finding ways of making the technique understandable.

The use of stereo playback indicates that further work with spatial displays is possible to aid the comprehension of the data with a richer display. The timing data is being written out with the frequencies are the sounds being created. This provides the potential for integration of the TEI data with SMIL duration markup and transformation into HTML Media Fragments so that the text can be displayed in the browser with links to the sound or converted into Music Encoding Initiative(5) to be visualized in a novel fashion.

Words and lines may be auralized using the tone associated with the speaker. The sonification would then render the associated tones. This does pose the issue of how a word is sonified: is it by length or some other metric? The choice element from the Text Encoding Initiative provides the options for an original element and a variation. The sonification would then have to associate a similar tone with the choices. It may be that the original text would be the expected tone given the word change but that the variation is a sharp or minor tone played as a chord.

Further work is needed to create better auditory icons that work across streams and to integrate audio and visual displays. We have not explored this area fully. Contextual questions include the type of sound that would be typical in a dramatic context or physical one, such as the construction of places of performance. It also demands knowledge of the practices of staging. We intend to research the use of the sex attribute of the person element, contemporary auditory icons and conduct user testing.

We believe that the use of sound provides an exciting way of exploring textual structures to determine differences between them as an alternative workflow. The novelty in this area is a major challenge but we strongly believe that it has relevance in the exploration of variants between texts marked up with TEI.

References

[1] Gregory Kramer. 1993. Auditory Display: Sonification, Audification, and Auditory Interfaces. Perseus Publishing.

[2] `The Search for the “Killer Application”: Drawing the Boundaries around the Sonification of Scientific Data`, Supper, Alexandra in The Oxford Handbook of Sound Studies, Pinch, Trevor, and Bijsterveld, Karin 2012. New York: Oxford University Press, New York, p253

[3] Keith V. Nesbitt and Stephen Barrass, Finding Trading Patterns in Stock Market Data, IEEE Computer Graphics and Applications 24:5, IEEE Computer Society, pp 45-55, 2004

[4] Digital facsimile of the Bodleian First Folio of Shakespeare’s plays, Arch. G c.7, First Folio home page, http://firstfolio.bodleian.ox.ac.uk/

[5] C Hinman, Mechanized collation; a preliminary report., Papers of the Bibliographical Society of America 41 (1947): 99-106.

[6] Smith, Steven Escar. 2000. “‘The Eternal Verities Verified’: Charlton Hinman and the Roots of Mechanical Collation.” Studies in Bibliography 53. 129-62.

[7] The tragedy of Hamlet Prince of Denmarke: an electronic edition, Hamlet, First Quarto, 1603. British Library Shelfmark: C.34.k.1, http://www.quartos.org/XML_Orig/ham-1603-22275x-bli-c01_orig.xml

[8] The tragedy of Hamlet Prince of Denmarke: an electronic editionHamlet, Second Quarto Variant, 1605. British Library Shelfmark: C.34.k.2, http://www.quartos.org/XML_Orig/ham-1605-22276a-bli-c01_orig.xml

[9] The Holinshed Project: Comparing and linking two editions of Holinshed’s Chronicle, James Cummings and Arno Mittelbach, International Journal of Humanities and Arts Computing. Volume 4, Issue 1-2, Page 39-53, ISSN 1753- 8548, Available Online October 2010, http://dx.doi.org/10.3366/ijhac.2011.0006

[10] Keith V. Nesbitt and Stephen Barrass, of a Multimodal Sonification and Visualisation of Depth of Market Stock Data,. Nakatsu and H. Kawahara (eds), International Conference on Auditory Display (ICAD), 2002 , pp2—5 [11] De Roure, David C., Cruickshank, Don G., Michaelides, Danius T., Page, Kevin R. and Weal, Mark J. (2002) On Hyperstructure and Musical Structure. The Thirteenth ACM Conference on Hypertext and Hypermedia (Hypertext 2002), Maryland, USA, 11 – 15 Jun 2002. ACM, 95-104.

[12] Holden, Claire 2012. Recreating early 19th- century style in a 21st-century marketplace: An orchestral violinist’s perspective. Presented at: Institute of Musical Research DeNote Seminar, Senate House, London, 30 January 2012.

[13] Sonification report: Status of the field and research agenda Prepared for the National Science Foundation by members of the International Community for Auditory Display (1997) by G. Kramer, B. Walker, T. Bonebright, et al., http://sonify.psych.gatech.edu/publications/pdfs/1999-NSF-Report.pdf

[14] Lehmann, L., Mittelbach, A., Cummings, J., Rensing, C., & Steinmetz, R. 2010. Automatic Detection and Visualisation of Overlap for Tracking of Information Flow. In Proceedings I-Know.

Notes

(1) http://listen.hatnote.com/

(2) http://tei-comparator.sourceforge.net/

(3) http://www.cems.ox.ac.uk/holinshed/about.shtml

(4) https://processing.org/

(5) http://music-encoding.org/

GENGNAGEL, Tessa (Cologne Center for eHumanities – CCeH, Germany)

Keywords: art history, iconography, images, digital scholarly editions, hermeneutics

  • Session: Hermeneutics and the TEI
  • Date: 2015-10-31
  • Time: 11:00 – 12:30
  • Room: Amphi Fugier

Although TEI is short for Text Encoding Initiative, reflecting a historical primacy of textual hermeneutics, it has grown to encompass ways of encoding and annotating images as well. Iconographic content is often identified by referencing classification systems such as Iconclass. When it comes to scholarly editing, however, it has to be considered that illustrations are not always a mere accessory to the text but sometimes a crucial part of text-image units, meaning that illustrations are transmitted in a relatively stable, iterative manner alongside the text, showing a variance in pictorial elements reminiscent of textual variance. Attempts have been made in printed editions to construct something akin to a critical apparatus for images but more so than texts, illustrations are dependent on being seen to be understood and cannot be transcribed in the same way.

The digitization of source material and with it the changed parameters for including faksimiles in an edition allow for a re-evaluation of the difficulties involved in editing iconographic programmes. This paper will a) briefly survey the status quo of marking up iconography, including taxonomies, tools and projects, b) discuss the conceptual difficulties of formalizing iconographic descriptions based on the method by Erwin Panofsky, and c) propose a solution for semantically marking up iconographic variants in a machine-readable way, using the TEI. This solution will involve a concept of superstructures that provide a frame of reference and, at the same time, a pattern of organization. The Ascende calve pope prophecies from the 14th century with their particularly interdependent text-image units will serve as an illustrative example throughout.

In conclusion, the paper will address the question of whether and how this solution is applicable to a wider range of use cases, beyond the specific editorial problems it originated from.

GLORIEUX, Frédéric (Université Paris-Sorbonne, France)

Keywords: French litterature, Molière, stand-off annotation, intertextuality

  • Session: Presentation and engagement with the TEI
  • Date: 2015-10-30
  • Time: 16:00 – 17:30
  • Room: Amphi Laprade

La littérature française n’a pas son Dante, son Cervantès ou son Goethe ; si bien que les humanités numériques en français n’ont pas l’équivalent d’un “Shakespeare first folio”. Pour Molière par exemple, des fichiers circulent, généralement repris d’éditions du XIXe, ils ne conservent pas la ponctuation et les majuscules d’époque, pourtant précieux pour retrouver le phrasé original. Par ailleurs, après plus de trois siècles, le texte et l’image de l’auteur sont brouillés et chargés de légendes contradictoires, au gré des goûts pédagogiques et politiques de chaque génération. Le projet Molière du LABEX OBVIL, sous la direction de Georges Forestier, a pour objectif d’établir une édition numérique de référence de Molière, sur laquelle articuler les sources de l’époque (http://www.moliere.paris-sorbonne.fr/base.php), mais aussi, les préfaces, notices, notes, tout le paratexte postérieur jusque 1950. Le texte de Molière sera complètement établi en juin 2015 http://obvil.paris-sorbonne.fr/corpus/moliere/moliere, la numérisation de la tradition critique est entamée, l’infrastructure TEI est en place : production, réseau d’identifiants, prototypes d’interface, et programme d’expériences textométriques (ex : contre la théorie du complot “molière corneille”, http://www.moliere-corneille.paris-sorbonne.fr/). Ce projet pilote inspire des entreprises de même ambition pour La Fontaine et Apollinaire au sein du LABEX OBVIL, avec déjà des différences structurelles notables (variété générique des œuvres, problématiques des traditions), mais aussi, avec le partage de structures de contenu transversales. Des lettres et des poèmes dans les romans, du théâtre cité dans une chronique dramatique, des dictionnaires dans une édition critique ; la confusion des genres peut paraitre exceptionnelle à l’échelle d’un livre, elle est la règle lorsque l’on confronte plusieurs œuvres intégrales. TEI s’impose comme un modèle textuel nécessaire, à la condition de se l’approprier et de l’articuler en modules plus digestes.

HAAF, Susanne (Berlin-Brandenburg Academy of Sciences and Humanities – BBAW, Germany); THOMAS, Christian (Berlin-Brandenburg Academy of Sciences and Humanities – BBAW, Germany)

Keywords: TEI Encoding, Corpus Linguistics, Manuscript Annotation, Interoperability, Digital Humanities

  • Session: Encoding manuscripts in the TEI
  • Date: 2015-10-30
  • Time: 16:00 – 17:30
  • Room: Amphi Fugier

In this presentation we would like to introduce the latest extension to the DTA Base Format (DTABf), which allows for the TEI-conformant annotation of manuscripts. The DTABf is a TEI-subset for the consistent, yet unambiguous annotation of large amounts of historical texts. It has been successfully applied to the 2,558 texts (~14 Mio. tokens) within the DTA corpus. The DTABf has continuously been subject to further adaptations for specific annotation needs, e.g. for scholarly editions or the representation of historical newspapers (cf. Haaf et al. 2015). The latest of these adaptations, DTABf for manuscripts (DTABf-M), is aimed at providing external projects with a fully TEI-conformant tagset for the annotation of historical handwritten sources which can then serve as valuable additions to the DTA corpora. For this purpose the existing DTABf tagset is applied wherever possible (cf. figures 1&2). Extensions were only carried out where demanded by manuscript specific phenomena (cf. figures 3&4; table 1). They were then carefully selected and applied with the established principles of simplicity, consistency and the avoidance of ambiguousness in mind. Consistent encoding of all—printed or handwritten—documents regardless of their original project context leads to a growing set of TEI-encoded, truly interoperable documents in the DTA. In our presentation we will elaborate on the creation process of the DTABf-M, the encoding problems which occured and the adaptations which were finally carried out. We will also discuss possible ways of implementing the proposed DTABf extensions on the schema level. The DTA project is an example for the application of the TEI guidelines to large scale corpora compiled from various sources. In this context as well as in the broader context of CLARIN interoperability and interchange between resources play an important role. The DTABf and its extensions are meant to contribute to this aspect.

HOMENDA, Nicholas (Indiana University Bloomington Libraries, United States of America); PEKALA, Shayna (Indiana University Bloomington Libraries, United States of America)

Keywords: open access publishing, OJS, journals

  • Session: Publishing and the TEI
  • Date: 2015-10-30
  • Time: 11:00 – 12:30
  • Room: Amphi Laprade

The Indiana University Libraries have a long history of using the TEI markup standard to encode and publish electronic texts, but choosing the best publishing platform has been challenging for certain projects. Prior to formally launching an open access journal publishing program in 2008, the Libraries collaborated with two scholarly journals to provide open access publishing using P3 SGML and P4 XML TEI encoding delivered through the DSpace and XTF platforms. Both journals used complex encoding, transformation, and delivery workflows that required copious amounts of custom code and developer time. As these journals aged, the time and effort required to maintain them steadily increased. In 2013, the Libraries began plans to migrate these journals into the Open Journals Systems (OJS) platform while preserving the TEI markup. The success of these migration projects hinged on the ability to leverage the OJS XML Galley plugin, which allows journal managers the ability to upload a customized XSLT file to actively and seamlessly render XML articles into HTML.

Both journals are now publishing using the OJS platform. The Indiana Magazine of History (IMH) (http://scholarworks.iu.edu/journals/index.php/imh/) was successfully launched in OJS in August 2014, and The Medieval Review (TMR) (http://scholarworks.iu.edu/journals/index.php/tmr/) was launched in June 2015. Both journals continue to encode articles in TEI, although for consistency and ease of migration nearly 4000 TMR articles were updated to P5 TEI encoding, along with their ongoing encoding workflow. Publishing in this manner leverages IU Libraries’ strengths in electronic text projects and XML workflows within an easy-to-use, flexible platform that journal editors appreciate. The success of these migrations presents alternative frameworks for future TEI-based XML publishing of open access journals at Indiana University.

Speaker Biographies

Nicholas Homenda is the Digital Initiatives Librarian at Indiana University Bloomington Libraries where he manages digital projects, services, and initiatives in the Digital Collections Services department. Nick has a Master of Science in Information Studies from the University of Texas at Austin and previously worked as a music librarian and an orchestral clarinetist.

Shayna Pekala is the Scholarly Communication Librarian at Indiana University Bloomington, where she oversees the Libraries’ open access publishing services and initiatives. She holds an M.L.S. with a specialization in Digital Libraries from Indiana University Bloomington, and has been involved with digital publishing projects since 2012.

JULOUX, Vanessa (EPHE, France)

Keywords: philology, textmining, methodology, gender studies, collaborative

  • Session: Workflows and the TEI
  • Date: 2015-10-31
  • Time: 11:00 – 12:30
  • Room: Amphi Laprade

Hypatie est un projet ‘open’ science, privilégiant une approche qualitative et quantitative, pour l’étude des relations entre les entités (divines ou mortelles) et l’étude de l’agency, dans les corpus principalement du Proche Orient ancien. Sous leurs formes flexionnelles, les morphèmes se révèlent être des facteurs adéquats pour analyser les relations et déterminer l’agency des entités — qui se détermine principalement selon le nombre d’occurrences. Hypatie se décompose en deux grandes étapes distinctes : données (A) saisies directement par l’utilisateur dans un formulaire 2REleR (“Relevant Reading Elements during Reading”), données (B) encodées dans un fichier XML (contrainte pour l’utilisateur d’utiliser un modèle TEI spécifique “Inventory of Lexemes” [IoL]). Ces données sont stockées dans une SGBDR MySQL. L’extraction de ces données (B) se fait par l’intermédiaire de XSLT, autrement dit il s’agit d’une opération de ‘text-mining’. Ce fichier XSLT comprend une sélection pré-définie de ‘templates’ et de variables (ex. @type=‘verb’, #type=‘interpret’, @ana, <gramGrp>, etc). A chaque mise à jour du ou des fichiers TEI, une nouvelle ‘upload’ sur le serveur est nécessaire à partir d’un formulaire sur Hypatie, afin que Cron puisse être exécuté pour générer un nouveau fichier XSLT ; quand ce processus est terminé, les données sont importées dans MySQL et ajoutées aux données (A). Les morphèmes provenant des fichiers TEI (B) sont comparés aux autres morphèmes enregistrés dans la base de données commune Hypatie — ceux issus des notices bibliographiques, de l’inventaire des corpus non-encodés, et corpus encodés TEI. Les notices bibliographiques ne reflètent pas le point de vue de l’utilisateur sauf ses annotations, à l’inverse de l’analyse des lexèmes liés aux entités, issus des corpus non-encodés ou encodés en TEI. Ainsi, Hypatie favorise selon des variables mesurables, à la fois une étude des relations entre les entités, à la fois une lecture plus précise de l’agency.

LAPIN, Hayim (University of Maryland, United States of America); STÖKL BEN EZRA, Daniel (EPHE, Paris, France)

Keywords: Mishnah, Hebrew, Manuscript, Digital alignment, Morphological markup

  • Session: Tooling and the TEI
  • Date: 2015-10-31
  • Time: 09:00 – 10:30
  • Room: Amphi Fugier

The Mishnah (ca 200 CE/AD) is the underlying text of the Talmud and a central cultural heritage text for Jews. It is also an under-appreciated document for the social and cultural history of the Roman provinces as an extended legal from the Roman Empire composed in a language other than Greek or Latin. Although published many times in many different formats, no critical edition exists. Two projects, CT-Mishnah (mishna.huma-num.fr) and the Digital Mishnah Project (www.digitalmishnah.umd.edu) have merged to form a single TEI-based project with three principal foci. 1.New data: The creation of a dynamic, born digital, critical edition based on automated collation of manuscripts. 2. Access: Providing tools such as morphological tagging, aligned translations, geographical and onomastic data to allow users to understand and investigate the text. 3. Extensibility: Creating a framework for extending our work to a much larger universe of rabbinic works linked by intertextual citation and commentary, to other Semitic literature and other scripture-citing literatures of late antiquity, and building the infrastructure to add our project to the universe of open linked data. To date, we have developed a demo publication that allows readers to page through a transcribed manuscript or generate collated manuscript readings on the fly, provides users (including the project editors!) with a tool to edit the collations, and offers several options for viewing the collations. On a second track, we have integrated morphological markup and translation alignment for an extended portion of a common print edition and for the corresponding portion of one manuscript that is often used as the basis of critical studies, and a series of satellite resources including glossaries and grammatical indices.

LAVRENTIEV, Alexei (ICAR Lab, CNRS, France); STUTZMANN, Dominique (IRHT Lab, CNRS, France)

Keywords: allograph, multi-layer transcription, normalization

  • Session: Codicology and the TEI
  • Date: 2015-10-30
  • Time: 09:00 – 10:30
  • Room: Amphi Fugier

This paper discusses the optimal TEI-format to store information of both textual and graphical nature about letterforms, from theory to implementation. The Oriflamms research project (http://oriflamms.hypotheses.org) aims at establishing an ontology of letterforms in medieval Latin and vernacular writing systems. It uses both previously created and new manuscript TEI-encoded transcriptions, which however use slightly different tagsets. These transcriptions are converted to a common fully TEI conformant format, tokenized at word and character level and automatically aligned to zones in manuscript images. The research needs defining an optimal TEI-format and to reassess the nature of graphical differences in manuscripts and in transcriptions, such as dots, accents, apostrophes (grammar or phonetics [ou/où]), capitals (names, sentence, abbreviation), letter variants (“allographs”, later specialized according to phonetics, [i/j], [u/v], [ss/ß], or not [s/ſ]). Some input transcriptions apply traditional character normalization rules, other not. Several solutions may be used at word or letter level (<choice>/<orig>/<reg> allowed in <w>, not in <c>): normalised/imitative/neutral transcriptions with attributes/pointers/elements to give graphical/semantic/phonetic information. Yet, the ‘perfect’ format needs ‘perfect’ information, not provided by extant transcriptions. This paper presents more use cases and an in-depth discussion of encoding strategy and methodological underpinnings. Need for consistency, progressive enhancement, and backward compatibility raises the issue of inline/stand-off markup and of defining a usable TEI-compliant format. It will evidence the possible role of the <g> elements (in association with <glyph>). Its use can be extended from representing “a glyph, or a non-standard character” (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-g.html) to distinguish the allograph of the source from the “regular” character in the edition (<g ana=”ori:dipl-small”>D</g>ieu), even if both have a corresponding Unicode code point, so as to adapt the view according to the edition scope.

LEBLOND MARTIN, Sylvaine (Chaire ITEN Unesco, France)

Keywords: TEI, MEI, musique orale, médiation numérique, norme

  • Session: Encoding orality and performance in TEI
  • Date: 2015-10-31
  • Time: 09:00 – 10:30
  • Room: Amphi Laprade

Depuis 2013, d’abord avec le projet HumanitéDigitMaghreb, puis avec M&TEIeuroMed, nous avons entrepris d’encoder en MEI des corpus musicaux du Maghreb.

Or, le patrimoine musical du Maghreb fait partie du patrimoine immatériel de l’humanité dans le sens où sa culture musicale est essentiellement orale et que, si cette musique est abondamment commentée par des textes théoriques arabes, elle n’est pas notée musicalement à l’origine.

Cependant, nous avons constaté que, depuis la fin du XIXe siècle, plusieurs tentatives intéressantes ont été faites, d’une part, de noter et de publier cette musique, d’autre part, de réaliser des enregistrements sonores des divers concerts au Maghreb.

Ce travail d’investigation et ce nouveau type de conservation de cette musique orale a permis la constitution d’un matériel important de ressources musicales autorisant les musicologues et ethnomusicologues à approfondir leurs recherches scientifiques dans l’univers du patrimoine musical singulier de Tunisie, du Maroc et d’Algérie.

Notre problématique a donc été de considérer le rôle de la notation musicale dans ce contexte particulier et comment le répertoire musical du Maghreb pouvait bénéficier de l’encodage MEI et TEI.

Par exemple, cette année 2015, après discussion avec Perry Roland et Andrew Hankinson de la communauté MEI, notre équipe a développé une stratégie de relation entre la partition musicale et différents enregistrements sonores de cette partition, inscrits directement dans les fichiers MEI, ceci afin de respecter la variété d’interprétations possibles d’une même pièce. En effet, plusieurs caractéristiques musicales d’une pièce peuvent changer d’un pays à l’autre, en plus d’autres critères qui peuvent s’ajouter.


Aujourd’hui, dans le cadre du projet MEI AE NORMA de la Chaire ITEN Unesco, nous souhaitons expérimenter l’articulation TEI/explications théoriques et MEI/partition, afin de refléter ainsi l’imbrication des aspects textuels et sonores de la musique et tenter d’approcher différemment la structuration interne des corpus musicaux originaux.

Courte Biographie

Sylvaine Leblond Martin est Docteur en Sciences de l’Information et de la Communication et compositrice de musique contemporaine, agréée du Centre de musique canadienne. Elle occupe un poste d’ingénieure de recherches dans le programme IDEFI CréaTIC de l’Université Paris 8 et est membre de la Chaire ITEN Unesco où elle assiste Jean-Pierre Dalbéra dans les projets de « médiation et de valorisation du patrimoine culturel » et dans les projets de musique.

Mots clés

TEI, MEI, musique orale, médiation numérique, norme.

LIÉGEOIS, Loïc (Laboratoire Ligérien de Linguistique, Université d’Orléans,France); ÉTIENNE, Carole (ICAR, CNRS, Lyon, France); PARISSE, Christophe (Modyco/INSERM, CNRS, Paris Ouest Nanterre, France); CHANARD, Christian (LLACAN, CNRS, Villejuif, France); BENZITOUN, Christophe (ATILF, Université de Lorraine, France)

Keywords: oral corpora, multimodal corpora, corpora interoperability, corpora exchange, TEI improvement

  • Session: Encoding orality and performance in TEI
  • Date: 2015-10-31
  • Time: 09:00 – 10:30
  • Room: Amphi Laprade

Linguistic research (for example about language development, interaction, sociolinguistics, gestural languages, language typology) has already produced quite a large number of corpora. Moreover, most of them are available for research and learning purposes. The transcripts often contain [8], at the same time, annotations of multiple levels to describe orthography, phonology, prosody, gesture, situation, syntax etc. Such a diversity requires a large set of metadata to describe a large amount of corpora characteristics (i.e. speakers, setting, languages, annotations…). One of the main specificity of work on oral and gestural languages is that it cannot be done without using simultaneously the original media (the sound and/or the video) and the actual transcript of the data, taking into account a subset of metadata. Researchers use dedicated softwares to annotate data (such as ELAN, CLAN, PRAAT, Transcriber…) and most of the time combine them, which causes problems in the management of these specific tools formats. The goal of the IRCOM workgroup about oral and multimodal corpora was to make a proposal about a pivot format that could be used to make it easier to share existing (and future) data. The TEI emerged as a natural choice for creating this pivot format. Indeed, the TEI feature includes both a header dedicated to the metadata and a text section dedicated to the transcript with a large choice of tagset [5, 9]. Some members of our group had already some experience in using the TEI format in their own projects [1, 4, 6] and others plan to do so [2, 3, 7]. However, a common initiative will facilitate the choice and the documentation of a common subset of elements and deal with missing elements or using elements initially made for written corpora. Our workgroup contributes to ISO TEI European group and we have already made some suggestions to improve some elements.

Bibliography
  • [1] Groupe ICOR (Bert, M., Bruxelles S., Etienne C., Jouin-Chardon E., Lascar J., Mondada L., Teston S. Traverso V.), 2010, Grands corpus et linguistique outillée pour l’étude du français en interaction (plateforme CLAPI et corpus CIEL), Pratiques, 147-148, 17-35.
  • [2] Chanard, C. (2006). Base de données des “Alphabets de 200 langues africaines” [http://sumale.vjf.cnrs.fr/phono].
  • [3] Debaisieux, J.-M., Benzitoun, C. & Deulofeu, J. (in press). Le projet ORFEO : un corpus d’étude pour le français contemporain, In Avanzi M., Béguelin M.-J. & Diémoz F. (eds), Corpus de français parlés et français parlés des corpus, Cahiers Corpus.
  • [4] Etienne,C. (2009). La TEI dans le Projet CLAPI, Corpus de langues parlées en interaction. TEI Council, Lyon
  • [5] Liégeois, L. (2013). De l’analyse au partage des données, quel(s) format(s) choisir ? L’exemple d’un corpus d’interactions parents-enfant. In Damiani M., Dolar K., Florez-Pulido C., Loth R., Magnier J. & [1] Pegaz A. (dir.) Traitement de corpus (Actes de Coldoc 2012). Paris : Modyco, 128- 142. [http://hal.archives-ouvertes.fr/hal-00850172]
  • [6] Liégeois, L., Chanier, T. & Chabanal, D. (2014). Corpus globaux ALIPE : Interactions parents-enfant annotées pour l’étude de la liaison. Nancy : Ortolang. [http://hdl.handle.net/11041/alipe-000853]
  • [7] Mettouchi, A. & Chanard, C. (2010) “From Fieldwork to Annotated Corpora : the CorpAfroAs Project”, Faits de Langue – Les Cahiers, 2, 255-265. [http://corpafroas.huma-num.fr/fichiers/Mettouchi_Chanard.pdf]
  • [8] Morgenstern, A. & Parisse, C. (2007). Codage et interprétation du langage spontané d’enfants de 1 à 3 ans. Corpus, 6, 55-78.
  • [9] Parisse, C. & Morgenstern, A. (2010). A multi-software integration platform and support for multimedia transcripts of language. LREC 2010, Proceedings of the Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, 106-110.

NYSTRÖM, Eva (Uppsala University, Sweden); GRANHOLM, Patrik (Uppsala University, Sweden)

Keywords: manuscript description, cataloguing, digitization, ODD schema, eXist-db

  • Session: Encoding manuscripts in the TEI
  • Date: 2015-10-30
  • Time: 16:00 – 17:30
  • Room: Amphi Fugier

In this paper we will give a presentation of the ongoing project to catalogue and digitize all Greek manuscripts in Swedish libraries and archives (120 manuscripts, ca 36 000 pages). The Greek manuscripts, in the form of bound parchment and paper volumes, include a rich and diverse collection of texts from antiquity and the Byzantine period. They originate mainly from the Byzantine cultural area from the tenth century onwards, but some are Renaissance or early modern manuscripts from Western Europe. Manuscript research and codicology is a relatively young and sprouting field of research. An important, and fairly recent, reorientation is to pay more attention to the codicological units from which most manuscripts are composed (Andrist et al. 2013). A manuscript may have been written and composed, put together, and perhaps taken apart, in several stages and processes, and the description needs to reflect this historical stratigraphy of production and use. In the present project the manuscript descriptions are encoded in TEI using a customised ODD schema which provides cataloguing guidelines with examples, and documentation of the elements and attributes used. The manuscript descriptions are structured around the notion of codicological units: the intellectual content, physical description, and history, where applicable, of each unit are described in separate <msPart> elements, whereas information common to all units, e.g. the binding, provenance, and bibliography, is described outside the <msPart> elements, directly under the <msDesc> element. The manuscript descriptions and the images are hosted on the project web site. The framework for the catalogue is built with eXist-db which provides powerful search and browsing capabilities. To facilitate user-friendly orientation, the images are displayed adjacent to the manuscript description, with links to specific folios of the digitized manuscripts. The TEI-files, schema, style sheets, and source code for the eXist-db interface are also stored on GitHub. Speakers Eva Nyström Ph.D. in Greek (Uppsala University). Specialized in codicology and Byzantine book history. Has also worked as a cataloguer of early modern manuscripts and of early printed books and bindings. In the present project mainly responsible for the cataloguing and encoding of manuscript descriptions. Patrik Granholm Ph.D. in Greek (Uppsala University). Specialized in textual criticism and manuscripts studies. Mainly responsible for the technical implementation in the present project. Knowledgeable in web design, XML, XSLT, XQuery, and Linux server administration.

Bibliography

PIERAZZO, Elena (University ‘Stendhal’ Grenoble 3)

Keywords: XML, modeling, future.

  • Session: Interoperability and the TEI
  • Date: 2015-10-29
  • Time: 11:00 – 12:30
  • Room: Amphi Laprade

Since its beginning, the TEI has been expressed in a markup language: first SGML, then XML. This was for very good reasons: long term preservation, flexibility, independence from hardware and software and the possibility of using existing generic tools and languages. Yet the TEI has also been constrained by this choice: every new development must allow for the limits of the technology of choice. One of the most deplored issues connected to the use of XML is, of course, the problem of overlapping hierarchies, which makes the TEI cumbersome in some cases, and unusable in some others. Despite our best efforts, the TEI is still largely dependent on the technology it is built on. In the past decade the TEI has developed into a powerful modeling tool; it has provided new ways to talk and think about texts, and to imagine what we can do with them. As Allen Renear put it “the TEI succeeded […] in […] the development of a new data description language that substantially improves our ability to describe textual features, not just our ability to exchange descriptions based on current practice” (2005). However, the TEI-as-model conflates two different types of models: a conceptual model of text and textuality, and also its implementation in the shape of an XML schema. This has enabled the TEI’s establishment among scholars and practitioners, but it has also prevented it from having a deeper influence on both scholarship and implementation. The next few years will be crucial for the survival and expansion of the TEI: in order to survive and overcome the new challenges that come with the fast-evolving world of data representation it will have to part ways with XML as a sole technological implementation and while becoming more abstract offer concrete solutions for those problems that have accompanied its whole life.

PYTLIK ZILLIG, Brian L. (Univ of Nebraska-Lincoln, United States of America)

Keywords: tei, xslt, svg, visualization, drama

  • Session: Abstracting the TEI
  • Date: 2015-10-30
  • Time: 09:00 – 10:30
  • Room: Amphi Laprade

As the number of texts in TEI/XML collections has grown, so too have the ways in which they can be analyzed and visualized. The author’s recent two-fold investigations into: (1) part-of speech (POS) n-gram sequences in a selection of Early Modern Drama texts have yielded data about differences in patterns of usage between the playwrights, and (2) showed promise in representing the tree-like structure of each work. Besides its use in transforming TEI, XSLT represents an excellent tool for querying texts. Moreover, with XSLT results can then be represented in SVG (another application of XML) for display and analysis. The present goal is to test this approach with the works of five early modern dramatists: William Shakespeare, James Shirley, Ben Jonson, Thomas Middleton, and Christopher Marlowe. Tanya Clement has written, in what may be an understatement for large corpora that would take months or longer to read, “Sometimes the view facilitated by digital tools generates the same data human beings . . . could generate by hand, but more quickly.” In the case of n-gram sequences, both word and POS sequences would be very difficult indeed to create with speed and consistency. And as Dana Solomon has observed, “[d]ue in large part to its often powerful and aesthetically pleasing visual impact, relatively quick learning curve … and overall ‘cool,’ the practice of visualizing textual data has been widely adopted by the digital humanities.” All data in the current research were generated and graphically rendered by XSLT, with part-of-speech tagging accomplished by Morphadorner, and application developed by Philip Burns at Northwestern University. Texts were adorned at Northwestern, but mostly originating in the work of the Text Creation Partnership. Martin Mueller’s assistance was key, as he developed NUPOS, the POS-tagging scheme used by Morphadorner, and offered additional texts for this work.

ROJAS CASTRO, Antonio (Universitat Pompeu Fabra, Spain)

Keywords: Critical Apparatus, Version, Spanish Literature

  • Session: Textual variants and the TEI
  • Date: 2015-10-30
  • Time: 14:00 – 15:30
  • Room: Amphi Laprade

The TEI has recently made great improvements to represent manuscripts and genetic editions but the module devoted to the Critical Apparatus has not been reviewed since its creation. The Guidelines neither provide a detailed account on how to encode born digital critical apparatus that contain all kind of variants such as scribal errors, authorial variants, omissions, additions, deletions, substitutions, and conjectures. The encoding of Góngora’s Solitudes might offer the possibility to review many of the issues that scholarly editors must face in creating a non preexisting apparatus. This work is a Spanish long poem (about 2100 verses) that was composed from 1612 to 1626 across nine stages. The base text of the edition is the definitive and longest version of the poem contained in the Chacón manuscript whereas the shortest is only 700 verses. Since there are no autographs the only remaining evidences are manuscripts and print editions that transmit authorial variants, scribal errors and different extents. Should the editor edit two different texts according to each version? Or should the editor try to relate all the versions in the critical apparatus instead? Which are the pros and cons of each approach? This paper, however, doesn’t present the results of the encoding as it is a work in progress project. Its aim is to offer some reflections on the ongoing process and a few suggestions about how to reach an agreement on the encoding of born digital apparatus and the editing of different versions of the same work.

SANTIN, Eleonora (CNRS – UMR 5189 HiSoMA, France)

Keywords: verse module; EpiDoc; epigraphy; epigraphic poetry

  • Session: Tooling and the TEI
  • Date: 2015-10-31
  • Time: 09:00 – 10:30
  • Room: Amphi Fugier

The TEI schema has been used for editing verse on several projects (cf. a partial list in Gonzáles Blanco 2014). Different encoding modes have been chosen for each of these programs, within the rich set of tags offered by the TEI verse module. To this day, the effectiveness and applicability of the TEI schema have been tested on medieval, modern and contemporary poems, but rarely on a corpus of ancient epigraphic texts. Therefore, the project Musa epigraphica : new approaches for studying and publishing epigraphic poetry, which I’m leading and which is just at its early stage, opens up an almost uncharted testing area (for other electronic editions of epigraphic poetry see GVCyr, in progress, and the CLE Hispaniae project, completed).

Like any edition of texts written on a three-dimensional surface, the digital edition of verse inscriptions mostly raises issues related to the physical dimension of the medium itself. We discussed said issues, and tried to solve them, in a recent paper (Morlock-Santin 2014). But, while laying the groundwork to define a suitable encoding schema for a specific category of objects bearing text, the basic question that is to be raised is one of hermeneutic nature. How can we make a TEI edition a place for knowledge creation, as well as a research tool that would help researchers to better answer the scientific matters raised by the ancient “street poetry”? (Panciera 2012).

The focus of my paper will be, on the one hand, to draw up a report regarding the current state of the TEI-EpiDoc subset in the matter of the verse inscriptions encoding, in order to identify any potential gaps, and to report potentially helpful additions for the encoders. In other words, my intention is to submit a rational proposal to improve the current schema (8.21), which would go further than a simple request asked in a thread. On the other hand, as a specialist of a particular branch of epigraphic research and as a TEI user, my wish would be to determine, within the scientific literature, the major topics about the epigrammatic genre, and to transpose them into a searchable TEI edition. Due to limited speaking time, I will only address the issues tied to the mark-up of the layout and text formatting, especially the graphic features adopted to display the literary nature of the epigraphic text and convey it to the reader/viewer, i.e. :

  • 1. Line/verse connection
  • 2. The link between rhythmic breaks (caesuras and dieresis) and line breaks (Agosti 2010);
  • 3. The layout of the text on its support: columnar layout; indented pentameter (Lougovaya 2012), indentation of other metric segment (clauses or hemistichs);
  • 4. The use of dividers, multifunctional symbols and vacat, with a significant function to mark the metric or textual structure (Bodel 2012; Monella 2013);
  • 5. The imitation of calligraphic style of literary papyrus.

Each question will be illustrated by one or more examples.

My presentation does not pretend to give conclusive answers, but rather to establish the terms of a debate in front of an experienced audience, and to suggest some possible solutions.

Selective bibliography
  • Agosti 2010 = Gianfranco Agosti, « Eisthesis, divisione dei versi, percezione dei cola negli epigrammi epigrafici di età tardo antica », Segno e testo, Cassino, 2010.
  • Bodel 2012 = John Bodel, « Paragrams, punctuation, and system in ancient Roman script », Stephen D. Houston, ed., The Shape of Script, Santa Fe, New Mexico, 2012.
Biography

Dr. Eleonora Santin is epigrapher and philologist. She works in Lyon as researcher at the CNRS (UMR 5189 HiSoMA). She is member of the team which works on the epigraphy and the history of the ancient Thessaly. Her PhD dissertation in Ancient History (University of Rome La Sapienza) focuses on the funerary epigrams of Thessaly and the new epigraphic group of signed epigrams within the larger category of artists’ signatures. Her primary research interests are the cultural history of ancient Greek society, in particular the authorship questions, the epigraphic poetry, the digital edition of inscriptions. She is author of a book, Autori di epigrammi sepolcrali greci su pietra (2009), of about ten peer reviewed publications and she is coeditor of the proceedings L’Épigramme dans tous ses états : épigraphiques, littéraires, historiques : actes du colloque international, 3-4 Juin 2010, ENS de Lyon, Lyon, ENS de Lyon Éditions (forthcoming).

Personal web-page : http://www.hisoma.mom.fr/annuaire/santin-eleonora

SCHOLGER, Martina (Centre for Information Modelling – Austrian Centre for Digital Humanities, University of Graz, Austria)

Keywords: Genetic Editing, Semantic Web, Ontologies, Art History, Digital Edition

  • Session: Encoding orality and performance in TEI
  • Date: 2015-10-31
  • Time: 09:00 – 10:30
  • Room: Amphi Laprade

The contribution will present a digital, genetic and semantically enriched edition of the notebooks by the Austrian artist Hartmut Skerbisch (1945-2009). Digital scholarly editions are a widely neglected method in art history. Hence, a focus of this project is on applying edition methods to art historical source material and demonstrating the value of handwritten sources as relevant primary sources for art historical research. Additional emphasis lies on the use of semantic technologies which allow for revealing interconnections between individual entities.

The goal is to reconstruct the artist‘s association processes in the course of his developing individual artworks, exhibitions and events. Therefore it is essential to take a closer look at the artist’s inspirations – from literature, music and the visual arts – and to trace these influences in the notebooks. Thus, it will be possible to demonstrate how a specific idea evolved and changed over time, paying special attention to the media transition from ephemeral idea to text to manifestation. A thematic and chronological order is applied to the notebook entries and text fragments, sketches and formulas from the notebook are linked to actual works: a digital representation is best suited if not indispensable for dealing with such witnesses.

The notebooks were annotated using the TEI, with special consideration given to the recommendations for editing origination processes. However, this project is not merely concerned with the genesis of the text itself, but with the development of artists’ ideas. For semantic enrichment, annotated entities are linked to formal ontologies based on the artist’s inventory. The digital repository GAMS supports the presentation of the TEI encoded text in different forms and comes with a triple store for RDF representations of the content.

The combination of these methods and technologies will help to reconstruct the artist’s association processes and reveal the genesis of his work.

Bibliography

Brüning, G., Henzel, K. and Pravida, D. (2013). Multiple Encoding in Genetic Editions: The Case of ‘Faust’, Journal of the Text Encoding Initiative. http://jtei.revues.org/697 (accessed 4 May 2015).

Burnard, L., Jannidis F., Pierazzo, E. and Rehbein, M. (2008-2013), An Encoding Model for Genetic Editions. http://www.tei-c.org/Activities/Council/Working/tcw19.html (accessed 4 May 2015).

De la Iglesia, M. and Göbel, M. (2013). From entity description to semantic analysis: The case of Theodor Fontane’s notebooks. In: Ciotti, F. and Ciula, A. (eds.), The Linked TEI: Text Encoding in the Web. TEI Conference and Members Meeting 2013, Rome: UniversItalia, 24-29.

Fenz, W. (1994). Hartmut Skerbisch. Werkauswahl 1969-1994. Graz: Neue Galerie.

Pierazzo, E. (2009). Digital Genetic Editions. The Encoding of Time in Manuscript Transcription. In: Deegan, M. and Sutherland, K. (eds.), Text Editing, Print and the Digital World. Farnham: Ashgate, 169-186.

Stigler, J. and Steiner, E. (2014-2015). GAMS and Cirilo Client. Policies, documentation and tutorial. http://gams.uni-graz.at/doku (accessed 4 May 2015)

Vogeler, G. (2014). Modelling digital edition of medieval and early modern accounting documents, digital humanities 2014 (Lausanne) http://dharchive.org/paper/DH2014/Paper-181.xml (accessed 4 May 2015)

Biography

Martina Scholger is scientist at the Centre for Information Modelling – Austrian Centre for Digital Humanities at the University of Graz. Currently, she is working on her PhD project “Hartmut Skerbisch – Artists’ notebooks as a digital genetic and semantically enriched edition”. In addition to teaching data and text modelling for humanities students, she is responsible for the conceptualization and implementation of digital editions in various cooperation projects. Since 2014, she has been a member of the Institute for Documentology and Digital Editing (IDE).

SCHOPPER, Daniel (Austrian Academy of Sciences, Austrian Centre of Digital Humanities – ACDH); BOWERS, Jack (French Institute for Research in Computer Science and Automation – INRIA); WANDL-VOGT, Eveline (Austrian Academy of Sciences, Austrian Centre of Digital Humanities – ACDH)

Keywords: lexicography, LOD, non-standard languages

  • Session: Interoperability and the TEI
  • Date: 2015-10-29
  • Time: 11:00 – 12:30
  • Room: Amphi Laprade

This paper discusses the standardisation process of heterogeneous dialectal data on the example of the Database of Bavarian dialects of Austria (dboe). The dboe is the digital representation of the most extensive analogue collection (estimated 4 mio unique paper slips) of the Bavarian dialects of the Habsburg monarchy (status 1911) and Austria from the beginnings of German language (about 800) until the late 20th century. Originally (1993-) built as a TUSTEP database to support the compilation of the Dictionary of the Bavarian dialects in Austria, it is currently being transformed into a tightly interlinked LOD-resource to create multilingual access via concepts (project exploreAT! – exploring Austria’s culture through the language glass) in the framework of the European Network of e-Lexicography ENeL. Although the modular architecture of the TEI has been widely applied to non-standard linguistic datasets, the dboe’s structure leads into areas where the Guidelines have not been quite tried and tested yet: It is a corpus in a broad sense of the term, in that it contains primary material of the bavarian dialects, yet it is constructed by drawing on a variety of types of sources, each requiring its own model of data and metadata. Being extracted from a larger text corpus of various sources (questionnaires, literary texts among others) the data in the database itself is both highly fragmentary, annotated and interlinked. Representing the hybrid nature of the database in a TEI-conformant model proves challenging – yet it offers good opportunities to do TEI+LOD-experiments, a combination which is likely to increase in demand as semantic web technologies and resources continue to evolve. Focusing on the dialectal core data exemplified through names for living organisms, the authors discuss a use case in this emerging area that will help to adapt TEI-compatible resources to the LOD paradigm.

SEGUY, Robin (University of Pennsylvania, United States of America)

Keywords: Indexing; Reference; Hermeneutics; Proper names; Pound, Ezra

  • Session: Hermeneutics and the TEI
  • Date: 2015-10-31
  • Time: 11:00 – 12:30
  • Room: Amphi Fugier

La création d’index terminologiques est, au cours d’un processus éditorial standardisé, une tâche qui, si elle ne va pas entièrement de soi – comme en témoigne sa délégation, fréquente dans le monde de l’édition, à des professionnels –, reste néanmoins considérée comme peu problématique. Qu’il s’agisse de noms propres (dont les questions de définition ou d’extension, qui firent les beaux jours de la philosophie analytique de Frege à Kripke, sont plus ou moins en déshérence), de matières ou d’objets, ce sont le plus souvent des critères implicites qui président à l’élaboration des index, qui ne se voient justifiés – si jamais – que par quelques lignes allusives. Seul le champ de le lemmatisation, objet d’un effort soutenu dans la recherche linguistique, semble échapper à ce pragmatisme intuitif. Aussi, au seul niveau de l’indexation des noms propres, la mise en œuvre des paradigmes ontologiques que présuppose la TEI (existence de « personnes », division des noms personnels en catégories clairement définies, opposition de lieux « politiques » et « naturels », agencement des noms en catégories hiérarchisées), semble offrir une perspective de désambiguïsation propice à une exploitation rationnelle des corpus, et à l’élaboration d’outils d’analyse ad hoc. Notre propre recherche nous a conduit à mettre à l’épreuve la grille de lecture « référentielle » de la TEI sur un exemple littéraire relativement complexe : « The Cantos » d’Ezra Pound. A travers ce cas-limite d’une épopée mettant en scène plus de 9000 noms propres, se dessine une série d’interrogations qui, outrepassant une problématique de la référence (vraie ou fausse, assignable ou non), appellent à autant de choix herméneutiques, conditionnant à maints égards la lecture du poème. C’est sur un certain nombre de ces amphibologies et frissonnements d’identités (historiques, linguistiques, graphématiques, idiolectaux…), et sur le traitement qu’il convient de leur accorder, que nous nous interrogerons ici.

STOKES, Peter Anthony (King’s College London, United Kingdom)

Keywords: Modelling, codicology, documentary editions, manuscript studies

  • Session: Codicology and the TEI
  • Date: 2015-10-30
  • Time: 09:00 – 10:30
  • Room: Amphi Fugier

A not uncommon challenge, particularly in medieval studies, lies in trying to reconstruct previous order(s) of pages in a book which has been disbound and rearranged at one or more points in its history. One example is the Exon Domesday book, produced in southern England probably around the 1080s. It is clear that the pages we have today are not in their original order, and indeed they have probably been rearranged more than once. These changes are a subject of considerable interest because they may provide important clues about the book’s production and use. For instance, did these arrangements reflect the sequence of lands that were visited when compiling the contents? The owners of the lands? The activity of particular scribes? The structure of other related manuscripts such as Great Domesday Book? To help scholars address these questions, the ‘Conqueror’s Commissioners’ project is creating a digital edition of Exon Domesday that allows users to dynamically change the order of the pages and immediately see how aspects of the text and its composition change as a result. However, the set of possible orderings is not arbitrary but is constrained by the physical makeup of a book. For instance, two pages which are part of a single sheet of parchment must always have been so; one cannot insert a page between the recto and verso (front and back) of the same sheet; a single sheet of parchment cannot have two ‘hair’ sides, and so on. In this paper I will therefore present some example research contexts for which this is relevant, introduce a new codicological model that includes these constraints, and then show how the TEI can be used with Schematron to build such an edition in practice.

TOMASEK, Kathryn (Wheaton College, Norton, Massachusetts, USA); VOGELER, Georg (Karl-Franzens-Universität Graz, Austria); PINDL, Kathrin (University of Regensburg, Germany); SPOERER, Mark (University of Regensburg, Germany)

Keywords: books of account, semantically enriched, digital edition

  • Session: Codicology and the TEI
  • Date: 2015-10-30
  • Time: 09:00 – 10:30
  • Room: Amphi Fugier

Account books have long been used as primary sources for economic and social history since they allow scholars to explore the development of economic behavior on both a macro­ and microstructural level. In the field of digital editing the TEI has become a standard for transcription and encoding of multiple aspects of the texts. Accounts have only recently entered the field of digital scholarly editions[1]. They pose new problems to be solved in particular in encoding the content of these documents and not only the text.[2] The recently granted MEDEA project (funded by DFG/NEH) will bring together economic historians, scholarly editors, and technical experts to discuss emerging methods for semantic markup of account books.

The MEDEA project supports the development of broad standards for semantically enriched digital editions of accounts, as common data models can help to create scholarly, verifiable, and exchangeable data. One main reference point for the development of standards for the production of such data is the Guidelines of the Text Encoding Initiative (TEI).

The core intentions of MEDEA which would be presented at the TEI conference include:

  • How might we model the economic activities recorded in historical documents? What models of bookkeeping were followed historically and how can they be represented formally? Are data models developed for modern business reporting helpful?
  • Can we establish common resources on metrics and currencies or even the value of money which can be reused in other projects? Is it possible to build common taxonomies of commodities and services to facilitate the comparison of financial information recorded at different dates and places? That is, can we develop references on the order of name authorities and standards for georeferencing?
  • How might we integrate topological information of the transcription with its financial interpretation? Is the “table” an appropriate method? What possibilities are offered by the TEI Manuscripts module and use of the tei:zone element?

The first MEDEA workshop will be held the weekend preceeding the TEI Member Meeting in Lyon, and we would be pleased to offer our TEI colleagues a late­breaking report in the form of either a paper or a poster.

[1] See for example Comptes des châtellanies Savoyardes <http://www.castellanie.net/> and the Jahrrechnungen der Stadt Basel 1535­1610 <http://gams.uni­graz.at/srbas>

[2] See Kathryn Tomasek and Syd Bauman, « Encoding Financial Records for Historical Research », Journal of the Text Encoding Initiative [Online], Issue 6 | December 2013, Online since 22 January 2014, connection on 20 July 2015. URL : http://jtei.revues.org/895 ; DOI : 10.4000/jtei.895; Georg Vogeler: « Warum werden mittelalterliche und frühneuzeitliche Rechnungsbücher eigentlich nicht digital ediert? », Zeitschrift für Digitale Geisteswissenschaften 1, Beta­Version March 2015, connection on 20 July 2015, URL: http://www.zfdg.de/warum­werden­mittelalterliche­und­fr%C3%BChneuzeitliche­rechnungsb%C3%BCch er­eigentlich­nicht­digital­ediert

Short biographies

Kathryn Tomasek
Associate Professor of History
Co­Director, Wheaton College Digital History Project
Wheaton College
Norton, Massachusetts
@KathrynTomasek tomasek_kathryn@wheatoncollege.edu

Kathryn Tomasek has been teaching undergraduates using TEI since 2004, and exploring the use of TEI markup for financial records since 2009. Tomasek was the Project Director for a Start­Up Grant from the NEH in 2011. And she was a member of the American Historical Association’s Committee on the Professional Evaluation of Digital Scholarship by Historians in 2014­2015.

Ass.­Prof. Dr. Georg Vogeler
Zentrum für Informationsmodellierung ­ Austrian Centre for Digital Humanities Universität Graz
Elisabethstr. 59 / III A­8010 Graz georg.vogeler@uni­graz.at

Georg Vogeler wrote his PhD on late medieval tax accounting in Germany. He is envolved in the field of Digital Scholarly Edition since 2006. He is the technical partner of the digital edition of the Jahrrechnungen der Stadt Basel (http://gams.uni­graz.at/srbas). He has tought many courses on the use of the TEI for digital scholarly editions, is technical director of the monasterium.net project, supervisor in the DiXiT project (EU7th Framework, http://dixit.uni­koeln.de/), and is founding member of the Institut für Dokumentologie und Editorik (http://www.i­d­e.de).

Prof. Dr. Mark Spoerer / Kathrin Pindl M.A. Universität Regensburg
Lehrstuhl für Wirtschafts­ und Sozialgeschichte
93040 Regensburg
mark.spoerer@ur.de / kathrin.pindl@ur.de

Mark Spoerer is a full professor of Economic and Social History at the University of Regensburg.

Kathrin Pindl works as a pre­doctoral research assistant at the Chair of Economic and Social History of the University of Regensburg. Her research interests include pre­modern living standards in European regions, group­specific patterns of consumption, and modeling economic activities as recorded in books of account.

ZANCARINI, Jean-Claude (Triangle, France); GEDZELMAN, Séverine (Triangle, France)

Keywords: Translation, Machiavelli, Text Analysis, Parallel Corpora, Dictionary

  • Session: Workflows and the TEI
  • Date: 2015-10-31
  • Time: 11:00 – 12:30
  • Room: Amphi Laprade

We will present some results of the translation comparison tool “HyperMachiavel” (a web version of the corpus and its annotations is available at : http://hyperprince.ens-lyon.fr/ ). This tool allows to compare the editio princeps (Blado 1532) with the four French translations of the XVIth century (Jacques de Vintimille (1546), Gaspard d’Auvergne et Jacques Cappel (1553), Jacques Gohory (1571) and one of the XVIIth century translation by Amelot de la Houssaie (1683).

1. Presentation of the tool

HM Inspired by machine translation and lexicographic domains, the system proposes an annotation environment dedicated to the edition of lexical correspondences and offers different views to assist humanities researchers in their interpretations of the quality and the specificities of translator’s work. It allows synoptic view, equivalences detection (it was defined to support manual edition of equivalences for aligned corpora and lexicography work). Corpora and annotations can be directly exported in TEI, although equivalences encoding follows a new dedicated XML schema. The construction of the corpus can be performed within the tool, importing and aligning one by one version of the same text. Concerning the Hyperprince corpus, alignement was performed on arbitrary segments, decisions made by the philologist, corresponding to subdivisions of the original text structure (in chapters).

2. First results

2.1 Stato: The “new things” that Machiavelli states are complex and their “semantic territories” intersect and overlap. HM allows to verify the hypothesis of permanent polysemy of terms – polysemy that comes from how Machiavelli tries to describe (using sometimes the same words in different meaning) the new objects or forms of political action.

2.2. The choices of each translator (virtù, ordini). The tool allows you to understand the differences in approach between the translators and highlight their lexical and syntactic choices. We can therefore think about a description of how each translator translates, allowing to see what’s playing at every moment in this or that choice.

ZIMMER, Mary Erica (The Editorial Institute, Boston University, United States of America); O’DONNELL, Molly (University of Nevada, Las Vegas, United States of America); BESHERO-BONDAR, Elisa (University of Pittsburgh at Greensburg, United States of America)

Keywords: TEI tagging, innovation, corpus methods, annotation, digital editions

  • Session: Abstracting the TEI
  • Date: 2015-10-30
  • Time: 09:00 – 10:30
  • Room: Amphi Laprade

How might editors annotate what they cannot identify? Under such circumstances, might a TEI archive’s own markup lead the way to new discoveries? Within Digital Mitford: The Mary Russell Mitford Archive, the challenge of locating the mysterious “Miss James” proves emblematic. Referenced solely by patronym, “Miss James” became a topic of conjecture when multiple editors shared questions about the same elusive figure. In letters penned by Mitford in 1819 and after, “Miss James” emerged as Mary Mitford’s trusted friend and advisor. She was also an opinionated humorist, offering assessments of everything from mutual acquaintances to literary works. Yet while her Christian name and profession were later discovered by project editors, her history remains largely unearthed.

What insights might processing Digital Mitford’s own markup reveal about such a figure? Inspired by Douglas Duhaime’s visualized co-citations in the EEBO-TCP corpus, we view clusters of related data as forms of annotation—ones that, rendered judiciously, aid both scholars and those newer to Mitford’s oeuvre.[1] Working with XQuery on our eXist database of project files, we first assess the prevalence of relational categories tagged by our editors, then use these counts to weight lists of high-frequency tokens in ranges indexed by a key term.[2] Visualized, the resulting bouquets of knowledge suggest lines of inquiry—ones “locating” the unknown while enhancing perspectives the TEI archive itself may offer.

[1] See Duhaime, Douglas. “Co-Citation Networks in the EEBO-TCP Corpus.” 26 July 2014. <http://douglasduhaime.com/blog/co-citation-networks-in-the-eebo-tcp-corpus>. Our model builds upon Christopher Ricks’ metaphor of scholarly annotation as “supererogation” (Allusion to the Poets, OUP, 2002). While its visualization is in progress, one mock-up may be found at <http://bit.ly/1gJXWsV>.

[2] See The Digital Mitford Codebook <https://docs.google.com/document/d/1r-8NGPJL1pZ20pnfvoX5OT0DkcDi- NBp5urJiZwx1sY/pub>. On ordered lists, see Witmore, Michael. “Finding ‘Distances’ Between Shakespeare’s Plays 2: Projecting Distances onto New Bases with PCA.” 6 July 2015. <http://winedarksea.org/?p=2271>

Speaker Bios

Mary Erica Zimmer is a Ph.D. Candidate in The Editorial Institute at Boston University whose research addresses editorial theories and methods, histories of the book, and intertextuality. She also has a strong interest in models for undergraduate research. Her work on Digital Mitford’s data visualization team is complemented by her development of an online, browsable model of the bookshops and stalls in London’s Paul’s Cross Churchyard before the 1666 Great Fire. Her dissertation will serve as a companion to the Selected Poems of Geoffrey Hill.

Molly O’Donnell is the University of Nevada, Las Vegas, President’s Foundation Graduate Research Fellow. She has recently contributed to Victoriographies and the Norton Anthology, and was formerly associate faculty at Notre Dame of Maryland University. Her dissertation uses contemporary sociolinguistics to examine the nineteenth-century tales novel as a useful mode for exploration in the areas of genre, narrative, and gender studies.

Elisa Beshero-Bondar, Project Director of the Digital Mitford Archive, is Associate Professor of English at the University of Pittsburgh at Greensburg where she has taught since 2004. She is the author of Women, Epic, and Transition in British Romanticism (University of Delaware Press, 2011). At Pitt- Greensburg, she helped to launch a Digital Humanities pedagogy and research initiative that engages faculty and students in electronic text markup, text-mining of digital library databases, and digital project development. She has recently been experimenting with network analysis as applied to complex text and paratext structures in Thalaba the Destroyer, an 1801 epic poem by Robert Southey.