Keywords: Digital Diplomatics, Charter Encoding Initiative, medieval charters, Counts of Luna, encoding standards.
- Session: Encoding historical data in the TEI
- Date: 2015-10-30
- Time: 11:00 – 12:30
- Room: Amphi Fugier
Keywords: TEI visualisation, TEI publication, digital editions, European integration studies
- Session: Encoding historical data in the TEI
- Date: 2015-10-30
- Time: 11:00 – 12:30
- Room: Amphi Fugier
Keywords: Pédagogie par TEI, crowd-sourcing TEI, Enseignement/recherche, écologie recherche/patrimoine
- Session: Presentation and engagement with the TEI
- Date: 2015-10-30
- Time: 16:00 – 17:30
- Room: Amphi Laprade
Bibliographie
- Burnard, Lou. (2014). What is the text Encoding Initiative ? How to add intelligent markup to digital resources. Open Edition press,108 p.
- Dacos, M. (2010). Manifeste des Digital humanities [Billet]. Consulté à l’adresse http://tcp.hypotheses.org/318
- Ferry, J. (2014). Jules Ferry 3.0, Bâtir une école créative et juste dans un monde numérique Conseil national du numérique (No. 3.0). Paris. Consulté à l’adresse http://www.cnnumerique.fr/education-2/
- Hudrisier, H. (1999). La « lecture assistée par ordinateur » et ses applications savantes ou pédagogiques dans un contexte interactif normalisé : la TEI (Text Encoding Initiative). Passerelles (Revue de l’Université de Paris 8), (24), 57‑64.
- – HumaNum. (s. d.). TGIR Huma-Num Très Grande Infrastructure de Recherche dédiée aux humanités numériques. Consulté 30 mai 2015, à l’adresse http://www.huma-num.fr/
- Ide, N., & Véronis, J. (1996). Présentation de la TEI. Cahiers GUTenberg, TEI : Text Encoding Initiative(24), 4‑10.
- TEI Consortium. (2015). Guidelines for Electronic Text Encoding and Interchange. (originally edited by Sperberg-McQueen, C.M. and Burnard, L., Version 2.8.0. avril 2015). Consulté 30 mai 2015, à l’adresse http://www.tei-c.org/Guidelines/P5/
- Université Paris 8. (s. d.). IDEFI-CréaTIC – Université Paris 8. Consulté 30 mai 2015, à l’adresse http://www.univ-paris8.fr/IDEFI-CreaTIC
Keywords: ontologies, historical structured data, semantic markup, digital history, semantic web
- Session: Encoding historical data in the TEI
- Date: 2015-10-30
- Time: 11:00 – 12:30
- Room: Amphi Fugier
Keywords: translation, alignment, early modern Spain, nineteenth-century England, ur-text, parallel editions
- Session: Workflows and the TEI
- Date: 2015-10-31
- Time: 11:00 – 12:30
- Room: Amphi Laprade
Keywords: TEI document production, extended-markdown, automatic tagging
- Session: Publishing and the TEI
- Date: 2015-10-30
- Time: 11:00 – 12:30
- Room: Amphi Laprade
Creating new XML documents from scratch, or from plain-text, can be a difficult, time consuming and error prone task, specially when the markup vocabulary used is rich and complex, as is the case of the TEI. It usually takes a good amount of time to make the document validate for the first time. In the old times, SGML allowed certain freedom to document encoders, which were meant to save time and effort, like leaving certain tags open, or not using quotes for all attribute values. In this sense, SGML was more permissive than XML. This was good for human encoders, but made it difficult for programmers to create parsers and applications that fully complied SGML’s permissive set of rules and inferences. On the contrary, XML was made to very restrictive, and in turn more predictable, which makes parsing and processing easier, and contributed to the fast popularity that XML gained soon after its introduction. In the Wiki world, a myriad of Wiki languages appeared, with the purpose of simplifying, or completely avoiding HTML markup. Among them, Markdown is a recent and very successful shorthand notation to avoid writing HTML tags while still keeping text legibility intact. By joining the spirit of good-old SGML, and the ideas behind Markdown, we came to the idea of the downTEI project, which consists of an extension to the markdown syntax meant for the creation of TEI-XML documents, and the corresponding parsers needed to perform such conversion. With this approach, it is easy to obtain a valid TEI document in a very short time, avoiding going through a long list of validation errors. This approach, however, has some limitations. It is meant to process the most common tags, like the ones used for prose and verse, and the most commonly used for the teiHeader (in short, the most frequent tags from teixlile.dtd). For specialized applications (like manuscripts, for instance), further tagging is necessary after the initial conversion, but even in such cases a significant amount of time is saved in the process. In the presentation we will describe the extended markdown notation, the parsing process, and the resulting TEI documents. We will also discuss the benefits and limitations of this approach.
AUTHOR’S BIO:
Alejandro Bia has a PhD in Software Engineering. He studied at ORT University, Oxford University, and at the University of Alicante. Currently he is a full time lecturer at the Miguel Hernández University, within the Department of Statistics, Mathematics and Computer Science and researcher at the Operations Research Center (CIO). He has lectured for the Cultural Heritage Digitization Course at FUNED (2013), the Master in Digital Humanities (2005-2011), and the Master in Web Technology (2005-2007), at the University of Castilla La Mancha, for the Department of Languages and Information Systems (2002-2004) and the Department of Fundamentals of Economic Analysis (2002) of the University of Alicante, and at ORT University (1990-1996). His lecture topics are: text markup using XML and TEI, software engineering, project management, computer forensics, information security, web application design, concurrent programming, operating systems, computer architecture, computer networks and English for computer sciences. At present, he is in charge of the TRACEsofTools project (software tools for contrastive text analysis in parallel bilingual corpus), and continues to develop the Digital Humanities Workbench (DHW) project. In 2005, he has done consultancy work for the National Library of Spain. From 1999 to 2004, he has been Head of Research and Development of the Miguel de Cervantes Digital Library at the University of Alicante. Previously, he has worked as Special-Projects Manager at NetGate (1996), and as Documentation Editor of the GeneXus project at ARTech (Advanced Research and Technology) (1991-1994). His current research interests are text alignment, text-mining, stylometry and visualization methods applied to text corpora. Previously, he worked on the application of software engineering methods and techniques to digital libraries and to enhance document structure design, multilingual markup, digitisation automation by computer means, digital preservation, digitisation metrics and cost estimates. He also worked on neural networks training and developed the ALOPEX-B optimization method. Currently he is a member of AIPO (Association for Human Computer Interaction).
Keywords: Indexation, Outil collaboratif, Base prosopographique, Environnement TEI, Autorités
- Session: Interchange and the TEI
- Date: 2015-10-30
- Time: 14:00 – 15:30
- Room: Amphi Fugier
Keywords: Crowdsourcing, editing, correspondence, normalisation, automation
- Session: Correspondence in the TEI
- Date: 2015-10-29
- Time: 09:00 – 10:30
- Room: Amphi Fugier
The Letters of 1916 is a project to create a collection of correspondence written around the time of the Easter Rising in Ireland or by Irish people. The project uses a crowdsourcing approach: not only experts, but anybody can contribute by uploading images of letters and transcribing them.
Since September 2013, when the project was launched, a large number of letter imageshave been uploaded and transcribed via our website (http://dh.tcd.ie/letters1916/) and stored in a relational database. The next stage of the project is it to make the collected images, transcriptions, and metadata available in the form of a digital edition.
The transition from our crowdsourcing environment to a digital scholarly edition is challenging on many levels. One of the biggest challenges is it to ensure normalisation and accuracy of the TEI encoding. Our workflow can be broken down into the following stages: firstly, extraction of transcriptions and metadata from the relational databases in which they are currently stored; secondly, insertion of metadata and transcriptions into TEI templates; and, finally, both automated and manual error checking and proofing to ensure the transcriptions are consistently encoded, valid TEI documents.
After a general view on our crowdsourced collection process for metadata and transcriptions and issues related to it, our paper will discuss strategies and methodologies we use to create valid and meaningful TEI transcriptions. Essentially the TEI encoding has to be general enough to be worthwhile (and useful as TEI data for future usage) and consistent enough to be ingested back into a relational database that is the “digital edition”.
Brief biography of authors
Roman Bleier is PhD student on the Digital Arts and Humanities programme in Trinity College Dublin. His research focuses on digital editing with TEI and he works on a digital edition of Saint Patrick’s writings under the supervision of Professor Seán Duffy.
Richard Hadden is a PhD student on the DiXiT progamme, studying the intersection between Digital Scholarly Editing and Mass Digitisation processes. The Letters of 1916 project is serving as a casestudy for his research. He has an MA in Digital Humanities and a BA in Modern European Languages.
Linda Spinazzè is currently working on the Letters of 1916 project as a DiXiT postdoctoral fellow. After completing her first degree in Medieval Latin Literature and gaining a second degree in Computer Science & Humanities, she obtained her PhD in Medieval and Classical Philology with an experimental dissertation on Elegies by Maximianus investigating an alternative model of digital scholarly editing.
keywords
Crowdsourcing, editing, correspondence, normalisation, automation
Keywords: standoff, OHCO, data modelling, xpointer, xpath
- Session: Interchange and the TEI
- Date: 2015-10-30
- Time: 14:00 – 15:30
- Room: Amphi Fugier
Keywords: variants, manuscript transcription, textual criticism, overlapping hierarchies
- Session: Textual variants and the TEI
- Date: 2015-10-30
- Time: 14:00 – 15:30
- Room: Amphi Laprade
Chapter 11 of the TEI Guidelines (Representation of Primary Sources) has undergone profound changes with TEI P5 Version 2. Equally significant changes are likely to occur in the following chapter (Critical Apparatus). This paper is aimed to elucidate connections between the issues of encoding manuscript revisions and multiple witnesses. It is part of a broader discussion of the current state of chapter 11 that is among the envisaged outcomes of the DFG-NEH project “Diachronic Markup and Presentation Practices”. Traditionally we distinguish differences between two or more witnesses (variants) from revisions within a single manuscript (‘corrections’). This methodological distinction is reflected in the coexistence of two different modules, transcr and textcrit. For the purpose of modelling variance in a more abstract way, however, it is useful to put aside whether variants occur within or between witnesses. It is well known and has been much complained about for a long time that variants among bigger textual units or even divisions cannot (or cannot satisfyingly) be expressed within the framework of the Text Criticism module in its current state. An analogous problem is faced in the Transcription module: <addSpan/> and <delSpan/> are globally available but are not designed to express paradigmatic relations, as app and subst do). Thus, if we consider making <app>, <lem>, and <rdg> available above the chunk-level, it would be consistent to treat <subst>, <del>, and <add> in the same way.
This would be a more radical change than the introduction of the considerable number of new elements and attributes that we have already seen in chapter 11, as the traditional hierarchy of divisions, inter-level, chunk and phrase-level elements was left untouched. However radical, allowing elements above the chunk level still falls short of what is needed when it comes to variants that cut across structural divisions and units of the text, and especially when it is the structure itself that either varies between witnesses or is changed within one witness. In such cases the conventional distinction between what is ‘text’ and what is ‘markup’ is abandoned, as it is the markup of divisions and units that is in turn to be marked up.
Ein wechselnd [Weben
Ein glühend] Leben!
On the basis of practical examples like this one from Goethe’s Urfaust (lines 154–5), the paper will demonstrate that it is indeed necessary to modify the traditional hierarchy, if the structure of texts is fluid rather than static, and if it is this very fluidity that an encoder wishes to represent. Furthermore, the paper will explore how this can happen as gently as possible, how the problem of overlap can be handled, and how the resulting markup can be transformed into a traditional TEI-Syntax.
Keywords: manuscript description, ontology, medieval manuscripts, annotation, codicology
- Session: Encoding manuscripts in the TEI
- Date: 2015-10-30
- Time: 16:00 – 17:30
- Room: Amphi Fugier
The development of tools for the automatic and manual tagging digitized sources is one of the main goals in digital humanities research projects dealing with medieval manuscripts. The TEI module “msdescription” already provides a good standard regarding descriptive metadata from manuscript catalogues. However, for a more detailed description of the page layout a more profound mark-up is sometimes required. In order to obtain easily machine-readable metadata for quantitative analysis of the sources the use of continuous text in prose needs to be avoided. Referring to an external ontology might be a good option, but the range of dictionaries and other terminological tools for codicology is still rather unsatisfactory. This applies not only to English and German controlled vocabularies with explicit codicological terms but also to other European languages. Therefore, within the project eCodicology – Algorithms for the Automatic Tagging of Medieval Manuscripts[1] a bilingual[2] codicological terminology[3] has been created and will now become the basis a SKOS data model[4] for the description of codicological information such as layout features and illuminations.
The paper will present this data model and its application in the meta data management5 of eCodicology. Some examples of statistical analyses will be given as well. Furthermore, I would like to discuss the possibility of future collaborations within the TEI community to establish a common reference model for manuscript terminology.
[1] http://www.ecodicology.org
[2] English and German
[3] Based on the works of Denis Muzerelle (French), Marilena Maniaci (Italian), M.P. Brown (English) and inspired by the discussions about the need of an updated and complete multilingual terminology of Christine Jacobi-Mirwald and Marilena Maniaci in Wolfenbüttel 2011.
[4] http://www.w3.org/2004/02/skos/
[5] A meta data schema according to TEI P5 is used.
Keywords: Akkadian language, cuneiform writing, tablets corpus, TXM tool, linguistic analysis
- Session: Correspondence in the TEI
- Date: 2015-10-29
- Time: 09:00 – 10:30
- Room: Amphi Fugier
Keywords: édition critique électronique, philologie numérique, corpus médiévaux, extension TEI
- Session: Textual variants and the TEI
- Date: 2015-10-30
- Time: 14:00 – 15:30
- Room: Amphi Laprade
Keywords: publication, XML database, Xquery, BaseX
- Session: Publishing and the TEI
- Date: 2015-10-30
- Time: 11:00 – 12:30
- Room: Amphi Laprade
Keywords: Ontology, markup semantic, linked data, EARMARK, OWL, modelling, encoding theory
- Session: Hermeneutics and the TEI
- Date: 2015-10-31
- Time: 11:00 – 12:30
- Room: Amphi Fugier
References
- 1. Ciotti F., Tomasi F. (2014). “Formal ontologies, Linked Data and TEI semantics”. TEI Conference and Members Meeting 2014. Evanston (IL), October 22-24, 2014. http://tei.northwestern.edu/files/2014/10/Ciotti-Tomasi-22p2xtf.pdf.
- 2. Peroni S., Gangemi A., Vitali F. (2011). “Dealing with Markup Semantics”. I-SEMANTICS 2011 Proceedings, 111-118. New York, ACM. DOI: 10.1145/2063518.2063533.
Keywords: CTS, continuous, integration, sustainability, contribution
- Session: Interoperability and the TEI
- Date: 2015-10-29
- Time: 11:00 – 12:30
- Room: Amphi Laprade
The Open Philology Project (OPP) at Leipzig and its US affiliate, the Perseus Digital Library at Tufts (PDL), has years of experience developing extensive infrastructures for managing textual data for historical languages. With around 100 million words available on PDL, and millions more words coming through OPP, in a context of opening contributions from wide ranging communities of users, dealing with ingestion of new texts is a matter of security, flexibility and efficiency.
Over the last few years, PDL and OPP have been moving forward in implementing the Canonical Text Service URN norm and the Epidoc subset guidelines to allow for better interoperability and citability of its texts. We are now working towards supporting a scalable workflow centered on continuous curation of these texts, from both within and outside the PDL/OPP ecosystem. Key requirements for such a workflow are ease of maintenance and speedy deployment of texts for use by a wide variety of analytical services and user interfaces.
Drawing on software engineering best practices, we are building an architecture meant for continuous integrations[1]: analogous to the way Travis[2] integrates with Github[3], we are developing a customizable service that test individual files upon each contribution made to our public git repositories. The services can be configured to test and report status on a variety of checkpoints from schema compliance to CTSready markup.
With a strong continuous integration service, we should be able to deal not only with a wide range of genres and languages, but also with a diversity of contributors. We can delegate the tedious tasks of checking markup to the machine, leaving curators free to focus on the scholarship. We also expect that automating checks on the integrity and the adaptability of textual objects for specific frameworks can reduce the error rate and allow for shorter feedback loops to contributors and users of our corpora.
Keywords: histoire de l’art, web sémantique, ecdotique, corpus
- Session: Presentation and engagement with the TEI
- Date: 2015-10-30
- Time: 16:00 – 17:30
- Room: Amphi Laprade
Keywords: ancient Egyptian-Coptic; digital epigraphy; TEI/XML – Epidoc; interchange format; controlled vocabularies; stand-off annotations
- Session: Interchange and the TEI
- Date: 2015-10-30
- Time: 14:00 – 15:30
- Room: Amphi Fugier
Sharing digital textual resources is an actual challenge for scholars working on Ancient Egyptian-Coptic (3000 BC-1350 AD). There are two types of reasons for this: first, the different writing systems that have been used throughout the history of this language (hiero-glyphic and hieratic scripts, demotic, Coptic) led to various solutions as regards the encoding of texts; second, the diverging aims and scopes of the projects involved in creating annotated corpora of Ancient Egyptian-Coptic generated representation formats with few characteristics in common. As a result, the resources themselves cannot be shared, and no standard tool can be used for encoding, annotating, querying or analyzing these resources. In order to overcome these issues, several leading projects in the field (1) join forces and introduce a TEI compliant interchange data model that has the following characteristics:
(a) The ancient Egyptian-Coptic TEI interchange data model represents an agree-ment on a subset of the EpiDoc schema towards which the textual data of each project can be converted. Project specific annotations are dealt with either using stand-off markup that refers to tokens of transliterated texts (Bański 2010; Pose et al. 2014), or on the basis of data models that are true expansions of the kernel interchange data model.
(b) The specialized metadata elements and attributes referring to Egyptological concepts are based on controlled vocabularies that are shared and enriched collaboratively by the projects.
(c) These metadata apply either to physical text-bearing objects, inscribed physical features, witnesses (on documents) or texts (Morlock & Santin 2014). As the conceptualization of the relationship between these entities is shared between projects, coherence and precision when describing both the material, philological and linguistic dimensions of textual resources can be obtained.
Note (1) : The project “Cachette de Karnak” (IFAO-SCA; http://www.ifao.egnet.net/bases/cachette/about/; Razanajao, Morlock & Coulon 2013), the Ramses Project (Polis, Honnay & Winand 2013), the Rubensohn Project (http://elephantine.smb.museum), and the Thesaurus Linguae Aegyptiae (http://aaew.bbaw.de/tla/; Dils & Feder 2013).
References
- Bański, P. 2010. Why TEI standoff annotation doesn’t quite work: and why you might want to use it nevertheless, in Proceedings of Balisage: The Markup Conference. Vol. 5 of Balisage Series on Markup Technologies.
- Dils, Peter & Feder, Frank. 2013. The Thesaurus Linguae Aegyptiae. Review and Perspectives, in Polis & Winand (eds.), p. 11-23.
- Morlock, Emmanuelle & Santin, Eleonora. 2014. The Inscription between text and object: The deconstruction of a multifaceted notion with a view of a flexible digital representation, in Orlandi, Santucci, Casarosa, Liuzzo (eds). First EAGLE International Conference on Information Technologies for Epigraphy and Cultural Heritage, Paris, 27th September-1st October. p. 325-350. <halshs-01141856>
- Polis, Stéphane, Anne-Claude Honnay & Jean Winand. 2013. Building an Annotated Corpus of Late Egyptian. The Ramses Project: Review and Perspectives, in Polis & Winand (eds.), p. 25-44.
- Polis, Stéphane & Winand, Jean (eds.), Texts, Languages & Information Technology in Egyptology. Selected papers from the meeting of the Computer Working Group of the International Association of Egyptologists (Informatique & Égyptologie), Liège, 6-8 July 2010, Liège, Ægyptiaca Leodiensia 9.
- Pose, Javier, Lopez, Patrice & Romary, Laurent. 2014. A Generic Formalism for Encoding Stand-off annotations in TEI. <hal-01061548>
- Razanajao, Vincent, Emmanuelle Morlock & Laurent Coulon. The Karnak Cachette Texts On-Line: the Encoding of Transliterated Hieroglyphic Inscriptions. TEI Conference and Members’ Meeting 2013, Oct 2013, Rome, Italy. <http://www.tei-c.org/Vault/MembersMeetings/2013/>. <halshs-01141540>
Keywords: TEI Simple, Processing Models, TEI ODD, Output
- Session: Publishing and the TEI
- Date: 2015-10-30
- Time: 11:00 – 12:30
- Room: Amphi Laprade
Keywords: TEI, modeling, transcription
- Session: Tooling and the TEI
- Date: 2015-10-31
- Time: 09:00 – 10:30
- Room: Amphi Fugier
Invariably, digitization projects using TEI concentrate on the content of items through transcription of the text into an abstracted format which is then most often reconstituted into a similacrum of the original alongside a stack of images from which the user of the tool is supposed to intuit the original form. While useful and necessary, this approach also obscrures what is lost in digitization – the material aspects of the codex book, the documentary tapestry, or the physical inscription – in favor of providing tools that are readable by machines, but less so by people. Thus the process of digitization, abstraction, and reconstitution serves the same function as a technological black box. The item to be digitized enters the box, the online version of that item appears on the other side, but the processes it undergoes are opaque to the average online viewer. Moreover, these practices threaten to alter the initial conception of of these artifacts, as that digital surrogate is often the first encounter students and scholars have.
Using both digitized medieval manuscript sources and the Clopton chantry chapel at Holy Trinity, Long Melford – itself an example of where the physical reality of a medieval artifact belies easy categorization and display digitally – I will explain ways that favoring the zone-based TEI schema in combination with thoughtful display practices might recover some of what is lost through the process of digitization and the wealth of legacy codicological information that might be utilized more effectively – ensuring that both form and content are given their due.
Biography
Dr. Matthew Evan Davis (Texas A&M Univeristy, 2013) most recently was the Council for Library and Information Resources/Mellon Postdoctoral Fellow in Data Curation for Medieval Studies at North Carolina State University. There, he worked as part of the team on two TEI-based projects — the Piers Plowman Electronic Archive and the Siege of Jerusalem Electronic Archive, as well the Manuscript DNA project and the Medieval Electronic Scholarly Alliance, an aggregator and discussion space for digital scholarly and cultural heritage work regarding the Middle Ages. Continuing to serve as a consultant and Technical Editor on the former two project, he is also currently the editor of The Minor Works of John Lydgate, a new project seeking to digitize, transcribe, and make available the works of the 15th century poet. The site in its current form may be seen at www.minorworksoflydgate.net.
Besides his work on Lydgate, Dr. Davis is also very interested in the staging of medieval drama, cultural transmission through translation and reception, the history of the book, and material and digital curation as a means of preserving both the material object and the connections between the object, the content contained by that object, and its cultural milieu.
Keywords: letters, meta data, interchange, API
- Session: Correspondence in the TEI
- Date: 2015-10-29
- Time: 09:00 – 10:30
- Room: Amphi Fugier
Keywords: sonification, hypertext
- Session: Abstracting the TEI
- Date: 2015-10-30
- Time: 09:00 – 10:30
- Room: Amphi Laprade
Sonification is a complementary technique to visualization that uses sound to describe data. Kramer defines sonification as “the use of nonspeech audio to convey information. More specifically, sonification is the transformation of data relations into perceived relations in an acoustic signal for the purposes of facilitating communication or interpretation.” [13] While providing new opportunities for communicating through the human perceptual and cognitive apparatus, sonification poses challenges with presenting the exploratory patterns in data to the user as it is a less familiar medium for this purpose.
We describe work to sonify variants of Hamlet to aid exploratory textual analysis. The sonification presented focuses on using pitch and tones to help the user listen to differences in the structure between variations of a text or texts encoded in Text Encoding Initiative (TEI) XML. Our approach is inspired by the Hinman Collator, an opto-mechanical device originally used to highlight print variants in Shakespeare texts, whereby visual differences between two texts literally stood out through a stereoscopic effect [5]. Using an audio stream for each text, this project aims to produce a stereo audio image of the text, so creating an audio version of the stereoscopic illusion used in collating machines. The timing and frequencies are extracted for storage and transformation into alternate formats or to repeat the analysis.
We present initial work on XML variants of Shakespeare’s Hamlet using the Bodleian Libraries’ First Folio XML and their earlier work on the Quartos. We extracted document entities such as act, scenes, lines, and stage directions for the analysis. These are viewed as hyperstructures that may be separated from the text for sonification and comparison with other variants. Analytical perceptions can be altered through the presentation of the tones, pitches and icons. Audio displays demand the creator to rethink how structural data is presented to the user, and about the hyperstructures extracted to give potential for conversion of the analysis into hypermedia using visualization as well as sonification. Early results show promise for the auditory comparison.
We look at related work and present the case study. We then consider the use of audio beacons to help the user locate within the document, and discuss the integration with visualization. Finally we look at future work and conclude the paper.
Related Work
Sonification on exploratory data patterns has been explored in several projects. For example, work on stock market data [3,10] discusses the use of volume and pitch to alert to changes in the data, rather than relying on purely visual stimuli. It demonstrates the use of auditory displays for pattern analysis in exploratory data using a rule system, and is closely associated with visualization.
The Listening to Wikipedia project(1) presents and audio and visual display of edits made to Wikipedia pages. Using circles and rule-based sounds, it presents the recent changes feed to the user including new users and the type of user making the edit. This work provides an elegant interface to the user data but it is limited to one stream.
The TEI-Comparator(2) was developed to compare paragraphs and visualize the changes [9, 14] for the Holinshed Chronicles(3) project, illustrating a collation approach applied to TEI. This visualization work does not render the text into audio signals, and it was designed for a particular text. It focuses on the text rather than the editorial structures.
Sonification of hyperstructures is explored in [11], where an authored hypertextual structure is sonified using the techniques of algorithmic composition. In contrast, we present work that develops the notion of sonifying the hyperstructure, or hyperstructures, extracted and transformed from the editorial matter.
Sonifying versions of Hamlet
We present work on creating an auditory display from Shakespeare’s Hamlet. This began with the Bodleian’s work on the First Folio [5] and their earlier work on the Quartos with the British Library.
Initially we convert a selection of TEI XML elements, relating to acts, scenes, stage directions, lines and speaker, into a series of numbers. The process uses the XPointers for the characters to match the speaker to the line. These are read by the sonification software and mapped to relevant tones and sounds before being recorded as a music file, played to the user, or both actions. The different versions of TEI encodings pose challenges to ensure that each play has the same characters encoded and that the encodings can be mapped to the same number via a rule.
Figure 1: Example transform of TEI XML structure into sound
This work focuses on an alternative presentation to Hinman’s Collator where two texts are transposed in stereoscope to show the differences between them. Our eyes use variations between images to interpret depth in 3D vision; similarly, our ears use subtle timing and phase variations to establish a stereo stage. Using an audio stream for each text, the project aims to produce a stereo audio image of the text with auditory beacons to guide the user within the stream. Playing a synchronized audio stream per text in each ear helps the listener’s brain to hear any subtle differences between two versions.
Displaying the hyperstructures of the texts such as the speakers of a line element allows the listener to hear whether editorial changes have been made to the textual structure and to hint at variations of the same text.
By way of example, in the 1603 Quarto edition [7] the first stage direction and first lines are:
<stage rend="italic, centred" type="entrance">Enter two Centinels. <add place="margin-right" type="note" hand="#ab" resp="#bli"> <figure> <figDesc>Brace.</figDesc> </figure>now call'd <name type="character" ref="#bar">Bernardo</name> <lb/>& <name type="character" ref="#fra">Francisco</name> — </add> </stage> <sp who="#sen"> <speaker>1.</speaker> <l><c rend="droppedCapital">S</c>Tand: who is that?</l> </sp> <sp who="#bar"> <speaker>2.</speaker> <l>Tis I.</l> </sp>
In the 1605 Quarto edition [8], the stage direction and first lines are:
<stage rend="italic, centred" type="entrance">Enter <name type="character" ref="#bar">Barnardo</name>, and <name type="character" ref="#fra">Francisco</name>, two Centinels. </stage> <sp who="#bar"> <speaker rend="italic">Bar.</speaker> <l><c rend="droppedCapital">VV</c>Hose there?</l> </sp> <sp who="#fra"> <speaker rend="italic">Fran.</speaker> <l>Nay answere me. Stand and vnfolde your selfe.</l> </sp>
Although the sentinels are identified as Barnardo and Francisco in the stage direction, the text and markup specify different characters. In our software, this would create separate sounds for the first line but not the second. The latter line would create the stereoscopic illusion where the first line breaks it.
Auditory Beacons
Acts and scenes provide useful beacons for the listener to understand which section of the text is being presented. As audio is an unfamiliar medium for this work, there is a need to help the listener identify their position within the document structure. Simple auditory icons are used to aid the listener in understanding the presented event, and research is ongoing to improve these.
In early versions of the sonification, the acts and scenes were produced with different instruments and pitches to allow the user to identify them as this element. This means that the user has to be taught what the sound means and how to associate events within the display. The present version of the software uses simple tones. We are considering the development of auditory icons to help identify the type of element event being presented. In [12], the author discusses the debates in musicology about the use of period and modern instruments in the playing of period music. This sets up a tension in the use of sound. As the text may not be modern, what sound should be represented: one that is contemporary to the text or to the user?
The stage element provides greater detail to use within the display. The ‘type’ and ‘who’ attributes help to design the type of sound. The sounds associated with the ‘who’ attribute can be linked to the speakers but present a different issue. The speaker attribute is associated with one person but the stage directions may have more than one person interacting with the direction. This changes the note from being a single note to a chord or progression. The volume for each speaker is slightly raised as they continue speaking, helping the user identify that the speaker has not changed. When comparing two streams, the listeners will identify any textual changes when both tone and volume alter. Using the two parameters of note and volume provides the user with two axes to understand the data.
Visualization
We created an early prototype visualization showing symbolic representations of the events, using the Processing language used for coding in the visual arts(4) The note data was sent to the visualization server to show an abstract image or text based on the note received, displayed in near real-time to the sound. It did aid comprehension of the audio display, but the use of abstract symbols like the circles for speakers, poses the same challenge as the sonification where the symbol must be understood.
Figure 2: Early visualisations to aid the sonification
User feedback suggests that further refinement is required to help make the displays more useful. This may include the use of text and being developed for the Web.
Future Work and Conclusion
We have demonstrated the potential of sonification as a tool to help the user identify differences between textual variants. Auditory displays are known in exploring data though new for analytical tools. The medium allows the designer to use multiple parameters simultaneously to add meaning to an event by changing tone, pitch, sound or volume. This presents challenges in finding ways of making the technique understandable.
The use of stereo playback indicates that further work with spatial displays is possible to aid the comprehension of the data with a richer display. The timing data is being written out with the frequencies are the sounds being created. This provides the potential for integration of the TEI data with SMIL duration markup and transformation into HTML Media Fragments so that the text can be displayed in the browser with links to the sound or converted into Music Encoding Initiative(5) to be visualized in a novel fashion.
Words and lines may be auralized using the tone associated with the speaker. The sonification would then render the associated tones. This does pose the issue of how a word is sonified: is it by length or some other metric? The choice element from the Text Encoding Initiative provides the options for an original element and a variation. The sonification would then have to associate a similar tone with the choices. It may be that the original text would be the expected tone given the word change but that the variation is a sharp or minor tone played as a chord.
Further work is needed to create better auditory icons that work across streams and to integrate audio and visual displays. We have not explored this area fully. Contextual questions include the type of sound that would be typical in a dramatic context or physical one, such as the construction of places of performance. It also demands knowledge of the practices of staging. We intend to research the use of the sex attribute of the person element, contemporary auditory icons and conduct user testing.
We believe that the use of sound provides an exciting way of exploring textual structures to determine differences between them as an alternative workflow. The novelty in this area is a major challenge but we strongly believe that it has relevance in the exploration of variants between texts marked up with TEI.
References
[1] Gregory Kramer. 1993. Auditory Display: Sonification, Audification, and Auditory Interfaces. Perseus Publishing.
[2] `The Search for the “Killer Application”: Drawing the Boundaries around the Sonification of Scientific Data`, Supper, Alexandra in The Oxford Handbook of Sound Studies, Pinch, Trevor, and Bijsterveld, Karin 2012. New York: Oxford University Press, New York, p253
[3] Keith V. Nesbitt and Stephen Barrass, Finding Trading Patterns in Stock Market Data, IEEE Computer Graphics and Applications 24:5, IEEE Computer Society, pp 45-55, 2004
[4] Digital facsimile of the Bodleian First Folio of Shakespeare’s plays, Arch. G c.7, First Folio home page, http://firstfolio.bodleian.ox.ac.uk/
[5] C Hinman, Mechanized collation; a preliminary report., Papers of the Bibliographical Society of America 41 (1947): 99-106.
[6] Smith, Steven Escar. 2000. “‘The Eternal Verities Verified’: Charlton Hinman and the Roots of Mechanical Collation.” Studies in Bibliography 53. 129-62.
[7] The tragedy of Hamlet Prince of Denmarke: an electronic edition, Hamlet, First Quarto, 1603. British Library Shelfmark: C.34.k.1, http://www.quartos.org/XML_Orig/ham-1603-22275x-bli-c01_orig.xml
[8] The tragedy of Hamlet Prince of Denmarke: an electronic editionHamlet, Second Quarto Variant, 1605. British Library Shelfmark: C.34.k.2, http://www.quartos.org/XML_Orig/ham-1605-22276a-bli-c01_orig.xml
[9] The Holinshed Project: Comparing and linking two editions of Holinshed’s Chronicle, James Cummings and Arno Mittelbach, International Journal of Humanities and Arts Computing. Volume 4, Issue 1-2, Page 39-53, ISSN 1753- 8548, Available Online October 2010, http://dx.doi.org/10.3366/ijhac.2011.0006
[10] Keith V. Nesbitt and Stephen Barrass, of a Multimodal Sonification and Visualisation of Depth of Market Stock Data,. Nakatsu and H. Kawahara (eds), International Conference on Auditory Display (ICAD), 2002 , pp2—5 [11] De Roure, David C., Cruickshank, Don G., Michaelides, Danius T., Page, Kevin R. and Weal, Mark J. (2002) On Hyperstructure and Musical Structure. The Thirteenth ACM Conference on Hypertext and Hypermedia (Hypertext 2002), Maryland, USA, 11 – 15 Jun 2002. ACM, 95-104.
[12] Holden, Claire 2012. Recreating early 19th- century style in a 21st-century marketplace: An orchestral violinist’s perspective. Presented at: Institute of Musical Research DeNote Seminar, Senate House, London, 30 January 2012.
[13] Sonification report: Status of the field and research agenda Prepared for the National Science Foundation by members of the International Community for Auditory Display (1997) by G. Kramer, B. Walker, T. Bonebright, et al., http://sonify.psych.gatech.edu/publications/pdfs/1999-NSF-Report.pdf
[14] Lehmann, L., Mittelbach, A., Cummings, J., Rensing, C., & Steinmetz, R. 2010. Automatic Detection and Visualisation of Overlap for Tracking of Information Flow. In Proceedings I-Know.
Notes
(1) http://listen.hatnote.com/
(2) http://tei-comparator.sourceforge.net/
Keywords: art history, iconography, images, digital scholarly editions, hermeneutics
- Session: Hermeneutics and the TEI
- Date: 2015-10-31
- Time: 11:00 – 12:30
- Room: Amphi Fugier
Although TEI is short for Text Encoding Initiative, reflecting a historical primacy of textual hermeneutics, it has grown to encompass ways of encoding and annotating images as well. Iconographic content is often identified by referencing classification systems such as Iconclass. When it comes to scholarly editing, however, it has to be considered that illustrations are not always a mere accessory to the text but sometimes a crucial part of text-image units, meaning that illustrations are transmitted in a relatively stable, iterative manner alongside the text, showing a variance in pictorial elements reminiscent of textual variance. Attempts have been made in printed editions to construct something akin to a critical apparatus for images but more so than texts, illustrations are dependent on being seen to be understood and cannot be transcribed in the same way.
The digitization of source material and with it the changed parameters for including faksimiles in an edition allow for a re-evaluation of the difficulties involved in editing iconographic programmes. This paper will a) briefly survey the status quo of marking up iconography, including taxonomies, tools and projects, b) discuss the conceptual difficulties of formalizing iconographic descriptions based on the method by Erwin Panofsky, and c) propose a solution for semantically marking up iconographic variants in a machine-readable way, using the TEI. This solution will involve a concept of superstructures that provide a frame of reference and, at the same time, a pattern of organization. The Ascende calve pope prophecies from the 14th century with their particularly interdependent text-image units will serve as an illustrative example throughout.
In conclusion, the paper will address the question of whether and how this solution is applicable to a wider range of use cases, beyond the specific editorial problems it originated from.
Keywords: French litterature, Molière, stand-off annotation, intertextuality
- Session: Presentation and engagement with the TEI
- Date: 2015-10-30
- Time: 16:00 – 17:30
- Room: Amphi Laprade
Keywords: TEI Encoding, Corpus Linguistics, Manuscript Annotation, Interoperability, Digital Humanities
- Session: Encoding manuscripts in the TEI
- Date: 2015-10-30
- Time: 16:00 – 17:30
- Room: Amphi Fugier
Keywords: open access publishing, OJS, journals
- Session: Publishing and the TEI
- Date: 2015-10-30
- Time: 11:00 – 12:30
- Room: Amphi Laprade
The Indiana University Libraries have a long history of using the TEI markup standard to encode and publish electronic texts, but choosing the best publishing platform has been challenging for certain projects. Prior to formally launching an open access journal publishing program in 2008, the Libraries collaborated with two scholarly journals to provide open access publishing using P3 SGML and P4 XML TEI encoding delivered through the DSpace and XTF platforms. Both journals used complex encoding, transformation, and delivery workflows that required copious amounts of custom code and developer time. As these journals aged, the time and effort required to maintain them steadily increased. In 2013, the Libraries began plans to migrate these journals into the Open Journals Systems (OJS) platform while preserving the TEI markup. The success of these migration projects hinged on the ability to leverage the OJS XML Galley plugin, which allows journal managers the ability to upload a customized XSLT file to actively and seamlessly render XML articles into HTML.
Both journals are now publishing using the OJS platform. The Indiana Magazine of History (IMH) (http://scholarworks.iu.edu/journals/index.php/imh/) was successfully launched in OJS in August 2014, and The Medieval Review (TMR) (http://scholarworks.iu.edu/journals/index.php/tmr/) was launched in June 2015. Both journals continue to encode articles in TEI, although for consistency and ease of migration nearly 4000 TMR articles were updated to P5 TEI encoding, along with their ongoing encoding workflow. Publishing in this manner leverages IU Libraries’ strengths in electronic text projects and XML workflows within an easy-to-use, flexible platform that journal editors appreciate. The success of these migrations presents alternative frameworks for future TEI-based XML publishing of open access journals at Indiana University.
Speaker Biographies
Nicholas Homenda is the Digital Initiatives Librarian at Indiana University Bloomington Libraries where he manages digital projects, services, and initiatives in the Digital Collections Services department. Nick has a Master of Science in Information Studies from the University of Texas at Austin and previously worked as a music librarian and an orchestral clarinetist.
Shayna Pekala is the Scholarly Communication Librarian at Indiana University Bloomington, where she oversees the Libraries’ open access publishing services and initiatives. She holds an M.L.S. with a specialization in Digital Libraries from Indiana University Bloomington, and has been involved with digital publishing projects since 2012.
Keywords: philology, textmining, methodology, gender studies, collaborative
- Session: Workflows and the TEI
- Date: 2015-10-31
- Time: 11:00 – 12:30
- Room: Amphi Laprade
Keywords: Mishnah, Hebrew, Manuscript, Digital alignment, Morphological markup
- Session: Tooling and the TEI
- Date: 2015-10-31
- Time: 09:00 – 10:30
- Room: Amphi Fugier
Keywords: allograph, multi-layer transcription, normalization
- Session: Codicology and the TEI
- Date: 2015-10-30
- Time: 09:00 – 10:30
- Room: Amphi Fugier
Keywords: TEI, MEI, musique orale, médiation numérique, norme
- Session: Encoding orality and performance in TEI
- Date: 2015-10-31
- Time: 09:00 – 10:30
- Room: Amphi Laprade
Depuis 2013, d’abord avec le projet HumanitéDigitMaghreb, puis avec M&TEIeuroMed, nous avons entrepris d’encoder en MEI des corpus musicaux du Maghreb.
Or, le patrimoine musical du Maghreb fait partie du patrimoine immatériel de l’humanité dans le sens où sa culture musicale est essentiellement orale et que, si cette musique est abondamment commentée par des textes théoriques arabes, elle n’est pas notée musicalement à l’origine.
Cependant, nous avons constaté que, depuis la fin du XIXe siècle, plusieurs tentatives intéressantes ont été faites, d’une part, de noter et de publier cette musique, d’autre part, de réaliser des enregistrements sonores des divers concerts au Maghreb.
Ce travail d’investigation et ce nouveau type de conservation de cette musique orale a permis la constitution d’un matériel important de ressources musicales autorisant les musicologues et ethnomusicologues à approfondir leurs recherches scientifiques dans l’univers du patrimoine musical singulier de Tunisie, du Maroc et d’Algérie.
Notre problématique a donc été de considérer le rôle de la notation musicale dans ce contexte particulier et comment le répertoire musical du Maghreb pouvait bénéficier de l’encodage MEI et TEI.
Par exemple, cette année 2015, après discussion avec Perry Roland et Andrew Hankinson de la communauté MEI, notre équipe a développé une stratégie de relation entre la partition musicale et différents enregistrements sonores de cette partition, inscrits directement dans les fichiers MEI, ceci afin de respecter la variété d’interprétations possibles d’une même pièce. En effet, plusieurs caractéristiques musicales d’une pièce peuvent changer d’un pays à l’autre, en plus d’autres critères qui peuvent s’ajouter.
Aujourd’hui, dans le cadre du projet MEI AE NORMA de la Chaire ITEN Unesco, nous souhaitons expérimenter l’articulation TEI/explications théoriques et MEI/partition, afin de refléter ainsi l’imbrication des aspects textuels et sonores de la musique et tenter d’approcher différemment la structuration interne des corpus musicaux originaux.
Courte Biographie
Sylvaine Leblond Martin est Docteur en Sciences de l’Information et de la Communication et compositrice de musique contemporaine, agréée du Centre de musique canadienne. Elle occupe un poste d’ingénieure de recherches dans le programme IDEFI CréaTIC de l’Université Paris 8 et est membre de la Chaire ITEN Unesco où elle assiste Jean-Pierre Dalbéra dans les projets de « médiation et de valorisation du patrimoine culturel » et dans les projets de musique.
Mots clés
TEI, MEI, musique orale, médiation numérique, norme.
Keywords: oral corpora, multimodal corpora, corpora interoperability, corpora exchange, TEI improvement
- Session: Encoding orality and performance in TEI
- Date: 2015-10-31
- Time: 09:00 – 10:30
- Room: Amphi Laprade
Bibliography
- [1] Groupe ICOR (Bert, M., Bruxelles S., Etienne C., Jouin-Chardon E., Lascar J., Mondada L., Teston S. Traverso V.), 2010, Grands corpus et linguistique outillée pour l’étude du français en interaction (plateforme CLAPI et corpus CIEL), Pratiques, 147-148, 17-35.
- [2] Chanard, C. (2006). Base de données des “Alphabets de 200 langues africaines” [http://sumale.vjf.cnrs.fr/phono].
- [3] Debaisieux, J.-M., Benzitoun, C. & Deulofeu, J. (in press). Le projet ORFEO : un corpus d’étude pour le français contemporain, In Avanzi M., Béguelin M.-J. & Diémoz F. (eds), Corpus de français parlés et français parlés des corpus, Cahiers Corpus.
- [4] Etienne,C. (2009). La TEI dans le Projet CLAPI, Corpus de langues parlées en interaction. TEI Council, Lyon
- [5] Liégeois, L. (2013). De l’analyse au partage des données, quel(s) format(s) choisir ? L’exemple d’un corpus d’interactions parents-enfant. In Damiani M., Dolar K., Florez-Pulido C., Loth R., Magnier J. & [1] Pegaz A. (dir.) Traitement de corpus (Actes de Coldoc 2012). Paris : Modyco, 128- 142. [http://hal.archives-ouvertes.fr/hal-00850172]
- [6] Liégeois, L., Chanier, T. & Chabanal, D. (2014). Corpus globaux ALIPE : Interactions parents-enfant annotées pour l’étude de la liaison. Nancy : Ortolang. [http://hdl.handle.net/11041/alipe-000853]
- [7] Mettouchi, A. & Chanard, C. (2010) “From Fieldwork to Annotated Corpora : the CorpAfroAs Project”, Faits de Langue – Les Cahiers, 2, 255-265. [http://corpafroas.huma-num.fr/fichiers/Mettouchi_Chanard.pdf]
- [8] Morgenstern, A. & Parisse, C. (2007). Codage et interprétation du langage spontané d’enfants de 1 à 3 ans. Corpus, 6, 55-78.
- [9] Parisse, C. & Morgenstern, A. (2010). A multi-software integration platform and support for multimedia transcripts of language. LREC 2010, Proceedings of the Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, 106-110.
Keywords: manuscript description, cataloguing, digitization, ODD schema, eXist-db
- Session: Encoding manuscripts in the TEI
- Date: 2015-10-30
- Time: 16:00 – 17:30
- Room: Amphi Fugier
Bibliography
- Andrist, P., Canart, P. & Maniaci, M. (2013), La syntaxe du codex : essai de codicologie structurale, Turnhout
- Project web site. URL: http://www.manuscripta.se
- eXist database. URL: http://exist-db.org
- Project source code. URL: https://github.com/manuscripta
Keywords: XML, modeling, future.
- Session: Interoperability and the TEI
- Date: 2015-10-29
- Time: 11:00 – 12:30
- Room: Amphi Laprade
Keywords: tei, xslt, svg, visualization, drama
- Session: Abstracting the TEI
- Date: 2015-10-30
- Time: 09:00 – 10:30
- Room: Amphi Laprade
Keywords: Critical Apparatus, Version, Spanish Literature
- Session: Textual variants and the TEI
- Date: 2015-10-30
- Time: 14:00 – 15:30
- Room: Amphi Laprade
Keywords: verse module; EpiDoc; epigraphy; epigraphic poetry
- Session: Tooling and the TEI
- Date: 2015-10-31
- Time: 09:00 – 10:30
- Room: Amphi Fugier
The TEI schema has been used for editing verse on several projects (cf. a partial list in Gonzáles Blanco 2014). Different encoding modes have been chosen for each of these programs, within the rich set of tags offered by the TEI verse module. To this day, the effectiveness and applicability of the TEI schema have been tested on medieval, modern and contemporary poems, but rarely on a corpus of ancient epigraphic texts. Therefore, the project Musa epigraphica : new approaches for studying and publishing epigraphic poetry, which I’m leading and which is just at its early stage, opens up an almost uncharted testing area (for other electronic editions of epigraphic poetry see GVCyr, in progress, and the CLE Hispaniae project, completed).
Like any edition of texts written on a three-dimensional surface, the digital edition of verse inscriptions mostly raises issues related to the physical dimension of the medium itself. We discussed said issues, and tried to solve them, in a recent paper (Morlock-Santin 2014). But, while laying the groundwork to define a suitable encoding schema for a specific category of objects bearing text, the basic question that is to be raised is one of hermeneutic nature. How can we make a TEI edition a place for knowledge creation, as well as a research tool that would help researchers to better answer the scientific matters raised by the ancient “street poetry”? (Panciera 2012).
The focus of my paper will be, on the one hand, to draw up a report regarding the current state of the TEI-EpiDoc subset in the matter of the verse inscriptions encoding, in order to identify any potential gaps, and to report potentially helpful additions for the encoders. In other words, my intention is to submit a rational proposal to improve the current schema (8.21), which would go further than a simple request asked in a thread. On the other hand, as a specialist of a particular branch of epigraphic research and as a TEI user, my wish would be to determine, within the scientific literature, the major topics about the epigrammatic genre, and to transpose them into a searchable TEI edition. Due to limited speaking time, I will only address the issues tied to the mark-up of the layout and text formatting, especially the graphic features adopted to display the literary nature of the epigraphic text and convey it to the reader/viewer, i.e. :
- 1. Line/verse connection
- 2. The link between rhythmic breaks (caesuras and dieresis) and line breaks (Agosti 2010);
- 3. The layout of the text on its support: columnar layout; indented pentameter (Lougovaya 2012), indentation of other metric segment (clauses or hemistichs);
- 4. The use of dividers, multifunctional symbols and vacat, with a significant function to mark the metric or textual structure (Bodel 2012; Monella 2013);
- 5. The imitation of calligraphic style of literary papyrus.
Each question will be illustrated by one or more examples.
My presentation does not pretend to give conclusive answers, but rather to establish the terms of a debate in front of an experienced audience, and to suggest some possible solutions.
Selective bibliography
- Agosti 2010 = Gianfranco Agosti, « Eisthesis, divisione dei versi, percezione dei cola negli epigrammi epigrafici di età tardo antica », Segno e testo, Cassino, 2010.
- Bodel 2012 = John Bodel, « Paragrams, punctuation, and system in ancient Roman script », Stephen D. Houston, ed., The Shape of Script, Santa Fe, New Mexico, 2012.
Biography
Dr. Eleonora Santin is epigrapher and philologist. She works in Lyon as researcher at the CNRS (UMR 5189 HiSoMA). She is member of the team which works on the epigraphy and the history of the ancient Thessaly. Her PhD dissertation in Ancient History (University of Rome La Sapienza) focuses on the funerary epigrams of Thessaly and the new epigraphic group of signed epigrams within the larger category of artists’ signatures. Her primary research interests are the cultural history of ancient Greek society, in particular the authorship questions, the epigraphic poetry, the digital edition of inscriptions. She is author of a book, Autori di epigrammi sepolcrali greci su pietra (2009), of about ten peer reviewed publications and she is coeditor of the proceedings L’Épigramme dans tous ses états : épigraphiques, littéraires, historiques : actes du colloque international, 3-4 Juin 2010, ENS de Lyon, Lyon, ENS de Lyon Éditions (forthcoming).
Personal web-page : http://www.hisoma.mom.fr/annuaire/santin-eleonora
Keywords: Genetic Editing, Semantic Web, Ontologies, Art History, Digital Edition
- Session: Encoding orality and performance in TEI
- Date: 2015-10-31
- Time: 09:00 – 10:30
- Room: Amphi Laprade
The contribution will present a digital, genetic and semantically enriched edition of the notebooks by the Austrian artist Hartmut Skerbisch (1945-2009). Digital scholarly editions are a widely neglected method in art history. Hence, a focus of this project is on applying edition methods to art historical source material and demonstrating the value of handwritten sources as relevant primary sources for art historical research. Additional emphasis lies on the use of semantic technologies which allow for revealing interconnections between individual entities.
The goal is to reconstruct the artist‘s association processes in the course of his developing individual artworks, exhibitions and events. Therefore it is essential to take a closer look at the artist’s inspirations – from literature, music and the visual arts – and to trace these influences in the notebooks. Thus, it will be possible to demonstrate how a specific idea evolved and changed over time, paying special attention to the media transition from ephemeral idea to text to manifestation. A thematic and chronological order is applied to the notebook entries and text fragments, sketches and formulas from the notebook are linked to actual works: a digital representation is best suited if not indispensable for dealing with such witnesses.
The notebooks were annotated using the TEI, with special consideration given to the recommendations for editing origination processes. However, this project is not merely concerned with the genesis of the text itself, but with the development of artists’ ideas. For semantic enrichment, annotated entities are linked to formal ontologies based on the artist’s inventory. The digital repository GAMS supports the presentation of the TEI encoded text in different forms and comes with a triple store for RDF representations of the content.
The combination of these methods and technologies will help to reconstruct the artist’s association processes and reveal the genesis of his work.
Bibliography
Brüning, G., Henzel, K. and Pravida, D. (2013). Multiple Encoding in Genetic Editions: The Case of ‘Faust’, Journal of the Text Encoding Initiative. http://jtei.revues.org/697 (accessed 4 May 2015).
Burnard, L., Jannidis F., Pierazzo, E. and Rehbein, M. (2008-2013), An Encoding Model for Genetic Editions. http://www.tei-c.org/Activities/Council/Working/tcw19.html (accessed 4 May 2015).
De la Iglesia, M. and Göbel, M. (2013). From entity description to semantic analysis: The case of Theodor Fontane’s notebooks. In: Ciotti, F. and Ciula, A. (eds.), The Linked TEI: Text Encoding in the Web. TEI Conference and Members Meeting 2013, Rome: UniversItalia, 24-29.
Fenz, W. (1994). Hartmut Skerbisch. Werkauswahl 1969-1994. Graz: Neue Galerie.
Pierazzo, E. (2009). Digital Genetic Editions. The Encoding of Time in Manuscript Transcription. In: Deegan, M. and Sutherland, K. (eds.), Text Editing, Print and the Digital World. Farnham: Ashgate, 169-186.
Stigler, J. and Steiner, E. (2014-2015). GAMS and Cirilo Client. Policies, documentation and tutorial. http://gams.uni-graz.at/doku (accessed 4 May 2015)
Vogeler, G. (2014). Modelling digital edition of medieval and early modern accounting documents, digital humanities 2014 (Lausanne) http://dharchive.org/paper/DH2014/Paper-181.xml (accessed 4 May 2015)
Biography
Martina Scholger is scientist at the Centre for Information Modelling – Austrian Centre for Digital Humanities at the University of Graz. Currently, she is working on her PhD project “Hartmut Skerbisch – Artists’ notebooks as a digital genetic and semantically enriched edition”. In addition to teaching data and text modelling for humanities students, she is responsible for the conceptualization and implementation of digital editions in various cooperation projects. Since 2014, she has been a member of the Institute for Documentology and Digital Editing (IDE).
Keywords: lexicography, LOD, non-standard languages
- Session: Interoperability and the TEI
- Date: 2015-10-29
- Time: 11:00 – 12:30
- Room: Amphi Laprade
Keywords: Indexing; Reference; Hermeneutics; Proper names; Pound, Ezra
- Session: Hermeneutics and the TEI
- Date: 2015-10-31
- Time: 11:00 – 12:30
- Room: Amphi Fugier
Keywords: Modelling, codicology, documentary editions, manuscript studies
- Session: Codicology and the TEI
- Date: 2015-10-30
- Time: 09:00 – 10:30
- Room: Amphi Fugier
Keywords: books of account, semantically enriched, digital edition
- Session: Codicology and the TEI
- Date: 2015-10-30
- Time: 09:00 – 10:30
- Room: Amphi Fugier
Account books have long been used as primary sources for economic and social history since they allow scholars to explore the development of economic behavior on both a macro and microstructural level. In the field of digital editing the TEI has become a standard for transcription and encoding of multiple aspects of the texts. Accounts have only recently entered the field of digital scholarly editions[1]. They pose new problems to be solved in particular in encoding the content of these documents and not only the text.[2] The recently granted MEDEA project (funded by DFG/NEH) will bring together economic historians, scholarly editors, and technical experts to discuss emerging methods for semantic markup of account books.
The MEDEA project supports the development of broad standards for semantically enriched digital editions of accounts, as common data models can help to create scholarly, verifiable, and exchangeable data. One main reference point for the development of standards for the production of such data is the Guidelines of the Text Encoding Initiative (TEI).
The core intentions of MEDEA which would be presented at the TEI conference include:
- How might we model the economic activities recorded in historical documents? What models of bookkeeping were followed historically and how can they be represented formally? Are data models developed for modern business reporting helpful?
- Can we establish common resources on metrics and currencies or even the value of money which can be reused in other projects? Is it possible to build common taxonomies of commodities and services to facilitate the comparison of financial information recorded at different dates and places? That is, can we develop references on the order of name authorities and standards for georeferencing?
- How might we integrate topological information of the transcription with its financial interpretation? Is the “table” an appropriate method? What possibilities are offered by the TEI Manuscripts module and use of the tei:zone element?
The first MEDEA workshop will be held the weekend preceeding the TEI Member Meeting in Lyon, and we would be pleased to offer our TEI colleagues a latebreaking report in the form of either a paper or a poster.
[1] See for example Comptes des châtellanies Savoyardes <http://www.castellanie.net/> and the Jahrrechnungen der Stadt Basel 15351610 <http://gams.unigraz.at/srbas>
[2] See Kathryn Tomasek and Syd Bauman, « Encoding Financial Records for Historical Research », Journal of the Text Encoding Initiative [Online], Issue 6 | December 2013, Online since 22 January 2014, connection on 20 July 2015. URL : http://jtei.revues.org/895 ; DOI : 10.4000/jtei.895; Georg Vogeler: « Warum werden mittelalterliche und frühneuzeitliche Rechnungsbücher eigentlich nicht digital ediert? », Zeitschrift für Digitale Geisteswissenschaften 1, BetaVersion March 2015, connection on 20 July 2015, URL: http://www.zfdg.de/warumwerdenmittelalterlicheundfr%C3%BChneuzeitlicherechnungsb%C3%BCch ereigentlichnichtdigitalediert
Short biographies
Kathryn Tomasek
Associate Professor of History
CoDirector, Wheaton College Digital History Project
Wheaton College
Norton, Massachusetts
@KathrynTomasek tomasek_kathryn@wheatoncollege.edu
Kathryn Tomasek has been teaching undergraduates using TEI since 2004, and exploring the use of TEI markup for financial records since 2009. Tomasek was the Project Director for a StartUp Grant from the NEH in 2011. And she was a member of the American Historical Association’s Committee on the Professional Evaluation of Digital Scholarship by Historians in 20142015.
Ass.Prof. Dr. Georg Vogeler
Zentrum für Informationsmodellierung Austrian Centre for Digital Humanities Universität Graz
Elisabethstr. 59 / III A8010 Graz georg.vogeler@unigraz.at
Georg Vogeler wrote his PhD on late medieval tax accounting in Germany. He is envolved in the field of Digital Scholarly Edition since 2006. He is the technical partner of the digital edition of the Jahrrechnungen der Stadt Basel (http://gams.unigraz.at/srbas). He has tought many courses on the use of the TEI for digital scholarly editions, is technical director of the monasterium.net project, supervisor in the DiXiT project (EU7th Framework, http://dixit.unikoeln.de/), and is founding member of the Institut für Dokumentologie und Editorik (http://www.ide.de).
Prof. Dr. Mark Spoerer / Kathrin Pindl M.A. Universität Regensburg
Lehrstuhl für Wirtschafts und Sozialgeschichte
93040 Regensburg
mark.spoerer@ur.de / kathrin.pindl@ur.de
Mark Spoerer is a full professor of Economic and Social History at the University of Regensburg.
Kathrin Pindl works as a predoctoral research assistant at the Chair of Economic and Social History of the University of Regensburg. Her research interests include premodern living standards in European regions, groupspecific patterns of consumption, and modeling economic activities as recorded in books of account.
Keywords: Translation, Machiavelli, Text Analysis, Parallel Corpora, Dictionary
- Session: Workflows and the TEI
- Date: 2015-10-31
- Time: 11:00 – 12:30
- Room: Amphi Laprade
We will present some results of the translation comparison tool “HyperMachiavel” (a web version of the corpus and its annotations is available at : http://hyperprince.ens-lyon.fr/ ). This tool allows to compare the editio princeps (Blado 1532) with the four French translations of the XVIth century (Jacques de Vintimille (1546), Gaspard d’Auvergne et Jacques Cappel (1553), Jacques Gohory (1571) and one of the XVIIth century translation by Amelot de la Houssaie (1683).
1. Presentation of the tool
HM Inspired by machine translation and lexicographic domains, the system proposes an annotation environment dedicated to the edition of lexical correspondences and offers different views to assist humanities researchers in their interpretations of the quality and the specificities of translator’s work. It allows synoptic view, equivalences detection (it was defined to support manual edition of equivalences for aligned corpora and lexicography work). Corpora and annotations can be directly exported in TEI, although equivalences encoding follows a new dedicated XML schema. The construction of the corpus can be performed within the tool, importing and aligning one by one version of the same text. Concerning the Hyperprince corpus, alignement was performed on arbitrary segments, decisions made by the philologist, corresponding to subdivisions of the original text structure (in chapters).
2. First results
2.1 Stato: The “new things” that Machiavelli states are complex and their “semantic territories” intersect and overlap. HM allows to verify the hypothesis of permanent polysemy of terms – polysemy that comes from how Machiavelli tries to describe (using sometimes the same words in different meaning) the new objects or forms of political action.
2.2. The choices of each translator (virtù, ordini). The tool allows you to understand the differences in approach between the translators and highlight their lexical and syntactic choices. We can therefore think about a description of how each translator translates, allowing to see what’s playing at every moment in this or that choice.
Keywords: TEI tagging, innovation, corpus methods, annotation, digital editions
- Session: Abstracting the TEI
- Date: 2015-10-30
- Time: 09:00 – 10:30
- Room: Amphi Laprade
How might editors annotate what they cannot identify? Under such circumstances, might a TEI archive’s own markup lead the way to new discoveries? Within Digital Mitford: The Mary Russell Mitford Archive, the challenge of locating the mysterious “Miss James” proves emblematic. Referenced solely by patronym, “Miss James” became a topic of conjecture when multiple editors shared questions about the same elusive figure. In letters penned by Mitford in 1819 and after, “Miss James” emerged as Mary Mitford’s trusted friend and advisor. She was also an opinionated humorist, offering assessments of everything from mutual acquaintances to literary works. Yet while her Christian name and profession were later discovered by project editors, her history remains largely unearthed.
What insights might processing Digital Mitford’s own markup reveal about such a figure? Inspired by Douglas Duhaime’s visualized co-citations in the EEBO-TCP corpus, we view clusters of related data as forms of annotation—ones that, rendered judiciously, aid both scholars and those newer to Mitford’s oeuvre.[1] Working with XQuery on our eXist database of project files, we first assess the prevalence of relational categories tagged by our editors, then use these counts to weight lists of high-frequency tokens in ranges indexed by a key term.[2] Visualized, the resulting bouquets of knowledge suggest lines of inquiry—ones “locating” the unknown while enhancing perspectives the TEI archive itself may offer.
[1] See Duhaime, Douglas. “Co-Citation Networks in the EEBO-TCP Corpus.” 26 July 2014. <http://douglasduhaime.com/blog/co-citation-networks-in-the-eebo-tcp-corpus>. Our model builds upon Christopher Ricks’ metaphor of scholarly annotation as “supererogation” (Allusion to the Poets, OUP, 2002). While its visualization is in progress, one mock-up may be found at <http://bit.ly/1gJXWsV>.
[2] See The Digital Mitford Codebook <https://docs.google.com/document/d/1r-8NGPJL1pZ20pnfvoX5OT0DkcDi- NBp5urJiZwx1sY/pub>. On ordered lists, see Witmore, Michael. “Finding ‘Distances’ Between Shakespeare’s Plays 2: Projecting Distances onto New Bases with PCA.” 6 July 2015. <http://winedarksea.org/?p=2271>
Speaker Bios
Mary Erica Zimmer is a Ph.D. Candidate in The Editorial Institute at Boston University whose research addresses editorial theories and methods, histories of the book, and intertextuality. She also has a strong interest in models for undergraduate research. Her work on Digital Mitford’s data visualization team is complemented by her development of an online, browsable model of the bookshops and stalls in London’s Paul’s Cross Churchyard before the 1666 Great Fire. Her dissertation will serve as a companion to the Selected Poems of Geoffrey Hill.
Molly O’Donnell is the University of Nevada, Las Vegas, President’s Foundation Graduate Research Fellow. She has recently contributed to Victoriographies and the Norton Anthology, and was formerly associate faculty at Notre Dame of Maryland University. Her dissertation uses contemporary sociolinguistics to examine the nineteenth-century tales novel as a useful mode for exploration in the areas of genre, narrative, and gender studies.
Elisa Beshero-Bondar, Project Director of the Digital Mitford Archive, is Associate Professor of English at the University of Pittsburgh at Greensburg where she has taught since 2004. She is the author of Women, Epic, and Transition in British Romanticism (University of Delaware Press, 2011). At Pitt- Greensburg, she helped to launch a Digital Humanities pedagogy and research initiative that engages faculty and students in electronic text markup, text-mining of digital library databases, and digital project development. She has recently been experimenting with network analysis as applied to complex text and paratext structures in Thalaba the Destroyer, an 1801 epic poem by Robert Southey.