Dariah-SI

XML-TEI Markup Language in the Humanities

Introductory workshop on Digital Humanities 

The workshoop took place on October 15th 2014 in Prešeren hall at the Slovene academy of Sciences and Arts in Ljubljana.

 Videos are in the Slovenian language.

Introduction to XML and TEI

Tomaž Erjavec

Introductory lecture presented the basics of XML markup standard. We looked at the structure of documents and tagging model in XML and briefly discussed the character encoding with an emphasis on standard Unicode. XML schemas, which enable the formal definition of grammar and a set of markings for a particular type of document, were presented in the follow-up. In the second part of the lecture, we learned of Text Encoding Initiative. The guidelines define a system for building XML schema and document in detail more than 500 elements that TEI provides for the marking of very diverse types of texts and for different analytical treatment. The motive for the establishment and historical overview of the TEI and the main advantages of using the TEI Guidelines for Electronic Text Encoding and Interchange were given at the end.

 Introduction to TEI

 Matija Ogrin

 

The TEI Consortium Guidelines try to accommodate the diverse needs of humanists whose main object of study is text. The Guidelines set out a comprehensive set of XML tags, which can be used to mark (encod) diversified structure of humanities texts. Symbols are grouped into modules for the various areas of work with texts. In this lecture we will learn about the general structure prescribed for TEI documents, and most importantly, the modules e used by humanists when work with text.

 

User case: description of the manuscripts

Matija Ogrin

Manuscripts represent one of the most important segments of cultural and especially literary heritage, which is the reason that electronic databases are being created, presenting detailed descriptions of the manuscripts together with a digital facsimile of the original. TEI Guidelines contain a special module fort his particular field. This lecture presents various opportunities from less to more complex labeling enebled by TEI guidelines.

 

User case: biographical and prosopographical data 

Petra Vide Ogrin

TEI Guidelines contains a special module for biographical and prosopographical data, which can be found in the archival regestae, prosopographies and especially in the lexicographical publications. Labeling of the biographical data, used at the web portal Slovenian biography (containg 3 lexicons: Sklovenian biographical lexicon (1925-1991), Slovene biographical lexicon of the Littoral (1974-1994) and the New Slovenian biographical lexicon (2013), was eneblad using these guidelines. This presentation describes the use of the TEI mark-ups for detailed labeling of the personal and variant names, titles and nobility predicates, place names, dates, occupations, family ties and their peculiarities.

 

User case: digitally born and structured data

Andrej Pančur

TEI Guidelines were originally created to mark-up digitised printed data of the analogue text, but they are used to label digitally born text, including scientific publications, in the last years. This presentation is going to adress the strengths and weaknesses of electronic publishing in the humanities using TEI Guidelines compared to some others in the publishing industry widespread markup languages (DocBook, XHTML, HTML5). In addition, we have shown how is it possible to include structured data from relational tables and databases into original digital text.

User case: Linguistically annotated corpora and dictionaries

Tomaž Erjavec

Computer text corpora form the basis for the empirical study of language, both in basic linguistic research as in applied linguistics, lexicography in particular. Guidelines TEI have a special module to record the corps and an additional module for the linguistic tags that you can add to text, which make the corpus much more useful. In this lecture we will look at some examples of linguistically labeled corpus of Slovenian language, and then record the cases of dictionary data for which guidelines also offer a separate module.