Title Early Modern Slovenian Manuscripts Between Description, Critical Edition and Lexicon

Encoded by   Vanessa Hannesschläger

Encoded by   Daniel Schopper


The Creative Commons Attribution 4.0 International (CC BY 4.0) License applies to this text.


Even though early modern periods of 17th and esp. 18th century are important for the development of Slovenian literature, the manuscripts from these periods have only been given a sketchy treatment or did not enter the account of scholarly evidence at all. Recent research has proved, however, that many Slovenian Baroque manuscripts entail superior literary, cultural or spiritual values, but have never been published in the medium of printed books, because their authors continued – for several reasons – to rely on the manuscript culture as the main medium of their textual oeuvre.

For this reason, new research initiatives have been undertaken for analysis, transcription, and digital processing of these texts. During the last 15 years (2001-2016), several TEI-conforming projects have been launched, which converged in three main methodological approaches.

  • The fundamental research is encompassed in the analysis of the manuscripts as primary sources, expressed in structured msDesc elements and arranged together online as the (beta version) Register of early modern Slovenian manuscripts 1.
  • Digital scholarly editions are an established route of text-critical study, processing, and presentation of selected early Slovenian texts in the eZISS project2, where one of the leading principles is our strict differentiation and inclusion of both diplomatic and critical editions of the text in question, which involves interesting issues of both encoding and on-screen processing.
  • The third approach to the same text is still in progress: We try to find an optimized workflow to generate, for each edition, a lexicon of the words and word forms, extant in its text. Here, again, questions of diplomatic and critical word forms complicate the matter a lot, and we can observe – and aim to address – a range of normalizations, going to contemporary standard Slovene word forms and their lemmatization with part-of-speech tagging, as was demonstrated in the IMP project3. We are experimenting with various machine learning methods for the normalizations, concentrating on Character-based Statistical Machine Translation and Conditional Random Fields. The specific aim of the development of these methods is to facilitate the lexicography work, but could also prove useful in preparing new editions. During the preparation of a new edition, the training dataset for the translator between the diplomatic and critical words and sentences could be bootstrapped as the translation progresses.