Title Encoding a Dictionary of Russian Dialects in TEI and linking to LOD Resources

Encoded by   Vanessa Hannesschläger

Encoded by   Daniel Schopper


The Creative Commons Attribution 4.0 International (CC BY 4.0) License applies to this text.


Within a cooperation project between the Russian and Austrian Academies of Sciences, we are investigating the TEI encoding of the Dictionary of Russian dialects, which contains more than 300,000 entries distributed over 48 volumes. The goal of the study is to increase accessibility, interoperability and reusability of this rich source of dialectal data. Our current proposal for a TEI representation consists in encoding the official Russian word as a TEI entry element and to use the cit element for each occurrence of a dialect form (within the quote element). Within the cit element, we then also include within the usg element the geo-location for indicating the region in which the dialect form is used. And finally, we include in the cit block available bibliographical information (bibl) – in most cases, from which source the dialect word has been collected.

The meaning of the entry is given in the original dictionary in the form of free text. We are currently working on offering more structure to this part of the original entries, with relevant parts tagged as names or, if more flexibility is needed (not only proper nouns), "referring strings" rs. In the context of the name element, we include then also conceptual information, for example that an entry is the name of a family of plants (… name type="botanicFamily"сложноцветных/name…), which we can then link to the scientific name of this family: name type="plant" subtype="scientific" key="taxonid:.." xml:lang="la"Taraxacum officinale Wegg./name. This way, we can easily link the original entry in the dialectal dictionary to taxonomic datasets that are available in the Linked Open Data cloud, and to other language data included in the Linguistic Linked Open Data.