Encoding a Dictionary of Russian Dialects in TEI and linking to LOD
|Title|| Encoding a Dictionary of Russian Dialects in TEI and linking to LOD
Encoded by Vanessa Hannesschläger
Encoded by Daniel Schopper
The Creative Commons Attribution 4.0 International (CC BY 4.0) License applies to this text.
Within a cooperation project between the Russian and Austrian Academies of Sciences, we
are investigating the TEI encoding of the Dictionary of Russian dialects, which contains
more than 300,000 entries distributed over 48 volumes. The goal of the study is to
increase accessibility, interoperability and reusability of this rich source of
dialectal data. Our current proposal for a TEI representation consists in encoding the
official Russian word as a TEI
entry element and to use the
element for each occurrence of a dialect form (within the
cit element, we then also include within the
the geo-location for indicating the region in which the dialect form is used. And
finally, we include in the
cit block available bibliographical information
bibl) – in most cases, from which source the dialect word has been
The meaning of the entry is given in the original dictionary in the form of free text.
We are currently working on offering more structure to this part of the original
entries, with relevant parts tagged as
names or, if more flexibility is needed
(not only proper nouns),
rs. In the context of the
name element, we include then also
conceptual information, for example that an entry is the name of a family of plants (…
name type="botanicFamily"сложноцветных/name…), which we can
then link to the scientific name of this family: name type="plant"
subtype="scientific" key="taxonid:.." xml:lang="la"Taraxacum officinale
Wegg./name. This way, we can easily link the original entry in the
dialectal dictionary to taxonomic datasets that are available in the Linked Open Data
cloud, and to other language data included in the Linguistic Linked Open Data.