Title Exploring data models for heterogenous dialect data: the case of explore.bread.AT!

Encoded by   Vanessa Hannesschläger

Encoded by   Daniel Schopper


The Creative Commons Attribution 4.0 International (CC BY 4.0) License applies to this text.


The project "exploreAT! exploring Austria’s culture through the language glass" aims to explore a collection of heterogeneous 20th century data of Bavarian dialects from the former area of the Austro-Hungarian Empire. Within the project, specific topics such as bread, colors, and plants, and their cultural associations, are investigated in greater detail. Schopper, Bowers and Wandl-Vogt1 described the process and some unique issues in converting the Database of Bavarian dialects in Austria (DBOE) from a TUSTEP database format to a basic XML format and its supporting resources into an LOD compatible TEI.

This paper discusses issues related to the digital modeling of the data from subproject "explore.bread.AT! exploring Austria’s bread culture dialectally" ; among the goals of which are to: extract culturally and linguistically relevant information about bread related topics that may be specific to a given place or time, and enhance the linguistic and semantic description of the dataset using standards, including adding etymological markup and analysis as per Bowers and Romary2.

As is the case with much of the rest of the DBOE, this dataset originates from a set of questionnaires, which are a complicated mixture of semasiological (term-based) and onomasiological (concept-based) phrasing, and the content and formatting of the original database entries reflect this. In approaching the modeling of this dataset using markup standards beyond basic XML, we compare and discuss how the data does and doesn’t fit within the semasiological model of the TEI dictionary, and where it may fit within a TBXTEI hybrid3 combining certain aspects of the onomasiological model of TBX and the former.

Thus in addition to shedding light on the dialectal data from "explore.bread.AT!" , the issues discussed can be seen as a representative look into core issues present in the remodeling of the larger legacy database.


  • Bowers, Jack/Romary, Laurent: Deep encoding of etymological information in TEI in: , (2016).
  • Romary, Laurent: TBX goes TEI: Implementing a TBX basic extension for the Text Encoding Initiative Guidelines in: Terminology and Knowledge Engineering (TKE) , Berlin, Germany, 2014.
  • Schopper, Daniel/Bowers, Jack/Wandl-Vogt, Eveline: dboe@TEI: remodelling a database of dialects into a rich LOD resource in: TEI conference 2015 , Lyon, France, 2015.