Title TEI and lemon: a comparative study on the lexical encoding and interoperability

In this study, we examine questions pertaining to interoperability between the Text Encoding Initiative (TEI) Guidelines for encoding dictionaries1 and the lexicon model for ontologies (lemon), which has been developed in the context of the W3C OntologyLexica Community Group. The OntoLex Community Group Specification2 contains a section dedicated to clarifying the systems relation to other models of lexical encoding, which covers SKOS(-XL), LMF (Lexical Markup Framework), and the Open Annotation standard but lacks a comparison with TEI. We think that such a comparison is needed; also since issues of bringing the TEI into the world of Linked Open Data and making use of ontologies have continued to gain interest within the TEI community.

In our study, we focus on both TEI and the different modules of lemon in aspects of their technical data models, as well as their respective capacities to adequately express certain linguistic features. For this purpose, we have encoded examples in both systems from multiple sources, including historical and dialectal dictionaries, in which we give particular focus on compounds and different types of variation in the lexical entries. In identifying the non-compatible aspects of these systems, we seek to point out and discuss some of the potential linguistic and technical benefits and potential downsides of using each system for one’s lexical data, also taking into consideration aspects related to the publication of lexicographic data in the Linked Data cloud.


