Info

Title Towards a Repository of Senses for Use in TEI encoded Dictionaries
responsible

Encoded by   Vanessa Hannesschläger

Encoded by   Daniel Schopper

License

The Creative Commons Attribution 4.0 International (CC BY 4.0) License applies to this text.

Abstract

The presentation is based on the observation that information about senses is often repeated in and across larger TEI encoded dictionaries. This has lead us to the idea to set up a repository of senses that can be shared by entries in distinct dictionaries; similar to the ISOcat repository for data categories that can be accessed for encoding part-of-speech and morphological information of lexical entries.

The TEI approach to the encoding of senses is described in the dictionary module of the TEI Guidelines1. There, an entry is defined as a component-level element (tagged as entry) that "contains a single structured entry in any kind of lexical resource, such as a dictionary or lexicon" 2. A sense (sense) is supposed to group "together all information related to one word sense in a dictionary entry, for example definitions, examples, and translation equivalents" 3. As such a sense is a component of an entry or of elements of an entry, like homonyms.

There are no defined restrictions as to how to codify the content of the sense, and all possible string characters seem to be allowed. This fact renders the comparison of senses across lexicons difficult, if not impossible. In general, we do not want to rely on string matching for stating a relation between senses included in different entries. We advocate the creation of a repository of senses, which can be referred to (and shared) by entries in TEI dictionaries. Our experiments made use of technologies such as SKOS-XL, LMF and lemon. To establish efficient linking mechanisms, we made use of the TEI ptr element.

References

  • , : TEI P5: Guidelines for Electronic Text Encoding and Interchange. Version 3.0.0 , 2016. http://www.tei-c.org/release/doc/tei-p5-doc/en/Guidelines.pdf