Towards a Repository of Senses for Use in TEI encoded Dictionaries
by
Thierry Declerck
Karlheinz Mörth
Info
Title | Towards a Repository of Senses for Use in TEI encoded Dictionaries |
---|---|
responsible |
Encoded by Vanessa Hannesschläger Encoded by Daniel Schopper |
License |
The Creative Commons Attribution 4.0 International (CC BY 4.0) License applies to this text. |
Abstract
The presentation is based on the observation that information about senses is often repeated in and across larger TEI encoded dictionaries. This has lead us to the idea to set up a repository of senses that can be shared by entries in distinct dictionaries; similar to the ISOcat repository for data categories that can be accessed for encoding part-of-speech and morphological information of lexical entries.
The TEI approach to the encoding of senses is described in the dictionary module of the
TEI Guidelines1. There, an entry is
defined as a component-level element (tagged as entry
) that
"contains a
single structured entry in any kind of lexical resource, such as a dictionary or
lexicon"
2. A sense (sense
) is supposed
to group
"together all information related to one word sense in a dictionary
entry, for example definitions, examples, and translation equivalents"
3. As such a sense is a component of an
entry or of elements of an entry, like homonyms.
There are no defined restrictions as to how to codify the content of the sense, and all
possible string characters seem to be allowed. This fact renders the comparison of
senses across lexicons difficult, if not impossible. In general, we do not want to rely
on string matching for stating a relation between senses included in different entries.
We advocate the creation of a repository of senses, which can be referred to (and
shared) by entries in TEI dictionaries. Our experiments made use of technologies such as
SKOS-XL, LMF and lemon. To establish efficient linking mechanisms, we made use of the
TEI ptr
element.