We are TEI. You will be assimilated!
by
Fabio Ciotti
Info
Title | We are TEI. You will be assimilated! |
---|---|
responsible |
Encoded by Vanessa Hannesschläger Encoded by Daniel Schopper |
License |
The Creative Commons Attribution 4.0 International (CC BY 4.0) License applies to this text. |
Abstract
TEI was originally conceived as a document centric markup language whose domain should have been the digital representation of textual documents in the humanities’ domain: mainly literary (at large) sources, archival documents, and language resources1. At the time when it was developed, no markup-based metadata standards existed, so a great attention was given to the definition of a metadata-oriented section in the encoding scheme, namely the TEI header2. The same happened for other kinds of data representation like formulas, graphs, graphic elements, feature structures. In its decennial evolution, even if the data and metadata encoding language ecosystem has changed dramatically, becoming even overpopulated, this approach has been not only confirmed but also reinforced.
During the years TEI has developed markup modules for prosopographical and biographical data3, geographical indexes, taxonomies, and other types of (not strictly textual) structured data; or has "exapted" existing and / or introduced new markup constructs to deal with new representational functions like the expression of Linked Data-like structures, or of stand-off annotations4. In general, we can say that TEI has an "assimilation" stance towards any kind of representational need emerging from its users’ community: We could call it a "one markup language does it all" (OMD) approach.
The consequences of this approach are manifold: The TEI encoding scheme has grown gigantic in scope, dimension, and structure, and hence in maintenance complexity; some non-textual modules of TEI, aimed at representing not strictly textual data, are not completely satisfactory for their adopters, and are consequently subject to a strong "pressure" by the user community, resulting in tag abuse or continuous extension and additions requests; the overall consistency of the abstract semantic of the TEI is not assured (a critical analysis of some of these drawbacks in the TEI architecture are in Schmidt5).
In this paper, we propose that future development of the TEI should take a more pluralistic approach, permitted by various recent developments of the XML technologies: Namespaces, NVDL (Namespace-based Validation Dispatching Language6) and multi-schema validations, XInclude.
Recently TEI has extended its capacity to cooperate with external data and metadata
representation languages introducing the new element xenoData
, a container
element into which metadata in
"non-TEI"
formats may be placed7. We propose some possible
extensions of this element, in order to pursue this approach more thoroughly:
- a special attribute to state the type of metadata vocabulary used (for validation purposes);
- a flexible method to assert that a set of external metadata (not necessarily in
XML format) applies to a specific part of the TEI file: for instance in order to give
a MODS description of bibliographic items cited in the text; to express MIX technical
metadata of facsimiles images; or to give a formal definition to a
term
(or whatever) element by the way of an RDF triple expressing a SKOS based thesaurus; - a way to adopt external metadata sets (MODS, VRA, Premis, etc.) to substitute or implement the internal ones.
We advocate, in sum, a wider conception of xenoData
and a more fine grained way
to assert the possible relationships between the metadata it contains and their objects.
This could be an operational intermediate step towards a more general redesign of the
TEI encoding schema.