Abstract

TEI was originally conceived as a document centric markup language whose domain should have been the digital representation of textual documents in the humanities’ domain: mainly literary (at large) sources, archival documents, and language resources1. At the time when it was developed, no markup-based metadata standards existed, so a great attention was given to the definition of a metadata-oriented section in the encoding scheme, namely the TEI header2. The same happened for other kinds of data representation like formulas, graphs, graphic elements, feature structures. In its decennial evolution, even if the data and metadata encoding language ecosystem has changed dramatically, becoming even overpopulated, this approach has been not only confirmed but also reinforced.

During the years TEI has developed markup modules for prosopographical and biographical data3, geographical indexes, taxonomies, and other types of (not strictly textual) structured data; or has "exapted" existing and / or introduced new markup constructs to deal with new representational functions like the expression of Linked Data-like structures, or of stand-off annotations4. In general, we can say that TEI has an "assimilation" stance towards any kind of representational need emerging from its users’ community: We could call it a "one markup language does it all" (OMD) approach.

The consequences of this approach are manifold: The TEI encoding scheme has grown gigantic in scope, dimension, and structure, and hence in maintenance complexity; some non-textual modules of TEI, aimed at representing not strictly textual data, are not completely satisfactory for their adopters, and are consequently subject to a strong "pressure" by the user community, resulting in tag abuse or continuous extension and additions requests; the overall consistency of the abstract semantic of the TEI is not assured (a critical analysis of some of these drawbacks in the TEI architecture are in Schmidt5).

In this paper, we propose that future development of the TEI should take a more pluralistic approach, permitted by various recent developments of the XML technologies: Namespaces, NVDL (Namespace-based Validation Dispatching Language6) and multi-schema validations, XInclude.

Recently TEI has extended its capacity to cooperate with external data and metadata representation languages introducing the new element xenoData, a container element into which metadata in "non-TEI" formats may be placed7. We propose some possible extensions of this element, in order to pursue this approach more thoroughly:

  • a special attribute to state the type of metadata vocabulary used (for validation purposes);
  • a flexible method to assert that a set of external metadata (not necessarily in XML format) applies to a specific part of the TEI file: for instance in order to give a MODS description of bibliographic items cited in the text; to express MIX technical metadata of facsimiles images; or to give a formal definition to a term (or whatever) element by the way of an RDF triple expressing a SKOS based thesaurus;
  • a way to adopt external metadata sets (MODS, VRA, Premis, etc.) to substitute or implement the internal ones.

We advocate, in sum, a wider conception of xenoData and a more fine grained way to assert the possible relationships between the metadata it contains and their objects. This could be an operational intermediate step towards a more general redesign of the TEI encoding schema.

References

  • Bański, Piotr: Why TEI standoff annotation doesn’t quite work: and why you might want to use it nevertheless in: Proceedings of Balisage: The Markup Conference. Balisage Series on Markup Technologies , 5, 2010.
  • Burnard, Lou/Popham, Michael: Putting Our Headers Together: A Report on the TEI Header Meeting 12 September 1997 in: Computers and the Humanities , 33/1-2, 1999.
  • Ciotti, Fabio/Tomasi, Francesca: Formal ontologies, Linked Data and TEI semantics in: Journal of the Text Encoding Initiative , 9, forthcoming.
  • Eide, Øyvind: Ontologies, Data Modeling, and TEI in: Journal of the Text Encoding Initiative , 8, 2015. http://jtei.revues.org/1191
  • , : ISO/IEC 19757-4 NVDL (Namespace-based Validation Dispatching Language) , 2006. http://standards.iso.org/ittf/PubliclyAvailableStandards/c038615_ISO_IEC_19757-4_2006(E).zip
  • Ore, Christian-Emil/Eide, Øyvind: TEI and cultural heritage ontologies: Exchange of information? in: Literary and Linguistic Computing , 29/4, 2009.
  • Schmidt, Desmond: Towards an Interoperable Digital Scholarly Edition in: Journal of the Text Encoding Initiative , 7, 2014. http://jtei.revues.org/979
  • Sperberg-McQueen, C. M./Burnard, Lou: The design of the TEI encoding scheme in: Computers and the Humanities , 29/17, 1995.
  • Wedervang-Jensen, Eva/Driscoll, Matthew: Report on XML Mark-up of Biographical and Prosopographical data , 2006. http://www.tei-c.org/activities/workgroups/pers/persw02.xml