The paper will provide an overview of and an update on the ongoing proposal to create a standOff component within the TEI architecture. It will elicit the conceptual background of having stand-off annotations embedded within a TEI document and the consequences in terms of primary source preservation, multiple annotation views and possible exporting of annotation content into autonomous TEI documents. It will demonstrate the various types of possible use cases ranging from manual annotation to fully automatized information extraction processes and show the importance of implementing, right from the onset, the possibility to use any kind of internal or external vocabulary for representing annotation bodies (e.g. to deal with structural or conceptual annotations). An important prospect here is that the standOff construct could lead to a simplified development of TEI-aware online services such as Named Entity Recognizers.

We will relate to ongoing initiatives and show the necessity to align with the Web Annotation Data Model (W3C) as well as with the recent introduction of the annotationBlock element for speech transcription (as part of the work carried out in the ISO standard 24624) as an elementary annotation crystal in the sense of Romary and Wegstein1. In this context, we will tackle the issue of implicitness in the representation of annotations and open the debate related to the trade-off between having a terse vs. highly flexible model.

We will end up by illustrating the application that is already made of the current proposal in various projects related to data mining or scientific information, and in particular to the representation of annotated scholarly content.

Further material

  • References

