From DOCX via TEI to Literature Map
|Title||From DOCX via TEI to Literature Map
Encoded by Vanessa Hannesschläger
Encoded by Daniel Schopper
The Creative Commons Attribution 4.0 International (CC BY 4.0) License applies to this text.
Data in literary studies often have references to location and time: Writers do travel and stage scenes in their novels often in places they have visited. By conveying these data together with geographical maps, we are able to create a so-called literature map that gives us an overview of the relationship between biography and writing of an author. But we can also compare depictions of a place across several authors, which is of interest for historians and geographers. Of course, these data is also useful for literary traveler's guides.
The project "Tyrol / South Tyrol - A literary topography" (funded by Austrian Science Fund FWF, P26039) aims at the gathering of such geo- and time-referenced data and the creation of literature maps for the Tyrolean region. We use a database consisting of two parts for our work: a document repository (for TEI data and images), and a RDBMS (managing metadata of authors, places, keywords, texts, dates, as well as relationships among them). Open government data and authority records are used for raw maps and metadata on places and people, TEI and DOCX are used for gathering literary data. By doing so, we can use a text-centric workflow by annotating texts, which is more natural for scholars in the humanities.
For each author and place, we create a DOCX document. A table at the beginning of the document captures metadata of the entities; biographical notes and primary texts follow the table. Using the stylesheet- and remark-function of MS Word, we are able to add annotations on places, dates and keywords to the texts. The data-ingest routine converts DOCX to TEI, and then it extracts data from the TEI-document for the relational database. After the conversion has been reviewed by scholars, the data is then ingested to the document repository and the relational database.