Title Upconversion and Migration: Generating a TEI-EpiDoc Corpus of Sicilian Inscriptions

This paper introduces the TEI P5 / EpiDoc corpus of inscriptions on stone for ancient Sicily called I.Sicily. The corpus aims to include all texts inscribed on stone, in any language, between approximately the seventh century BC and the seventh century AD in Sicily. The corpus currently contains records for over 2,500 texts, and when complete is likely to contain c. 4,000. The corpus is built upon a conversion from a legacy dataset maintained in MS Access and Excel into EpiDoc TEI XML. The XML records are held in an eXist database for xQuery access, and generate other outputs such as a fulltext search using SOLR / Lucene. The corpus and related information (museum list, bibliography) are published as Linked Data, and are manipulated through a RESTful API. The records are queried and viewed through a web interface built with AngularJS and jQuery JavaScript components. Mapping is provided in the browser by the Google Maps API, and ZPR (Zoom, Pan, Rotate) imageviewing is provided by the IIIP image server.

This paper will report on the main conversion of MS Access and MS Excel files into TEI-EpiDoc XML. This conversion uses a combination of existing TEI stylesheets, and customized transformations to generate thousands of individual TEI-EpiDoc files. These incorporate a variety of references to additional references to standardized vocabularies taken from MS Excel files listing canonical entries. This means that individual inscriptions link through to information about places using Pleiades, lists of museums, and epigraphic types, materials, and supports using the URIs for EAGLE vocabularies. This paper not only reports on the conversion, providing helpful advice on how to undertake such conversions, but also on the project itself.