Title Islamic Manuscripts in Cambridge Digital Library & the German-French virtual library project Paleocoran

Encoded by   Vanessa Hannesschläger

Encoded by   Daniel Schopper


The Creative Commons Attribution 4.0 International (CC BY 4.0) License applies to this text.


European libraries have been actively collecting manuscripts from the Islamic world since the 16th Century. Until recently, the only way of discovering these collections was through inconsistently produced printed catalogues. In 2009, Cambridge (Ul) and Oxford (Bodly) embarked on a joint project to create an online catalogue of Islamic Manuscripts and a structured approach to Islamic manuscript descriptions using TEI. The focus was on creating a standard practice for TEI description of Islamic manuscripts to promote interoperability of data. The result was a union catalogue of Oxford's and Cambridge's holdings, which now includes twelve further institutions (FIHRIST). Content (TEI files) and methodology (schema, practice) were reused and expanded in Cambridge Digital Library, fostering collaborations with external projects. In this paper, I will talk about the challenges and rewards of using TEI for Islamic Manuscript description and about our work in creating a "TEI-community" within the field.

PALEOCORAN aims at bridging the gap between

  • the general history of the Qurʾān as known through Arabic sources and latest paleographic research on the manuscripts and
  • the actual reception of various aspects of the text as documented in the library of the ʿAmr mosque in al-Fusṭāṭ.

The fragmentary state of early Qurʾānic manuscripts, scattered between various collections, has prevented many attempts at examining thoroughly all manuscript evidence. The digital reconstruction of the Fusṭāṭ collection includes fragments from all over the world, most however being kept in Gotha, Saint Petersburg, and Paris (approx. 11,000 fol.) and unites them in a virtual online library.

Besides a unified cataloguing of its approx. 360 fragments, an important part of the project is focusing on the development of the Arabic script (paleography, letter shapes, diacritical signs, vowel system) and the process of canonization of the Qurʾān within this collection. Thus, the manuscripts will be approached in a multidisciplinary way, combining philology, paleography, codicology, art history, and physico-chemical analyses (ink analysis and 14C). Whilst some of this data can easily be gathered and analyzed with common data models, the encoding of variant readings applied to a certain Arabic word within a manuscript requires more complicated considerations on the design of the XMLschema. Furthermore, a convenient way for entering and editing this data, i.e. Arabic words with multiple markups on single vowel points, by the researchers is needed and has not yet been applied to a similarly huge collection.


  • Cambridge Digital Library. Mingana-Lewis Palimpsest.
  • Cambridge University Library Special Collections.