Preservation through digitisation of the Tangut collection at the Institute of Oriental Studies, St Petersburg Branch, Russian Academy of Sciences (EAP140)

Aims and objectives

The Institute of Oriental Studies holds 4,600 manuscripts and 3,765 block-prints in the Tangut language, the largest collection worldwide. The collection constitutes a unique archive of information on a Silk Road people who established their own kingdom between the 10th-13th centuries in present-day northwest China. Their language was in the Sino-Tibetan group, closer to Tibetan. They subsequently invented their own script, based on Chinese, and translated the whole of the Buddhist canon into Tangut. They were annihilated by the Mongol conquerors in 1227 and the spoken language disappeared completely. Full access to this material through digitisation would allow a reappraisal and full exposition of the historical legacy of the Tanguts from primary sources, as opposed to the more subjective secondary sources of neighbouring peoples.

This project will digitise the Tangut collection. These unique historical, literary, and administrative texts are of immense value for understanding Tangut language and culture. The metadata and images will be made freely available on the International Dunhuang Project database and websites in Russia, Britain, China, Germany and Japan. This will open the material to scholars worldwide, currently unable to study the materials firsthand due to distance and unsuitability of handling originals. High resolution digital images will help solve this problem. The manuscripts are very fragile, suffering from paper destruction and fungus lesion and there are currently no surrogate copies - they are unique documents. The 8365 Tangut manuscripts will require a minimum of 30-35,000 images - some are in scroll format requiring more than one shot, and the recto and verso of each item is always taken to ensure a complete archive.

The project will start with digitising the Buddhist part of the Tangut collection, about 600 items (5,000 images). These are Tangut sutras of the 12th century (mainly translations from Tibetan and Chinese taking into account the Sanskrit texts). They are absolutely unique, some of which (Mahaprajnapararamita, Mahaparinirvana, Suvarnabrabhasa) exist in a number of handwritten or wood-printed copies, which would allow codicological and palaeographical research to be undertaken. The project will then move on to the remaining 7700 items which would require a further 25-30,000 images. These are historical, literary, administrative and other texts and are of immense value for understanding both the Tangut language and the culture.

The material will be checked by the conservator from the Institute of Oriental Studies before being passed for digitisation. In some cases the conservator might need to carry out some stabilisation work to allow safe digitization, or to flatten and unfold material. A grant from other sources was awarded in 2006 for the conservation of this material and so it is probable that most will be suitable for inclusion in this project.


The poor condition of the material meant extensive restoration work was needed to be undertaken prior to digitisation. This, together with the fragility of the material which required handling with special care, meant that only 15,500 images were able to be taken - 260 manuscripts, with the pressmark Tang. 334 (Tangut translation of Mahaprajnaparamita sutra). This amounted to 3.1 terabytes of data.

The digital material is now housed on RAID-Array (8x750Gb HDD) at the Institute of Oriental Manuscripts, Russian Academy of Sciences. The images and metadata are freely available on the International Dunhuang Project (IDP) database and websites in Russian, English, French, Chinese, German and Japanese. A copy of the digital collection has been deposited with the British Library.

The records copied by this project have been catalogued as:

Blog: Tangut manuscripts from St Petersburg - July 2014