The goal of this work is to provide relevant corpus representations for the manuscripts of British philosopher and social reformer Jeremy Bentham (1748-1832).
This is a large corpus, owned by University College London (UCL). Until recently, the manuscripts were for the most part untranscribed, so that very few people had access to the corpus to evaluate its content and its value. At UCL, the corpus is now being digitized and transcribed thanks to a large number of volunteers recruited through a crowd-sourcing initiative called Transcribe Bentham. The problem researchers are facing with such a corpus is clear: how to access the content, how to structure these (at the moment) 30,000 files, and how to get relevant access to this mass of data?
We first performed a lexical extraction in order to obtain the terms to model the corpus, using a technology called entity linking or wikification.
We then used CorText Manager to cluster the terms, and produce corpus visualizations, both static and dynamic through time. We used the following functions from the CorText platform:
– Network analysis: concept networks
– Heatmaps by decade: they show what areas of the corpus are more prominent per decade
– Tubes layout: evolution of concept clusters throug time
An interface showcasing project results so far is at: http://apps.lattice.cnrs.fr/bentham/
The work was presented at the 2016 Digital Humanities Conference in Krakow: