Revisiting the Marjory Fleming Project
- Joseph Nockels
- Aug 6, 2024
- 4 min read
Updated: Jul 28
For a new blog, I wanted to instead rehash some old work briefly and look back at my time at the National Library of Scotland (NLS). This post includes helping to create the Library's first dataset using AI-enabled Handwritten Text Recognition (HTR). Working with the Digital Scholarship Team, we created an accurate transcription of the 19th century diaries written by Scottish child author Marjory Fleming (1803-1811), published online in 2021.

Frontage of the first Fleming volume
The diaries were selected as an important historical resource, with few sources centring children as authentic voices of their own experiences. Presented to the Library in 1930, they therefore formed a rare exception and rich resource for children's history. Their pages are filled with witticisms, moralisms, and copybook notes, all completed under the watchful gaze of Fleming’s tutor and older cousin Isabella Keith. That said, the diaries are not entirely humourous but include deep personal tragedy. The collection ends with a few pages written not in Fleming’s hand but those of her cousin and mother, accounted for by the manual training of our HTR model. They express deep grief in Marjory’s untimely death aged 8, in December of 1811, from measles and meningitis. In their words, Marjory was left powerless to act against '… so heavenly mercy’s plan'.
It was essential, therefore, for the Library to extend their digitisation workflows and provide accurate transcriptions of these intimate and personal materials. In taking scanned images from the NLS digital gallery, the HTR platform Transkribus was used to automatically convert
these images into computer-readable text. For a deeper explanation of the process, look at this NLS blog post: https://blog.nls.uk/automatically-transcribing-the-marjorie-fleming-diaries/. Without the kind of 'Collections as Data' (Padilla, 2019) guides now accessible through organisations such as LIBER (European Association of Research Libraries) with advice, links and training materials (Candela, 2024), we were initially flying blind.

Transkribus downloaded client, showing metadata and training data of the Fleming model, entitled 'Early 19th Century (Child)
Nonetheless, in getting to grips with Transkribus, an eventual CER (character error rate) of 1.85% was recorded. These transcriptions only required minimal correction and
formed the base for the various file formatted outputs now available to researchers via the NLS's Data Foundry, a dynamic destination where data is stored, welded together, replayed and reused: https://data.nls.uk/data/digitised-collections/marjory-fleming/
This work was well received! Which is always nice. We made up some column inches in the Fife press and Scotsman and, more recently, William Kilbride included our transcriptions as a case study in his 2024 keynote on Artificial Intelligence and Digital Preservation to IFLA (International Federation of Library Associations). The READ-COOP, who maintain and develop Transkribus, also includes the images and transcriptions from the Fleming diaries to launch parts of their new web app, earlier this year.

Transkribus launch of their web publishing tool, Transkribus Sites, using the Fleming diaries, with the always energetic and engaging Matthias Sorg
Personally, this experience transcribing Fleming's diaries has led me to deepen my thinking surrounding the different histories we can tell when given free-flowing access to personal archives, where appropriate. These archives, unlike many printed business - organisational archives are not created with an intended audience in mind. They are also often situated closer to events (Simpson, 2020) and, for historians especially, therefore present the
pinnacle of source material when attempting to reflect accurate retellings of the past. How do these histories emerge when digital technologies are leveraged to make materials further available? Can they tell us more than our printed - regimented - standardised - industrialised - mechanical documents can? Or will they simply differ? If so, how?
Now, with the NLS becoming more involved in Transkribus against the backdrop of proactively constructing AI ethics guidelines and promoting greater automation internally, this 'Collections as Data' project appears to have acted as one small step for a library primed to take a giant (or at least large) leap into adopting such approaches. AI in archives certainly suffers from a hype curve. NLS staff are increasingly aware of this - dealing with more and more user requests for transcriptions, AI outputs or general advice (another reason for using Transkribus!). However, such technology has clear value when properly directed - in generating data and even in preserving it (Kilbride, 2024). As with the Fleming diaries, AI can provide access to authentic and overlooked voices. Kilbride says so much in his presentation:
'There are so many voices from so many digitization projects that are just waiting to be heard.'
References
Candela, G. (2024) ‘Collections as Data: Getting Started’, Digital Scholarship & Data Science Essentials for Library Professionals. https://libereurope.github.io/dsessentials/collectionsasdata.html
Crow, A. (2021) 'AI transcribes historical diaries of child prodigy pet marjorie', Fife Today, February 4, https://www.fifetoday.co.uk/news/people/artificial-intelligence-transcribes-historic-diaries-of-child-prodigy-pet-marjorie-3124330
Kilbride, W. (2024) 'DP and Artificial Intelligence - A Four Point Plan', Digital Preservation Coalition, https://www.dpconline.org/blog/dp-and-artificial-intelligence-a-4-point-plan
Padilla, T., Allen, L., Frost, H., Potvin, S., Russey Roke, E., Varner, S. (2019) Final Report - Always Already Computational: Collections as Data. doi: https://zenodo.org/record/3152935#.X6WOf-LPzIU
Simpson, K. (2020) ‘The digital archive as space and place in the constitution, production and circulation of knowledge’, in Ben Fletcher-Watson and Jana Phillips (eds.), Humanities of the Future: Perspectives from the Past and Present, vol. 22, pp. 65-82. Edinburgh, IASH Occasional Papers.
Transkribus. (2024) Publish Documents with Transkribus Sites, https://www.youtube.com/watch?v=LlApHRWN6Uo
For further reference
Transkribus. (2024) https://www.transkribus.org/


Comments