Thoughts from the BRAID Cultural Heritage Forum
- Joseph Nockels
- Nov 10
- 8 min read
Updated: Nov 11
Last week, I had the pleasure of attending the Responsible AI for Cultural Heritage Forum, supported by Bridging Responsible AI Divides (BRAID) and hosted by the School of Advanced Study (SAS) at Senate House Library.
The day had multiple aims -
1) To soft-launch the RESHAPED platform, a set of AI workbenches and learning materials for cultural heritage practitioners, led by Dr Anna-Maria Sichani.
2) To discuss how to best engage with AI affordances and limitations, against the fragmented environment of cultural heritage: best expressed by some audience members asking whether the sector even needed AI? And others suggesting that cultural heritage should simply look to the health domain for its algorithmic testing and technical sophistication.
3) To network and build out research interests.

Throughout the event, it was clear that the sectoral fragmentation in responding to AI came from instances of variable: willingness, capacity, resource distribution, coordination and infrastructure. The latter was typologised as ‘learning / training as infrastructure’, mentioned by Anna-Maria and SAS colleagues; the (re)use of raw datasets as infrastructure demonstrated by Daniel van Strien’s presentation on the use of Small Language Models; and more explicit scaffolding manned by human operators. In the following text of my talk, I aimed to address the latter ‘... as infrastructure’.

Other presentations paid careful attention to the role of Large Language Models (LLMs) in mitigating Linked Open Data (LOD) challenges and lowering barriers to findability, as well as making the case for critical AI approaches institutionally; using LLMs for data argumentation annotation and how human rights - beyond copyright - relate to AI for cultural heritage. I was also made aware of Better Images of AI - clearly, I’ve been living under a rock. The outfit aims to nuance stock images of AI, currently depicted as blue brains, cascading green binary or white-clad robotics, instead commissioning art using archival images copyright free to better inform non-expert audiences. Here’s an example below, which explores fears and concerns around AI eroding and replacing critical thinking, creativity and other core human attributes. It reminds me of Rauschenberg’s mid-20th century prints, similarly depicting angst through collaged effect - albeit of the Cold War.

In any case, below is my presentation. It builds on the aims and objectives of my National Library of Scotland Digital Fellowship, which aims to relate interpretable benchmarking to automated transcription. As is often the case, parallels and better examples emerged during the event. I - jokingly - suggested that this had led me to ‘process’ while ‘presenting’ by mulling over other speakers’ presentations, projects and arguments. It also led to a lot of scribbling over my hard-copy script. I’ve highlighted these parallels, not originally included, and linked them to the other contributions. The event was recorded, so hopefully is accessible for everyone soon enough.
My slides can be found via this Zenodo link - 10.5281/zenodo.17572139.
Intro -
Thank you for the invitation.
I’ll mainly be speaking about my project, Recognising Text, Recognising Processes: eXplainable Automatic Text Recognition for Scottish Spiritualist Newspapers, supported by the National Library of Scotland (NLS) and University of Sheffield’s Digital Humanities Institute, where I am currently based.
As context, I’ll begin with a sort of problem statement - since I live with a philosopher; our approach, some anticipated and already faced challenges, and lastly a few preliminary findings. This is the first month of a three month fellowship, so feel free to count the caveats. It is also worth mentioning, especially in following Anna-Maria’s great work undergoing community needs reporting for BRAID, that Knowledge Exchange activities are planned. These will mainly involve NLS staff training once a preferred transcription approach is settled. However, this presentation will provide more of a general overview of the project’s methodology and aims.
Background -
Although I rely on Web of Science (WoS), I am equally conscious of a commercial provider reporting that their own financial market is expanding. Clarivate’s ‘Pulse of the Library’ report - published last week - details a steady rise in AI adoption within libraries: though for most institutions this remains at the early evaluation stage [1]. This echoes a similar trend found in Cox’s 2021 CILIP-commissioned study of AI awareness within the sector, with both reports viewing AI literacy and confidence as foundational to greater model deployment within institutional workflows [2]. The need for growing confidence and technical fluency among cultural heritage professionals strikes me as a core theme of today’s events, being mentioned by several speakers regardless of what precise AI tools are tested.
Despite the growing awareness of AI capabilities, a gap remains between conceptualising these tools and operationalising them at the level of workflows, policies, positions, collections and infrastructure [3]. Bridging this gap, the aim of BRAID - it’s in the name, for libraries requires what Twidale and Nichols (2006) define as ‘computational sense’, an understanding which “extends beyond the notion of programming to encompass an ability to understand the broader notions of the capabilities of software and the socio-technical issues of usability, system deployment and maintenance” [4]. In using this definition, we can see that foundational conceptualisations of digital cultural heritage work remain relevant, despite being introduced before the recent wake of AI. It remains a technology that can be registered against older frameworks. This relates to Paul’s [Blundell, Head of Digital Research and Development, Arts Marking Association] earlier comment that approaching AI in cultural heritage can often be a case of “applying what you’ve got” - whether that’s expertise, a knowledge of collections, or critical expectations of software.
The need for computational sense, however, is increasingly pertinent as libraries seek to contextualise greater automated outputs, with staff expressing anxieties in legitimising inadequate systems - easily done, through their longstanding status as trustworthy informational repositories [5].
RQs -
So - in contending with a small part of this move to operationalisation, our project focuses on Automatic Text Recognition (ATR), the AI-enabled process of converting images-of-text into machine-processable datasets, a necessary tool for greater collection accessibility, data quality and digital scholarship. ATR also forms a perfect case study for assessing computational sense in libraries, with its outputs often requiring the use of multiple systems, formats, sufficient ground truth and manual correction.
Specifically, we ask how far AI eXplainability (XAI), the effort to enable non-technical users to comprehend algorithmic results, can be applied to library’s ATR processes [6] ?
Moving beyond ATR’s proven ability to make collections readable, we instead assess to what degree XAI is required for national libraries to utilise ATR at scale?
Lastly, we ask, does added eXplainability impact ATR model performance? This last question is directed by technical papers often citing increased eXplainability as compromising model performance and scalability [7].
Method -
So how are we trying to unpack this? Well - alongside research-in-practice discussions with relevant NLS staff, we are experimenting with six predominant ATRs over the same 50 pages of The Spiritualist [point to table]. These tools are representative of the range of ATR business structures, from OS to commercial LLMs, with each model’s Character Error Rate (CER) measured against qualitative / interpretive coding of each providers’ publicly available how-to manuals, guides and advice within NVIVO.
This approach also enables us to make The Spiritualist, currently available via the NLS’s Data Foundry as ‘dirty’ OCR, more accessible - a core source for how the religious movement articulated their seances with the deceased. A nice parallel emerges, with mediums’ often channeling spirits through ‘automatic writing’ - linking to our efforts in articulating a similarly abstracted system with its own internal logic [8].
Challenges -
The Turing’s 2025 Doing AI Differently white paper, highlights that as AI moves into domains where interpretive judgement is required - such as automatically transcribing collections, traditional benchmarks can easily break down. In our use of CER, are we instead widening the gap between technical performance and real-world relevance? That remains an open question.
Ensuring a fair technical benchmark is also difficult across systems, with some ATRs requiring tailored layout analysis approaches to produce accurate outputs, while others are designed to be streamlined in their workflows. Developmental priorities are also constantly changing: while commercial ATRs have full-time maintenance, OS tools will likely undergo fewer updates [9].
Very Preliminary Findings -
To finish, here are some (very) preliminary findings:
Currently, our testing has found that for most ATRs - especially on newspapers - manual Layout Analysis (LA) is still required before accurate baseline and text recognition can occur. This is despite providers listing automated models for recognising regions and document structures. We are currently testing how Visual Language Models (VLMs) perform, which simultaneously process visual and text features - so this is likely to shift. That said - at present - the most accurate, fully automated workflow, for processing newspapers at-scale appears to involve general-purpose OCR techniques (Tesseract and OpenCV Graphics) (CER, 25.30%), developed by Google from 2006 from a legacy system launched by Hewlett-Packard in 1984 [10]. It should be said that these metrics were calculated externally against a reference text and are specific to The Spiritualist.
In terms of XAI in tool guidance, we corresponded our initial CERs to the % coverage of interpretability and opacity codes labelled in NVIVO manually. In using a Spearman ranking, which shows monotonic relationships (i.e., how consistently the ordering of one variable matches another), we ordered each tool across several criteria (interpretability included) [point to table]. We then correlated their values as a data matrix. Although super preliminary, greater interpretability is seen to improve tool accuracy (negative correlation, -1.0). Likewise, there is a moderate negative relationship between opacity (where a guide does not fully explain a model function for non-technical users) and worsened CER (- 0.5): less transparency, more errors. This challenges the adage that increasing time spent on explaining technical processes, decreases model performance. However, this is in need of more verification. Of course - CERs could come from my inability in applying these tools, outside of standardised and efficient workflows. However, this raises an interesting, almost self-ethnographic, critique about whether ‘this is just a me problem.’
Ghostwriter Model -
Finally - our reference text, corrected within Transkribus, has been used to train a model entitled Ghostwriter (Halloween has passed, I know). The model is now publicly available via the platform, with a reported CER of 0.89%. I’d be interested - if anybody is willing to test it - how far it is applicable to other 19th century newspapers. Dr Sarah Ames, NLS Digital Scholarship Librarian, is also interested in lifting this model out of the Transkribus environment with aid from Hugging Face as a means of enabling further re-use of material. This links to Daniel’s [van Strien, Hugging Face Librarian] suggestion that centralised workflows can potentially wait, in the face of producing datasets - with structured data itself forming a sort of infrastructure through (re)use and further modification.
We’re also developing, at the DHI, a web interface to enable users to toggle between each of the six ATR transcriptions, to further contrast/compare results, with some more context as to the coding process. In seeing RESHAPED first-hand, we will be taking notes from the School of Advanced Study, especially in terms of navigability, interactables, and curated interdisciplinary hubs.
Thank you for listening
Clarivate is a commercial AI analytics and database provider, managing searchability in Web of Science and Proquest, as well as aiding in automated metadata generation within Alma. Clarivate (2025), https://clarivate.com/pulse-of-the-library/
A. Cox (2021). The impact of AI, machine learning, automation and robotics on the information profession. CILIP. 1-56. https://www.cilip.org.uk/page/researchreport
T. Padilla (2019). Responsible Operations: Data Science, Machine Learning, and AI in Libraries. OCLC. https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html
M.B., Twidale, D.M. Nichols (2006). Computational sense: the role of technology in the education of digital librarians. Hamilton, New Zealand: The University of Waikato. https://researchcommons.waikato.ac.nz/handle/10289/40.
L. McCarron (2023). Interview by Joseph Nockels [Microsoft Teams]. 27 April.
J. Van Wessel (2020). AI in Libraries: Seven Principles, https://zenodo.org/records/3865344
Z.C. Lipton (2016) ‘The Mythos of Model Interpretability’, Paper presented at 4th ICML Workshop on Human Interpretability in Machine Learning, June 23, 2016, New York. 1-8. doi: 10.48550/arXiv.1606.03490. S. Ali et al. (2023) ‘Explainable Artificial Intelligence (XAI): What We Know and What Is Left to Attain Trustworthy Artificial Intelligence’, Information Fusion, 99: 1-52. doi: 10.1016/j.inffus.2023.101805.
M. Foot (2023) Modern Spiritualism and Scottish Art - Scots, Spirits and Seances, 1860-1940. London: Bloomsbury, pp. 73.
J. Van Wessel (2020). AI in Libraries.
R. Smith (2007). An Overview of the Tesseract OCR Engine. Paper presented in 9th International Conference on Document Analysis and Recognition (ICDAR 2007), 23-26 September, Curitiba, Brazil. pp. 629-633. doi: 10/1109/ICDAR.2007.4376991