Scaling Down, The Place of Limited Studies in the Digital Humanities Landscape

Mar 10
13 min read

Updated: Mar 26

Sinem Görücü / https://betterimagesofai.org / https://creativecommons.org/licenses/by/4.0/

In getting closer to my new University of Glasgow role, I have been thinking more about the place of small-scale projects within the broader DH landscape.

Is the main aspect of my role to support students in completing smaller projects, from which they can evidence transferable skills to employers? Or - do these tighter, time-bound, projects have greater value to DH, beyond self-learning?

Of course, the answer is somewhere in the middle, but how do we approach the place of smaller DH projects, at a time of AI hype cycles, rampant LLM use, and automation, especially when these factors encourage staff and students alike to conduct research on a greater scale?

Smaller projects: a research placement, cataloguing internship, practical portfolio, or end-of-term essay, do not necessarily need to benefit wider DH. Of course, their primary purpose is to benefit the student - especially if that student is paying tuition. That is already more than enough, and I am sure most educators would say the same. But, there is also value in showing students that they can contribute to a research field, beyond their own early-career development. We should be mindful to measure such encouragement against high expectation setting. Otherwise, a certain pressure ensues for our DH students: to completely solve an emerging issue, provide an entirely new perspective or research method, or crunch a dataset that has proved previously unwieldy for all before them.

At the BA and MA level, we value originality, but also synthesised reviews of extant work and methods. As an MA thesis supervisor, at the University of Sheffield’s (UoS) Digital Humanities Institute, I find myself more often than not recommending students scale-back their approaches, often to their own disappointment. To illustrate this - during a recent visit to our Special Collections, Heritage and Archives, I joined a colleague in outlining potential projects to their Digital Cultural Heritage students: ‘You could use linked data approaches to better understand how an individual appears in this archival text, compared to that one’; ‘You could use automated transcription to identify previously unknown authors, using stylometry’. Though excited by these research prospects, one student appeared conflicted, while looking over the example collection, laid out by the archivist in front of them. After the visit, they questioned whether they needed to deploy state-of-the-art digital tools and had enough technical expertise. As such, they worried about the scale of their work, especially in comparison to the DH suggested reading we provided: ergo their work held less value …

All I could say was that ‘every bit of information / data helps’, like I was part of an off-brand Tesco commercial, encouraging students to work hard until we had ‘total’ coverage of this or that archive. It was clear that we both left the interaction, somewhat, unsatisfied …

Returning to this exchange, I increasingly feel that it touched on some important epistemics of Digital Humanities, both in terms of how we got here - with students vocalising such concerns over scale, and where the field may be going, in light of greater AI capability.

Those in DH know that there is no singular definition of the discipline, though it is convincingly encapsulated by Nyhan and Flinn (2016: 1) as aiming to transform how the Humanities encounter, transmit, question, interpret, problematise, and imagine artefacts and attitudes. In doing so, they suggest that ‘it tends to differentiate itself from routine uses of computing in research and teaching …’ (Nyhan & Flinn, 2016: 1).

DH is aphamorphous. That is why many are attracted to it from more traditional research backgrounds, including some of my own colleagues. They often express that they were motivated to work within a more dynamic, and less hierarchical, field than Italian Studies, History or Archaeology, while continuing to be recognised authorities on these subjects.

However, because of its flexibility, students new to DH are pulled across the hacks, those who code, and yacks, those who interpret tools, divide (Nowviskie, 2014). I still contend with such anxieties, especially when working with more technical colleagues. Against this long-standing tension, DH can easily impress on students a certain kind of interdisciplinary pressure. This is increasingly compounded by the perceived need to work across large-scale collections, or to distance research approaches even more, through the use of AI models, tools and approaches.

***

Last year, we wrote a paper on the levels of explicit religiosity in the abolitionist Frederick Douglass’s (1818 - 1895) autobiographical writings. Broadly, we assessed whether his mention of religion decreased over time, as historians previously suggested. The full paper is referenced below (see Nockels et al., 2025), however what emerged as a secondary theme was our advocation for small-scale text analysis, and how such approaches can reveal nuances in autobiographical history.

Text analysis, like other DH-aligned methods, often prioritises working on larger datasets to identify patterns for further research, otherwise indiscernible manually (Yu et al., 2023). One reviewer of the article stated as much, wanting greater justification for how our method using Handwritten Text Recognition (HTR), the process of converting images-of-manuscripts into machine-processable text (Pinche and Stokes, 2024), and text analysis using R (https://cran.r-project.org/index.html), could be considered robust. They suggested that upping the study’s scale was the only way. Even now, I agree that the efficacy of digital workflows is often proven by ratcheting scales up, and increasing amounts of training and evaluation data. Nonetheless, we suggested there was worth in showing how more honed methods could illuminate knowledge of comparably smaller archives. They eventually agreed, thanks to our re-purposing Guldi's (2023, 132) view that text analysis can be tailored to ensure consulted data confirms known information and facilitates discovery. Alongside this notion, starting with smaller datasets presented a less daunting prospect for our readers, students, and academic colleagues, outside of formal DH and STEM set-ups.

As part of this informal blog, I return to such work in an attempt to understand the place of limited work in our DH landscape, hopefully as a means to support student confidence in their smaller - still innovative - projects.

***

Following this discussion of how small-scale analysis may inform wider DH, the following section presents an ‘off cut’ of our HTR to text analysis project, supported by the Library of Congress and University of Edinburgh, in 2024. We are currently working on using this case study to further understand how HTR can benefit dynamic, temporal, understandings of 19th century abolitionist history, through coupling automated transcription with network visualisation methods, but the following was seen by reviewers as extraneous.

The project aimed to fully incorporate unpublished material through automated transcription, which had been previously left out of holistic studies of the abolitionist’s self-asserted political theory. However, due to the limited availability of such material, digitally captured in-house by the LoC in 2000, our analysis of Douglass’s writing remained small-scale. Nonetheless, the work still provided a scalable, accurate and reproducible method for foregrounding autobiography research based on handwritten library collections as data. In focusing on DH research purely in terms of scale, we can easily lose track of the broader transferability of such approaches, and their potential in extending historical understandings of the past.

A large part of this work aimed to show how studies pairing Optical Character Recognition (OCR) with text analysis, seen since the 1990s, could now be extended through HTR: enabling studies of more intimate documentation, compared to published works intended for public audiences (Nockels et al., 2024: 153). By extension, this research needed to be conducted on scant, ephemeral, and unstandardised text, instead of reams of printed pages.

Therefore, we fully incorporated Douglass’s unpublished, and lesser studied, travel diary of Europe and North Africa (1886 - 1887, mss11879 1:1), alongside his published works: Narrative of the Life of Frederick Douglass (NLFD, 1845), My Bondage and My Freedom (MBMF, 1855) and Life and Times of Frederick Douglass (LTFD, 1881). The diary forms part of the Library of Congress’s Frederick Douglass Papers (1841 - 1964) (https://www.loc.gov/collections/frederick-douglass-papers/about-this-collection/).

Our transcription model of Douglass’s travel diary, trained within Transkribus https://www.transkribus.org/), and shareable through contact, covered 12,642 words (70 pages), with 94.28% character accuracy. This transcription was then analysed alongside Project Gutenberg’s 2006 edition of NLFD (1845) (https://www.gutenberg.org/ebooks/23), 1995 version of MBMF (1855) (https://www.gutenberg.org/ebooks/202) and 1892 edition of the LTFD (https://www.gutenberg.org/ebooks/71893). This created a total dataset, including DD, of 361,543 words, by all accounts a smaller-scale study within DH.

***

Against our background discussion on how small-scale work still holds merit for wider DH research, we provide a snippet from our text recognition to analysis work on Douglass’s autobiography. The code used for this work can be found through our GitHub (https://github.com/jnockelsss/Douglass-Project-LoC-2024). Acknowledgement must be given to Gina Nguyen, Research Assistant at the Library of Congress’s Kluge Center, who supported this work, as well as those who scanned the underlying image data at the Library’s Manuscript Division.

Word Frequency Data

In beginning with foundational word frequencies, we provide a baseline for understanding the vocabulary of each of Douglass’s autobiographies. Analysing word frequencies enables initial comparisons to be drawn between the themes of the abolitionist’s published and unpublished autobiographies. Across his three published works, words denoting enslavement: ‘slave/slaves’, ‘slavery’, ‘free’ and ‘master’ were among the most frequent, shown in Table 1 as normalised proportions based on each autobiography’s total length. The predominant nature of these terms appears as expected, with NLFD (1845), MBMF (1855), and LTFD (1881) written to condemn the practice of human bondage, whether as self-assertive narrative or broader political theory.

By contrast, these terms are relatively infrequent in his 1886-1887 diary. Instead, words such as ‘day’, ‘people’ and ‘city’ appear more frequently. This data provides a subtle indication as to Douglass’s more positive circumstances in 1886 - 1887, compared to his previous voyages to Britain. In 1845, Blackett (1983: 13) frames Douglass as contending with ongoing enslavement by participating in the ‘well-oiled and pretty efficient propaganda machine’ to promote anti-slavery support through lectures abroad. In 1859, following John Brown’s (1800-1859) failed Harpers Ferry raid, Douglass returned to Britain as a fugitive to avoid reprisal due to his close personal associates with the revolutionary (Murray 2023). DD represents a third and final British visit, directed by more leisurely activities, reflected in corresponding word frequencies gravitating around terms away from abolitionism.

Term	NLFD (1845)	MBMF (1855)	LTFD (1881)	DD (1886-1887)
slave	3.32	4.19	2.40	0.16
slavery	2.14	3.66	2.35	0.08
time	2.87	1.93	2.35	1.03
master	2.82	2.76	1.66	0.08
slaves	3.00	2.23	1.41	0.08
people	0.41	1.29	2.02	2.45
colored	0.91	1.45	1.99	1.19
life	0.66	0.84	1.39	0.16
day	1.14	0.99	1.04	3.64
free	0.82	1.03	1.11	0.32

Table 1. Ten highest word frequencies across Douglass’s published autobiographies and 1886-1887 travel diary, normalised as a proportion of each autobiography’s length.

Though Table 1’s word frequencies reaffirm the content of Douglass’s autobiographical work, they only provide minimal context for the relationship between his published autobiographies and personal accounts. As such, we calculated word frequency similarities using the distance between co-occurring words as a metric (1). Figure 1 shows Douglass’s published accounts as separated terms, measured against DD for similarity, and normalised as a percentage of word count. Greater similarity (lower distance between terms) is shown through a point’s proximity to the mid-line. In contrast, word tokens with a higher or lower normalised frequency than DD appear closer to the 1.0 or -1.0 limits (2).

Fig 1. Word frequency similarities between Douglass’s autobiographies and his travel diary, created within R Studio using ggplot2

Figure 1 suggests NLFD (1845) is closest to DD (1886 - 1887) in language, despite having the widest date range and Douglass’s own circumstances shifting from a fugitive to esteemed intellectual and public figures. As Blassingame (2003: xxx) suggests, NLFD (1845) was received well by the press but deemed limited in linguistic range and technique. This could explain its similarity to a travel account of simple phrases, notes and limited narrative construction. Nonetheless, suggestions that Douglass’s later autobiographies form intellectual leaps require further analysis through charting infrequent terms.

Sentence Length

In Nockels et al. (2025), we calculated the presence of unique word tokens (hapax legomena) to corroborate Blassingame et al.’s (2003) notion that Douglass’s writing style became more complex throughout his writing career. It found the abolitionist’s lexical diversity, seen as the presence of unique vocabulary, gradually increased over time. In providing an additional measure for this, we also calculate the average sentence length across his autobiography by splitting text strings using punctuation-based regular expressions. Douglass’s published works show a gradual lengthening of sentences, which coincides with his growing use of vocabulary: NLFD (21.49 words), MBMF (25.78), LTFD (27.00). However, DD returns a lower average sentence length (16.25), possibly due to the nature of his diary writing including memos, short self-reflections and tabulations of person names and addresses. This also discerns a difference between honed public writing as autobiography compared to more private musings not written for wider consumption.

Personal Tense in Douglass’s Autobiography

In following linguists’ understanding that person tense can act as a means of self-constructed identity, this section maps Douglass’s use of personal pronouns throughout his autobiographical writings. Dar and Masroor (2020: 136) suggest that pronouns are more than fixed noun replacements, instead revealing how the self (I/my), group identities (we/us) and significant associations (they/them) interact. Iliopoulou (2019: 40), in categorising the qualities of second person storytelling, similarly highlights that pronouns, often treated as reductive and a closed category of words, hold social and political implications due to their impact on rhetorical structures that reflect connotations beyond simple denotation. For our autobiography study, such data may uncover detail as to the genre of Douglass’s work and whether it mainly constitutes political theory or narrative storytelling. All four autobiographies were uploaded into R Studio, pre-processed, and converted to a one-word-per-row structure (Silge and Robinson, 2017). This allowed for a simple matching exercise between constructed First, Second, Third person pronoun dictionaries and Douglass’s written work. These figures were then normalised per 1000 words to provide length-adjusted statistics.

First person tense is shown to be most prevalent across all four autobiographies. However, this steadily declines throughout Douglass’s career (NLFD, 48.48, MBMF, 45.43; LTFD, 40.81, DD, 30.53). As Dar and Masroor (2020) discuss, First person tense enables readers to understand the narrative author’s feelings, depressions, sufferings, pains, personal issues and experiences. This observation that First person language declines can therefore be explained through Douglass’s narrative shifts, with NLFD (1845) and MBMF (1855) recounting his own experience of slavery. LTFD (1881) reads more as social commentary against postbellum prejudice and DD (1886 – 1887) which engages with enslavement less directly, shown through Table 1’s word frequencies.

The presence of Second person pronouns is relatively minimal and similarly declines as Douglass moves away from self-reflecting on his own enslavement (NLFD, 4.09, MBMF, 3.23; LTFD, 2.06, DD, 0.08). In following Iliopoulou’s (2019: 24) characterisation of Second person narratives, which are often neglected in binary opposite studies around internal and external perspective, we expected a higher prevalence of related pronouns. This is due to the Second person having numerous employments, functions and characteristics, among holding a ludic (spontaneous and playful), self-reflective, and direct quality when engaging the reader (Iliopoulou, 2019: 7). These qualities hold clear relation to Douglass’s biography as a touring speaker for the Anti-Slavery Society and avowed social critic on issues ranging from obvious abolitionist causes to Union recruitment during the American Civil War (1861 – 1865).

The prevalence of Third person pronouns follows the same trend (NLFD, 42.09, MBMF, 34.17; LTFD, 30.32, DD, 19.22) and shows the multivocal nature of Douglass’s work. Each autobiographical work therefore contains strong First person articulations alongside distant Third person narration, with Freeborn (1996, 206) characterising such narration as almost omniscient. Therefore, Douglass appears both as direct and distant narrator, while articulating his own feelings, the experiences of others and society more generally.

***

The above section highlights how, in working over just a few texts, robust historical findings can still be reached using critical digital methods. In our case, we used pre-built workflows, opposed to hard-to-implement and overly technical methods. Together, these tools can be wielded to better understand the linguistic features and structures of 19th century autobiographical material. This work is made reproducible through our project GitHub (https://github.com/jnockelsss/Douglass-Project-LoC-2024), and informed our approaches on wider material related to Douglass (see Nockels, 2025).

As a standalone study, the textual features uncovered may not reshape our entire autobiographical understandings of the abolitionist, but move the needle slightly. However, the project involved training a Research Assistant to use R-based approaches, as well as upskilling on my part. I had not used R before. Therefore, in both informing our future work and developing our digital research skills, this ‘off cut’ still holds merit to the broader DH ecosystem.

Viewing this work purely in terms of scale obscures such wider benefits.

As a last, woolier point, this research was also enjoyable, despite it not covering everything.

Let us scale-back then, and focus more on immediate surroundings, and encourage our students to do the same - to work on what interests them, feels manageable, and still sets them up for whatever they want to do next.

Notes

Each term, or word token, was counted across the four autobiographical works. This was then normalised as a proportional of total words for each autobiography, with the resulting frequencies subtracted by the equivalent word frequency in Douglass's 1886-1887 diary. This acted as a benchmark. For example, the word ‘people’ occurs 166 times in MBMF and 31 times in his travel diary, with these terms normalised as a percentage of total words: MBMF (0.13%), travel diary (0.25%), resulting in 0.12% distance. Words appearing less than 15 times were cut off for clearer visualisation.
A result of 1.0 indicates the word only appears in Douglass’s published works, and -1.0 in DD.

References

Blackett, R.J.M. (1983) Building an Antislavery Wall: Black Americans in the Atlantic Abolitionist Movement, 1830-1860. London: Cornell University Press.

Blassingame, J.W. (2003) ‘Introduction’, in Blassingame, J.W., McKivigan, J.R., Hinks, P.P. (eds.), The Frederick Douglass Papers, Series Two, Autobiographical Writings, Volume 2: My Bondage and My Freedom. New Haven, CT: Yale University Press, pp. 1-10.

Dar, S. R., Masroor, F. (2020) ‘Depiction of Self & Others: A Corpus-Based Study of Personal Pronouns in Autobiographies’, Global Language Review, 5(1), pp. 135-145. doi: 10.31703/GLR.2020(V-I).15

Freeborn, D. (1996) Style: Text Analysis and Linguistic Criticism. Cham, Switzerland: Springer.

Guldi, J. (2023) The dangerous art of text mining: A methodology for digital history. Cambridge: University of Cambridge.

Iliopoulou, E. (2019) Because of You - Understanding Second-person Storytelling. Bielefeld: Transcript Verlag.

Murray, H.R. (2023) ‘Frederick Douglass in Britain and Ireland’. Available at: http://frederickdouglassinbritain.com

Nockels, J., Nguyen, G., Charlton, A., Terras, M. (2025) ‘God on the Stage: A Text Analysis of Frederick Douglass’s Religiosity (1845 - 1887)’, International Journal of Arts and Humanities Computing, 19(2): 1-25. doi: 10.3366/ijhac.2025.0352

Nockels J., Gooding P., Terras, M. (2024) ‘The implications of handwritten text recognition for accessing the past at scale’, Journal of Documentation, 80(7): 148–167, doi: 10.1108/JD-09-2023-0183

Nowviskie, B. (2014). ‘On the Origin of ‘Hack’ and ‘Yack’’, Journal of Digital Humanities, 3(2). Available at: https://journalofdigitalhumanities.org/3-2/on-the-origin-of-hack-and-yack-by-bethany-nowviskie/

Nyhan, J., Flinn, A. (2016) ‘Introduction’, in Nyhan, J. Flinn, A. (eds.), Computation and the Humanities. Cham, Switzerland: Springer Open, pp. 1-20. doi: 10.1007/978-3-319-20170-2_1

Pinche, A., Stokes, P. (2024) ‘Historical documents and Automatic Text Recognition: Introduction’, Journal of Data Mining and Digital Humanities, 1-11. doi: 10.46298/jdmdh.13247.

Silge, J., Robinson, D. (2017) ‘Text Minding with R, A Tidy Approach’. Available at: https://www.tidytextmining.com

Yu, L., Charlton, A., Askins, W., Terras, M., Filgueira, R. (2023) ‘frances: Cloud-based historical text mining with deep learning and parallel processing’, Proceedings of the 19th International Conference on e-Science, Limassol, Cyprus, 9-13 Octobers, 1-10. doi: 10.1109/E- Science58273.2023.10254798

10 Comments

Guest

3 days ago

dự đoán xsmb mình thấy mọi người nói hoài nên cũng bấm vô nghía thử cho biết. Mình không đọc kỹ đâu, chủ yếu xem giao diện có dễ nhìn không vì mình hay bị ngán mấy trang chữ dày đặc. Lướt một vòng thấy họ chia nội dung thành từng khối khá rõ, nhìn thoáng mắt nên kéo xuống không bị rối. Mấy chỗ trình bày dạng bảng theo cột cũng ổn, kiểu liếc qua là nắm được ý chính chứ không phải căng mắt đọc từng dòng. Menu đặt ngay chỗ dễ thấy, bấm qua lại vài mục vẫn biết mình đang ở đâu, không bị lạc. Nói chung cảm giác dùng nhanh gọn, nhất là phần bố…

Guest

Jul 10

https://ao88y.top/ hôm bữa mình lướt thấy link này nên bấm vào xem thử cho biết thôi. Mình không đăng ký hay chơi gì, chỉ xem trang chủ họ trình bày kiểu gì. Giao diện nhìn khá sáng sủa, chữ dễ đọc, mấy phần nội dung chia thành từng khối nên kéo xuống không bị rối mắt. Có đoạn “thông tin sơ lược” về AO88 đặt ngay trên trang nên đọc lướt là hiểu họ nói về gì, không phải đi tìm lung tung. Mình cũng để ý phần nói về uy tín/minh bạch được viết gọn, kiểu tóm tắt chứ không dài dòng. Thanh menu nằm chỗ dễ thấy, bấm qua lại mượt, không bị giật lag. Nói chung cảm…

Guest

Jun 28

gg88.solution hôm bữa mình thấy nhắc đâu đó nên bấm vào coi thử cho biết thôi, kiểu xem trang họ trình bày ra sao chứ không có ý ngồi lâu. Giao diện nhìn khá thoáng, các mục được chia theo từng khối nên lướt xuống không bị rối mắt. Mình có đọc lướt đoạn họ nói về bảo mật mã hóa nhiều lớp, thấy viết cũng rõ ràng chứ không phải vài câu cho có. Với lại phần hỗ trợ giải đáp để khá dễ thấy, cảm giác ai cần hỏi gì thì tìm nhanh. Nói chung xem vài phút là hiểu họ đang muốn người dùng nắm thông tin theo từng mảng, nhất là mấy tiêu đề lớn và…

Guest

Jun 28

https://uy88.gg/ hôm trước mình lướt thử vì thấy bạn bè nhắc, kiểu vào xem giao diện ra sao thôi. Mình không đọc sâu nội dung, chủ yếu xem cách họ trình bày có dễ nhìn không. Ấn tượng đầu tiên là tông màu nhìn “quý tộc” khá rõ, kiểu hoàng gia nhưng không bị rối mắt. Chữ với các khối nội dung canh khá gọn nên kéo xuống vẫn theo kịp, không có cảm giác bị ngợp. Mình cũng để ý phần nói về bảo mật/độ an toàn được đặt khá nổi, nên ai quan tâm chuyện này nhìn qua là thấy ngay. Menu để đúng chỗ quen thuộc, bấm qua lại mượt, và trang chủ chia block rõ ràng…

Guest

Jun 28

b52clubvg.com.co hôm nọ mình cũng tò mò ghé thử vì thấy mọi người nhắc, vào cái là thấy giao diện khá “dễ thở” chứ không rối. Mình không đọc kỹ nội dung, chỉ lướt nhanh xem họ bố cục ra sao thôi, nhưng thấy các khối thông tin chia rõ nên tìm cái mình cần khá nhanh. Có đoạn nhắc về chuyện website giả mạo đặt ngay chỗ dễ nhìn, kiểu kéo xuống chút là gặp nên cũng yên tâm hơn khi đọc lướt. Mấy mục tin tức với phần tải app được để tách bạch, nhìn phát là phân biệt được chứ không bị dồn chữ. Nói chung trang này trình bày gọn, tiêu đề nổi và các hộp…

Joe Nockels

Scaling Down, The Place of Limited Studies in the Digital Humanities Landscape

Recent Posts

10 Comments